M4 20Hz Cleaned or Rotated Data NOT always 12,000 Records

Hello,

I recently started using the M4 20Hz MAT files and noticed an inconsistency with the record lengths for some of the Sonic variables. The NWTC website indicates that cleaned or rotated data will always be 12,000 records long (as per software version 1.21). This does not appear to be the case. The following details are from a single M4 20Hz MAT file VER 1.23. Notice that some of the cleaned or rotated variables contain 12,000 records while other cleaned or rotated variables do not.

Notice that many of the 30m variables are fine (contain 12,000 records). Also notice that the variable: “Sonic_temp_30” should probably be “Sonic_Temp_30” (capital T). I only mention this typo because 30m variables seem to be different in this file (may help with troubleshooting).

I know the formatting below is terrible! (couldn’t seem to make it much better, sorry)

Current File Name: 10122_01_10_00_020.mat

tower.processing.code

   date: [2013 1 29 17 0 0]
version: 1.2300

Var_Name Var_Length
Sonic_Temp_100 11999
Sonic_Temp_131 11999
Sonic_Temp_15 11999
Sonic_Temp_50 11999
Sonic_Temp_76 11999
Sonic_Temp_clean_100m 11999
Sonic_Temp_clean_131m 11999
Sonic_Temp_clean_15m 11999
Sonic_Temp_clean_30m 12000
Sonic_Temp_clean_50m 11999
Sonic_Temp_clean_76m 11999
Sonic_Temp_rotated_100m 11999
Sonic_Temp_rotated_131m 11999
Sonic_Temp_rotated_15m 11999
Sonic_Temp_rotated_30m 12000
Sonic_Temp_rotated_50m 11999
Sonic_Temp_rotated_76m 11999
Sonic_cleaned_timestamp 11999
Sonic_rotated_timestamp 12000
Sonic_temp_30 11999
Sonic_u_100m 11999
Sonic_u_131m 11999
Sonic_u_15m 11999
Sonic_u_30m 12000
Sonic_u_50m 11999
Sonic_u_76m 11999
Sonic_v_100m 11999
Sonic_v_131m 11999
Sonic_v_15m 11999
Sonic_v_30m 12000
Sonic_v_50m 11999
Sonic_v_76m 11999
Sonic_w_100m 11999
Sonic_w_131m 11999
Sonic_w_15m 11999
Sonic_w_30m 12000
Sonic_w_50m 11999
Sonic_w_76m 11999
Sonic_x_100 11999
Sonic_x_131 11999
Sonic_x_15 11999
Sonic_x_30 11999
Sonic_x_50 11999
Sonic_x_76 11999
Sonic_x_clean_100m 11999
Sonic_x_clean_131m 11999
Sonic_x_clean_15m 11999
Sonic_x_clean_30m 12000
Sonic_x_clean_50m 11999
Sonic_x_clean_76m 11999
Sonic_y_100 11999
Sonic_y_131 11999
Sonic_y_15 11999
Sonic_y_30 11999
Sonic_y_50 11999
Sonic_y_76 11999
Sonic_y_clean_100m 11999
Sonic_y_clean_131m 11999
Sonic_y_clean_15m 11999
Sonic_y_clean_30m 12000
Sonic_y_clean_50m 11999
Sonic_y_clean_76m 11999
Sonic_z_100 11999
Sonic_z_131 11999
Sonic_z_15 11999
Sonic_z_30 11999
Sonic_z_50 11999
Sonic_z_76 11999
Sonic_z_clean_100m 11999
Sonic_z_clean_131m 11999
Sonic_z_clean_15m 11999
Sonic_z_clean_30m 12000
Sonic_z_clean_50m 11999
Sonic_z_clean_76m 11999
time_UTC 11999

Another example file from the M4 20Hz data:10122_16_20_00_020.mat.
This file seems to be OK at 131m and 30m (12,000 records), other heights contain 11,997 records

Regards,
Everett

I did some more work here. I looked at all variable lengths for the files in the following directory:
path: ‘S:\Projects\MetData\M4Twr\2012\10\12’

The attached file is a tab delimited file that shows the variable lengths for these 144 files. Open with Excel or Notepad++ (no carriage returns so regular Notepad is ugly).

Everett
M4_20Hz_Variable_Length.txt (108 KB)

Hi Everett,

Thanks again for posting your comments to the forums, rather than sending me emails. This way everyone gets to learn about any issues with the data.

I took a look at the file you sent through. What appears to be happening is that for a given file, the raw data might only be 11,998 records long because we missed a sample at some point. Maintaining 20 Hz can be a real challenge sometimes. Then, when I go through the data processing routines I remap the sonic data to a continuous 20-Hz time series (note that where there are readings I don’t resample, I only shift things by a hundredth of a second). That means I have a true 20-Hz clean signal. I can do this because I know that the data system was actually triggered at 20 Hz, but sometimes we have hiccoughs in getting the time stamp.

If a file ends short, I extend it using the mean of the time series.

So my original data might looked like this:
elapsed time [s], value
0, 3.4
0.05, 3.6
0.099, 3.75
0.15, 3.8
0.25, 4.0

599.90, 12

  • END OF FILE -

And then it gets mapped to the nearest 0.05 second point and interpolated where there is no measurement:

0, 3.4
0.05, 3.6
0.1, 3.75 ← data remapped to 20-Hz time series
0.15, 3.8
0.25, 4.0

599.90, 12
599.95, 8.9 ← or whatever the mean is

  • END OF FILE -

This only happens to the ‘clean’ sonic data in the 20-Hz files, so you only see this effect in some columns.

Perfect!

Thanks Andy

Hi Andy

I think there could still be a problem with the timestamps for the M4 20Hz data. There is a mismatch between the number of records for some of the variables and the associated timestamp within the file.

For instance:

In my first post, I showed a long column of data with variable names and the associated number of records for the variables. Near the bottom of the list you will see that “Sonic_z_clean_30m” has 12,000 values while “Sonic_z_clean_50m” only has 11,999 values. Although these are both “cleaned” variables they have a different number of values. Also notice that the “Sonic_cleaned_timestamp” has 11,999 values. This issue will cause indexing errors and variable length errors in Matlab. A similar situation occurs with the rotated data.

For example, the code you posted in your “software version” post will not work for all files:

figure
plot(246060*(time_UTC.val-time_UTC.val(1)),Sonic_z_15.val,‘ko’) % raw data
hold on
plot(Sonic_cleaned_timestamp.val,Sonic_z_clean_15m.val,‘r+’) % cleaned data
plot(Sonic_rotated_timestamp.val,Sonic_w_15m.val,‘bx’) % rotated data

The 2nd plot statement will throw a “vector length error” if I use the file “10122_00_00_00_020.mat”. The “Sonic_cleaned_timestamp” has 11,998 values while the “Sonic_z_clean_15m” has 12,000 values. It seems that the “Sonic_cleaned_timestamp” was not remapped to a 20Hz signal.

To make a long story short, here is my understanding when the data is at least 95% complete:

  1. Cleaned variables should always have 12,000 values and should exactly match the number of values in the “Sonic_cleaned_timestamp” (currently not always true).
  2. Rotated variables should always have 12,000 values and should exactly match the number of values in the “Sonic_rotated_timestamp” (currently not always true).
  3. Raw variables should have the same number of values as the raw timestamp (time_UTC), (currently seems to be true).

I apologize if I am beating a dead horse here but after reading your timestamp QC algorithm, it does not seem like the situations I mentioned above should be possible when the number of samples is at least 95% of 12,000 samples?

Best regards,
Everett

I think you found one of the special conditions under which my code breaks down.

As far as I can tell, what is happening there is that the data for one (or more) of the sonics for the 12,000th sample in the time series are NaN. Because there’s no data, there’s no last record created. I think this happens because I consider all sonics independently. This might take some time to fix! I’ll post on here when I’ve figured out the solution.

FIRST EDIT:

  • The elapsed time should always be 11,999 points, because it’s defined right at the start for all sonics together. It’s defined as
dt_full = [0:1/tower.daqfreq:(tower.windowsize-1)/tower.daqfreq]';

where

tower.daqfreq = 20; tower.windowsize = 12000;

  • The actual time series of measurements only includes the samples in the file, which can occasionally be less than that. So there’s possibly a mismatch.
dt_clean = [0:1/tower.daqfreq:max(dt)]';

where dt is the time elapsed since the first timestamp in the data file:

dt = (timestamp-timestamp(1,:))*60*60*24;
  • Interim solution: plot your data as plot(x(1:length(y)),y,‘b-’). That should work while I figure out how to recode this. I may need to add some extra variables to the output.

SECOND EDIT
Still something not quite right here.

To try to fix this I’ve modified the variables slightly. I have written out the clean and rotated time stamp for each data sonic, so there’s no assumption of any of them being on a common time stamp. The timestamps are written out as Sonic_dt_clean_zm and Sonic_dt_rotated_zm.

I’ve uploaded some data with this format to [url]http://wind.nrel.gov/MetData/M4Twr/V1.25RC/04293_20_40_00_030.mat[/url].

Try using this code with the new data.

figure
plot(Sonic_dt_clean_50m.val,Sonic_x_clean_50m.val)
hold on
plot(60*60*24*(time_UTC.val-time_UTC.val(1)),Sonic_x_50.val,'r.')
plot(Sonic_dt_rotated_50m.val,Sonic_u_50m.val,'g.')

If this solution works, I’ll go ahead and reprocess the data.

I didn’t have any problems with this file. However, since this file was a full file to start with (12,000 records) I wonder if it might be a good idea to test the code on a few files that have some missing records? The following file-names would be a good test:

  1. 10122_00_00_00_020.mat
  2. 10122_01_10_00_020.mat
  3. 10122_20_50_00_020.mat

I did notice that a few of the variable “labels” for the file “04293_20_40_00_030.mat” may not be correct. Also, quite a few of the variable “units” for the file “04293_20_40_00_030.mat” are not correct. I have posted an Excel file that identifies the issues.
M4_20Hz_Label_or_Units_Problem.xlsx (15.2 KB)

That was a very helpful spreadsheet, thanks! The example files you chose let me isolate the problem, which was that I was using a slightly different time stamp for the sonics than I should have been (too many variables starting with ‘dt…’!). The impact of that should have been minimal, though. At worst the time stamp may have been a few samples off. It’s now correct.

I’ve uploaded a new test file at [url]http://wind.nrel.gov/MetData/M4Twr/V1_26RC2/10122_00_00_00_020.mat[/url]. That file is processed with a new code version (release candidate #2 for version 1.26). The changes include the following:

  • More uniform sonic anemometer variable names (you may need to update some scripts to use the new variable names)
  • Units for the sonic anemometers and for some other variables are now correct

Data processing has not been changed.

That is great news!

I was a little slow about responding last time, but I will run your new test file this evening and get you some feedback no later than first thing tomorrow morning.

I didn’t have any problems working with the new file. It seems like the record length issues are solved. I did notice one more possible “units” issue though for the DeltaT variables. Two are in “K” and one is in deg C.

DeltaT_134_88m =

   val: [11998x1 double]
 label: 'Delta T (134 m)'
 units: 'K'
height: 134

DeltaT_26_3m =

   val: [11998x1 double]
 label: 'Delta T (3 m)'
 units: '°C'
height: 3

DeltaT_88_26m =

   val: [11998x1 double]
 label: 'Delta T (26 m)'
 units: 'K'
height: 26

Having said that, I would be very interested to see how file: “10122_01_10_00_020.mat” works with the version 1.26 fixes. This was the file that started crashing my code and alerted me that something wasn’t quite right. I don’t think it is worth holding up the version 1.26 release to test this one file though. If version 1.26 has no record length issues with file “10122_01_10_00_020.mat”, then I think it would be safe to say that the record length issues are “fixed”.

Thanks again for the feedback. I’ve released 1.26 with the corrections to the units that you identified (see this post). I’ve also tidied up the variables for M5.