I recently started using the M4 20Hz MAT files and noticed an inconsistency with the record lengths for some of the Sonic variables. The NWTC website indicates that cleaned or rotated data will always be 12,000 records long (as per software version 1.21). This does not appear to be the case. The following details are from a single M4 20Hz MAT file VER 1.23. Notice that some of the cleaned or rotated variables contain 12,000 records while other cleaned or rotated variables do not.
Notice that many of the 30m variables are fine (contain 12,000 records). Also notice that the variable: “Sonic_temp_30” should probably be “Sonic_Temp_30” (capital T). I only mention this typo because 30m variables seem to be different in this file (may help with troubleshooting).
I know the formatting below is terrible! (couldn’t seem to make it much better, sorry)
Current File Name: 10122_01_10_00_020.mat
date: [2013 1 29 17 0 0]
Another example file from the M4 20Hz data:10122_16_20_00_020.mat.
This file seems to be OK at 131m and 30m (12,000 records), other heights contain 11,997 records
I did some more work here. I looked at all variable lengths for the files in the following directory:
The attached file is a tab delimited file that shows the variable lengths for these 144 files. Open with Excel or Notepad++ (no carriage returns so regular Notepad is ugly).
M4_20Hz_Variable_Length.txt (108 KB)
Thanks again for posting your comments to the forums, rather than sending me emails. This way everyone gets to learn about any issues with the data.
I took a look at the file you sent through. What appears to be happening is that for a given file, the raw data might only be 11,998 records long because we missed a sample at some point. Maintaining 20 Hz can be a real challenge sometimes. Then, when I go through the data processing routines I remap the sonic data to a continuous 20-Hz time series (note that where there are readings I don’t resample, I only shift things by a hundredth of a second). That means I have a true 20-Hz clean signal. I can do this because I know that the data system was actually triggered at 20 Hz, but sometimes we have hiccoughs in getting the time stamp.
If a file ends short, I extend it using the mean of the time series.
So my original data might looked like this:
elapsed time [s], value
And then it gets mapped to the nearest 0.05 second point and interpolated where there is no measurement:
0.1, 3.75 ← data remapped to 20-Hz time series
599.95, 8.9 ← or whatever the mean is
This only happens to the ‘clean’ sonic data in the 20-Hz files, so you only see this effect in some columns.
I think there could still be a problem with the timestamps for the M4 20Hz data. There is a mismatch between the number of records for some of the variables and the associated timestamp within the file.
In my first post, I showed a long column of data with variable names and the associated number of records for the variables. Near the bottom of the list you will see that “Sonic_z_clean_30m” has 12,000 values while “Sonic_z_clean_50m” only has 11,999 values. Although these are both “cleaned” variables they have a different number of values. Also notice that the “Sonic_cleaned_timestamp” has 11,999 values. This issue will cause indexing errors and variable length errors in Matlab. A similar situation occurs with the rotated data.
For example, the code you posted in your “software version” post will not work for all files:
plot(246060*(time_UTC.val-time_UTC.val(1)),Sonic_z_15.val,‘ko’) % raw data
plot(Sonic_cleaned_timestamp.val,Sonic_z_clean_15m.val,‘r+’) % cleaned data
plot(Sonic_rotated_timestamp.val,Sonic_w_15m.val,‘bx’) % rotated data
The 2nd plot statement will throw a “vector length error” if I use the file “10122_00_00_00_020.mat”. The “Sonic_cleaned_timestamp” has 11,998 values while the “Sonic_z_clean_15m” has 12,000 values. It seems that the “Sonic_cleaned_timestamp” was not remapped to a 20Hz signal.
To make a long story short, here is my understanding when the data is at least 95% complete:
- Cleaned variables should always have 12,000 values and should exactly match the number of values in the “Sonic_cleaned_timestamp” (currently not always true).
- Rotated variables should always have 12,000 values and should exactly match the number of values in the “Sonic_rotated_timestamp” (currently not always true).
- Raw variables should have the same number of values as the raw timestamp (time_UTC), (currently seems to be true).
I apologize if I am beating a dead horse here but after reading your timestamp QC algorithm, it does not seem like the situations I mentioned above should be possible when the number of samples is at least 95% of 12,000 samples?
I think you found one of the special conditions under which my code breaks down.
As far as I can tell, what is happening there is that the data for one (or more) of the sonics for the 12,000th sample in the time series are NaN. Because there’s no data, there’s no last record created. I think this happens because I consider all sonics independently. This might take some time to fix! I’ll post on here when I’ve figured out the solution.
- The elapsed time should always be 11,999 points, because it’s defined right at the start for all sonics together. It’s defined as
dt_full = [0:1/tower.daqfreq:(tower.windowsize-1)/tower.daqfreq]';
tower.daqfreq = 20;
tower.windowsize = 12000;
- The actual time series of measurements only includes the samples in the file, which can occasionally be less than that. So there’s possibly a mismatch.
dt_clean = [0:1/tower.daqfreq:max(dt)]';
where dt is the time elapsed since the first timestamp in the data file:
dt = (timestamp-timestamp(1,:))*60*60*24;
Interim solution: plot your data as plot(x(1:length(y)),y,‘b-’). That should work while I figure out how to recode this. I may need to add some extra variables to the output.
Still something not quite right here.
To try to fix this I’ve modified the variables slightly. I have written out the clean and rotated time stamp for each data sonic, so there’s no assumption of any of them being on a common time stamp. The timestamps are written out as Sonic_dt_clean_zm and Sonic_dt_rotated_zm.
I’ve uploaded some data with this format to http://wind.nrel.gov/MetData/M4Twr/V1.25RC/04293_20_40_00_030.mat.
Try using this code with the new data.
If this solution works, I’ll go ahead and reprocess the data.
I didn’t have any problems with this file. However, since this file was a full file to start with (12,000 records) I wonder if it might be a good idea to test the code on a few files that have some missing records? The following file-names would be a good test:
I did notice that a few of the variable “labels” for the file “04293_20_40_00_030.mat” may not be correct. Also, quite a few of the variable “units” for the file “04293_20_40_00_030.mat” are not correct. I have posted an Excel file that identifies the issues.
M4_20Hz_Label_or_Units_Problem.xlsx (15.2 KB)
That was a very helpful spreadsheet, thanks! The example files you chose let me isolate the problem, which was that I was using a slightly different time stamp for the sonics than I should have been (too many variables starting with ‘dt…’!). The impact of that should have been minimal, though. At worst the time stamp may have been a few samples off. It’s now correct.
I’ve uploaded a new test file at http://wind.nrel.gov/MetData/M4Twr/V1_26RC2/10122_00_00_00_020.mat. That file is processed with a new code version (release candidate #2 for version 1.26). The changes include the following:
- More uniform sonic anemometer variable names (you may need to update some scripts to use the new variable names)
- Units for the sonic anemometers and for some other variables are now correct
Data processing has not been changed.
That is great news!
I was a little slow about responding last time, but I will run your new test file this evening and get you some feedback no later than first thing tomorrow morning.
I didn’t have any problems working with the new file. It seems like the record length issues are solved. I did notice one more possible “units” issue though for the DeltaT variables. Two are in “K” and one is in deg C.
val: [11998x1 double]
label: 'Delta T (134 m)'
val: [11998x1 double]
label: 'Delta T (3 m)'
val: [11998x1 double]
label: 'Delta T (26 m)'
Having said that, I would be very interested to see how file: “10122_01_10_00_020.mat” works with the version 1.26 fixes. This was the file that started crashing my code and alerted me that something wasn’t quite right. I don’t think it is worth holding up the version 1.26 release to test this one file though. If version 1.26 has no record length issues with file “10122_01_10_00_020.mat”, then I think it would be safe to say that the record length issues are “fixed”.
Thanks again for the feedback. I’ve released 1.26 with the corrections to the units that you identified (see this post). I’ve also tidied up the variables for M5.