Report all data, even if it fails tests?

Andy.Clifton · May 20, 2013, 7:48pm

Should we calculate all data and provide flags, even if we know results are likely to be rubbish?

yes, but provide flags that show quality
no, where there is less than 95% data just give NaN or similar

0 voters

A recent thread ([url]Possible 95% Threshold Discrepancy]) raises an interesting question.

Even though I know the data might be wrong, should all calculations be done anyway?

Let me explain using an example: the sonic anemometers measure at 20 Hz. Sometimes those sonics can’t give acquire 12,000 valid signals per 10-minute interval. To mitigate this, I have a quality threshold that I can set that defines the minimum number of samples that are required for me to “rotate” the sonics to calculate things like fluxes (for example, w’T’). Currently, if the amount of data falls below 0.95, I don’t bother to calculate anything requiring rotation of the sonics as they are likely junk.

Assuming the code works, I would instead flag these data and leave it up to the user to apply their own filters to decide what is “good” or “bad” data.

Everett.Perry · May 27, 2013, 5:04pm

I am using the rotated data for my work. I am also using the 20Hz data extensively. For my purposes it is better to have data that does not meet the 95% rotation threshold listed as NaN or similar (which is how I voted in the poll).

However, I could scan through “quality” flags and determine the 95% threshold for myself. In other words, I can use the data either way with no loss of continuity. I just wanted to mention that in case it is a close poll.

Jennifer.Rinker · July 12, 2013, 5:38pm

I voted “no” simply because I’m worried that the change would be implemented without me noticing, and then I would be analyzing data that weren’t very good. As Everett mentioned, if I knew about the change I could simply check the quality flag and ditch the bad data myself. Another solution would be to place the “bad” data in a separate directory than the “good” data, but I’m not sure that would be easy to implement with the current data structures.

Andy.Clifton · August 15, 2013, 4:47pm

Jenni, now that you’ve been using the data for a while and (hopefully) understand better how it works and what is in there, do you still think that you might not notice new data, or not use the QC flags to check if the data are valid? I’ve held off making this change, but I’m thinking it might be helpful still.

Jennifer.Rinker · November 1, 2013, 3:48pm

Yep, I would be fine now with providing “bad” data but flagging it with a QC. My only request would be to keep some updated document of the meanings of the QC codes for each instrument, since they might be a bit fluid over time. Additionally, if it would be easy to put the QC codes in the 20 Hz structures in addition to the 10-minute structures, that would be a huge benefit to me.

Also, apologies for the huge delay in response. I assumed that if you replied to a thread you were auto-subscribed.

Andy.Clifton · November 18, 2013, 9:39pm

Believe it or not, the QC codes are constant over time. The basic QC code hasn’t changed for over a year, mainly because I figured that people would not want to deal with codes changing. The basic concept is set out in the unofficial guide (p. 15), and repeated here for interest. N.B. I’ve updated this file today to add the reason for code 1002.

– QC codes indicating that data are ‘flagged’ (possibly bad) are in the range 1000 to 4999. Reasons for flagging channels include:

1001 irregular timing. The period between measurements should be 0.05 seconds at a data acquisi- tion rate of 20 Hz. If more than 1% of data are more than 5% from the ideal period, this QC code is set.

1002 insufficient data in the wind speed time series

1003, 1004 If the number of points within the manufacturer’s limits or users’ limits is below a threshold set in the configuration file. These threshold values are the range rate (QC code 1003) and the accept rate (QC code 1004).

1006 if the standard deviation drops below 0.01% of the mean and so a channel is assumed to have a constant value during the measurement interval.

20nn if a channel is flagged because it is linked with another channel that has been flagged, where nn is the number of the channel that was flagged.
– QC codes indicating that channels or data have failed are greater than 5000. Reasons for marking chan- nels as failed include:

5001 if a channel is empty.

5002 if all data in a channel have known ‘bad’ values, e.g. -999. * 5003 if all data in a channel are not-a-number (NaN).
15

5004 if the boom speed exceeds 0.1 m/s at any time during the 10 minute interval.

5005 if the channel is affected by a known outage.

60nn if a channel fails because it is linked with another channel that has failed, where nn is the number of the channel that failed.

I will look into adding some of this to the 20-Hz data as well. Did you dig around in the structures to see what was already there?

Jennifer.Rinker · November 19, 2013, 5:11pm

I’ve looked at the raw and cleaned/rotated sonic structures and in the tower structure in the 20 Hz .mat files from M4. The sonic structures have four fields (val, label, units, height) and the tower structure has several fields that are related to the sonics, but neither of them have any “flag” field or anything that I could recognize as QC codes. It’s possible I’m looking in the wrong place, though.

Andy.Clifton · November 19, 2013, 5:21pm

It seems I didn’t add information on the QC state of each piece of 20-Hz data. I’ll look into it, but it will double the file size and will require that I rewrite almost all of the codes. Not a high priority, I’m afraid.

Topic		Replies	Views
20 Hz sonic data quality NWTC Wind Data	2	9373	November 19, 2013
Possible 95% Threshold Discrepancy NWTC Wind Data	2	10158	May 16, 2013
20 Hz sonic anemometer behavior NWTC Wind Data	6	12336	November 18, 2013
NaNs and length issues in wind records NWTC Wind Data	1	9128	July 12, 2013
Data Acquisition System Updates NWTC Wind Data	1	26040	October 15, 2013

Report all data, even if it fails tests?

Related topics