Report all data, even if it fails tests?

Should we calculate all data and provide flags, even if we know results are likely to be rubbish?

  • yes, but provide flags that show quality
  • no, where there is less than 95% data just give NaN or similar

0 voters

A recent thread ([url]Possible 95% Threshold Discrepancy]) raises an interesting question.

Even though I know the data might be wrong, should all calculations be done anyway?

Let me explain using an example: the sonic anemometers measure at 20 Hz. Sometimes those sonics can’t give acquire 12,000 valid signals per 10-minute interval. To mitigate this, I have a quality threshold that I can set that defines the minimum number of samples that are required for me to “rotate” the sonics to calculate things like fluxes (for example, w’T’). Currently, if the amount of data falls below 0.95, I don’t bother to calculate anything requiring rotation of the sonics as they are likely junk.

Assuming the code works, I would instead flag these data and leave it up to the user to apply their own filters to decide what is “good” or “bad” data.

I am using the rotated data for my work. I am also using the 20Hz data extensively. For my purposes it is better to have data that does not meet the 95% rotation threshold listed as NaN or similar (which is how I voted in the poll).

However, I could scan through “quality” flags and determine the 95% threshold for myself. In other words, I can use the data either way with no loss of continuity. I just wanted to mention that in case it is a close poll.

I voted “no” simply because I’m worried that the change would be implemented without me noticing, and then I would be analyzing data that weren’t very good. As Everett mentioned, if I knew about the change I could simply check the quality flag and ditch the bad data myself. Another solution would be to place the “bad” data in a separate directory than the “good” data, but I’m not sure that would be easy to implement with the current data structures.

Jenni, now that you’ve been using the data for a while and (hopefully) understand better how it works and what is in there, do you still think that you might not notice new data, or not use the QC flags to check if the data are valid? I’ve held off making this change, but I’m thinking it might be helpful still.

Yep, I would be fine now with providing “bad” data but flagging it with a QC. My only request would be to keep some updated document of the meanings of the QC codes for each instrument, since they might be a bit fluid over time. Additionally, if it would be easy to put the QC codes in the 20 Hz structures in addition to the 10-minute structures, that would be a huge benefit to me.

Also, apologies for the huge delay in response. I assumed that if you replied to a thread you were auto-subscribed. :stuck_out_tongue:

Believe it or not, the QC codes are constant over time. The basic QC code hasn’t changed for over a year, mainly because I figured that people would not want to deal with codes changing. The basic concept is set out in the unofficial guide (p. 15), and repeated here for interest. N.B. I’ve updated this file today to add the reason for code 1002.

I will look into adding some of this to the 20-Hz data as well. Did you dig around in the structures to see what was already there?

I’ve looked at the raw and cleaned/rotated sonic structures and in the tower structure in the 20 Hz .mat files from M4. The sonic structures have four fields (val, label, units, height) and the tower structure has several fields that are related to the sonics, but neither of them have any “flag” field or anything that I could recognize as QC codes. It’s possible I’m looking in the wrong place, though.

It seems I didn’t add information on the QC state of each piece of 20-Hz data. I’ll look into it, but it will double the file size and will require that I rewrite almost all of the codes. Not a high priority, I’m afraid.