Hello everyone (again),
I have a question about the data quality for the 20 Hz sonic anemometers from the M4 tower. It seems that there are typically hundreds of “spikes” that occur in a given 10-minute record (see subplot 1 below), which are mostly removed in the cleaning process (see subplot 2). My question is, does the extremely large number of spikes impact the other data points in the record? One rule of thumb is to hard-flag records where the number of spikes is greater than 1% of the number of samples in the record [1], but if I did that I would be flagging almost every record in the set. Do I need to worry about the quality of the 20 Hz cleaned and rotated sonic data? Or is this behavior normal?
Thanks for your advice,
[1] Vickers, Dean, and L. Mahrt. “Quality control and flux sampling problems for tower and aircraft data.” Journal of Atmospheric and Oceanic Technology 14.3 (1997): 512-526.
The level of acceptable missing data is very, very application specific. I can cope with 5% missing data for most of my applications (I think), but you may need to implement a flag depending on your needs. For those reasons I am very glad when people take the time to understand the data and report back issues (as you have - thank you!). We’re always trying to improve the quality of data.
There may two things going on here; one is related to the data quality, and the other is an instrument issue:
- When we lose the communications with a sonic, we put a -97 in the data stream that we obtain from the device. That -97 is very easy to detect and remove. You can see that in the first plot. Those are not spikes per se, but issues with the data stream.
- Sometimes, just sometimes, a rain drop, fly or dust hits the sonic transducers and causes chaos. That resolves as a very small spike, which is still present in the second plot.
So, there are two questions:
- How to interpolate between data points when we lose data, and
- How to detect true spikes.
And two solutions:
- The missing time series data are currently replaced with a linear interpolation, if less than 8% of data are missing. If less than 5% of data are missing, the sonic data will be rotated into the prevailing wind.
- From what I recall, my data processing code detects spikes by looking for very large changes in one direction, followed immediately by very large changes in the other direction. I think the magnitude is set by looking for the 99th percentile change - i.e. small changes are rejected. It looks like the code needs to expand that limit a bit.
You may want to look at a more recent data file. We were aware of this problem and have been trying to fix it. I think we traced the reason for these communications dropouts recently, and fixed it when we upgraded the DAS on M5. So, I would be very interested to know if you see this on data from M5 since 11/15/2013. We are aiming to resolve it with M4 soon.
Excellent, this addresses my concerns. Since most of the “spikes” in the raw data are from loss of communication, not from dust-storms or malfunction of the sonic, then I think I can use the interpolated records without issue. Thanks!