I have a few questions about the 20 Hz sonic anemometer data that I thought I’d bring up on the forums. Please feel free to direct me where to find the answers, if applicable, rather than answer them yourself.
- Some records still have “spikes” in the data after Andy’s denoising (e.g., 2013/02/11 at 30m), I assume because these spikes last for more than 1 sample and so are not caught by the denoising method laid out in the data guides. I’m working on a few methods for removing/interpolating these spikes, but how are other people dealing with this issue? The QC codes for that particular record are 1002, 1003, and 2023-2024, which I believe are all ignored.
- What does QC 1002 mean? I can’t find it in the unofficial guide or the official guide to the data. Additionally, what is the “range rate” (QC 1003) for the anemometers?
- I see that the raw sonic data generally has lots of spikes before denoising, oftentimes over a hundred in a given record. Is there any concern of this affecting the quality of the sonic data?
Lastly, I’ve been seeing some records that have been quantized. I don’t have any examples right now, nor have I checked to see if any QC codes will flag such an occurrence in the sonic data, but I wanted to make a note of it so people know it could be an issue.
All the best,
Jenni - good questions. I’ve tried to answer them one at a time below.
I think the events in the sonic data that you mean are these 10-20 second “ramps”, where the 3 components and the temperature simultaneously depart from the trend (see the events at ~200 and ~ 400 seconds in the image below). Can you confirm these are what you are referring to?
We’ve seen these before. When they occur we usually flag the data and swap out the sonic as soon as possible. I was thinking about how to detect these, and wondered if I could do something like cusum(difference) in 10-20 second windows; where the cusum is high, then we know something is going on.
We may also have found the hardware problem that is causing this - I’ll update later if this is the case.
From the unofficial guide, p. 20:
The unofficial guide PDF should be searchable, which might help you in cases like this in the future.
Generally these single-point spikes are only expected to impact high-frequencies in the data. They should not impact the mean values too much. You may want to try taking a complete 10minute, 20-Hz data stream and randomly adding / subtracting 100-200 points to see the effect on the statistics.
I’m not sure what you mean by this?
The spikes that I am concerned about range from 2 samples in width (0.1 seconds) to ~6 samples (0.3 seconds) and are presented in the Sonic_u data, so it’s already been cleaned and rotated. For example, the first plot in the below image is the sonic-u data from Feb. 11, 2013 at 10:10 and 30m; the spike is pretty obvious. I just re-downloaded this file from the MetData directory to make sure it wasn’t an old version. I’ve been experimenting with a few detection methods and I think I now have one that will detect these spikes successfully, but I wanted to note it in case others weren’t aware.
As for quantization, some records seem to have undergone some quantization in the data acquisition process, and because I’m looking at Fourier phase characteristics I have to be really sensitive about any quantization in the records. I defined a quantization “event” as 5 or more consecutive samples with essentially the same value, and flagged records that had more than 10 quantization events. Of the 27,000 records that I currently have on my desktop (up through 02/2013 and including some of 03/2013), about 1,000 have been flagged as quantized with this definition. The second subplot shows a record that has a bit of quantization; I couldn’t find a more demonstrative record very quickly but I have seen records that were much more pronounced.
EDIT: I would have sworn I searched for the 1002 QC code, but apparently it was only in my dreams. Sorry about that.
Thanks, Jenni. These are all fairly different questions, and I don’t know that they belong together. For the sake of my / your sanity, maybe you could split these questions out into four new threads (one for each question)? I’ll repost my answers.
I have been following this thread and was wondering if you split this thread up, could you please include if you are seeing the behavior on M4, M5, or both?
My observations are only from the M4 tower data; I’ve added notes in the other two threads about which data I am analyzing. Thanks for the note!