M4 & M5 20Hz File Corruption

Hello,

I have noticed that after downloading some of the 20Hz files from the NWTC server, Matlab will not open some of the files and gives a message that the files may be corrupt. The total number of corrupt files is well over 1200 files.

I have attached 3 files that show some of the 20Hz corrupted files. I can only attach 3 files (forum limit) but there are more.

If any that are using the 20Hz data could download one of the NWTC 20Hz files that I list as corrupt, and report here if you are able to open the file in Matlab, it would be very helpful to get this corruption issue figured out.

If you would rather not download the files that I have attached, I have listed the first 10 file names below for M4, July, 2013, 20Hz, below (M4_ has been pre-pended, otherwise the file name is the same as that on the server). Pick a file and download it from the NWTC server. Report back if Matlab will open the file.

M4_07013_01_30_00_026.mat is corrupt
M4_07013_01_40_00_026.mat is corrupt
M4_07013_01_50_00_026.mat is corrupt
M4_07013_02_00_00_026.mat is corrupt
M4_07013_02_10_00_026.mat is corrupt
M4_07013_02_20_00_026.mat is corrupt
M4_07013_02_30_00_026.mat is corrupt
M4_07013_02_40_00_026.mat is corrupt
M4_07013_02_50_00_026.mat is corrupt
M4_07013_03_20_00_026.mat is corrupt

Thanks,
Everett
M5_20Hz_Corrupt_Files_Jan_2013.txt (19.2 KB)
M4_20Hz_Corrupt_Files_June_2013.txt (891 Bytes)
M4_20Hz_Corrupt_Files_Jul_2013.txt (27.5 KB)

This is now (probably) fixed.

Background
We first process files internally and save them on our own file server. Those processed files are copied to the web server every so often, where you can download them. Much of this is done using compiled matlab scripts.

Issue
In some situations, two matlab jobs can access the same file at the same time. This causes corruption if one job tries to save while the other is using the file.

Solution
I found that some files on our sever were corrupt. Those have all been deleted, and the offending files on the network have been updated. There may be some older files left still that I didn’t find: if you see any files with this problem, please report them here.

Hi Andy,

Good News: The Jan M5 20Hz (2013) data files are 100% free from corruption (downloaded 2 days ago).

However, the M4 July (2013) 20Hz files have the exact same file corruption issues (complete download 2 days ago). There are 738 corrupt files for this month. My program uses a Try/Catch block to identify whether a file will open in Matlab or not. I did notice a single Matlab error (about midway through the M4 Jul file that I attached) that my Try/Catch block may not actually be catching. Thought I would post it in case it is helpful.
The error: “unexpected end-of-file while reading compressed data”

There are still a few “minor offenders” (less than 25 corrupt files for an entire month). This is inconsequential to me. However, I will continue to monitor this issue and will post the the minor offenders once I have additional confirmation of file corruption.

Everett

All of the 2013 files for M4 and M5 on the web have now been updated, and there shouldn’t be any old / corrupt files on the web… It seems that the files you found as being corrupt were not properly processed. I am trying to track down the problem with the processing, and will post that data when I have it.

The last reprocessing left some gaps in the online data. I’m in the process of updating the data that are available online.

Hi Andy,

I seem to have misplaced a post from you. Perhaps it was edited or deleted and is no longer important. However, I believe you asked me if the M4 corrupt files from July were from July 1st and 2nd (2013 20Hz) . There are corrupt files for these dates but nearly every day of July has corrupt files.

Sorry for the slow response and the bother if this is no longer relevant.

It sounds like this problem is about to be a thing of the past but I will keep you posted if I see anything.

Everett

I’m pretty sure everything is OK now. I really hope so, anyway…

Files for 2013 with modification dates of November 23 or November 24 should all be fine. I will also update the 2012 files over the next week or so.

Andy,

Sounds great!

I will now verify the entire 2013 M4 & M5 20Hz data sets (even the data I didn’t request). This may include downloading about 1/2TB so it may take a few days to download. I will report all findings here.

When you finish the 2012 data, let me know and I will download the entire 2012 M4 & M5 20Hz data sets (I can use this in my research) and will perform the same testing and report back here.

I have to say that you are a relentless problem solver and I thank you for all of your help on this!

Everett

Andy,

The 20Hz M4 & M5 data for 2013 is clean of all file corruption. It looks like the file corruption issue is fixed.

Everett