How are you accessing/updating your wind data?

Jennifer.Rinker · July 12, 2013, 5:45pm

Hello,

My current method of using the M4 met tower data is to go the url provided by Andy Clifton, then download the .mats manually to my hard drive. However, this is time-consuming, and if the data are re-analyzed at NREL and updated online, the data on my hard-drive is bad and I have to re-download everything again. It is not the ideal method of doing this, but I can’t figure out how to tell MATLAB to take the .mats directly from the url. fread, urlwrite, and urlread have failed me.

How are other people accessing the data? And how do I know if the data I have on my hard drive are up-to-date? Is there a log so that I can know when the data on the website have been updated and re-uploaded?

Thanks!

Jenni

Andy.Clifton · July 12, 2013, 7:07pm

The problem is that we’re using the http directory listing, which I now realise will cause trouble for wget, curl and co. What you need to do is get the text of the directory listing and then convert it into file names. You can then download each of the file names. I know this is tedious, but you should be able to figure it out.

Here’s an example of how you can get the file names and download them using R.

[code]# script to demonstrate downloading NWTC met tower data using R

script to demonstrate downloading NWTC met tower data using R

START OF INPUTS

get the location of the files online (ony need the root directory)

NREL.URL.Base ← “Index of /MetData/135mData/M5Twr/20Hz/mat”

and define the location we will write to locally (again, only the root)

Local.URL.Base ← “~/Documents/temp”

define the

my.years = c(“2011”,“2012”)
my.months = c(“01”,“02”,“03”,“04”,“05”,“06”,“07”,“08”,“09”,“10”,“11”,“12”)
my.days = c(“01”,“02”)

END OF INPUTS

load packages

require(RCurl)

define the connection we will use to the NREL database

NREL.con = getCurlHandle(ftp.use.epsv = FALSE,
maxconnects=1,
fresh.connect=0,
timeout = 60,
useragent = “R”)

loop through the times and dates we defined

for (year in my.years){
for (month in my.months){
for (day in my.days){
date.path ← paste(year,“/”,
month,“/”,
day,“/”,
sep = “”)
# make the URL we want to check
source.file.path ← paste(NREL.URL.Base,“/”,
date.path,
sep = “”)
if (url.exists(url = source.file.path)){
# get a file listing
source.listing ← unlist(strsplit(getURL(source.file.path,
curl = NREL.con,
verbose=FALSE),
“\n”))

    # scrape that listing into a list of files
    matches <- regexpr(pattern = "#?[0-9\\_]{18}\\.mat(?=<)", 
                       text = source.listing,
                       perl = TRUE)
    mat.files <- NULL
    for (row in 1:NROW(matches)){
      if(matches[row] >0 ){
        mat.files <- c(mat.files, substr(source.listing[row],
                                         matches[row],
                                         matches[row]+attr(matches,"match.length")[row]-1))
        
        attr(matches,"match.length")[row]
      } else {
        # no match
      }
    }
    # make a directory to dump the files into
    dest.file.path = paste(Local.URL.Base,"/",
                           date.path,
                           sep = "") 
    dir.create(dest.file.path,
               recursive = TRUE)
    for (row in 1:NROW(mat.files)){
      download.file(url = paste(NREL.URL.Base,"/",
                                date.path,
                                mat.files[row],
                                sep = ""),
                    destfile = paste(Local.URL.Base,"/",
                                     date.path,
                                     mat.files[row],     
                                     sep = ""))
    }
  } # end of url.exists loop
} # end of the day loop

} # end of the month loop
} # end of the day loop
[/code]

Everett.Perry · July 12, 2013, 7:36pm

Hi Jenni,

I have some code that may help. It is a bit clunky but works pretty well (actually it is very clunky). I could not figure out how to get a directory for the URL. However, I noticed that in general the file names are very similar and predictable (except for the last few characters sometimes). This program will download the “predictable” names and then you will only have to download the few files that this program doesn’t catch. I built a counter into the program to let you know if an expected file is not detected.

Currently, the code is set to download all files for Jan 24th 2013 for the M5 20Hz data.

The best way to use it is to download a single day at a time (should be 144 10 minute files). At least with this program, you won’t have to “click” on every file on the URL.

As far as the upgraded versions go, I have had to download the data more than once to assure I have the latest version. I just keep tabs on the forums to see when updates come out (or to see if an update even affects me).

Hope this helps. It looks like some of the long comments in the code have been wrapped (might look strange when pasted into Matlab).
Everett

PS: Looks like Andy has already posted something that is much more clever!

[code]%NWTC_20Hz_mat_file_saver.m
%This program will save files from the NREL HTTP site offered by Dr. Andrew
%Clifton. The program will take a little insight to run, but should be
%pretty straight forward. User will have to modify the three variables listed
%below (and a few other locations marked in the program) to start at the
%right file on the server. I couldn’t find a way to get a directory listing
%for the server but since most of the files end with a “_020.mat” or
%something similar. This program will get those files. Then I only have to
%get the few files manually that don’t end with the “_020.mat”. Last update
%by Everett Perry: 07/12/2013 (comments only)
%
%###################### Set these three variables #################
startHour = 0;
startMin = 0; %Do not set to 50, will cause problems for minCntr loop. (0, 10, 20, 30, 40 only) Should just fix this.
the_year = ‘3’; %this will be either ‘2’ or ‘3’ (i.e 2012 or 2013)
%#################################################################

sec = ‘00’;
noFileCntr = 0;
for day_cntr = 24:24 %Change this to run consecutive days (currently set at 24 to run day 24 only)
for hourCntr = startHour:23 %i.e 0 to 23 hours
for minCntr = startMin:10:50 %i.e
if hourCntr<10
hourStr = [‘0’ num2str(hourCntr)];
elseif hourCntr>= 10
hourStr = num2str(hourCntr);
else
disp(‘An error has occurred in the Hour of the filename!’);
disp(‘Program Terminated’);
clear
return %Clear variables and KILL program
end
if minCntr ==0
minStr = [‘0’ num2str(minCntr)];
elseif minCntr ~=0
minStr = num2str(minCntr);
else
disp(‘An error has occurred in the Minute of the filename!’);
disp(‘Program Terminated’);
clear
return %Clear variables and KILL program
end
hour_min= [hourStr ‘_’ minStr];

		%Some old notes here (outdated)

% urlFileName = [‘01183_’ hour_min ‘_’ sec ‘_016.mat’];
% fullURL = [‘Index of /MetData/135mData/M4Twr/20Hz/mat/2013/01/18’ urlFileName];
% %Index of /MetData/135mData/M4Twr/20Hz/mat/2013/01/20
% %Index of /MetData/135mData/M4Twr/20Hz/mat/2013/01/17
% Index of /MetData/135mData/M5Twr/20Hz/mat/2013/01
% %filename = ['K:\X Research\NWTC Research\M4_Tower\M4_20Hz_Updated\mat\2013\01\20' urlFileName];
% filename = ['K:\X Research\NWTC Research\M4_Tower\M4_Ver1.27\Jan_2013\01-18-2013' urlFileName];

		%###################################################################
		%###################################################################
		
		urlFileName = ['01' num2str(day_cntr) the_year '_' hour_min '_' sec '_003.mat']; %would have to change the '01' and the '_003.mat' (month and extension)
		fullURL = ['http://wind.nrel.gov/MetData/135mData/M5Twr/20Hz/mat/2013/01/' num2str(day_cntr) '/' urlFileName]; %would have to change this M4 or M5,2012 or 2013, 01 (month) etc
		filename = ['K:\X Research\NWTC Research\M5_Tower\M5_Ver1.27\Jan_2013\01-' num2str(day_cntr) '-2013\' urlFileName]; %Your directory here
		
		%###################################################################
		%###################################################################

		[F,STATUS] = urlwrite(fullURL,filename);
		pause(2) %I included a small pause here so I don't swamp the server

		if STATUS==0
			disp(' '); %Create a blank line
			disp('An error has occurred!!!');
			disp('It does not look like the filename exists!!!');
			disp('Verify the following filename');
			disp(' '); %Create a blank line
			disp(urlFileName);
			disp(' '); %Create a blank line
			noFileCntr = noFileCntr+1;
			disp('Program Terminated');
			clear
			return %Clear variables and KILL program
		end %End if
	end %End minCntr
end %End hourCntr

end %day_cntr

disp(’ '); %Blank Line
disp(‘noFileCntr’);
disp(noFileCntr);
disp(‘Done’);
clear[/code]

Jennifer.Rinker · August 13, 2013, 4:28pm

Thanks for the responses! I ended up using Andy’s code because it can be used to download the data from all of the days with minimal fuss, even including my inexperience with R. In case someone else is interested in implementing Andy’s script, here’s a brief idiot’s guide for getting the script to work on Windows 7:

Download/install the latest version of R. Add the path to Rscript.exe to your system PATH variable, restart your computer (may not be necessary, but just in case).
Open the R GUI from your start menu, go to Packages → Install Packages. Pick a mirror site (not sure the choice matters, I used USA (CA1)), then choose the RCurl package. If prompted, pick to use a personal library. Once the package is installed, close the GUI.
In your command line, change your directory to wherever you have saved the script. Enter the command “Rscript .r”. Things should start downloading to your specified folder.

This may not be the most correct/elegant method, but it worked on my computer so I thought I’d post it just in case.

Jenni

UPDATE: I didn’t check the .mat files before posting yesterday, but because Windows is sensitive to binary/text issues, I needed to modify Andy’s script slightly so that the files would open properly in MATLAB. Specifically, in the “download.file” command, I had to specify that the download mode was binary by adding in the flag 'mode = “wb” '. See the code except below.

          download.file(url = paste(NREL.URL.Base,"/",
                                    date.path,
                                    mat.files[row],
                                    sep = ""),
                        destfile = paste(Local.URL.Base,"/",
                                         date.path,
                                         mat.files[row],     
                                         sep = ""),
                        mode = "wb")

Jennifer.Rinker · March 3, 2015, 12:01am

Hey all,

I’ve extended the work here in a Python script that will download new data or update old local data if the online modification date is newer than the local creation date. If you’re interested, details are here: Python Script to Update Local Copies of M4 Data.

Cheers,
Jenni

Topic		Replies	Views
Where to find data NWTC Wind Data	0	10309	April 21, 2014
Python Script to Update Local Copies of M4 Data NWTC Wind Data	0	9467	March 2, 2015
Who's doing what with the met tower data? NWTC Wind Data	3	11525	July 22, 2013
Data Version 1.30 NWTC Wind Data	0	9604	August 6, 2013
Data Version 1.40 NWTC Wind Data	0	9108	January 20, 2015