Importing NLDAS and GLDAS data into R



OK, I just spent the entire day obtaining and learning how to import the NLDAS and GLDAS data into R.  This could have been made a lot simpler with better meta-data descriptions and “readme” files placed in locations they need placed.  In any case, I’m posting this to save anyone else wanting to use this data some precious time.  In case you didn’t know (I didn’t until yesterday), the NASA Land Data Assimilation Systems (NLDAS) and Global Land Data Assimilation Systems (GLDAS) are measured/interpolated and/or modeled climate and land surface variables for essentially the conterminous US (NLDAS) at 0.125 deg resolution or the world (GLDAS) at 1 deg resolution.  There are a lot of variables of interest, including the basic set of min/max air temperature and precipitation, plus snowfall, soil temperature, LAI, albedo, incoming/net shortwave and longwave radiation, etc.  The data is available in sets representing calculations at 3-hr intervals or monthly intervals or averages for each month across the given time period.  Most of the temporal extents of these models cover 1979 to the present.

Both NLDAS and GLDAS have version 1 and 2, the latter being newer and more sophisticated.  Both have also been run with 3 land surface models: Mosaic, Noah, VIC–but wait, there’s a fourth, SAC, which is not described in the ReadMe file for NLDAS2.  There are also “FORA” and “FORB” data sets put alongside the three models with little explanation as to what they are.  They contains the forcing variables used by the land surface models.  Note that the forcing variables remain unchanged across the three land surface models, so if you want a variable in the forcing set you can just get it from FORA or FORB.

Obtaining the data

If you want the entire hourly dataset it will take a long time to download as each model set contains tens of thousands of files.  The monthly is only a few thousand; the monthly averages only 24 (one raster file plus one XML file per month)–these have the word “climatology” in their data set names.

There are many ways to get the data, including using wget which snatches files from a list of links but whose help is written for Unix/Linux, which is a girl I used to know.  You can also get subsets of the hourly data using the Simple Subset Wizard (search for “NLDAS” or “GLDAS”).  The SSW can also export the files in NetCDF format, which obviates the stuff below but I found the SSW did not always give me the full set of results.  Hourly/monthly data are available from NASA’s Mirador (same search) or GES DISC.  I used the latter then DownloadThemAll, a Firefox plugin.  This still took a lot of clicking, but not near as much as if I had done it manually.  (You’ve also got this badly documented FTP site.)

Extracting the data

So… the problem is that G/NLDAS files are stored in GRIB format, akin to a raster brick, but with no meta data on layer identity that is automatically imported into R or ArcMap when the raster is loaded.  The XML file that comes with each GRIB file has a list of variables, but they are not in the order they appear when imported into R. Likewise, when imported into R the metadata that should come with a GRIB file is not associated with the file contents, so you are left with a long series of rasters with many meaningless numbers.


In R:

grib <- readGDAL(‘<filename of GRIB file>’) # read GRIB file
grib <- brick(grib) # convert to raster brick
grib # notice brick has N layers

2. Download wgrib.

3. Open a command (DOS) window and navigate to the folder with wgrib.  In Windows you can get a DOS window by pressing the Windows key then typing “cmd”.

4. Issue “wgrib <filename with no spaces>”.  The output will show a table with variable names and attrbutes for each layer. You will need to copy the GRIB file into the same folder as wgrib or put it into a folder with no spaces in its name or any of its parent folders.  Probably a way around this…

5. Consult the metadata file “README.NLDAS2.pdf” from the G/NLDAS website and see Table 4a therein.  Find the “Short Name” of the variable you want.

# 6. Now look for that variable name in the DOS command window. The output from wgrib will show you the layer number of that variable.  Remember this number… call it “x”.

7. Back in R:

myLayer <- grib[[x]] # the layer you want

NB This seems to work for every variable except TSOIL (soil temperature) which for the file I experimented with has 3 such layers.  I am guessing these pertain to the three soil layers for the particular land surface model.  There was also a band named “var255” at the end, which had what seem like meaningful values of some variable.

Note that the layer you want may not be in the same place across land surface models–i.e., LAI may be layer x in one and y in another.

As my professor said to me once, if things don’t go well for you at least make it better for the next person.

Leave a Reply