Archive for the ‘Nichey’ Category

Welcome to our new postdoc, Stephen Murphy!

Wednesday, December 12th, 2018

Stephen MurphyWe are very excited to host Stephen, who is interested in combining aspects of biogeography and macroecology with his ongoing work in community ecology. Stephen graduated from Ohio State University, and is now supported by a grant from the Institute for Museum and Library Services for developing and testing methods for using vaguely georeferenced and dated herbarium and natural history museum records.  Welcome, Stephen!

Welcome to our new postdoc, Kelley Erickson!

Monday, September 17th, 2018

Kelley EricksonWe are excited to serve as the new intellectual home of Kelley Erickson, a recent graduate of the University of Miami where she studied the demography of the highly invasive shrub Schinus terebinthifolia (Brazilian peppertree). Kelley is working on incorporating issues related to detectability in species distribution models in a project sponsored by the Institute for Museum and Library Services.  Welcome, Kelley!

NSF Advances in Biological Informatics

Friday, March 2nd, 2018

Awesome news! We were just informed that the National Science Foundation will fund our proposal to use pollen, genetic, and distributional data to estimate the spatial dynamics of how trees migrated poleward after the last glacial maximum.  This is a collaborative project with Sean Hoban (Morton Arboretum), Andria Dawson (Mount Royal University), John Robinson (Michigan State University), and Allan Strand (College of Charleston). We will be hiring two postdocs over the 3 years of the grant. The first position will be based at The Morton Arboretum near Chicago, Illinois and the second at Michigan State University.


bayesLopod: Species distribution modeling with “messy” data

Thursday, March 1st, 2018

Collectively, biodiversity databases represent over a billion specimens and sightings of species.  Unfortunately, quite often 60-90% or more of that data does not meet the standards necessary for biogeographic analysis: coordinates are missing or blatantly wrong, dates are missing, and some identifications can be questionable.  Typically this data is discarded before analysis, even though it represents hundreds, perhaps thousands of years of person-work in collection and curation.  More importantly, many of the discarded records are probably critical to understanding historical distributions and environmental tolerances of species because they represent the only known collections from a given location or on the edge of the range. Many of these erstwhile “unusable” records are historical and yet very valuable.  Indeed, using historical records indicative of pre-anthropogenic range contractions better estimates  species’ environmental tolerances.

We are excited to announce Version 1.01 of bayesLopod, a Bayesian modeling framework that can use vaguely-georeferenced specimen records and estimate the probability that a record that falls outside the body of the distribution was incorrectly identified. bayesLopod is an R package that relies on Stan, a Bayesian coding language (which you do not need to know!) that approximates posteriors very fast compared to BUGS or JAGS.  The input is either a points file, raster, or a shapefile, with detections and some background estimate of sampling effort.  The output is in the same data format and provides an estimate of the probability of occupancy and the probability that a sample unit (e.g., raster cell) contains an incorrectly-identified record.  We hope this tool can help conservation biogeographers better address pressing questions about Earth’s biodiversity.

Available on CRAN and GitHub.

The range of Andropogon gerardii

The bayesLopod model juxtaposes records of a species (top left) with sampling intensity (bottom left) to estimate the range (main panel). The model can utilize badly georeferenced specimens (this species has only 90 accurately-referenced records but the model uses 5300 records).



Upscaling biodiversity

Tuesday, January 23rd, 2018

Our long-awaited paper on predicting country-scale biodiversity from small plots is out! Of 19 “upscaling” techniques, the most successful method was able to predict total plant richness in the United Kingdom with <10% error, though few techniques were able to recreate the shape of the actual species-area relationship.

Kunin, W.E., Harte, J., He, Fangliang, Hui, C., Jobe, R.T., Ostling, A., Polce, C., Šizling, A., Smith, A.B., Smith, Krister, Smart, S.M., Storch, D., Tjørve, E., Ugland, K-I., Ulrich, W., and Varma, V.  Accepted.  Up-scaling biodiversity: Estimating the species-area relationship from small samples.  Ecological Monographs. doi: 10.1002/ecm.1284


Upscaling Biodiversity


Phenotypic distribution modeling

Tuesday, November 14th, 2017

Our latest paper in Global Change Biology on modeling intraspecific phenotypic variation has gotten great press!  Combined, the news outlets covering our research reach ~78 million people and included The San Francisco Chronicle, The Seattle Times, US News and World Report, The Topeka Capital Journal, The Manhattan Mercury, and numerous other regional newspapers, radio stations (e.g., KWMU 90.7), TV stations (e.g., KWCH12), and science news websites (e.g., Science News Online)!

Smith, A.B., Alsdurf, J., Knapp, M. and Johnson, L.C.  2017.  Phenotypic distribution models corroborate species distribution models: A shift in the role and prevalence of a dominant prairie grass in response to climate change.  Global Change Biology 23:4365-4375. doi: 10.1111/gcb.13666

Change in biomass of Andropogon gerardii due to climate change

Change in biomass of Andropogon gerardii due to climate change

Climate paths and climate change communication

Tuesday, February 28th, 2017
Climate path of St. Louis

Climate path of St. Louis, Missouri, USA

How can we communicate global warming to local audiences (= everybody who lives in a place)? Recently I made a poster showing the locations that climatically currently resemble the future climate of St. Louis.

But how did I know where to locate the “future” St. Louis climatically?  By running species distribution models in “reverse”.  First, I created a set of 100 points to represent St. Louis (I actually had them have the exact same coordinates–it’s false sample size inflation, but it doesn’t matter much since at first approximation St. Louis is a point–and it does allow me to use more complex fitting features in Maxent).

Second, I associated these points with the climate layers I have for the 2070s (once for each emissions scenario).

I then trained a Maxent model using this future climate data, then projected it back to the present.

Finally, I calculated the geographic center of gravity of all cells using the predicted suitability as weights.  The center gravity is the average location of the “future” climate of St. Louis!  I found I got slightly better (= intuitive) results by thresholding first, then using suitability values above the threshold as weights. I also found I got better results when using only mean annual temperature and precipitation, rather than all 19 WORLCLIM variables.

This procedure is fairly simple and takes advantage of the fact that 1) “species” distribution models are not just for species, and 2) the output of a SDM (or whatever you want to call them) is really just an index of similarity of a multivariate space (= climate layers) at a set of points (= presences) and another set of points (= all grid cells in the layer to which you’re projecting).

I’ll be trying the poster out at the Missouri Botanical Garden’s upcoming Science Open House–hopefully it will spark some conversation!

Which is worse for biodiversity, a dollar of beef or gasoline?

Thursday, November 10th, 2016

ConsumerismWhich produces more climate change, consumption of gasoline or beef? We have a good idea about the answer to this question.  But now ask, which displaces more biodiversity?  We have no idea–until now.  Just today our article on biodiversity impacts of economic consumption was released in Conservation Letters.  Spearheaded by Justin Kitzes and chaperoned by John Harte, this analysis considers all the direct and indirect impacts of consumption across the entire world economic system.  For example, agriculture directly displaces biodiversity, but so does the insurance industry by its need for paper, transportation, energy, and so on.  Personally, I am surprised by the impact of eating rice versus, say, buying paper–the Earth would be much better if we could digest the latter!

Kitzes, J., Berlow, E., Conlisk, E., Erb, K., Iha, K., Martinez, N., Newman, E.A., Plutzar, C., Smith, A.B., and Harte, J.  In press.  Consumption-based conservation targeting: Linking biodiversity loss to upstream demand through a global wildlife footprint.  Conservation Letters.

A Perfect Storm of Threats

Wednesday, November 2nd, 2016
Number of rare plant species threatened by recreation

Number of rare plant species threatened by recreation

Just out: a new analysis by Haydee Hernández-Yáñez and 7 other students at the University of Missouri-Saint Louis and myself on the threats that affect all known rare plants of the US! This is a reprise of the analysis by David Wilcove and colleagues from 1998.  We already got coverage on NPR and Inside Science!

Not appearing in the analysis is the spatial aspect (see image to the right) which we decided to drop near the end because of the article was getting too long.  Still, I’m hoping this will become something else on its own!

Hernández-Yáñez, H., Kos, J.T., Bast, M.D., Griggs, J.L., Hage, P.A., Killian, A., Whitmore, M.B., Loza, M. L., Smith, A.B.  2016.  A systematic assessment of threats affecting the rare plants of the United States.  Biological Conservation 203:260-267.

Importing NLDAS and GLDAS data into R

Monday, August 29th, 2016


OK, I just spent the entire day obtaining and learning how to import the NLDAS and GLDAS data into R.  This could have been made a lot simpler with better meta-data descriptions and “readme” files placed in locations they need placed.  In any case, I’m posting this to save anyone else wanting to use this data some precious time.  In case you didn’t know (I didn’t until yesterday), the NASA Land Data Assimilation Systems (NLDAS) and Global Land Data Assimilation Systems (GLDAS) are measured/interpolated and/or modeled climate and land surface variables for essentially the conterminous US (NLDAS) at 0.125 deg resolution or the world (GLDAS) at 1 deg resolution.  There are a lot of variables of interest, including the basic set of min/max air temperature and precipitation, plus snowfall, soil temperature, LAI, albedo, incoming/net shortwave and longwave radiation, etc.  The data is available in sets representing calculations at 3-hr intervals or monthly intervals or averages for each month across the given time period.  Most of the temporal extents of these models cover 1979 to the present.

Both NLDAS and GLDAS have version 1 and 2, the latter being newer and more sophisticated.  Both have also been run with 3 land surface models: Mosaic, Noah, VIC–but wait, there’s a fourth, SAC, which is not described in the ReadMe file for NLDAS2.  There are also “FORA” and “FORB” data sets put alongside the three models with little explanation as to what they are.  They contains the forcing variables used by the land surface models.  Note that the forcing variables remain unchanged across the three land surface models, so if you want a variable in the forcing set you can just get it from FORA or FORB.

Obtaining the data

If you want the entire hourly dataset it will take a long time to download as each model set contains tens of thousands of files.  The monthly is only a few thousand; the monthly averages only 24 (one raster file plus one XML file per month)–these have the word “climatology” in their data set names.

There are many ways to get the data, including using wget which snatches files from a list of links but whose help is written for Unix/Linux, which is a girl I used to know.  You can also get subsets of the hourly data using the Simple Subset Wizard (search for “NLDAS” or “GLDAS”).  The SSW can also export the files in NetCDF format, which obviates the stuff below but I found the SSW did not always give me the full set of results.  Hourly/monthly data are available from NASA’s Mirador (same search) or GES DISC.  I used the latter then DownloadThemAll, a Firefox plugin.  This still took a lot of clicking, but not near as much as if I had done it manually.  (You’ve also got this badly documented FTP site.)

Extracting the data

So… the problem is that G/NLDAS files are stored in GRIB format, akin to a raster brick, but with no meta data on layer identity that is automatically imported into R or ArcMap when the raster is loaded.  The XML file that comes with each GRIB file has a list of variables, but they are not in the order they appear when imported into R. Likewise, when imported into R the metadata that should come with a GRIB file is not associated with the file contents, so you are left with a long series of rasters with many meaningless numbers.


In R:

grib <- readGDAL(‘<filename of GRIB file>’) # read GRIB file
grib <- brick(grib) # convert to raster brick
grib # notice brick has N layers

2. Download wgrib.

3. Open a command (DOS) window and navigate to the folder with wgrib.  In Windows you can get a DOS window by pressing the Windows key then typing “cmd”.

4. Issue “wgrib <filename with no spaces>”.  The output will show a table with variable names and attrbutes for each layer. You will need to copy the GRIB file into the same folder as wgrib or put it into a folder with no spaces in its name or any of its parent folders.  Probably a way around this…

5. Consult the metadata file “README.NLDAS2.pdf” from the G/NLDAS website and see Table 4a therein.  Find the “Short Name” of the variable you want.

# 6. Now look for that variable name in the DOS command window. The output from wgrib will show you the layer number of that variable.  Remember this number… call it “x”.

7. Back in R:

myLayer <- grib[[x]] # the layer you want

NB This seems to work for every variable except TSOIL (soil temperature) which for the file I experimented with has 3 such layers.  I am guessing these pertain to the three soil layers for the particular land surface model.  There was also a band named “var255” at the end, which had what seem like meaningful values of some variable.

Note that the layer you want may not be in the same place across land surface models–i.e., LAI may be layer x in one and y in another.

As my professor said to me once, if things don’t go well for you at least make it better for the next person.