Maximizing the information inherent in museum and herbarium specimen collections

Guazuma tomentosa, an herbarium specimen
An herbarium specimen of Guazuma tomentosa

Problem #1: We typically infer the distribution of a species, but the evidence of distribution (herbarium or museum data) is collected in a non-standardized manner.

Problem #2: Museum/herbarium data often contain “false positives,” or specimens that are mistakenly mis-identified.

Problem #3: Lack of evidence of occurrence does not connote evidence of absence… false absences confound knowing the true distribution of a species.

Problem #4: A lot of herbarium/museum data is geolocated to large geopolitical units (e.g., state/provinces, etc.). Discarding this data discounts possible occurrence in these regions.

Solution: The enmSdmBayes software for the R Statistical Environment meets these needs by:

  • Allowing collector-level covariates to correct for non-systematic sampling
  • Estimates the probability occurrences at a site are actually misidentifications without the need for a “gold standard” data set
  • Estimates the probability of presence and absence
enmSdmBayes: easy-to-use, open-source, informative output

How do I install enmSdmBayes?

The latest version installation instructions and tutorials are found here.

What kind of data does enmSdmBayes use?

eSB can accommodate data shapefile form, where one field represents number of collections of a focal species, and another field represents total number of collections (e.g., of species in the same family). eSB can also accommodate CSV (spreadsheet) files with one row per record. Regardless of the input format, it is assumed that specimens are able to be placed within “higher” geographies (e.g., states/provinces) and “lower” geographies (e.g., counties) within the higher geographies.

What kind of outputs does esB produce?

esB can estimate and produce maps of:

  • The probability of occurrence of a species;
  • Uncertainty in this probability;
  • The probability of detection of the species assuming it is present;
  • Uncertainty in the probability of detection; and
  • The probability that a specimen in a sample unit (e.g., a county) is falsely identified as the focal species if the species does not actually exist there.
The range of Andropogon gerardii
The enmSdmBayes model juxtaposes records of a species (top left) with sampling intensity (bottom left) to estimate the range (main panel). The model can utilize badly georeferenced specimens (this species has only 90 accurately-referenced records but the model uses 5300 records).
Top left: Variable detectability.
Top right: Constant detectabilty.
Botttom left: Variable detection with conditional autoregression in occupancy .
Bottom right: Constant detection with conditional autoregression in occupancy.

Integration into biodiversity web portals using tropicosMassModeling

The underlying statistical engine behind enmSdmBayes can be implemented as a stand-alone system to continuously serve maps of species’ likely distributions and placed on a biodiversity web portal. We have implemented such a system for TROPICOS, the largest primary specimen database of plants in the world and the database foe the Missouri Botanical Garden. As a part of redevelopment of the TROPICOS web portal, for each species a “Conservation” tab is being added. On this tab will be maps automatically generated by enmSdmBayes will be served. Users will also be able to download shapefiles of the model output.

Instructions and code for installing this system on other biodiversity data are available.


This project was made possible in part by the Institute of Museum and Library Services National Leadership grant to ABS (FAIN MG-30-15-0094-15).