bayesLopod: Species distribution modeling with “messy” data

Collectively, biodiversity databases represent over a billion specimens and sightings of species.  Unfortunately, quite often 60-90% or more of that data does not meet the standards necessary for biogeographic analysis: coordinates are missing or blatantly wrong, dates are missing, and some identifications can be questionable.  Typically this data is discarded before analysis, even though it represents hundreds, perhaps thousands of years of person-work in collection and curation.  More importantly, many of the discarded records are probably critical to understanding historical distributions and environmental tolerances of species because they represent the only known collections from a given location or on the edge of the range. Many of these erstwhile “unusable” records are historical and yet very valuable.  Indeed, using historical records indicative of pre-anthropogenic range contractions better estimates  species’ environmental tolerances.

We are excited to announce Version 1.01 of bayesLopod, a Bayesian modeling framework that can use vaguely-georeferenced specimen records and estimate the probability that a record that falls outside the body of the distribution was incorrectly identified. bayesLopod is an R package that relies on Stan, a Bayesian coding language (which you do not need to know!) that approximates posteriors very fast compared to BUGS or JAGS.  The input is either a points file, raster, or a shapefile, with detections and some background estimate of sampling effort.  The output is in the same data format and provides an estimate of the probability of occupancy and the probability that a sample unit (e.g., raster cell) contains an incorrectly-identified record.  We hope this tool can help conservation biogeographers better address pressing questions about Earth’s biodiversity.

Available on CRAN and GitHub.

The range of Andropogon gerardii

The bayesLopod model juxtaposes records of a species (top left) with sampling intensity (bottom left) to estimate the range (main panel). The model can utilize badly georeferenced specimens (this species has only 90 accurately-referenced records but the model uses 5300 records).



Leave a Reply