Collectively, biodiversity databases represent over a billion specimens and sightings of species. Unfortunately, quite often 60-90% or more of that data does not meet the standards necessary for biogeographic analysis: coordinates are missing or blatantly wrong, dates are missing, and some identifications can be questionable. Typically this data is discarded before analysis, even though it represents hundreds, perhaps thousands of years of person-work in collection and curation. More importantly, many of the discarded records are probably critical to understanding historical distributions and environmental tolerances of species because they represent the only known collections from a given location or on the edge of the range. Many of these erstwhile “unusable” records are historical and yet very valuable. Indeed, using historical records indicative of pre-anthropogenic range contractions better estimates species’ environmental tolerances.

We are excited to announce Version 1.01 of bayesLopod, a Bayesian modeling framework that can use vaguely-georeferenced specimen records and estimate the probability that a record that falls outside the body of the distribution was incorrectly identified. bayesLopod is an R package that relies on Stan, a Bayesian coding language (which you do not need to know!) that approximates posteriors very fast compared to BUGS or JAGS. The input is either a points file, raster, or a shapefile, with detections and some background estimate of sampling effort. The output is in the same data format and provides an estimate of the probability of occupancy and the probability that a sample unit (e.g., raster cell) contains an incorrectly-identified record. We hope this tool can help conservation biogeographers better address pressing questions about Earth’s biodiversity.