It is well-known that herbarium and museum specimen records provide imperfect, and often biased, snapshots of where species occur. To date, a number of bias-correction methods exist, some from the species distribution modeling (SDM) framework based on data filtering and some from the occupancy modeling (OM) framework based on collector-specific covariates. However, we don’t really know whether it is necessary to correct for bias collector-by-collector, or whether we can describe and correct collector behavior in the aggregate using covariates like month of collection. Likewise, we don’t have a good handle on how much (if any) data filtering is necessary to remove bias. To address these issues, Kelley constructed hierarchical Bayesian models of herbarium specimens to ascertain whether SDM-like or OM-like approaches work better.
The upshot? SDM- and OM-like methods work best when combined. Nonetheless, covariates specific to individual collectors (i.e., random intercepts for each collector) were rarely important. Rather, collector behavior can be described and corrected for adequately using meta-data on phenology, likelihood to collect the focal species, and other factors obtainable (yet often overlooked) in specimen records.
Figure: Uneven (biased) sampling of Anacardiaceae in Florida by: (a) collector, (b) month, (c) decade), (d) type of collector, (e) whether or not a collection of that species was the first or not-first collection for a particular collector, (f-g) counties collected by the two most prolific collectors of Anacardiaceae in Florida, and (h) collections of Anacardiaceae by county. Also see maps for all species.
Accounting for imperfect detection in data from museums and herbaria when modeling species distributions: Combining and contrasting data-level versus model-level bias correction. [article]
Erickson, K.D. and Smith, A.B. In 2021. Ecography 44:1341-1352