Below are some useful R scripts I’ve written for handling biodiversity data and species distribution/ecological niche modeling. If you use any of them, all I ask is you please let me know. You can install two packages which contain most of the functions below (plus many more) directly from GitHub! Just follow these directions:
install.packages('devtools') # if you haven't done this already library(devtools) install_github('adamlilith/omnibus') install_github('adamlilith/legendary') install_github('adamlilith/enmSdm') install_github('adamlilith/fasterRaster') # you may also need to install the "sp", "dismo", "raster", and other # packages (all from CRAN), depending on the functions you wish to use library(omnibus) library(legendary) library(enmSdm) library(fasterRaster)
Below are some of the functions in these packages (plus a few more).
art: Aligned rank transformation for non-parametric ANOVA [package omnibus]
To date, you cannot perform a non-parametric ANOVA on 2 or more factors unless one of the two is random (which you can only do if one factor is random–if you have three factors then you’re out of luck). Alternatively, non-parametric data can be transformed to meet normality assumptions and analyzed in a normal ANOVA framework. This script implements the aligned rank transform for up to 4 factors. You can also download the stand-alone program ARTools by Jacob Wobbrock et al. See their short article for a succinct explanation. Note: By default the script uses the mean as a measure of centrality, but other metrics can be used, though the mean is likely the best (Peterson 2002).
trainMaxEnt: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.3.3k) [package enmSdm]
trainMaxNet: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.4.1+ — also known as “Maxnet”) [package enmSdm]
These functions follows Warren & Siefert’s algorithm for calibrating the master regularization parameter (beta) using AICc. It also tests all possible combinations of feature classes.
Special note: These scripts do not use rasters to do the AIC calculation, unlike what is proposed in Warren & Siefert. Using the rasters take a long time, and if you’re using non-random background sites anyway, is inappropriate. This important tidbit was gleaned from a personal communication from Dan Warren cited in Wright, A.N., Hijmans, R.J., Schwartz, M.W., and Shaffer, H.B. 2015. Multiple sources of uncertainty affect metrics for ranking conservation risk under climate change. Diversity and Distributions 21:111-122.
contBoyce: Calculate the Continuous Boyce Index [package enmSdm]
The CBI is a measure of model accuracy like AUC, but specifically designed for cases where one has no true absences. See Boyce et al. (2002) for the Boyce Index and Hirzel et al. (2006) for the continuous version, which this function calculates.
Takes a network object crated in the network package and returns a network with all connected vertices removed (i.e. just those nodes with no edges). If the network is fully saturated and allows loops, returns an empty network object. Otherwise, if it is fully saturated, the function returns a network with a single vertex. Tip: Sometimes coercing a set of geographic points into a network (with edges defined by some minimum distance) then applying a function is faster than the geogThin function.
elimCellDups: Eliminate duplicate points in a raster cell [package enmSdm]
Takes a data frame with records that have coordinates, overlays it with a raster, and returns a data frame with just one record per cell.
geoFold: Assign “geographic k-folds” to sites [package enmSdm]
Divides sites into k groups such that there is as little spatial overlap between each group as possible.
geoThin: Thin geographic points so that none are within a given distance of one another [package enmSdm]
Thins geographic points such that none are within a user-defined distance of one another. If ties exist, removes points with greatest number of neighbors first, then points closest to geographic center of all points. See also geoThinApprox() below for a faster but random version.
geoThinApprox: Thin geographic points so that none are within a given distance of one another [package enmSdm]
Thins geographic points such that none are within a user-defined distance of one another. If ties exist, removes points with greatest number of neighbors first, and if ties among these exist, then removes a point randomly. See also geoThin() above for a slower but deterministic version.
trainBrt, trainCrf, trainGam, trainGlm, trainLars, trainMaxEnt, trainMaxNet, trainNs, trainRf: Calibrate boosted regression trees, conditional random forests, generalized additive models, generalized linear models, least angle regression (with interactions and higher-order terms), Maxent (older and newer versions), natural splines, and random forests [in package enmSdm]
These functions are wrappers for model-specific functions like glm() or maxent() that implement “best-practices” calibration, depending on the algorithm (e.g., AICc-based model selection for GAM, GLM, and Maxent, deviance reduction for BRTs, and so on).
yearFromDate: Returns a year from a messy date [package omnibus]
Have you ever had a list of dates all in different formats like “2012-01-29”, “Nov 23, 1973”, “12 Nov 18”, and so on? This script takes those list of dates and returns the year in which they occurred. When millennium and century cannot be inferred, a dummy value of “99” is prepended to the output (e.g., “71” becomes “9971”).
Attempts to replace diacritically marked characters with unmarked character (e.g., à, á, â, and ã all simply become “a”). Note that this won’t actually replace all characters, but it tries! Useful for matching names between different sources, some with and some without diacritics.
Others’ Useful packageshttps://paleobiodb.org/#/
Biodiversity: Data Access
auk: Access eBird data
heminthR: London Natural History Museum helminth parasite database
paleobioDB: Paleobiology Database
naturalis: Naturalis Biodiversity Center of the Netherlands
PresSPickR: GBIF, Bioatles, and potentially others
rcites: Data on species protected by CITES or CMS (Convention on Migratory Species)
rebird: Access eBird data
rredlist: IUCN Red List
rgbif: Access GBIF data
rbison: Access BISON data
spocc: For downloading data from GBIF, BISON, AntWeb, and others.
Biodiversity: Data Cleaning
biogeo: Specimen data cleaning
CoordinateCleaner: Specimen data cleaning
spoccutils: Light cleaning and visualization of biodiversity data
taxonomyCleanr: Clean taxonomic data with a taxonomic name resolution service (TNRS)
wikitaxa: Taxonomic information from Wikipedia/Wikidata/Wikispecies
worms: Taxonomic information from the WOrld Registry of Marine Species (WORMS)
osmdata: Access Open Street Map data
stplanr: Transport planning
Climate: Data Access
clifro: New Zealand National Climate Database
GSODR: Global Surface Summary of the Day (GSOD) weather data
rnoaa: NOAA weather data
weathercan: Environment and Climate Change Canada
Data Cleaning (General)
parsedate: Parses messy dates
dismo: The must-have package
iSdm: Invasive species distribution and niche modeling
GIS Data (General)
landscapetools: enhanced raster tools
rasterVis raster plotting
rayshader mind-blowing plotting
tiler create raster tiles for geographic and non-geographic raster data
GIS: Polygons, Lines, and Points
recogeo reconcile geographic differences between shapefiles of the same features
Data-to-Viz taxonomy of graphics
rayshader great for geographic data
brranching: Obtain phylogenies
taxa: Data structures for taxonomies
rmarkdown book for free!
Remote Sensing: Data Access
rLandsat: LANDSAT data
MODIStsp: MODIS time series
smapr: NASA’s Soil Moisture Active-Passive data
ggeffects: Calculate marginal effects
“Uber” plot for evaluating models in one set of commands
tidybayes: extracting/visualizing posteriors from popular platforms
landscapetools: neutral landscape models