R Tools

Below are some useful R scripts I’ve written for handling biodiversity data and species distribution/ecological niche modeling.  If you use any of them, all I ask is you please let me know. You can install two packages which contain most of the functions below (plus many more) directly from GitHub!  Just follow these directions:

install.packages('devtools') # if you haven't done this already
# you may also need to install the "sp", "dismo", "raster", and other
# packages (all from CRAN), depending on the functions you wish to use


Below are some of the functions in these packages (plus a few more).

art: Aligned rank transformation for non-parametric ANOVA [package omnibus]

To date, you cannot perform a non-parametric ANOVA on 2 or more factors unless one of the two is random (which you can only do if one factor is random–if you have three factors then you’re out of luck). Alternatively, non-parametric data can be transformed to meet normality assumptions and analyzed in a normal ANOVA framework. This script implements the aligned rank transform for up to 4 factors.  You can also download the stand-alone program ARTools by Jacob Wobbrock et al. See their short article for a succinct explanation.  Note: By default the script uses the mean as a measure of centrality, but other metrics can be used, though the mean is likely the best (Peterson 2002).

trainMaxEnt: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.3.3k) [package enmSdm]

trainMaxNet: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.4.1+ — also known as “Maxnet”) [package enmSdm]

These functions follows Warren & Siefert’s algorithm for calibrating the master regularization parameter (beta) using AICc.  It also tests all possible combinations of feature classes.

Special note: These scripts do not use rasters to do the AIC calculation, unlike what is proposed in Warren & Siefert.  Using the rasters take a long time, and if you’re using non-random background sites anyway, is inappropriate.  This important tidbit was gleaned from a personal communication from Dan Warren cited in Wright, A.N., Hijmans, R.J., Schwartz, M.W., and Shaffer, H.B.  2015.  Multiple sources of uncertainty affect metrics for ranking conservation risk under climate change.  Diversity and Distributions 21:111-122.

contBoyce: Calculate the Continuous Boyce Index [package enmSdm]

The CBI is a measure of model accuracy like AUC, but specifically designed for cases where one has no true absences.  See Boyce et al. (2002) for the Boyce Index and Hirzel et al. (2006) for the continuous version, which this function calculates.

deconNet: Deconstruct network

Takes a network object crated in the network package and returns a network with all connected vertices removed (i.e. just those nodes with no edges).  If the network is fully saturated and allows loops, returns an empty network object.  Otherwise, if it is fully saturated, the function returns a network with a single vertex.  Tip: Sometimes coercing a set of geographic points into a network (with edges defined by some minimum distance) then applying a function is faster than the geogThin function.

elimCellDups: Eliminate duplicate points in a raster cell  [package enmSdm]

Takes a data frame with records that have coordinates, overlays it with a raster, and returns a data frame with just one record per cell.

geoFold: Assign “geographic k-folds” to sites [package enmSdm]

Divides sites into k groups such that there is as little spatial overlap between each group as possible.

geoThin: Thin geographic points so that none are within a given distance of one another [package enmSdm]

Thins geographic points such that none are within a user-defined distance of one another.  If ties exist, removes points with greatest number of neighbors first, then points closest to geographic center of all points.  See also geoThinApprox() below for a faster but random version.

geoThinApprox: Thin geographic points so that none are within a given distance of one another [package enmSdm]
Thins geographic points such that none are within a user-defined distance of one another.  If ties exist, removes points with greatest number of neighbors first, and if ties among these exist, then removes a point randomly.  See also geoThin() above for a slower but deterministic version.

trainBrt, trainCrf, trainGam, trainGlm, trainLars, trainMaxEnt, trainMaxNet, trainNs, trainRf: Calibrate boosted regression trees, conditional random forests, generalized additive models, generalized linear models, least angle regression (with interactions and higher-order terms), Maxent (older and newer versions), natural splines, and random forests [in package enmSdm]

These functions are wrappers for model-specific functions like glm() or maxent() that implement “best-practices” calibration, depending on the algorithm (e.g., AICc-based model selection for GAM, GLM, and Maxent, deviance reduction for BRTs, and so on).

yearFromDate: Returns a year from a messy date [package omnibus]

Have you ever had a list of dates all in different formats like “2012-01-29”, “Nov 23, 1973”,  “12 Nov 18”, and so on?  This script takes those list of dates and returns the year in which they occurred.  When millennium and century cannot be inferred, a dummy value of “99” is prepended to the output (e.g., “71” becomes “9971”).

replaceDiacritics: Remove diacritics

Attempts to replace diacritically marked characters with unmarked character (e.g., à, á, â, and ã all simply become “a”).  Note that this won’t actually replace all characters, but it tries!  Useful for matching names between different sources, some with and some without diacritics.


Others’ Useful packageshttps://paleobiodb.org/#/

Biodiversity: Data Access

auk: Access eBird data

heminthR: London Natural History Museum helminth parasite database

paleobioDB: Paleobiology Database

naturalis: Naturalis Biodiversity Center of the Netherlands

PresSPickR: GBIF, Bioatles, and potentially others

rcites: Data on species protected by CITES or CMS (Convention on Migratory Species)

rebird: Access eBird data

rredlist: IUCN Red List

rfishbase: FishBase

rgbif: Access GBIF data

rbison: Access BISON data

spocc: For downloading data from GBIF, BISON, AntWeb, and others.

Biodiversity: Data Cleaning

biogeo: Specimen data cleaning

CoordinateCleaner: Specimen data cleaning

spoccutils: Light cleaning and visualization of biodiversity data

Biodiversity: Taxonomy

taxonomyCleanr: Clean taxonomic data with a taxonomic name resolution service (TNRS)

wikitaxa: Taxonomic information from Wikipedia/Wikidata/Wikispecies

worms: Taxonomic information from the WOrld Registry of Marine Species (WORMS)

Built Environments

osmdata: Access Open Street Map data

stplanr: Transport planning

Climate: Data Access

clifro: New Zealand National Climate Database

GSODR: Global Surface Summary of the Day (GSOD) weather data

rnoaa: NOAA weather data

prism: PRISM

weathercan: Environment and Climate Change Canada

Data Cleaning (General)

parsedate: Parses messy dates


blockCV: Creation of quasi-independent spatial or environmental cross-validation blocks (also GH)

dismo: The must-have package

iSdm: Invasive species distribution and niche modeling

GIS Data (General)

FedData: For downloading GIS data from several US government data sources (CRAN, GitHub)

geonames: Interface with GeoNames server

weathercan: Download weather station data from Environment and Climate Change Canada (ECCC)

GIS: Rasters

landscapetools: enhanced raster tools

raster obviously!

rasterVis raster plotting

rayshader mind-blowing plotting

tiler create raster tiles for geographic and non-geographic raster data

GIS: Polygons, Lines, and Points

recogeo reconcile geographic differences between shapefiles of the same features


Data-to-Viz taxonomy of graphics

virdis palette

rayshader great for geographic data


brranching: Obtain phylogenies

taxa: Data structures for taxonomies


rmarkdown book for free!

Remote Sensing: Data Access

rLandsat: LANDSAT data

MODIStsp: MODIS time series

smapr: NASA’s Soil Moisture Active-Passive data

Statistics (General)

ggeffects: Calculate marginal effects

“Uber” plot for evaluating models in one set of commands

Statistics: Bayesian

tidybayes: extracting/visualizing posteriors from popular platforms

Statistics: Geographic

landscapetools: neutral landscape models

Taxonomic Name Resolution Services

ritis: Also see ITIS