R Tools

Below are some useful R scripts I’ve written for handling biodiversity data and species distribution/ecological niche modeling.  If you use any of them, all I ask is you please let me know. You can install two packages which contain most of the functions below (plus many more) directly from GitHub!  Just follow these directions:

install.packages('devtools') # if you haven't done this already
# you may also need to install the "sp", "dismo", "raster", and other
# packages (all from CRAN), depending on the functions you wish to use


Below are some of the functions in these packages (plus a few more).

art: Aligned rank transformation for non-parametric ANOVA [package omnibus]

To date, you cannot perform a non-parametric ANOVA on 2 or more factors unless one of the two is random (which you can only do if one factor is random–if you have three factors then you’re out of luck). Alternatively, non-parametric data can be transformed to meet normality assumptions and analyzed in a normal ANOVA framework. This script implements the aligned rank transform for up to 4 factors.  You can also download the stand-alone program ARTools by Jacob Wobbrock et al. See their short article for a succinct explanation.  Note: By default the script uses the mean as a measure of centrality, but other metrics can be used, though the mean is likely the best (Peterson 2002).

trainMaxEnt: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.3.3k) [package enmSdm]

trainMaxNet: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.4.1+ — also known as “Maxnet”) [package enmSdm]

These functions follows Warren & Siefert’s algorithm for calibrating the master regularization parameter (beta) using AICc.  It also tests all possible combinations of feature classes.

Special note: These scripts do not use rasters to do the AIC calculation, unlike what is proposed in Warren & Siefert.  Using the rasters take a long time, and if you’re using non-random background sites anyway, is inappropriate.  This important tidbit was gleaned from a personal communication from Dan Warren cited in Wright, A.N., Hijmans, R.J., Schwartz, M.W., and Shaffer, H.B.  2015.  Multiple sources of uncertainty affect metrics for ranking conservation risk under climate change.  Diversity and Distributions 21:111-122.

contBoyce: Calculate the Continuous Boyce Index [package enmSdm]

The CBI is a measure of model accuracy like AUC, but specifically designed for cases where one has no true absences.  See Boyce et al. (2002) for the Boyce Index and Hirzel et al. (2006) for the continuous version, which this function calculates.

deconNet: Deconstruct network

Takes a network object crated in the network package and returns a network with all connected vertices removed (i.e. just those nodes with no edges).  If the network is fully saturated and allows loops, returns an empty network object.  Otherwise, if it is fully saturated, the function returns a network with a single vertex.  Tip: Sometimes coercing a set of geographic points into a network (with edges defined by some minimum distance) then applying a function is faster than the geogThin function.

elimCellDups: Eliminate duplicate points in a raster cell  [package enmSdm]

Takes a data frame with records that have coordinates, overlays it with a raster, and returns a data frame with just one record per cell.

geoThin: Thin geographic points so that none are within a given distance of one another [package enmSdm]

Thins geographic points such that none are within a user-defined distance of one another.  If ties exist, removes points with greatest number of neighbors first, then points closest to geographic center of all points.  See also geoThinApprox() below for a faster but random version.

geoThinApprox: Thin geographic points so that none are within a given distance of one another [package enmSdm]
Thins geographic points such that none are within a user-defined distance of one another.  If ties exist, removes points with greatest number of neighbors first, and if ties among these exist, then removes a point randomly.  See also geoThin() above for a slower but deterministic version.

returnYear: Return year from messy dates

Have you ever had a list of dates all in different formats like “2012-01-29”, “Nov 23, 1973”,  “12 Nov 18”, and so on?  This script takes those list of dates and returns the year in which they occurred.  When millennium and century cannot be inferred, a dummy value of “99” is prepended to the output (e.g., “71” becomes “9971”).

replaceDiacritics: Remove diacritics

Attempts to replace diacritically marked characters with unmarked character (e.g., à, á, â, and ã all simply become “a”).  Note that this won’t actually replace all characters, but it tries!  Useful for matching names between different sources, some with and some without diacritics.


Others’ Useful packages


Biodiversity: Data Access

spocc: For downloading data from GBIF, BISON, AntWeb, and others.

Biodiversity: Data Cleaning

biogeo: For cleaning biodiversity data.

spoccutils: Light cleaning and visualization of biodiversity data.

Built Environments

osmdata: Access Open Street Map data

stplanr: Transport planning

GIS Data (General)

FedData: For downloading GIS data from several US government data sources (CRAN, GitHub)


brranching: Obtain phylogenies


ggeffects: Calculate marginal effects

Taxonomic Name Resolution Services

ritis: Also see ITIS