## R Code

Below are some useful R scripts I’ve written for handling biodiversity data and species distribution modeling. If you use any of them, all I ask is you please let me know.

**New**: You can download two packages that contain all of these functions and more. The packages will soon be on CRAN!

**omnibus (zip):** collection of functions–needed for enmSdm)

**enmSdm (zip):** functions for paramaterizing species distribution/ecological niche models

*art*: Aligned rank transformation for non-parametric ANOVA

To date, you cannot perform a non-parametric ANOVA on 2 or more factors unless one of the two is random (which you can only do if one factor is random–if you have three factors then you’re out of luck). Alternatively, non-parametric data can be transformed to meet normality assumptions and analyzed in a normal ANOVA framework. This script implements the aligned rank transform for up to 4 factors. You can also download the stand-alone program ARTools by Jacob Wobbrock et al. See their short article for a succinct explanation. Note: By default the script uses the mean as a measure of centrality, but other metrics can be used, though the mean is likely the best (Peterson 2002).

*trainMaxEnt*: Calibrate Maxent regularization parameter using AICc (uses Maxent 3.3.3k)

This function follows Warren & Siefert’s algorithm for calibrating the master regularization parameter (beta) using AICc. It duplicates the same function in ENMTools. *Updated May 8th, 2017: Now cycles though all possible combinations of feature classes.*

Special note: These scripts do not use rasters to do the AIC calculation, unlike what is proposed in Warren & Siefert. Using the rasters take a *long* time, and if you’re using non-random background sites anyway, is inappropriate. This important tidbit was gleaned from a personal communication from Dan Warren noted in Wright, A.N., Hijmans, R.J., Schwartz, M.W., and Shaffer, H.B. 2015. Multiple sources of uncertainty affect metrics for ranking conservation risk under climate change. *Diversity and Distributions* 21:111-122.

*contBoyce*: Calculate the Continuous Boyce Index

The CBI is a measure of model accuracy like AUC, but specifically designed for cases where one has no true absences. See Boyce et al. (2002) for the Boyce Index and Hirzel et al. (2006) for the continuous version, which this function calculates. *Update May 8th, 2017: Added ability of use to specify weights assigned to presence/background sites.*

Takes a network object crated in the *network* package and returns a network with all connected vertices removed (i.e. just those nodes with no edges). If the network is fully saturated and allows loops, returns an empty network object. Otherwise, if it is fully saturated, the function returns a network with a single vertex. Tip: Sometimes coercing a set of geographic points into a network (with edges defined by some minimum distance) then applying a function is faster than the *geogThin* function.

*elimCellDups*: Eliminate duplicate points in a raster cell

Takes a data frame with records that have coordinates, overlays it with a raster, and returns a data frame with just one record per cell.

*geogThin:* Thin geographic points so that none are within a given distance of one another

Thins geographic points such that none are within a user-defined distance of one another. If ties exist, removes points with greatest number of neighbors first, then points closest to geographic center of all points. See also *thinApprox()* below for a faster but random version.

*geogThinApprox*: Thin geographic points so that none are within a given distance of one another

Thins geographic points such that none are within a user-defined distance of one another. If ties exist, removes points with greatest number of neighbors first, and if ties among these exist, then removes a point randomly. See also *geogThin()* above for a slower but deterministic version.

*returnYear*: Return year from messy dates

Have you ever had a list of dates all in different formats like “2012-01-29”, “Nov 23, 1973”, “12 Nov 18”, and so on? This script takes those list of dates and returns the year in which they occurred. When millennium and century cannot be inferred, a dummy value of “99” is prepended to the output (e.g., “71” becomes “9971”).

*replaceDiacritics*: Remove diacritics

Attempts to replace diacritically marked characters with unmarked character (e.g., à, á, â, and ã all simply become “a”). Note that this won’t actually replace all characters, but it tries! Useful for matching names between different sources, some with and some without diacritics.