--- title: "An Introduction to iNEXT.beta3D via Examples" author: "Anne Chao and Kai-Hsiang Hu" date: "Latest version 1.1.0 (Nov. 2024)" output: rmarkdown::html_vignette: vignette: > %\VignetteIndexEntry{An Introduction to iNEXT.beta3D via Examples} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.retina = 2, fig.align = 'center', fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE) options("width" = 200, digits = 3) library(iNEXT.beta3D) ``` The Latest Update in Nov. 2024: In earlier versions, diversity decomposition (alpha, beta, gamma, and dissimilarity) was performed only for all assemblages of datasets. In the updated version, we have added a logical argument "by_pair" in the main function "iNEXTbeta3D" to specify whether diversity decomposition will be performed for pairs of assemblages or not. If "by_pair = TRUE", alpha/beta/gamma diversity or dissimilarity will be computed for all pairs of assemblages in the input data; if "by_pair = FALSE", alpha/beta/gamma diversity or dissimilarity will be computed for K assemblages (i.e., K can be greater than two) when data for K assemblages are provided in the input data. Default is "by_pair = FALSE". \ The package `iNEXT.beta3D` (iNterpolation and EXTrapolation with beta diversity for three dimensions of biodiversity) is a sequel to `iNEXT.` The three dimensions (3D) of biodiversity include taxonomic diversity (TD), phylogenetic diversity (PD) and functional diversity (FD). This document provides an introduction demonstrating how to run `iNEXT.beta3D`. An online version [iNEXT.beta3D Online](https://chao.shinyapps.io/iNEXT_beta3D/) is also available for users without an R background. A unified framework based on Hill numbers and their generalizations is adopted to quantify TD, PD and FD. TD quantifies the effective number of species, mean-PD (PD divided by tree depth) quantifies the effective number of lineages, and FD quantifies the effective number of virtual functional groups (or functional "species"). Thus, TD, mean-PD, and FD are all in the same units of species/lineage equivalents and can be meaningfully compared; see Chao et al. (2021) for a review of the unified framework. For each of the three dimensions, `iNEXT.beta3D` focuses on the multiplicative diversity decomposition (alpha, beta and gamma) of orders q = 0, 1 and 2 based on sampling data. Beta diversity quantifies the extent of among-assemblage differentiation, or the changes in species/lineages/functional-groups composition and abundance among assemblages. `iNEXT.beta3D` features standardized 3D estimates with a common sample size (for alpha and gamma diversity) or sample coverage (for alpha, beta and gamma diversity). `iNEXT.beta3D` also features coverage-based standardized estimates of four classes of dissimilarity measures. Based on the rarefaction and extrapolation (R/E) method for Hill numbers (TD) of orders q = 0, 1 and 2, Chao et al. (2023b) developed the pertinent R/E theory for taxonomic beta diversity with applications to real-world spatial, temporal and spatio-temporal data. An application to Gentry's global forest data along with a concise description of the theory is provided in Chao et al. (2023a). The extension to phylogenetic and functional beta diversity is generally parallel. The `iNEXT.beta3D` package features two types of R/E sampling curves: 1. Sample-size-based (or size-based) R/E sampling curves: This type of sampling curve plots standardized 3D gamma and alpha diversity with respect to sample size. Note that the size-based beta diversity is not a statistically valid measure (Chao et al. 2023b) and thus the corresponding sampling curve is not provided. 2. Sample-coverage-based (or coverage-based) R/E sampling curves: This type of sampling curve plots standardized 3D gamma, alpha, and beta diversity as well as four classes of dissimilarity measures with respect to sample coverage (an objective measure of sample completeness). Sufficient data are needed to run `iNEXT.beta3D`. If your data comprise only a few species and their abundances/phylogenies/traits, it is probable that the data lack sufficient information to run `iNEXT.beta3D`. ## HOW TO CITE iNEXT.beta3D If you publish your work based on results from `iNEXT.beta3D`, you should make reference to at least one of the following methodology papers (2023a, b) and also cite the `iNEXT.beta3D` package: - Chao, A., Chiu, C.-H., Hu, K.-H., and Zeleny, D. (2023a). Revisiting Alwyn H. Gentry's forest transect data: a statistical sampling-model-based approach. Japanese Journal of Statistics and Data Science, 6, 861-884. (https://doi.org/10.1007/s42081-023-00214-1) - Chao, A., Thorn, S., Chiu, C.-H., Moyes, F., Hu, K.-H., Chazdon, R. L., Wu, J., Magnago, L. F. S., Dornelas, M., Zeleny, D., Colwell, R. K., and Magurran, A. E. (2023b). Rarefaction and extrapolation with beta diversity under a framework of Hill numbers: the iNEXT.beta3D standardization. Ecological Monographs e1588.(https://doi.org/10.1002/ecm.1588) - Chao, A. and Hu, K.-H. (2023). The iNEXT.beta3D package: interpolation and extrapolation with beta diversity for three dimensions of biodiversity. R package available from CRAN. ## SOFTWARE NEEDED TO RUN iNEXT.beta3D IN R - Required: [R](https://cran.r-project.org/) - Suggested: [RStudio IDE](https://posit.co/products/open-source/rstudio/#Desktop) ## HOW TO RUN iNEXT.beta3D: The `iNEXT.beta3D` package is available from CRAN and can be downloaded from Anne Chao's Github [iNEXT.beta3D_github](https://github.com/AnneChao/iNEXT.beta3D) using the following commands. For a first-time installation, additional visualization extension package (`ggplot2` from CRAN) and relevant package (`iNEXT.3D` from CRAN) must be installed and loaded. ```{r eval=FALSE} ## install iNEXT.beta3D package from CRAN install.packages("iNEXT.beta3D") ## install the latest version from github install.packages('devtools') library(devtools) install_github('AnneChao/iNEXT.beta3D') ## import packages library(iNEXT.beta3D) ``` There are three main functions in this package: - **iNEXTbeta3D**: computes standardized 3D estimates with a common sample size (for alpha and gamma diversity) or sample coverage (for alpha, beta and gamma diversity) for default sample sizes or coverage values. This function also computes coverage-based standardized 3D estimates of four classes of dissimilarity measures for default coverage values. In addition, this function also computes standardized 3D estimates with a particular vector of user-specified sample sizes or coverage values. - **ggiNEXTbeta3D**: Visualizes the output from the function `iNEXTbeta3D`. - **DataInfobeta3D**: Provides basic data information for (1) the reference sample in each assemblage, (2) the gamma reference sample in the pooled assemblage, and (3) the alpha reference sample in the joint assemblage. ## DATA INPUT FORMAT To assess beta diversity among assemblages, information on shared/unique species and their abundances is required. Thus, species identity (or any unique identification code) and assemblage affiliation must be provided in the data. In any input dataset, set row name of the data to be species name (or identification code) and column name to be assemblage name. Two types of species abundance/incidence data are supported: 1. Individual-based abundance data (`datatype = "abundance"`): Input data for a single dataset with N assemblages consist of a species-by-assemblage abundance matrix/data.frame. Users can input several datasets which may represent data collected from various localities, regions, plots, time periods, ..., etc. Input data for multiple datasets then consist of a list of matrices; each matrix represents a species-by-assemblage abundance matrix for one of the datasets. Different datasets can have different numbers of assemblages. `iNEXTbeta3D` computes beta diversity and dissimilarity among assemblages within each dataset. 2. Sampling-unit-based incidence raw data (`datatype = "incidence_raw"`): Input data for a dataset with N assemblages consist of a list of matrices/data.frames, with each matrix representing a species-by-sampling-unit incidence raw matrix for one of the N assemblages; each element in the incidence raw matrix is 1 for a detection, and 0 for a non-detection. Users can input several datasets. Input data then consist of multiple lists with each list comprising a list of species-by-sampling-unit incidence matrices; see an example below. The number of sampling units can vary with datasets (but within a dataset, the number of sampling units in each assemblage must be the same). `iNEXTbeta3D` computes beta diversity and dissimilarity among assemblages within each dataset based on incidence-based frequency counts obtained from all sampling units. #### Species abundance data format We use the tree species abundance data collected from three rainforest fragments/localities in Brazil to assess beta diversity between Edge and Interior assemblages/habitats within each fragment; see Chao et al. (2023b) for analysis details. The data (named `"Brazil_rainforests"`) consist of a list of three matrices (for three fragments named "Marim", "Rebio2", and "Rochedo", respectively); each matrix represents a species-by-assemblage abundance matrix, and there are two assemblages ("Edge" and "Interior") in each fragment. The demo data are slightly different from those analyzed in Chao et al. (2023b) because seven species are removed from the original pooled data due to lack of phylogenetic information. Run the following code to view the data: (Here we only show the first 15 rows for each matrix.) ```{r eval=FALSE} data(Brazil_rainforests) Brazil_rainforests ``` ```{r echo=FALSE} data(Brazil_rainforests) lapply(Brazil_rainforests, function(x) x[1:15,]) ``` #### Species incidence raw data format We use tree species data collected from two second-growth rainforests, namely Cuatro Rios (CR) and Juan Enriquez (JE) in Costa Rica, as demo data to assess temporal beta diversity between two years (2005, 2011, and 2017) within each forest. Each year is designated as an assemblage. The data in each forest were collected from a 1-ha (50 m x 200 m) forest plot. Because individual trees of some species may exhibit intra-specific aggregation within a 1 ha area, they may not be suitable for modelling as independent sampling units. In this case, it is statistically preferable to first convert species abundance records in each forest to occurrence or incidence (detection/non-detection) data in subplots/quadrats; see Chao et al. (2023b) for analysis details. Each 1-ha forest was divided into 100 subplots (each with 0.01 ha) and only species' incidence records in each subplot were used to compute the incidence frequency for a species (i.e., the number of subplots in which that species occurred). By treating the incidence frequency of each species among subplots as a "proxy" for its abundance, the `iNEXT.beta3D` standardization can be adapted to deal with spatially aggregated data and to avoid the effect of intra-specific aggregation. The data (named `"Second_growth_forests"`) consist of two lists (for two forests named "CR 2005 vs. 2011 vs. 2017" and "JE 2005 vs. 2011 vs. 2017", respectively). Each list consists of three matrices; the first matrix represents the species-by-subplot incidence data in 2005, the second matrix represents the species-by-subplot incidence data in 2011, and the third matrix represents the species-by-subplots incidence data in 2017. Run the following code to view the incidence raw data: (Here we only show the first ten rows and six columns for each matrix; there are 100 columns/subplots in each forest and each year.) ```{r eval=FALSE} data(Second_growth_forests) Second_growth_forests ``` ```{r echo=FALSE} data(Second_growth_forests) lapply(Second_growth_forests, function(x) lapply(x, function(y) y[1:10,1:6])) ``` #### Phylogenetic tree format for PD To perform PD analysis, the phylogenetic tree (in Newick format) spanned by species observed in all datasets must be stored in a data file. For example, the phylogenetic tree for all observed species (including species in "Marim", "Rebio2", and "Rochedo" fragments) is stored in a data file named `"Brazil_tree"` for demonstration purpose. A partial list of the tip labels and node labels are shown below. ```{r} data(Brazil_tree) Brazil_tree ``` #### Species pairwise distance matrix format for FD To perform FD analysis, the species-pairwise distance matrix (Gower distance computed from species traits) for species observed in all datasets must be stored in a matrix/data.frame format. Typically, the distance between any two species is computed from species traits using the Gower distance. In our demo data, the distance matrix for all species (including species in both "Marim", "Rebio2", and "Rochedo" fragments) is stored in a data file named `"Brazil_distM"` for demonstration purpose. Here we only show the first three rows and three columns of the distance matrix. ```{r eval=FALSE} data(Brazil_distM) Brazil_distM ``` ```{r echo=FALSE} data(Brazil_distM) round(Brazil_distM[1:3,1:3], 4) ``` ## MAIN FUNCTION: iNEXTbeta3D() We first describe the main function `iNEXTbeta3D()` with default arguments: ```{r eval=FALSE} iNEXTbeta3D(data, diversity = "TD", q = c(0, 1, 2), datatype = "abundance", base = "coverage", level = NULL, nboot = 10, conf = 0.95, PDtree = NULL, PDreftime = NULL, PDtype = "meanPD", FDdistM = NULL, FDtype = "AUC", FDtau = NULL, FDcut_number = 30, by_pair = FALSE) ``` The arguments of this function are briefly described below, and will be explained in more details by illustrative examples in later text. By default (with the standardization base = "coverage"), this function computes coverage-based standardized 3D gamma, alpha, beta diversity, and four dissimilarity indices for coverage up to one (for q = 1, 2) or up to the coverage of double the reference sample size (for q = 0). If users set the standardization base to base="size", this function computes size-based standardized 3D gamma and alpha diversity estimates up to double the reference sample size in each dataset. In addition, this function also computes standardized 3D estimates with a particular vector of user-specified sample sizes or coverage values.
Argument Description
data (a) For datatype = "abundance", species abundance data for a single dataset can be input as a matrix/data.frame (species-by-assemblage); data for multiple datasets can be input as a list of matrices/data.frames, with each matrix representing a species-by-assemblage abundance matrix for one of the datasets. (b) For datatype = "incidence_raw", data for a single dataset with N assemblages can be input as a list of matrices/data.frames, with each matrix representing a species-by-sampling-unit incidence matrix for one of the assemblages; data for multiple datasets can be input as multiple lists.
diversity selection of diversity type: diversity = "TD" = Taxonomic diversity, diversity = "PD" = Phylogenetic diversity, and diversity = "FD" = Functional diversity.
q a numerical vector specifying the diversity orders. Default is c(0, 1, 2).
datatype data type of input data: individual-based abundance data (datatype = "abundance") or species by sampling-units incidence matrix (datatype = "incidence_raw") with all entries being 0 (non-detection) or 1 (detection).
base standardization base: coverage-based rarefaction and extrapolation for gamma, alpha, beta diversity, and four classes of dissimilarity indices (base = "coverage"), or sized-based rarefaction and extrapolation for gamma and alpha diversity (base = "size"). Default is base = "coverage".
level A numerical vector specifying the particular values of sample coverage (between 0 and 1 when base = "coverage") or sample sizes (base = "size") that will be used to compute standardized diversity/dissimilarity. Asymptotic diversity estimator can be obtained by setting level = 1 (i.e., complete coverage for base = "coverage"). By default (with base = "coverage"), this function computes coverage-based standardized 3D gamma, alpha, beta diversity, and four dissimilarity indices for coverage from 0.5 up to one (for q = 1, 2) or up to the coverage of double the reference sample size (for q = 0), in increments of 0.025. The extrapolation limit for beta diversity is defined as that for alpha diversity. If users set base = "size", this function computes size-based standardized 3D gamma and alpha diversity estimates based on 40 equally-spaced sample sizes/knots from sample size 1 up to double the reference sample size.
nboot a positive integer specifying the number of bootstrap replications when assessing sampling uncertainty and constructing confidence intervals. Bootstrap replications are generally time consuming. Set `nboot = 0` to skip the bootstrap procedures. Default is `nboot = 10`. If more accurate results are required, set `nboot = 100` (or `nboot = 200`).
conf a positive number < 1 specifying the level of confidence interval. Default is conf = 0.95.
PDtree (required argument for diversity = "PD"), a phylogenetic tree in Newick format for all observed species in the pooled assemblage.
PDreftime (argument only for diversity = "PD"), a numerical value specifying reference time for PD. Default is PDreftime=NULL. (i.e., the age of the root of PDtree)
PDtype (argument only for diversity = "PD"), select PD type: PDtype = "PD" (effective total branch length) or PDtype = "meanPD" (effective number of equally divergent lineages). Default is PDtype = "meanPD", where meanPD = PD/tree depth.
FDdistM (required argument for diversity = "FD"), a species pairwise distance matrix for all species in the pooled assemblage.
FDtype (argument only for diversity = "FD"), select FD type: FDtype = "tau_value" for FD under a specified threshold value, or FDtype = "AUC" (area under the curve of tau-profile) for an overall FD which integrates all threshold values between zero and one. Default is FDtype = "AUC".
FDtau (argument only for diversity = "FD" and FDtype="tau_value"), a numerical value between 0 and 1 specifying the tau value (threshold level) that will be used to compute FD. If FDtau = NULL (default), then the threshold level is set to be the mean distance between any two individuals randomly selected from the pooled dataset (i.e., quadratic entropy).
FDcut_number (argument only for diversity = "FD" and FDtype="AUC"), a numeric number to cut [0, 1] interval into equal-spaced sub-intervals to obtain the AUC value by integrating the tau-profile. Equivalently, the number of tau values that will be considered to compute the integrated AUC value. Default is FDcut_number = 30. A larger value can be set to obtain more accurate AUC value.
by_pair a logical variable specifying whether to perform diversity decomposition for all pairs of assemblages or not. If by_pair = TRUE, alpha/beta/gamma diversity will be computed for all pairs of assemblages in the input data; if by_pair = FALSE, alpha/beta/gamma diversity will be computed for multiple assemblages (i.e, more than two assemblages) in the input data. Default is FALSE.
This function returns an `"iNEXTbeta3D"` object which can be further used to make plots using the function `ggiNEXTbeta3D()` to be described below. (only accept the outcome from `iNEXTbeta3D` under `by_pair = FALSE`) ## Output of the main function iNEXTbeta3D() By default (with `base = 'coverage'`), the `iNEXTbeta3D()` function for each of the three dimensions (TD, PD, and FD) returns the `"iNEXTbeta3D"` object including seven data frames for each dataset: - gamma (standardized gamma diversity) - alpha (standardized alpha diversity) - beta (standardized beta diversity) - 1-C (standardized Sorensen-type non-overlap index) - 1-U (standardized Jaccard-type non-overlap index) - 1-V (standardized Sorensen-type turnover index) - 1-S (standardized Jaccard-type turnover index) When users set `base = 'size'`, the `iNEXTbeta3D()` function for each of the three dimensions (TD, PD, and FD) returns the `"iNEXTbeta3D"` object including two data frames for each dataset: - gamma (size-based standardized gamma diversity) - alpha (size-based standardized alpha diversity) Size-based beta diversity and dissimilarity indices are not statistically valid measures and thus are not provided. ## GRAPHIC DISPLAYS: FUNCTION ggiNEXTbeta3D() The function `ggiNEXTbeta3D()` with default arguments is described as follows: (only accept the outcome from `iNEXTbeta3D` under `by_pair = FALSE`) ```{r eval=FALSE} ggiNEXTbeta3D(output, type = "B") ```
Argument Description
output output from the function iNEXTbeta3D.
type (argument only for `base = "coverage"`), type = 'B' for plotting the rarefaction and extrapolation sampling curves for gamma, alpha, and beta diversity; type = 'D' for plotting the rarefaction and extrapolation sampling curves for four dissimilarity indices. Skip the argument for plotting size-based rarefaction and extrapolation sampling curves for gamma and alpha diversity.
The `ggiNEXTbeta3D()` function is a wrapper around the `ggplot2` package to create a R/E curve using a single line of code. The resulting object is of class `"ggplot"`, so it can be manipulated using the `ggplot2` tools. Users can visualize the displays of coverage-based R/E sampling curves of gamma, alpha and beta diversity as well as four classes of dissimilarity indices by setting the parameter type. ## TAXONOMIC DIVERSITY (TD): RAREFACTION/EXTRAPOLATION VIA EXAMPLES ### EXAMPLE 1: Abundance data with default sample sizes or coverage values (not by pairs) First, we run the `iNEXTbeta3D()` function with `Brazil_rainforests` abundance data to compute coverage-based taxonomic gamma, alpha, beta diversity, and four dissimilarity indices under `base = 'coverage'` by running the following code: ```{r eval=FALSE} ## Coverage-based R/E Analysis with taxonomic diversity for abundance data (not by pairs) data(Brazil_rainforests) output_TDc_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'TD', datatype = 'abundance', base = "coverage", nboot = 10) output_TDc_abun ``` The output contains seven data frames: `gamma`, `alpha`, `beta`, `1-C`, `1-U`, `1-V`, `1-S`. For each data frame, it includes the name of dataset (`Dataset`), combinations of assemblage pairs (`Pair`, if calculating not by pairs, then there is no such column), the diversity order of q (`Order.q`), the target standardized coverage value (`SC`), the corresponding sample size (`Size`), the estimated diversity/dissimilarity estimate (`Alpha/Beta/Gamma/Dissimilarity`), `Method` (Rarefaction, Observed, or Extrapolation, depending on whether the target coverage is less than, equal to, or greater than the coverage of the reference sample), standard error of standardized estimate (`s.e.`), the bootstrap lower and upper confidence limits for the diversity/dissimilarity with a default significance level of 0.95 (`LCL`, `UCL`). These estimates with confidence intervals in the output are then used for plotting rarefaction and extrapolation curves. Our diversity/dissimilarity estimates and related statistics in the default output are displayed for the standardized coverage value from 0.5 to the coverage value of twice the reference sample size (for q = 0), or from 0.5 to 1.0 (for q = 1 and 2), in increments of 0.025. In addition, the results for the following four coverage value are also added: `SC(n, alpha)`, `SC(2n, alpha)`, `SC(n, gamma)` and `SC(2n, gamma)` if these values are in the above-specified range. Here `SC(n, alpha)` and `SC(2n, alpha)` represent, respectively, the coverage estimate for the alpha reference sample size n and the extrapolated sample with size 2n in the joint assemblage. These values can be found as `SC(n)` and `SC(2n)` for `"Joint assemblage (for alpha)"` in the column "Assemblage" from the output of the function `DataInfobeta3D`; see later text. Similar definitions pertain to `SC(n, gamma)` and `SC(2n, gamma)` for the gamma reference sample; these two values can also be found as `SC(n)` and `SC(2n)` for `"Pooled assemblage (for gamma)"` in the column "Assemblage" from the output of the function `DataInfobeta3D`. For beta diversity and dissimilarity, the observed sample coverage and extrapolation limit are defined the same as the alpha diversity. The corresponding coverage values for incidence data are denoted as, respectively, `SC(T, alpha)`, `SC(2T, alpha)`, `SC(T, gamma)` and `SC(2T, gamma)` in the output. Because all the diversity/dissimilarity estimates are computed for the standardized coverage range values starting from 0.5, the default setting with `level = NULL` does not work if the observed sample coverage in the alpha/gamma reference sample is less than 50%. In this case, readers should specify sample coverage values using the argument `level`, instead of using `level = NULL`. The suggested maximum coverage value that readers can specify is `SC(2n, alpha)`. Beyond the limit, beta diversity and dissimilarity estimates may be subject to some bias. Below we show the output for taxonomic beta diversity between the "Edge" and "Interior" habitats in the "Marim" fragment. ```{r echo=FALSE} output_TDc_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'TD', datatype = 'abundance', base = "coverage", nboot = 10) tmp = output_TDc_abun$Marim$beta tmp[,c(3:5, 7:9)] = round(tmp[,c(3:5, 7:9)], 3) tmp = tmp[,-ncol(tmp)] tmp ``` Run the following code to display the two types of curves: ```{r, fig.align='center', fig.height=6, fig.width=6} ## Coverage-based R/E curves for taxonomic gamma, alpha and beta diversity ggiNEXTbeta3D(output_TDc_abun, type = 'B') ``` ```{r, fig.align='center', fig.height=8, fig.width=6} ## Coverage-based R/E curves for four taxonomic dissimilarity indices ggiNEXTbeta3D(output_TDc_abun, type = 'D') ``` The following commands return the size-based R/E sampling curves for gamma and alpha taxonomic diversity: ```{r, fig.align='center', fig.height=5, fig.width=6} ## Size-based R/E curves with taxonomic gamma and alpha diversity (not by pairs) data(Brazil_rainforests) output_TDs_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'TD', datatype = 'abundance', base = "size", nboot = 10) ggiNEXTbeta3D(output_TDs_abun) ``` ### EXAMPLE 2: Abundance data for all pairs of assemblages with user-specified sample sizes or coverage values In addition to the default sample sizes or coverage values, `iNEXTbeta3D` also computes standardized 3D estimates for all pairs of assemblages with a particular vector of user-specified sample sizes or coverage values. The following commands return the TD estimates with two user-specified levels of sample coverage (e.g., 85% and 90%). Only the output for gamma, alpha and beta is shown below in each dataset; the output for 1-C, 1-U, 1-V, 1-S is omitted. ```{r eval=FALSE} ## Coverage-based R/E Analysis for all pairs of assemblages with taxonomic diversity for abundance data data(Brazil_rainforests) data = list("Edge" = sapply(Brazil_rainforests, function(x) x[,1]), "Interior" = sapply(Brazil_rainforests, function(x) x[,2])) output_TDc_abun_byuser = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'abundance', base = "coverage", nboot = 10, level = c(0.85, 0.9), by_pair = TRUE) output_TDc_abun_byuser ``` ```{r echo=FALSE} data = list("Edge" = sapply(Brazil_rainforests, function(x) x[,1]), "Interior" = sapply(Brazil_rainforests, function(x) x[,2])) output_TDc_abun_byuser = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'abundance', base = "coverage", nboot = 10, level = c(0.85, 0.9), by_pair = TRUE) tmp = output_TDc_abun_byuser tmp = lapply(tmp, function(x) lapply(x[1:3], function(y) { y[,c(4:6, 8:10)] = round(y[,c(4:6, 8:10)], 3) y = y[,-ncol(y)] y })) class(tmp) = "iNEXTbeta3D" tmp ``` The following commands return the TD estimates for all pairs of assemblages with two user-specified levels of sample sizes (e.g., 300 and 500). ```{r eval=FALSE} ## Size-based R/E for all pairs of assemblages with taxonomic gamma and alpha diversity data(Brazil_rainforests) data = list("Edge" = sapply(Brazil_rainforests, function(x) x[,1]), "Interior" = sapply(Brazil_rainforests, function(x) x[,2])) output_TDs_abun_byuser = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'abundance', base = "size", nboot = 10, level = c(300, 500), by_pair = TRUE) output_TDs_abun_byuser ``` ```{r echo=FALSE} data = list("Edge" = sapply(Brazil_rainforests, function(x) x[,1]), "Interior" = sapply(Brazil_rainforests, function(x) x[,2])) output_TDs_abun_byuser = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'abundance', base = "size", nboot = 10, level = c(300, 500), by_pair = TRUE) tmp = output_TDs_abun_byuser tmp = lapply(tmp, function(x) lapply(x, function(y) { y[,c(4:6, 8:10)] = round(y[,c(4:6, 8:10)], 3) y = y[,-ncol(y)] y })) class(tmp) = "iNEXTbeta3D" tmp ``` ### EXAMPLE 3: Incidence data with default sample sizes or coverage values (not by pairs) We can also use incidence raw data (`Second_growth_forests`) to compute coverage-based standardized gamma, alpha, beta diversity, and four dissimilarities under `base = 'coverage'`, and also size-based standardized gamma and alpha diversity. Run the following code to perform incidence data analysis. The output data frame is similar to that based on abundance data and thus is omitted. ```{r eval=FALSE} ## Coverage-based R/E Analysis with taxonomic diversity for incidence raw data (not by pairs) data(Second_growth_forests) data = list("CR 2005 vs. 2017" = Second_growth_forests[[1]][c(1,3)], "JE 2005 vs. 2017" = Second_growth_forests[[2]][c(1,3)]) output_TDc_inci = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw', base = "coverage", nboot = 10) output_TDc_inci ``` ```{r echo=FALSE} data = list("CR 2005 vs. 2017" = Second_growth_forests[[1]][c(1,3)], "JE 2005 vs. 2017" = Second_growth_forests[[2]][c(1,3)]) output_TDc_inci = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw', base = "coverage", nboot = 10) # lapply(output_TDc_inci, function(x) # # lapply(x[1:3], function(y) { # # index1 = which(y$SC == min(y$SC)) # index2 = which( (y$Order.q != 0 & y$SC == max(y$SC)) | (y$Order.q == 0 & y$SC == max(y[y$Order.q == 0, "SC"])) ) # index3 = which(y$Method %in% c("Observed", "Observed_C(n, alpha)")) # # index = sort(unique(c(index1, index1 + 1, index2, index2 - 1, index3 - 1, index3, index3 + 1))) # # tmp = y[index,] # tmp[,c(3:5, 7:9)] = round(tmp[,c(3:5, 7:9)], 3) # tmp # }) # ) ``` The same procedures can be applied to incidence data. Based on the demo dataset, we display below the coverage-based R/E curves for comparing temporal beta diversity between 2005 and 2017 in two second-growth forests (CR and JE) by running the following code: ```{r, fig.align='center', fig.height=6, fig.width=6} ## Coverage-based R/E curves with taxonomic gamma, alpha and beta diversity ggiNEXTbeta3D(output_TDc_inci, type = 'B') ``` The following commands return the size-based R/E sampling curves for gamma and alpha taxonomic diversity: ```{r, fig.align='center', fig.height=5, fig.width=6} ## Size-based R/E curves with taxonomic gamma and alpha diversity (not by pairs) data(Second_growth_forests) data = list("CR 2005 vs. 2017" = Second_growth_forests[[1]][c(1,3)], "JE 2005 vs. 2017" = Second_growth_forests[[2]][c(1,3)]) output_TDs_inci = iNEXTbeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw', base = "size", nboot = 10) ggiNEXTbeta3D(output_TDs_inci) ``` ### EXAMPLE 4: Incidence data for all pairs of assemblages with user-specified sample sizes or coverage values As with abundance data, user can also specify sample sizes (i.e. number of sampling units) or coverage values to obtain the pertinent output for all pairs of assemblages. The code for examples is given below with two user-specified levels of sample coverage values (e.g., 90% and 95%), but the output is omitted. ```{r eval=FALSE} ## Coverage-based R/E Analysis for all pairs of assemblages with taxonomic diversity for incidence data data(Second_growth_forests) output_TDc_inci_byuser = iNEXTbeta3D(data = Second_growth_forests, diversity = 'TD', datatype = 'incidence_raw', base = "coverage", nboot = 10, level = c(0.9, 0.95), by_pair = TRUE) output_TDc_inci_byuser ``` The following commands return the TD estimates for all pairs of assemblages with two user-specified levels of sample sizes (e.g., 100 and 200). ```{r eval=FALSE} ## Size-based R/E for all pairs of assemblages with taxonomic gamma and alpha diversity data(Second_growth_forests) output_TDs_inci_byuser = iNEXTbeta3D(data = Second_growth_forests, diversity = 'TD', datatype = 'incidence_raw', base = "size", nboot = 10, level = c(100, 200), by_pair = TRUE) output_TDs_inci_byuser ``` ## PHYLOGENETIC DIVERSITY (PD): RAREFACTION/EXTRAPOLATION VIA EXAMPLES ### EXAMPLE 5: Abundance data with default sample sizes or coverage values (not by pairs) As with taxonomic diversity, `iNEXT.beta3D` computes coverage-based standardized phylogenetic gamma, alpha, beta diversity as well as four classes of phylogenetic dissimilarity indices; it also computes size-based standardized phylogenetic gamma and alpha diversity. The species names (or identification codes) in the phylogenetic tree must exactly match with those in the corresponding species abundance/incidence data. Two types of phylogenetic rarefaction and extrapolation curves (coverage- and size-based sampling curves) are also provided. The required argument for performing PD analysis is `PDtree`. For example, the phylogenetic tree for all observed species (including species in "Marim", "Rebio2", and "Rochedo" fragments) is stored in a data file named `"Brazil_tree"`. Then we enter the argument `PDtree = Brazil_tree`. Two optional arguments are: `PDtype` and `PDreftime`. There are two options for `PDtype`: `"PD"` (effective total branch length) or `"meanPD"` (effective number of equally divergent lineages, meanPD = PD/tree depth). Default is `PDtype = "meanPD"`. `PDreftime` is a numerical value specifying a reference time for computing phylogenetic diversity. By default (`PDreftime = NULL`), the reference time is set to the tree depth, i.e., age of the root of the phylogenetic tree. Run the following code to perform PD analysis. The output data frame is similar to that based on abundance data and thus is omitted. ```{r eval=FALSE} ## Coverage-based R/E Analysis with phylogenetic diversity for abundance data (not by pairs) data(Brazil_rainforests) data(Brazil_tree) output_PDc_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'PD', datatype = 'abundance', base = "coverage", nboot = 10, PDtree = Brazil_tree, PDreftime = NULL, PDtype = 'meanPD') output_PDc_abun ``` ```{r echo=FALSE} data(Brazil_rainforests) data(Brazil_tree) output_PDc_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'PD', datatype = 'abundance', base = "coverage", nboot = 10, PDtree = Brazil_tree, PDreftime = NULL, PDtype = 'meanPD') ``` Run the following code to display the R/E curves for phylogenetic gamma, alpha, and beta diversity: ```{r, fig.align='center', fig.height=6, fig.width=6} ## Coverage-based R/E sampling curves for phylogenetic gamma, alpha and beta diversity ggiNEXTbeta3D(output_PDc_abun, type = 'B') ``` The following commands return the size-based R/E sampling curves for gamma and alpha phylogenetic diversity: ```{r, fig.align='center', fig.height=5, fig.width=6} ## Size-based R/E curves for phylogenetic gamma and alpha diversity (not by pairs) data(Brazil_rainforests) data(Brazil_tree) output_PDs_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'PD', datatype = 'abundance', base = "size", nboot = 10, PDtree = Brazil_tree, PDreftime = NULL, PDtype = 'meanPD') ggiNEXTbeta3D(output_PDs_abun) ``` ## FUNCTIONAL DIVERSITY (FD): RAREFACTION/EXTRAPOLATION VIA EXAMPLES ### EXAMPLE 6: Abundance data with default sample sizes or coverage values (not by pairs) As with taxonomic and phylogenetic diversity, `iNEXT.beta3D` computes coverage-based standardized functional gamma, alpha, beta diversity as well as four classes of functional dissimilarity indices; it also computes size-based standardized functional gamma and alpha diversity. The species names (or identification codes) in the distance matrix must exactly match with those in the corresponding species abundance/incidence data. Two types of functional rarefaction and extrapolation curves (coverage- and size-based sampling curves) are also provided. The required argument for performing FD analysis is `FDdistM`. For example, the distance matrix for all species (including species in "Marim", "Rebio2", and "Rochedo" fragments) is stored in a data file named `"Brazil_distM"`. Then we enter the argument `FDdistM = Brazil_distM`. Three optional arguments are (1) `FDtype`: `FDtype = "AUC"`means FD is computed from the area under the curve of a tau-profile by integrating all plausible threshold values between zero and one; `FDtype = "tau_value"` means FD is computed under a specific threshold value to be specified in the argument `FD_tau`. (2) `FD_tau`: a numerical value specifying the tau value (threshold level) that will be used to compute FD. If `FDtype = "tau_value"` and `FD_tau = NULL`, then the threshold level is set to be the mean distance between any two individuals randomly selected from the pooled data over all datasets (i.e., quadratic entropy). (3) `FDcut_number` is a numeric number to cut [0, 1] interval into equal-spaced sub-intervals to obtain the AUC value. Default is `FDcut_number = 30`. If more accurate integration is desired, then use a larger integer. Run the following code to perform FD analysis. The output data frame is similar to that based on abundance data and thus is omitted; see later graphical display of the output. ```{r eval=FALSE} ## Coverage-based R/E Analysis with functional diversity for abundance data - FDtype = 'AUC' (area ## under curve) by considering all threshold values between zero and one (not by pairs) data(Brazil_rainforests) data(Brazil_distM) output_FDc_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', base = "coverage", nboot = 10, FDdistM = Brazil_distM, FDtype = 'AUC', FDcut_number = 30) output_FDc_abun ``` ```{r echo=FALSE} data(Brazil_rainforests) data(Brazil_distM) output_FDc_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', base = "coverage", nboot = 10, FDdistM = Brazil_distM, FDtype = 'AUC', FDcut_number = 30) ``` Run the following code to display the R/E curves for functional gamma, alpha, and beta diversity: ```{r, fig.align='center', fig.height=6, fig.width=6} ## Coverage-based R/E sampling curves for functional gamma, alpha and beta diversity ggiNEXTbeta3D(output_FDc_abun, type = 'B') ``` The following commands return the size-based R/E sampling curves for gamma and alpha functional diversity: ```{r, fig.align='center', fig.height=5, fig.width=6} ## Size-based R/E curves for functional gamma and alpha diversity (not by pairs) data(Brazil_rainforests) data(Brazil_distM) output_FDs_abun = iNEXTbeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', base = "size", nboot = 10, FDdistM = Brazil_distM, FDtype = 'AUC', FDcut_number = 30) ggiNEXTbeta3D(output_FDs_abun) ``` ## DATA INFORMATION: FUNCTION DataInfobeta3D() The function `DataInfobeta3D()` provides basic data information for (1) the reference sample in each individual assemblage, (2) the gamma reference sample in the pooled assemblage, and (3) the alpha reference sample in the joint assemblage. The function `DataInfobeta3D()` with default arguments is shown below: ```{r eval=FALSE} DataInfobeta3D(data, diversity = "TD", datatype = "abundance", PDtree = NULL, PDreftime = NULL, FDdistM = NULL, FDtype = "AUC", FDtau = NULL, by_pair = FALSE) ``` All arguments in the above function are the same as those for the main function `iNEXTbeta3D`. Running the `DataInfobeta3D()` function returns basic data information including sample size, observed species richness, two sample coverage estimates (`SC(n)` and `SC(2n)`) as well as other relevant information in each of the three dimensions of diversity. We use `Brazil_rainforests` data to demo the function for each dimension. ```{r eval=FALSE} ## Data information for taxonomic diversity (not by pairs) data(Brazil_rainforests) DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'TD', datatype = 'abundance') ``` ```{r echo=FALSE} ## Data information for taxonomic diversity (not by pairs) data(Brazil_rainforests) tmp = DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'TD', datatype = 'abundance') tmp[,c("SC(n)","SC(2n)")] = round(tmp[,c("SC(n)","SC(2n)")], 3) tmp ``` Output description: - `Dataset` = the input datasets. - `Pair` = combinations of assemblage pairs (if calculating not by pairs, then there is no such column). - `Assemblage` = Individual assemblages, `'Pooled assemblage'` (for gamma) or `'Joint assemblage'` (for alpha). - `n` = number of observed individuals in the reference sample (sample size). - `S.obs` = number of observed species in the reference sample. - `SC(n)` = sample coverage estimate of the reference sample. - `SC(2n)` = sample coverage estimate of twice the reference sample size. - `f1`-`f5` = the first five species abundance frequency counts in the reference sample. ```{r eval=FALSE} ## Data information for taxonomic diversity for all pairs of assemblages data(Brazil_rainforests) data = list("Edge" = sapply(Brazil_rainforests, function(x) x[,1]), "Interior" = sapply(Brazil_rainforests, function(x) x[,2])) DataInfobeta3D(data = data, diversity = 'TD', datatype = 'abundance', by_pair = TRUE) ``` ```{r echo=FALSE} ## Data information for taxonomic diversity for all pairs of assemblages data(Brazil_rainforests) data = list("Edge" = sapply(Brazil_rainforests, function(x) x[,1]), "Interior" = sapply(Brazil_rainforests, function(x) x[,2])) tmp = DataInfobeta3D(data = data, diversity = 'TD', datatype = 'abundance', by_pair = TRUE) tmp[,c("SC(n)","SC(2n)")] = round(tmp[,c("SC(n)","SC(2n)")], 3) tmp ``` Output description: definitions are the same as before and thus are omitted. ```{r eval=FALSE} ## Data information for phylogenetic diversity (not by pairs) data(Brazil_rainforests) data(Brazil_tree) DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'PD', datatype = 'abundance', PDtree = Brazil_tree, PDreftime = NULL) ``` ```{r echo=FALSE} ## Data information for phylogenetic diversity (not by pairs) data(Brazil_rainforests) data(Brazil_tree) tmp = DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'PD', datatype = 'abundance', PDtree = Brazil_tree, PDreftime = NULL) tmp[,c("SC(n)","SC(2n)")] = round(tmp[,c("SC(n)","SC(2n)")], 3) tmp ``` Information description: - `Dataset`, `Pair`, `Assemblage`, `n`, `S.obs`, `SC(n)` and `SC(2n)`: definitions are the same as in the TD output. - `PD.obs` = the observed total branch length in the phylogenetic tree spanned by all observed species. - `f1*`,`f2*` = the number of singletons and doubletons in the node/branch abundance set. - `g1`,`g2` = the total branch length of those singletons/doubletons in the node/branch abundance set. - `Reftime` = reference time for phylogenetic diversity (the age of the root of phylogenetic tree). ```{r eval=FALSE} ## Data information for functional diversity (under a specified threshold level, FDtype = 'tau_value', ## and not by pairs) data(Brazil_rainforests) data(Brazil_distM) DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', FDdistM = Brazil_distM, FDtype = 'tau_value', FDtau = NULL) ``` ```{r echo=FALSE} ## Data information for functional diversity (under a specified threshold level, FDtype = 'tau_value', and not by pairs) data(Brazil_rainforests) data(Brazil_distM) tmp = DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', FDdistM = Brazil_distM, FDtype = 'tau_value', FDtau = NULL) tmp[,c("SC(n)","SC(2n)","Tau")] = round(tmp[,c("SC(n)","SC(2n)","Tau")], 3) tmp ``` Information description: - `Dataset`, `Pair`, `Assemblage`, `n`, `S.obs`, `SC(n)` and `SC(2n)`: definitions are the same as in the TD output. - `a1*`,`a2*` = the number of singletons (`a1*`) and of doubletons (`a2*`) among the functionally indistinct set at the specified threshold level `'Tau'`. - `h1`,`h2` = the total contribution of singletons (`h1`) and of doubletons (`h2`) at the specified threshold level `'Tau'`. - `Tau` = the specified threshold level of distinctiveness. Default is dmean (the mean distance between any two individuals randomly selected from the pooled data over all datasets). ```{r eval=FALSE} ## Data information for functional diversity (FDtype = 'AUC' and not by pairs) data(Brazil_rainforests) data(Brazil_distM) DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', FDdistM = Brazil_distM, FDtype = 'AUC') ``` ```{r echo=FALSE} ## Data information for functional diversity (FDtype = 'AUC' and not by pairs) data(Brazil_rainforests) data(Brazil_distM) tmp = DataInfobeta3D(data = Brazil_rainforests[1:2], diversity = 'FD', datatype = 'abundance', FDdistM = Brazil_distM, FDtype = 'AUC') tmp[,c("SC(n)","SC(2n)","dmean","dmax")] = round(tmp[,c("SC(n)","SC(2n)","dmean","dmax")], 3) tmp ``` Information description: - `Dataset`, `Pair`, `Assemblage`, `n`, `S.obs`, `SC(n)` and `SC(2n)`: definitions are the same as in TD and thus are omitted. - `dmin` = the minimum distance among all non-diagonal elements in the distance matrix. - `dmean` = the mean distance between any two individuals randomly selected from each assemblage. - `dmax` = the maximum distance among all elements in the distance matrix. Below We use the demo dataset (`Second-growth forests`) to show the output of the function `DataInfobeta3D` for incidence data: ```{r eval=FALSE} ## Data information for taxonomic diversity with incidence data (not by pairs) data(Second_growth_forests) data = list("CR 2005 vs. 2017" = Second_growth_forests[[1]][c(1,3)], "JE 2005 vs. 2017" = Second_growth_forests[[2]][c(1,3)]) DataInfobeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw') ``` ```{r echo=FALSE} ## Data information for taxonomic diversity with incidence data (not by pairs) data(Second_growth_forests) data = list("CR 2005 vs. 2017" = Second_growth_forests[[1]][c(1,3)], "JE 2005 vs. 2017" = Second_growth_forests[[2]][c(1,3)]) tmp = DataInfobeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw') tmp[,c("SC(T)","SC(2T)")] = round(tmp[,c("SC(T)","SC(2T)")], 3) tmp ``` Information description: - `Dataset` = the input datasets. - `Pair` = combinations of assemblage pairs (if calculating not by pairs, then there is no such column). - `Assemblage` = Individual assemblages, `'Pooled assemblage'` (for gamma) or `'Joint assemblage'` (for alpha). - `T` = number of sampling units in the reference sample (sample size for incidence data). - `U` = total number of incidences in the reference sample. - `S.obs` = number of observed species in the reference sample. - `SC(T)` = sample coverage estimate of the reference sample. - `SC(2T)` = sample coverage estimate of twice the reference sample size. - `Q1`-`Q5` = the first five species incidence frequency counts in the reference sample. ```{r eval=FALSE} ## Data information for taxonomic diversity for all pairs of assemblages (incidence data) data(Second_growth_forests) data = Second_growth_forests names(data) = c("CR", "JE") DataInfobeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw', by_pair = TRUE) ``` ```{r echo=FALSE} ## Data information for taxonomic diversity for all pairs of assemblages (incidence data) data(Second_growth_forests) data = Second_growth_forests names(data) = c("CR", "JE") tmp = DataInfobeta3D(data = data, diversity = 'TD', datatype = 'incidence_raw', by_pair = TRUE) tmp[,c("SC(T)","SC(2T)")] = round(tmp[,c("SC(T)","SC(2T)")], 3) tmp ``` Output description: definitions are the same as before and thus are omitted. ## License and feedback The `iNEXT.beta3D` package is licensed under the GPLv3. To help refine `iNEXT.beta3D`, users' comments or feedback would be welcome (please send them to Anne Chao or report an issue on the `iNEXT.beta3D` github [iNEXT.beta3D_github](https://github.com/AnneChao/iNEXT.beta3D). ## References - Chao, A., Chiu, C.-H., Hu, K.-H., and Zeleny, D. (2023a). Revisiting Alwyn H. Gentry's forest transect data: a statistical sampling-model-based approach. Japanese Journal of Statistics and Data Science, 6, 861-884. (https://doi.org/10.1007/s42081-023-00214-1) - Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K.-H., Dornelas, M. and Magurran, A. E. (2021). Measuring temporal change in alpha diversity: a framework integrating taxonomic, phylogenetic and functional diversity and the iNEXT.3D standardization. Methods in Ecology and Evolution, 12, 1926-1940. - Chao, A., Thorn, S., Chiu, C.-H., Moyes, F., Hu, K.-H., Chazdon, R. L., Wu, J., Magnago, L. F. S., Dornelas, M., Zeleny, D., Colwell, R. K., and Magurran, A. E. (2023b). Rarefaction and extrapolation with beta diversity under a framework of Hill numbers: the iNEXT.beta3D standardization. Ecological Monographs e1588.