A Quick Introduction to iNEXT.3D via Examples

iNEXT.3D (INterpolation and EXTrapolation for three dimensions of biodiversity) is a sequel to iNEXT (Hsieh et al., 2016). Here the three dimensions (3D) of diversity include taxonomic diversity (TD), phylogenetic diversity (PD) and functional diversity (FD). An online version “iNEXT.3D Online” (https://chao.shinyapps.io/iNEXT_3D/) is also available for users without an R background.

A unified framework based on Hill numbers (for TD) and their generalizations (Hill-Chao numbers, for PD and FD) is adopted to quantify 3D. In this framework, TD quantifies the effective number of species, PD quantifies the effective total branch length, mean-PD (PD divided by tree depth) quantifies the effective number of lineages, and FD quantifies the effective number of virtual functional groups (or functional “species”). Thus, TD, mean-PD, and FD are all in the same units of species/lineage equivalents and can be meaningfully compared; see Chao et al. (2014) for the basic standardization theory for TD, and Chao et al. (2021) for a review of the unified theory for 3D.

For each of the three dimensions of biodiversity, iNEXT.3D features two statistical analyses (non-asymptotic and asymptotic):

A non-asymptotic approach based on interpolation and extrapolation for 3D diversity (i.e., Hill-Chao numbers)

iNEXT.3D computes the estimated 3D diversity for standardized samples with a common sample size or sample completeness. This approach aims to compare diversity estimates for equally-large (with a common sample size) or equally-complete (with a common sample coverage) samples; it is based on the seamless rarefaction and extrapolation (R/E) sampling curves of Hill-Chao numbers for q = 0, 1 and 2. For each dimension of biodiversity, iNEXT.3D offers three types of R/E sampling curves:

Sample-size-based (or size-based) R/E sampling curves: This type of sampling curve plots the diversity estimates with respect to sample size.
Coverage-based R/E sampling curves: This type of sampling curve plots the diversity estimates with respect to sample coverage.
Sample completeness curve: This curve depicts how sample coverage varies with sample size. The sample completeness curve provides a bridge between the size- and coverage-based R/E sampling curves.

An asymptotic approach to infer asymptotic 3D diversity (i.e., Hill-Chao numbers)

iNEXT.3D computes the estimated asymptotic 3D diversity and also plots 3D diversity profiles (q-profiles) for q between 0 and 2, in comparison with the observed diversity. Typically, the asymptotic estimates for q ≥ 1 are reliable, but for q < 1 (especially for q = 0, species richness), the asymptotic estimates represent only lower bounds. iNEXT.3D also features a time-profile (which depicts the observed and asymptotic estimate of PD or mean PD with respect to reference times), and a tau-profile (which depicts the observed and asymptotic estimate of FD with respect to threshold level tau).

How to cite

If you publish your work based on results from iNEXT.3D package, you should make references to the following methodology paper and the package:

Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K-H., Dornelas, M and. Magurran, A. E. (2021). Measuring temporal change in alpha diversity: a framework integrating taxonomic, phylogenetic and functional diversity and the iNEXT.3D standardization. Methods in Ecology and Evolution, 12, 1926-1940.
Chao, A. and Hu, K.-H. (2023). The iNEXT.3D package: interpolation and extrapolation for three dimensions of biodiversity. R package available from CRAN.

SOFTWARE NEEDED TO RUN iNEXT.3D IN R

Required: R
Suggested: RStudio IDE

HOW TO RUN iNEXT.3D:

The iNEXT.3D package can be downloaded from CRAN or Anne Chao’s iNEXT.3D_github using the commands below. For a first-time installation, some additional packages must be installed and loaded; see package manual.

## install iNEXT.3D package from CRAN
install.packages("iNEXT.3D")  

## or install the latest version from github
install.packages('devtools')
library(devtools)
install_github('AnneChao/iNEXT.3D')

## import packages
library(iNEXT.3D)

There are six main functions in this package:

Two functions for non-asymptotic analysis with graphical displays:

iNEXT3D computes standardized 3D diversity estimates of order q = 0, 1 and 2 for rarefied and extrapolated samples at specified sample coverage values and sample sizes.
ggiNEXT3D visualizes the output from the function iNEXT3D.

Two functions for point estimation and basic data information

estimate3D computes 3D diversity of order q = 0, 1 and 2 with a particular set of user-specified level of sample sizes or sample coverage values.
DataInfo3D provides basic data information based on the observed data.

Two functions for asymptotic analysis with graphical displays:

ObsAsy3D computes observed and asymptotic diversity of order q between 0 and 2 (in increments of 0.2) for 3D diversity; it also computes observed and asymptotic PD for specified reference times, and observed and asymptotic FD for specified threshold levels.
ggObsAsy3D visualizes the output from the function ObsAsy3D.

DATA INPUT FORMAT

Species abundance/incidence data format

Although species identities/names are not required to assess TD or compare TD across individual assemblages (as in the iNEXT package), they are required for PD and FD. Thus, for iNEXT.3D package, information on species identity (or any unique identification code) and assemblage affiliation is required. Two types of species abundance/incidence data are supported:

Individual-based abundance data (datatype = "abundance"): When there are multiple assemblages, in addition to the assemblage/site names (as column names) and the species names (as row names), species abundance data (reference sample) can be input as a species (in rows) by assemblage (in columns) matrix/data.frame or a list of species abundance vectors. In the special case that there is only one assemblage, all data should be read in one column.
Sampling-unit-based incidence data: Incidence-raw data (datatype = "incidence_raw"): for each assemblage, input data for a reference sample consist of a species-by-sampling-unit matrix, in addition to the sampling-unit names (as column names) and the species names (as row names). When there are N assemblages, input data consist of N lists of matrices, and each matrix is a species-by-sampling-unit matrix. Each element in the incidence raw matrix is 1 for a detection, and 0 for a non-detection. Input a matrix which combines data for all assemblages is allowed, but the argument nT in the function iNEXT3D must be specified so that the number of sampling units in each assemblage is specified.

For example, the dataset Brazil_rainforest_abun_data included in the iNEXT.3D package consists of species sample abundances of two assemblages/habitats: “Edge” and “Interior”. Run the following code to view the first 15 rows of the abundance data.

data("Brazil_rainforest_abun_data")
Brazil_rainforest_abun_data

                         Edge Interior
Carpotroche_brasiliensis   11       21
Astronium_concinnum       110       11
Astronium_graveolens       36        7
Spondias_macrocarpa        12        1
Spondias_venulosa           2        0
Tapirira_guianensis         7        1
Thyrsodium_spruceanum      11       11
Anaxagorea_silvatica        1       13
Annona_acutiflora           1        1
Annona_cacans               0        2
Annona_dolabripetala        3        3
Annona_sp                   0        1
Duguetia_chrysocarpa        1        1
Ephedranthus_sp1            1        0
Ephedranthus_sp2            0        1

We use data (Fish_incidence_data) collected from two time periods, namely "2013-2015" and "2016-2018", as an example. Each time period is designated as an assemblage. The purpose was to compare 3D diversity of the two time periods. In each time period, species incidence/occurrence was recorded in 36 sampling units in each assemblage; each sampling unit represents a sampling date. Thus, there are 36 columns in each time period. Run the following code to view the first 6 rows and 6 columns for each matrix.

data("Fish_incidence_data")
Fish_incidence_data

$`2013-2015`
                    17/01/2013 18/02/2013 19/03/2013 17/04/2013 16/05/2013 14/06/2013
Agonus_cataphractus          0          1          1          1          0          0
Alosa_fallax                 0          0          0          0          0          0
Ammodytes_tobianus           0          0          0          0          0          0
Anguilla_anguilla            0          1          1          0          0          0
Aphia_minuta                 0          0          0          0          1          1
Arnoglossus_laterna          0          0          0          0          0          0

$`2016-2018`
                    18/01/2016 15/02/2016 16/03/2016 14/04/2016 12/05/2016 10/06/2016
Agonus_cataphractus          1          1          1          1          1          0
Alosa_fallax                 0          0          0          0          0          0
Ammodytes_tobianus           0          0          0          0          0          0
Anguilla_anguilla            0          0          0          0          0          0
Aphia_minuta                 0          0          0          0          1          0
Arnoglossus_laterna          0          0          0          0          0          0

Phylogenetic tree format for PD

To perform PD analysis, the phylogenetic tree (in Newick format) spanned by species observed in the pooled data is required. For the dataset Fish_incidence_data, the phylogenetic tree for all observed species (including species in both time periods) is stored in the file fish_phylo_tree; for the dataset Brazil_rainforest_abun_data, the phylogenetic tree for all observed species (including species in both Edge and Interior habitats) is stored in the file Brazil_rainforest_phylo_tree. A partial list of the tip labels and node labels are shown below.

data("Brazil_rainforest_phylo_tree")
Brazil_rainforest_phylo_tree

Phylogenetic tree with 425 tips and 205 internal nodes.

Tip labels:
  Carpotroche_brasiliensis, Casearia_ulmifolia, Casearia_sp4, Casearia_sylvestris, Casearia_sp2, Casearia_sp3, ...
Node labels:
  magnoliales_to_asterales, poales_to_asterales, , , , celastrales_to_malpighiales, ...

Rooted; includes branch lengths.

Species pairwise distance matrix format for FD

To perform FD analysis, the species-pairwise distance matrix (Gower distance computed from species traits) for species observed in the pooled data is required in a matrix/data.frame format. For the dataset Fish_incidence_data, the distance matrix for all observed species (including species in both time periods) is stored in the file fish_dist_matrix; for the dataset Brazil_rainforest_abun_data, the distance matrix for all species (including species in both Edge and Interior habitats) is stored in the file Brazil_rainforest_dist_matrix. The distance matrix for the first 3 Brazil rainforest tree species is shown below.

data("Brazil_rainforest_distance_matrix")
Brazil_rainforest_distance_matrix

                         Carpotroche_brasiliensis Astronium_concinnum Astronium_graveolens
Carpotroche_brasiliensis                    0.000               0.522                0.522
Astronium_concinnum                         0.522               0.000                0.000
Astronium_graveolens                        0.522               0.000                0.000

MAIN FUNCTION iNEXT3D(): RAREFACTION/EXTRAPOLATION

We first describe the main function iNEXT3D() with default arguments:

iNEXT3D(data, diversity = 'TD', q = c(0,1,2), datatype = "abundance", 
        size = NULL, endpoint = NULL, knots = 40, nboot = 50, conf = 0.95, nT = NULL, 
        PDtree = NULL, PDreftime = NULL, PDtype = 'meanPD', 
        FDdistM, FDtype = 'AUC', FDtau = NULL, FDcut_number = 50)

The arguments of this function are briefly described below, and will be explained in more details by illustrative examples in later text. This main function computes standardized 3D diversity estimates of order q = 0, 1 and 2, the sample coverage estimates, and related statistics for K (if knots = K in the specified argument) evenly-spaced knots (sample sizes) between size 1 and the endpoint, where the endpoint is described below. Each knot represents a particular sample size for which 3D diversity estimates will be calculated. By default, endpoint = double the reference sample size for abundance data or double the total sampling units for incidence data. For example, if endpoint = 10, knot = 4 is specified, diversity estimates will be computed for a sequence of samples with sizes (1, 4, 7, 10).

Argument	Description
`data`	For `datatype = “abundance”`, data can be input as a vector of species abundances (for a single assemblage), matrix/data.frame (species by assemblages), or a list of species abundance vectors. For `datatype = “incidence_raw”`, data can be input as a list of matrices/data.frames (species by sampling units); data can also be input as a single matrix/data.frame by merging all sampling units across assemblages based on species identity; in this case, the number of sampling units (nT, see below) must be specified.
`diversity`	selection of diversity type: `‘TD’` = Taxonomic diversity, `‘PD’` = Phylogenetic diversity, and `‘FD’` = Functional diversity.
`q`	a numerical vector specifying the diversity orders. Default is `c(0, 1, 2)`.
`datatype`	data type of input data: individual-based abundance data (`datatype = “abundance”`), or species by sampling-units incidence/occurrence matrix (`datatype = “incidence_raw”`) with all entries being 0 (non-detection) or 1 (detection).
`size`	an integer vector of sample sizes (number of individuals or sampling units) for which diversity estimates will be computed. If `NULL`, then diversity estimates will be computed for those sample sizes determined by the specified/default `endpoint` and `knots`.
`endpoint`	an integer specifying the sample size that is the `endpoint` for rarefaction/extrapolation. If NULL, then `endpoint` `=` double the reference sample size.
`knots`	an integer specifying the number of equally-spaced `knots` (say K, default is 40) between size 1 and the `endpoint`; each knot represents a particular sample size for which diversity estimate will be calculated. If the `endpoint` is smaller than the reference sample size, then `iNEXT3D()` computes only the rarefaction estimates for approximately K evenly spaced `knots`. If the `endpoint` is larger than the reference sample size, then `iNEXT3D()` computes rarefaction estimates for approximately K/2 evenly spaced `knots` between sample size 1 and the reference sample size, and computes extrapolation estimates for approximately K/2 evenly spaced `knots` between the reference sample size and the `endpoint`.
`nboot`	a positive integer specifying the number of bootstrap replications when assessing sampling uncertainty and constructing confidence intervals. Enter 0 to skip the bootstrap procedures. Default is 50.
`conf`	a positive number < 1 specifying the level of confidence interval. Default is 0.95.
`nT`	(required only when `datatype = “incidence_raw”` and input data in a single matrix/data.frame) a vector of nonnegative integers specifying the number of sampling units in each assemblage. If assemblage names are not specified(i.e., `names(nT) = NULL`), then assemblages are automatically named as “assemblage1”, “assemblage2”,…, etc.
`PDtree`	(required argument for `diversity = “PD”`), a phylogenetic tree in Newick format for all observed species in the pooled assemblage.
`PDreftime`	(argument only for `diversity = “PD”`), a vector of numerical values specifying reference times for PD. Default is `NULL` (i.e., the age of the root of PDtree).
`PDtype`	(argument only for `diversity = “PD”`), select PD type: `PDtype = “PD”` (effective total branch length) or `PDtype = “meanPD”` (effective number of equally divergent lineages). Default is `“meanPD”`, where `meanPD = PD/tree depth`.
`FDdistM`	(required argument for `diversity = “FD”`), a species pairwise distance matrix for all species in the pooled assemblage.
`FDtype`	(argument only for `diversity = “FD”`), select FD type: `FDtype = “tau_values”` for FD under specified threshold values, or `FDtype = “AUC”` (area under the curve of tau-profile) for an overall FD which integrates all threshold values between zero and one. Default is `“AUC”`.
`FDtau`	(argument only for `diversity = “FD”` and `FDtype = “tau_values”`), a numerical vector between 0 and 1 specifying tau values (threshold levels). If `NULL` (default), then threshold is set to be the mean distance between any two individuals randomly selected from the pooled assemblage (i.e., quadratic entropy).
`FDcut_number`	(argument only for `diversity = “FD”` and `FDtype = “AUC”`), a numeric number to cut [0, 1] interval into equal-spaced sub-intervals to obtain the AUC value by integrating the tau-profile. Equivalently, the number of tau values that will be considered to compute the integrated AUC value. Default is `FDcut_number = 50`. A larger value can be set to obtain more accurate AUC value.

For each dimension of diversity (TD, PD, FD), the main function iNEXT3D() returns the iNEXT3D object, which can be further used to make plots using the function ggiNEXT3D() to be described below. The "iNEXT3D" object includes three lists:

$TDInfo ($PDInfo,or $FDInfo) for summarizing data information.
$TDiNextEst ($PDiNextEst, or $FDiNextEst) for showing diversity estimates along with related statistics for a series of rarefied and extrapolated samples; there are two data frames ($size_based and $coverage_based) conditioning on standardized sample size or sample coverage, respectively.
$TDAsyEst ($PDAsyEst, or $FDAsyEst) for showing asymptotic diversity estimates along with related statistics.

FUNCTION ggiNEXT3D(): GRAPHIC DISPLAYS

The function ggiNEXT3D(), which extends ggplot2 with default arguments, is described as follows:

ggiNEXT3D(output, type = 1:3, facet.var = "Assemblage", color.var = "Order.q")

Here output is the iNEXT3D() object. Three types of curves are allowed for 3D diversity:

Sample-size-based R/E curve (type = 1): This curve plots diversity estimates with confidence intervals as a function of sample size.
Sample completeness curve (type = 2): This curve plots the sample coverage with respect to sample size.
Coverage-based R/E curve (type = 3): This curve plots the diversity estimates with confidence intervals as a function of sample coverage.

The argument facet.var = "Order.q", facet.var = "Assemblage", facet.var = "Both", or facet.var = "None" is used to create a separate plot for each value of the specified variable.

The ggiNEXT3D() function is a wrapper with the package ggplot2 to create a rarefaction/extrapolation sampling curve in a single line of code. The figure object is of class "ggplot", so it can be manipulated by using the ggplot2 tools.

TAXONOMIC DIVERSITY (TD): RAREFACTION/EXTRAPOLATION VIA EXAMPLES

EXAMPLE 1: TD rarefaction/extrapolation for abundance data

Based on the dataset (Brazil_rainforest_abun_data) included in the package, the following commands return all numerical results for TD. The first list of the output ($TDInfo) returns basic data information including the name of the Assemblage, sample size (n), observed species richness (S.obs), sample coverage estimate of the reference sample with size n (SC(n)), sample coverage estimate of the extrapolated sample with size 2n (SC(2n)) as well as the first five species abundance frequency counts in the reference sample (f1-f5). The output is identical to that based on the function DataInfo3D() by specifying diversity = 'TD' and datatype = "abundance"; see later text). Thus, if only data information is required, the simpler function DataInfo3D() (see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D(), which will be introduced later.

data(Brazil_rainforest_abun_data)
output_TD_abun <- iNEXT3D(Brazil_rainforest_abun_data, diversity = 'TD', q = c(0,1,2), 
                          datatype = "abundance")
output_TD_abun$TDInfo

$TDInfo
  Assemblage    n S.obs SC(n) SC(2n)  f1 f2 f3 f4 f5
1       Edge 1794   319 0.939  0.974 110 48 38 28 13
2   Interior 2074   356 0.941  0.973 123 48 41 32 19

The second list of the output ($TDiNextEst) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the “Edge” assemblage, corresponding to the target sample size m = 1, 95, 189, …, 1699, 1794, 1795, 1899, …, 3588), which locates the reference sample size at the mid-point of the selected knots. There are two data frames ($size_based and $coverage_based).

The first data frame ($size_based) includes the name of the Assemblage, diversity order (Order.q), the target sample size (m), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the size m is less than, equal to, or greater than the reference sample size), the diversity estimate of order q (qTD), the lower and upper confidence limits of diversity (qTD.LCL and qTD.UCL) conditioning on the sample size, and the corresponding sample coverage estimate (SC) along with the lower and upper confidence limits of sample coverage (SC.LCL and SC.UCL). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based output are displayed:

output_TD_abun$TDiNextEst$size_based

  Assemblage Order.q   m      Method     qTD qTD.LCL qTD.UCL    SC SC.LCL SC.UCL
1       Edge       0   1 Rarefaction   1.000   1.000   1.000 0.012  0.010  0.013
2       Edge       0  95 Rarefaction  66.306  65.080  67.532 0.484  0.468  0.500
3       Edge       0 189 Rarefaction 106.743 104.144 109.342 0.638  0.622  0.653
4       Edge       0 284 Rarefaction 137.029 133.162 140.896 0.718  0.704  0.733
5       Edge       0 378 Rarefaction 161.010 156.005 166.016 0.768  0.755  0.782
6       Edge       0 472 Rarefaction 181.073 175.028 187.119 0.803  0.791  0.816

The second data frame ($coverage_based) includes the name of assemblage, the diversity order (Order.q), the target sample coverage value (SC), the corresponding sample size (m), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the coverage SC is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qTD), the lower and upper confidence limits of diversity (qTD.LCL and qTD.UCL) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based table is wider than the corresponding interval in the $size_based table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)

output_TD_abun$TDiNextEst$coverage_based

  Assemblage Order.q    SC   m      Method     qTD qTD.LCL qTD.UCL
1       Edge       0 0.012   1 Rarefaction   1.000   0.895   1.105
2       Edge       0 0.484  95 Rarefaction  66.306  62.007  70.605
3       Edge       0 0.638 189 Rarefaction 106.743  99.796 113.690
4       Edge       0 0.718 284 Rarefaction 137.029 127.925 146.134
5       Edge       0 0.768 378 Rarefaction 161.010 150.110 171.911
6       Edge       0 0.803 472 Rarefaction 181.073 168.601 193.546

The third list of the output ($TDAsyEst) includes the name of the Assemblage, diversity label (qTD, species richness for q = 0, Shannon diversity for q = 1, and Simpson diversity for q = 2), the observed diversity (TD_obs), asymptotic diversity estimate (TD_asy) and its estimated bootstrap standard error (s.e.) as well as the confidence intervals for asymptotic diversity (qTD.LCL and qTD.UCL). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D(). The output for $TDAsyEst is shown below:

output_TD_abun$TDAsyEst

  Assemblage               qTD  TD_obs  TD_asy   s.e. qTD.LCL qTD.UCL
1       Edge  Species richness 319.000 444.971 28.698 388.724 501.219
2       Edge Shannon diversity 155.386 178.000  5.428 167.360 188.639
3       Edge Simpson diversity  82.023  85.905  4.851  76.397  95.414
4   Interior  Species richness 356.000 513.518 31.183 452.401 574.635
5   Interior Shannon diversity 163.514 186.983  5.341 176.515 197.451
6   Interior Simpson diversity  72.153  74.718  3.925  67.025  82.410

The ggiNEXT3D function can be used to make graphical displays for rarefaction and extrapolation sampling curves. When facet.var = "Assemblage" is specified in the ggiNEXT3D function, it creates a separate plot for each assemblage; within each assemblage, different color curves represent diversity of different orders. An example for showing sample-size-based rarefaction/extrapolation curves (type = 1) is given below:

# TD sample-size-based R/E curves, separating by "Assemblage"
ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Assemblage")

When facet.var = "Order.q" is specified in the ggiNEXT3D function, it creates a separate plot for each diversity order; within each plot, different color curves represent different assemblages. An example is shown below:

# TD sample-size-based R/E curves, separating by "Order.q"
ggiNEXT3D(output_TD_abun, type = 1, facet.var = "Order.q")

The following commands return the sample completeness (sample coverage) curve (type = 2) in which different colors represent different assemblages.

# Sample completeness curves for abundance data, separating by "Assemblage"
ggiNEXT3D(output_TD_abun, type = 2, color.var = "Assemblage")

The following commands return the coverage-based rarefaction/extrapolation sampling curves in which different color curves represent three diversity orders within each assemblage (facet.var = "Assemblage"), or represent two assemblages within each diversity order (facet.var = "Order.q"), respectively.

# TD coverage-based R/E curves, separating by "Assemblage"
ggiNEXT3D(output_TD_abun, type = 3, facet.var = "Assemblage")

# TD coverage-based R/E curves, separating by "Order.q"
ggiNEXT3D(output_TD_abun, type = 3, facet.var = "Order.q")

EXAMPLE 2: TD rarefaction/extrapolation for incidence data

Based on the dataset (Fish_incidence_data) included in the package, the following commands return all numerical results for TD. The first list of the output ($TDInfo) returns basic data information including the name of the Assemblage, number of sampling units (T), total number of incidences (U), observed species richness (S.obs), sample coverage estimate of the reference sample with size T (SC(T)), sample coverage estimate of the extrapolated sample with size 2T (SC(2T)) as well as the first five species incidence frequency counts in the reference sample (Q1-Q5). The output is identical to that based on the function DataInfo3D() by specifying diversity = 'TD' and datatype = "incidence_raw"; see later text). Thus, if only data information is required, the simpler function DataInfo3D() (see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D(), which will be introduced later.

data(Fish_incidence_data)
output_TD_inci <- iNEXT3D(Fish_incidence_data, diversity = 'TD', q = c(0, 1, 2), 
                          datatype = "incidence_raw")
output_TD_inci$TDInfo

$TDInfo
  Assemblage  T   U S.obs SC(T) SC(2T) Q1 Q2 Q3 Q4 Q5
1  2013-2015 36 532    50 0.980  0.993 11  6  4  1  3
2  2016-2018 36 522    53 0.976  0.989 13  5  5  2  3

The second list of the output ($TDiNextEst) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "2013-2015" time period, corresponding to the target number of sample units mT = 1, 2, 4, …, 34, 36, 37, 38, …, 72), which locates the reference sampling units at the mid-point of the selected knots. There are two data frames ($size_based and $coverage_based).

The first data frame ($size_based) includes the name of the Assemblage, diversity order (Order.q), the target number of sampling units (mT), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the target number of sample units mT is less than, equal to, or greater than the number of sampling units in the reference sample), the diversity estimate of order q (qTD), the lower and upper confidence limits of diversity (qTD.LCL and qTD.UCL) conditioning on the sample size, and the corresponding sample coverage estimate (SC) along with the lower and upper confidence limits of sample coverage (SC.LCL and SC.UCL). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based output are displayed:

output_TD_inci$TDiNextEst$size_based

  Assemblage Order.q mT      Method    qTD qTD.LCL qTD.UCL    SC SC.LCL SC.UCL
1  2013-2015       0  1 Rarefaction 14.778  14.101  15.455 0.606  0.578  0.633
2  2013-2015       0  2 Rarefaction 20.603  19.535  21.671 0.749  0.727  0.770
3  2013-2015       0  4 Rarefaction 27.079  25.482  28.677 0.851  0.833  0.868
4  2013-2015       0  6 Rarefaction 31.121  29.129  33.114 0.894  0.878  0.910
5  2013-2015       0  8 Rarefaction 34.042  31.709  36.375 0.919  0.905  0.933
6  2013-2015       0 10 Rarefaction 36.319  33.686  38.952 0.934  0.921  0.947

The second data frame ($coverage_based) includes the name of assemblage, the diversity order (Order.q), the target sample coverage value (SC), the corresponding number of sampling units (mT), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the coverage SC is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qTD), the lower and upper confidence limits of diversity (qTD.LCL and qTD.UCL) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based table is wider than the corresponding interval in the $size_based table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)

output_TD_inci$TDiNextEst$coverage_based

  Assemblage Order.q    SC mT      Method    qTD qTD.LCL qTD.UCL
1  2013-2015       0 0.606  1 Rarefaction 14.778  13.716  15.840
2  2013-2015       0 0.749  2 Rarefaction 20.603  18.968  22.238
3  2013-2015       0 0.851  4 Rarefaction 27.079  24.660  29.499
4  2013-2015       0 0.894  6 Rarefaction 31.121  28.088  34.154
5  2013-2015       0 0.919  8 Rarefaction 34.042  30.509  37.574
6  2013-2015       0 0.934 10 Rarefaction 36.319  32.366  40.272

output_TD_inci$TDAsyEst

  Assemblage               qTD TD_obs TD_asy   s.e. qTD.LCL qTD.UCL
1  2013-2015  Species richness 50.000 59.803  7.712  44.689  74.918
2  2013-2015 Shannon diversity 30.089 31.542  1.102  29.381  33.702
3  2013-2015 Simpson diversity 23.961 24.394  0.739  22.946  25.842
4  2016-2018  Species richness 53.000 69.431 23.365  23.636 115.225
5  2016-2018 Shannon diversity 31.534 33.393  1.199  31.044  35.742
6  2016-2018 Simpson diversity 24.889 25.409  0.899  23.648  27.170

# TD sample-size-based R/E curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_TD_inci, type = 1, facet.var = "Assemblage")

# TD sample-size-based R/E curves for incidence data, separating by "Order.q"
ggiNEXT3D(output_TD_inci, type = 1, facet.var = "Order.q")

The following commands return the sample completeness (sample coverage) curve (type = 2) in which different colors are used for different assemblages.

# Sample completeness curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_TD_inci, type = 2, color.var = "Assemblage")

# TD coverage-based R/E curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_TD_inci, type = 3, facet.var = "Assemblage")

# TD coverage-based R/E curves for incidence data, separating by "Order.q"
ggiNEXT3D(output_TD_inci, type = 3, facet.var = "Order.q")

PHYLOGENETIC DIVERSITY (PD): RAREFACTION/EXTRAPOLATION VIA EXAMPLES

EXAMPLE 3: PD rarefaction/extrapolation for abundance data

Based on the dataset (Brazil_rainforest_abun_data) and the phylogentic tree (Brazil_rainforest_phylo_tree) included in the package, the following commands return all numerical results for PD. The first list of the output ($PDInfo) returns basic data information including the name of the Assemblage, sample size (n), observed species richness (S.obs), sample coverage estimate of the reference sample with size n (SC(n)), sample coverage estimate of the extrapolated sample with size 2n (SC(2n)), the observed total branch length in the phylogenetic tree spanned by all observed specise (PD.obs), the number of singletons and doubletons in the node/branch abundance set (f1*,f2*), the total branch length of those singletons and doubletons in the node/branch abundance set (g1,g2), and the reference time (Reftime). The output is identical to that based on the function DataInfo3D() by specifying diversity = 'PD' and datatype = "abundance"; see later text). Thus, if only data information is required, the simpler function DataInfo3D() (see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D(), which will be introduced later.

The required argument for performing PD analysis is PDtree. For example, the phylogenetic tree for all observed species (including species in both Edge and Interior habitats) is stored in Brazil_rainforest_phylo_tree. Then we enter the argument PDtree = Brazil_rainforest_phylo_tree. Two optional arguments are: PDtype and PDreftime. There are two options for PDtype: "PD" (effective total branch length) or "meanPD" (effective number of equally divergent lineages, meanPD = PD/tree depth). Default is PDtype = "meanPD". PDreftime is a numerical value specifying a reference time for computing phylogenetic diversity. By default (PDreftime = NULL), the reference time is set to the tree depth, i.e., age of the root of the phylogenetic tree. Run the following code to perform PD analysis.

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_phylo_tree)
data <- Brazil_rainforest_abun_data
tree <- Brazil_rainforest_phylo_tree
output_PD_abun <- iNEXT3D(data, diversity = 'PD', q = c(0, 1, 2), datatype = "abundance", 
                          nboot = 20, PDtree = tree)
output_PD_abun$PDInfo

$PDInfo
# A tibble: 2 × 11
  Assemblage     n S.obs `SC(n)` `SC(2n)` PD.obs `f1*` `f2*`    g1    g2 Reftime
  <chr>      <int> <int>   <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Edge        1794   319   0.939    0.974  24516   110    52  6578  2885     400
2 Interior    2074   356   0.941    0.973  27727   123    56  7065  3656     400

The second list of the output ($PDiNextEst) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the “Edge” assemblage, corresponding to the target sample size m = 1, 95, 189, …, 1699, 1794, 1795, 1899, …, 3588), which locates the reference sample size at the mid-point of the selected knots. There are two data frames ($size_based and $coverage_based).

The first data frame ($size_based) includes the name of the Assemblage, diversity order (Order.q), the target sample size (m), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the size m is less than, equal to, or greater than the reference sample size), the diversity estimate of order q (qPD), the lower and upper confidence limits of diversity (qPD.LCL and qPD.UCL) conditioning on the sample size, the corresponding sample coverage estimate (SC) along with the lower and upper confidence limits of sample coverage (SC.LCL and SC.UCL), the reference time (Reftime) and the type of PD (Type). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based output are displayed:

output_PD_abun$PDiNextEst$size_based

  Assemblage Order.q   m      Method    qPD qPD.LCL qPD.UCL    SC SC.LCL SC.UCL Reftime   Type
1       Edge       0   1 Rarefaction  1.000   0.988   1.012 0.012  0.010  0.013     400 meanPD
2       Edge       0  95 Rarefaction 18.547  18.171  18.922 0.484  0.467  0.501     400 meanPD
3       Edge       0 189 Rarefaction 26.723  26.005  27.441 0.638  0.624  0.651     400 meanPD
4       Edge       0 284 Rarefaction 32.305  31.311  33.299 0.718  0.706  0.731     400 meanPD
5       Edge       0 378 Rarefaction 36.498  35.286  37.710 0.768  0.756  0.781     400 meanPD
6       Edge       0 472 Rarefaction 39.882  38.484  41.279 0.803  0.791  0.815     400 meanPD

The second data frame ($coverage_based) includes the name of assemblage, the diversity order (Order.q), the target sample coverage value (SC), the corresponding sample size (m), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the coverage SC is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qPD), the lower and upper confidence limits of diversity (qPD.LCL and qPD.UCL) conditioning on the target sample coverage value, the reference times (Reftime) and the type of PD (Type). Here only the first six rows of the $coverage_based output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based table is wider than the corresponding interval in the $size_based table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)

output_PD_abun$PDiNextEst$coverage_based

  Assemblage Order.q    SC   m      Method    qPD qPD.LCL qPD.UCL Reftime   Type
1       Edge       0 0.012   1 Rarefaction  1.000   0.913   1.087     400 meanPD
2       Edge       0 0.484  95 Rarefaction 18.547  17.518  19.576     400 meanPD
3       Edge       0 0.638 189 Rarefaction 26.723  25.361  28.085     400 meanPD
4       Edge       0 0.718 284 Rarefaction 32.305  30.631  33.979     400 meanPD
5       Edge       0 0.768 378 Rarefaction 36.498  34.516  38.481     400 meanPD
6       Edge       0 0.803 472 Rarefaction 39.882  37.606  42.157     400 meanPD

The third list of the output ($PDAsyEst) includes the name of the Assemblage, PD (or meanPD) for q = 0, 1, and 2 (qPD), the observed diversity (PD_obs), asymptotic diversity estimates (PD_asy), estimated asymptotic bootstrap standard error (s.e.) as well as the confidence intervals for asymptotic diversity with q = 0, 1, and 2 (qPD.LCL and qPD.UCL), the reference times (Reftime) and the type of PD (Type). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D(). The output is shown below:

output_PD_abun$PDAsyEst

  Assemblage      qPD PD_obs PD_asy  s.e. qPD.LCL qPD.UCL Reftime   Type
1       Edge q = 0 PD 61.290 80.027 5.150  69.933  90.122     400 meanPD
2       Edge q = 1 PD  5.246  5.372 0.110   5.156   5.587     400 meanPD
3       Edge q = 2 PD  1.797  1.798 0.028   1.742   1.853     400 meanPD
4   Interior q = 0 PD 69.318 86.375 3.026  80.444  92.306     400 meanPD
5   Interior q = 1 PD  5.721  5.854 0.128   5.603   6.105     400 meanPD
6   Interior q = 2 PD  1.914  1.915 0.032   1.853   1.977     400 meanPD

# PD sample-size-based R/E curves, separating by "Assemblage"
ggiNEXT3D(output_PD_abun, type = 1, facet.var = "Assemblage")

# PD sample-size-based R/E curves, separating by "Order.q"
ggiNEXT3D(output_PD_abun, type = 1, facet.var = "Order.q")

The following commands return the sample completeness (sample coverage) curve (type = 2) in which different colors are used for different assemblages.

# Sample completeness curves for abundance data, separating by "Assemblage"
ggiNEXT3D(output_PD_abun, type = 2, color.var = "Assemblage")

# PD coverage-based R/E curves, separating by "Assemblage"
ggiNEXT3D(output_PD_abun, type = 3, facet.var = "Assemblage")

# PD coverage-based R/E curves, separating by "Order.q"
ggiNEXT3D(output_PD_abun, type = 3, facet.var = "Order.q")

EXAMPLE 4: PD rarefaction/extrapolation for incidence data

Based on the dataset (Fish_incidence_data) included in the package and the phylogentic tree (Fish_phylo_tree), the following commands return all numerical results for PD. The first list of the output ($PDInfo) returns basic data information including the name of the Assemblage, number of sampling units (T), total number of incidences (U), observed species richness (S.obs), sample coverage estimate of the reference sample with size T (SC(T)), sample coverage estimate of the extrapolated sample with size 2T (SC(2T)), the observed total branch length in the phylogenetic tree spanned by all observed species (PD.obs), the singletons/doubletons in the sample branch incidence (Q1*,Q2*), the total branch length of those singletons/doubletons in the sample branch incidence (R1,R2), and the reference time (Reftime). The output is identical to that based on the function DataInfo3D() by specifying diversity = 'PD' and datatype = "incidence_raw"; see later text). Thus, if only data information is required, the simpler function DataInfo3D() (see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D(), which will be introduced later.

The required argument for performing PD analysis is PDtree. For example, the phylogenetic tree for all observed species (including species in both "2013-2015" and "2016-2018" time periods) is stored in Fish_phylo_tree. Then we enter the argument PDtree = Fish_phylo_tree. Two optional arguments are: PDtype and PDreftime. There are two options for PDtype: "PD" (effective total branch length) or "meanPD" (effective number of equally divergent lineages, meanPD = PD/tree depth). Default is PDtype = "meanPD". PDreftime is a numerical value specifying a reference time for computing phylogenetic diversity. By default (PDreftime = NULL), the reference time is set to the tree depth, i.e., age of the root of the phylogenetic tree. Run the following code to perform PD analysis.

data(Fish_incidence_data)
data(Fish_phylo_tree)
data <- Fish_incidence_data
tree <- Fish_phylo_tree
output_PD_inci <- iNEXT3D(data, diversity = 'PD', q = c(0, 1, 2), 
                          datatype = "incidence_raw", nboot = 20, PDtree = tree)
output_PD_inci$PDInfo

$PDInfo
# A tibble: 2 × 12
  Assemblage     T     U S.obs `SC(T)` `SC(2T)` PD.obs `Q1*` `Q2*`    R1    R2 Reftime
  <chr>      <int> <int> <int>   <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 2013-2015     36   532    50   0.98     0.993   9.62    11     7 0.69  1.23    0.977
2 2016-2018     36   522    53   0.976    0.989   9.44    13     6 0.368 0.345   0.977

The second list of the output ($PDiNextEst) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "2013-2015" time period, corresponding to the target number of sample units mT = 1, 2, 4, …, 34, 36, 37, 38, …, 72), which locates the reference sampling units at the mid-point of the selected knots. There are two data frames ($size_based and $coverage_based).

The first data frame ($size_based) includes the name of the Assemblage, diversity order (Order.q), the target number of sample units (mT), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the target number of sample units mT is less than, equal to, or greater than the number of sampling units in the reference sample), the diversity estimate of order q (qPD), the lower and upper confidence limits of diversity (qPD.LCL and qPD.UCL) conditioning on the sample size, the corresponding sample coverage estimate (SC) along with the lower and upper confidence limits of sample coverage (SC.LCL and SC.UCL), the reference time (Reftime) and the type of PD (Type). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based output are displayed:

output_PD_inci$PDiNextEst$size_based

  Assemblage Order.q mT      Method   qPD qPD.LCL qPD.UCL    SC SC.LCL SC.UCL Reftime   Type
1  2013-2015       0  1 Rarefaction 5.744   5.559   5.929 0.606  0.584  0.628   0.977 meanPD
2  2013-2015       0  2 Rarefaction 6.813   6.578   7.048 0.749  0.727  0.770   0.977 meanPD
3  2013-2015       0  4 Rarefaction 7.716   7.442   7.991 0.851  0.832  0.870   0.977 meanPD
4  2013-2015       0  6 Rarefaction 8.130   7.782   8.477 0.894  0.877  0.911   0.977 meanPD
5  2013-2015       0  8 Rarefaction 8.389   7.970   8.809 0.919  0.903  0.934   0.977 meanPD
6  2013-2015       0 10 Rarefaction 8.589   8.108   9.070 0.934  0.920  0.948   0.977 meanPD

The second data frame ($coverage_based) includes the name of assemblage, the diversity order (Order.q), the target sample coverage value (SC), the corresponding number of sample units (mT), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the coverage SC is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qPD), the lower and upper confidence limits of diversity (qPD.LCL and qPD.UCL) conditioning on the target sample coverage value, the reference time (Reftime) and the type of PD (Type). Here only the first six rows of the $coverage_based output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based table is wider than the corresponding interval in the $size_based table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)

output_PD_inci$PDiNextEst$coverage_based

  Assemblage Order.q    SC mT      Method   qPD qPD.LCL qPD.UCL Reftime   Type
1  2013-2015       0 0.606  1 Rarefaction 5.744   5.505   5.983   0.977 meanPD
2  2013-2015       0 0.749  2 Rarefaction 6.813   6.523   7.103   0.977 meanPD
3  2013-2015       0 0.851  4 Rarefaction 7.716   7.360   8.073   0.977 meanPD
4  2013-2015       0 0.894  6 Rarefaction 8.130   7.714   8.545   0.977 meanPD
5  2013-2015       0 0.919  8 Rarefaction 8.389   7.913   8.866   0.977 meanPD
6  2013-2015       0 0.934 10 Rarefaction 8.589   8.057   9.121   0.977 meanPD

The third list of the output ($PDAsyEst) includes the name of the Assemblage, PD (or meanPD) for q = 0, 1, and 2 (qPD), the observed diversity (PD_obs), asymptotic diversity estimate (PD_asy) and its estimated bootstrap standard error (s.e.), the confidence intervals for asymptotic diversity (qPD.LCL and qPD.UCL), the reference time (Reftime) and the type of PD (Type) . These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D(). The output is shown below:

output_PD_inci$PDAsyEst

  Assemblage      qPD PD_obs PD_asy  s.e. qPD.LCL qPD.UCL Reftime   Type
1  2013-2015 q = 0 PD  9.847 10.039 1.063   7.956  12.123   0.977 meanPD
2  2013-2015 q = 1 PD  7.635  7.729 0.174   7.388   8.069   0.977 meanPD
3  2013-2015 q = 2 PD  7.013  7.057 0.166   6.732   7.383   0.977 meanPD
4  2016-2018 q = 0 PD  9.659  9.854 0.984   7.925  11.783   0.977 meanPD
5  2016-2018 q = 1 PD  7.781  7.859 0.111   7.642   8.077   0.977 meanPD
6  2016-2018 q = 2 PD  7.202  7.244 0.113   7.023   7.465   0.977 meanPD

# PD sample-size-based R/E curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_PD_inci, type = 1, facet.var = "Assemblage")

# PD sample-size-based R/E curves for incidence data, separating by "Order.q"
ggiNEXT3D(output_PD_inci, type = 1, facet.var = "Order.q")

The following commands return the sample completeness (sample coverage) curve (type = 2) in which different colors are used for different assemblages.

# Sample completeness curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_PD_inci, type = 2, color.var = "Assemblage")

# PD coverage-based R/E curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_PD_inci, type = 3, facet.var = "Assemblage")

# PD coverage-based R/E curves for incidence data, separating by "Order.q"
ggiNEXT3D(output_PD_inci, type = 3, facet.var = "Order.q")

FUNCTIONAL DIVERSITY (FD): RAREFACTION/EXTRAPOLATION VIA EXAMPLES

EXAMPLE 5: FD rarefaction/extrapolation for abundance data

Based on the dataset (Brazil_rainforest_abun_data) and the the distance matrix (Brazil_rainforest_distance_matrix) included in the package, the following commands return all numerical results for FD. The first list of the output ($FDInfo) returns basic data information including the name of the Assemblage, sample size (n), observed species richness (S.obs), sample coverage estimate of the reference sample with size n (SC(n)), sample coverage estimate of the extrapolated sample with size 2n (SC(2n)), and the minimum, mean, and maximum distance among all non-diagonal elements in the distance matrix(dmin, dmean, dmax). The output is identical to that based on the function DataInfo3D() by specifying diversity = 'FD' and datatype = "abundance"; see later text). Thus, if only data information is required, the simpler function DataInfo3D() (see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D(), which will be introduced later.

The required argument for performing FD analysis is FDdistM. For example, the distance matrix for all species (including species in both Edge and Interior habitats) is stored in Brazil_rainforest_distance_matrix. Then we enter the argument FDdistM = Brazil_rainforest_distance_matrix Three optional arguments are (1) FDtype: FDtype = "AUC" means FD is computed from the area under the curve of a tau-profile by integrating all plausible threshold values between zero and one; FDtype = "tau_values" means FD is computed under specific threshold values to be specified in the argument FD_tau. (2) FD_tau: a numerical value specifying the tau value (threshold level) that will be used to compute FD. If FDtype = "tau_values" and FD_tau = NULL, then the threshold level is set to be the mean distance between any two individuals randomly selected from the pooled data over all data (i.e., quadratic entropy).

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_distance_matrix)
data <- Brazil_rainforest_abun_data
distM <- Brazil_rainforest_distance_matrix
output_FD_abun <- iNEXT3D(data, diversity = 'FD', datatype = "abundance", nboot = 10, 
                          FDdistM = distM, FDtype = 'AUC')
output_FD_abun$FDInfo

$FDInfo
  Assemblage    n S.obs SC(n) SC(2n) dmin dmean  dmax
1       Edge 1794   319 0.939  0.974    0 0.372 0.776
2   Interior 2074   356 0.941  0.973    0 0.329 0.776

The second list of the output ($FDiNextEst) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the “Edge” assemblage, corresponding to the target sample size m = 1, 95, 189, …, 1699, 1794, 1795, 1899, …, 3588), which locates the reference sample size at the mid-point of the selected knots. There are two data frames ($size_based and $coverage_based).

The first data frame ($size_based) includes the name of the Assemblage, diversity order (Order.q), the target sample size (m), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the size m is less than, equal to, or greater than the reference sample size), the diversity estimate of order q (qFD), the lower and upper confidence limits of diversity (qFD.LCL and qFD.UCL) conditioning on the sample size, and the corresponding sample coverage estimate (SC) along with the lower and upper confidence limits of sample coverage (SC.LCL and SC.UCL). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based output are displayed:

output_FD_abun$FDiNextEst$size_based

  Assemblage Order.q   m      Method    qFD qFD.LCL qFD.UCL    SC SC.LCL SC.UCL
1       Edge       0   1 Rarefaction  1.000   1.000   1.000 0.012  0.010  0.013
2       Edge       0  95 Rarefaction 10.900  10.499  11.301 0.484  0.460  0.508
3       Edge       0 189 Rarefaction 12.993  12.326  13.659 0.638  0.616  0.660
4       Edge       0 284 Rarefaction 14.129  13.241  15.017 0.718  0.700  0.737
5       Edge       0 378 Rarefaction 14.860  13.790  15.931 0.768  0.753  0.784
6       Edge       0 472 Rarefaction 15.383  14.159  16.607 0.803  0.789  0.817

The second data frame ($coverage_based) includes the name of assemblage, the diversity order (Order.q), the target sample coverage value (SC), the corresponding sample size (m), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the coverage SC is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qFD), and the lower and upper confidence limits of diversity (qFD.LCL and qFD.UCL) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based table is wider than the corresponding interval in the $size_based table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)

output_FD_abun$FDiNextEst$coverage_based

  Assemblage Order.q    SC   m      Method    qFD qFD.LCL qFD.UCL
1       Edge       0 0.012   1 Rarefaction  1.000   0.915   1.085
2       Edge       0 0.484  95 Rarefaction 10.900  10.583  11.217
3       Edge       0 0.638 189 Rarefaction 12.993  12.396  13.589
4       Edge       0 0.718 284 Rarefaction 14.129  13.251  15.007
5       Edge       0 0.768 378 Rarefaction 14.860  13.730  15.991
6       Edge       0 0.803 472 Rarefaction 15.383  14.016  16.750

The third list of the output ($FDAsyEst) includes the name of the Assemblage, FD for q = 0, 1, and 2 (qFD), the observed diversity (FD_obs), asymptotic diversity estimate (FD_asy) and its estimated bootstrap standard error (s.e.) as well as the confidence intervals for asymptotic diversity (qFD.LCL and qFD.UCL). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D(). The output is shown below:

output_FD_abun$FDAsyEst

  Assemblage           qFD FD_obs FD_asy  s.e. qFD.LCL qFD.UCL
1       Edge q = 0 FD(AUC) 17.851 19.008 9.661   0.073  37.943
2       Edge q = 1 FD(AUC) 11.781 12.058 0.641  10.802  13.314
3       Edge q = 2 FD(AUC)  9.139  9.254 0.346   8.575   9.932
4   Interior q = 0 FD(AUC) 17.168 18.215 6.593   5.292  31.138
5   Interior q = 1 FD(AUC)  9.716  9.935 0.257   9.431  10.440
6   Interior q = 2 FD(AUC)  7.007  7.067 0.188   6.698   7.435

# FD sample-size-based R/E curves, separating by "Assemblage"
ggiNEXT3D(output_FD_abun, type = 1, facet.var = "Assemblage")

# FD sample-size-based R/E curves, separating by "Order.q"
ggiNEXT3D(output_FD_abun, type = 1, facet.var = "Order.q")

The following commands return the sample completeness (sample coverage) curve (type = 2) in which different colors are used for different assemblages.

# Sample completeness curves for abundance data, separating by "Assemblage"
ggiNEXT3D(output_FD_abun, type = 2, color.var = "Assemblage")

# FD coverage-based R/E curves, separating by "Assemblage"
ggiNEXT3D(output_FD_abun, type = 3, facet.var = "Assemblage")

# FD coverage-based R/E curves, separating by "Order.q"
ggiNEXT3D(output_FD_abun, type = 3, facet.var = "Order.q")

EXAMPLE 6: FD rarefaction/extrapolation for incidence data

Based on the dataset (Fish_incidence_data) and the the distance matrix (Fish_distance_matrix) included in the package, the following commands return all numerical results for FD. The first list of the output ($FDInfo) returns basic data information including the name of the Assemblage, number of sampling units (T), total number of incidences (U), observed species richness (S.obs), sample coverage estimate of the reference sample with size T (SC(T)), sample coverage estimate of the reference sample with size 2T (SC(2T)), and the minimum, mean, and maximum distance among all non-diagonal elements in the distance matrix(dmin, dmean, dmax). The output is identical to that based on the function DataInfo3D() by specifying diversity = 'FD' and datatype = "incidence_raw"; see later text). Thus, if only data information is required, the simpler function DataInfo3D() (see later text) can be used to obtain the same output. More information about the observed diversity (for any order q between 0 and 2) can be obtained by function ObsAsy3D(), which will be introduced later.

The required argument for performing FD analysis is FDdistM. For example, the distance matrix for all species (including species in both "2013-2015" and "2016-2018" time periods) is stored in Fish_distance_matrix. Then we enter the argument FDdistM = Fish_distance_matrix Three optional arguments are (1) FDtype: FDtype = "AUC" means FD is computed from the area under the curve of a tau-profile by integrating all plausible threshold values between zero and one; FDtype = "tau_values" means FD is computed under specific threshold values to be specified in the argument FD_tau. (2) FD_tau: a numerical value specifying the tau value (threshold level) that will be used to compute FD. If FDtype = "tau_values" and FD_tau = NULL, then the threshold level is set to be the mean distance between any two individuals randomly selected from the pooled data over all data (i.e., quadratic entropy).

data(Fish_incidence_data)
data(Fish_distance_matrix)
data <- Fish_incidence_data
distM <- Fish_distance_matrix
output_FD_inci <- iNEXT3D(data, diversity = 'FD', datatype = "incidence_raw", nboot = 20, 
                          FDdistM = distM, FDtype = 'AUC')
output_FD_inci$FDInfo

$FDInfo
  Assemblage  T   U S.obs SC(T) SC(2T)  dmin dmean  dmax
1  2013-2015 36 532    50 0.980  0.993 0.006 0.240 0.733
2  2016-2018 36 522    53 0.976  0.989 0.006 0.237 0.733

The second list of the output ($FDiNextEst) includes size- and coverage-based standardized diversity estimates and related statistics computed for 40 knots by default (for example in the "2013-2015" time period, corresponding to the target number of sample units mT = 1, 2, 4, …, 34, 36, 37, 38, …, 72), which locates the reference sampling units at the mid-point of the selected knots. There are two data frames ($size_based and $coverage_based).

The first data frame ($size_based) includes the name of the Assemblage, diversity order (Order.q), the target number of sample units (mT), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the target number of sample units mT is less than, equal to, or greater than the number of sampling units in the reference sample), the diversity estimate of order q (qFD), the lower and upper confidence limits of diversity (qFD.LCL and qFD.UCL) conditioning on the sample size, and the corresponding sample coverage estimate (SC) along with the lower and upper confidence limits of sample coverage (SC.LCL and SC.UCL). These sample coverage estimates with confidence intervals are used for plotting the sample completeness curve. If the argument nboot is greater than zero, then a bootstrap method is applied to obtain the confidence intervals for the diversity and sample coverage estimates. Otherwise, all confidence intervals will not be computed. Here only the first six rows of the $size_based output are displayed:

output_FD_inci$FDiNextEst$size_based

  Assemblage Order.q mT      Method    qFD qFD.LCL qFD.UCL    SC SC.LCL SC.UCL
1  2013-2015       0  1 Rarefaction 14.778  14.067  15.489 0.606  0.578  0.634
2  2013-2015       0  2 Rarefaction 15.318  14.583  16.054 0.749  0.727  0.770
3  2013-2015       0  4 Rarefaction 15.888  15.112  16.663 0.851  0.832  0.870
4  2013-2015       0  6 Rarefaction 16.224  15.404  17.044 0.894  0.877  0.911
5  2013-2015       0  8 Rarefaction 16.463  15.597  17.329 0.919  0.903  0.934
6  2013-2015       0 10 Rarefaction 16.652  15.741  17.563 0.934  0.919  0.948

The second data frame ($coverage_based) includes the name of assemblage, the diversity order (Order.q), the target sample coverage value (SC), the corresponding number of sample units (mT), the Method (Rarefaction, Observed, or Extrapolation, depending on whether the coverage SC is less than, equal to, or greater than the reference sample coverage), the diversity estimate of order q (qFD), and the lower and upper confidence limits of diversity (qFD.LCL and qFD.UCL) conditioning on the target sample coverage value. Here only the first six rows of the $coverage_based output are displayed below: (Note for a fixed coverage value, the confidence interval in the $coverage_based table is wider than the corresponding interval in the $size_based table. This is because, for a given coverage value, the sample size needed to attain a fixed coverage value varies with bootstrap replication, leading to higher uncertainty on the resulting diversity estimate.)

output_FD_inci$FDiNextEst$coverage_based

  Assemblage Order.q    SC mT      Method    qFD qFD.LCL qFD.UCL
1  2013-2015       0 0.606  1 Rarefaction 14.778  13.909  15.646
2  2013-2015       0 0.749  2 Rarefaction 15.318  14.411  16.226
3  2013-2015       0 0.851  4 Rarefaction 15.888  14.967  16.808
4  2013-2015       0 0.894  6 Rarefaction 16.224  15.268  17.179
5  2013-2015       0 0.919  8 Rarefaction 16.463  15.459  17.467
6  2013-2015       0 0.934 10 Rarefaction 16.652  15.587  17.717

The third list of the output ($FDAsyEst) includes the name of the Assemblage, FD for q = 0, 1, and 2 (qFD), the observed diversity (FD_obs), asymptotic diversity estimate (FD_asy) and its estimated bootstrap standard error (s.e.), and the confidence intervals for asymptotic diversity (qFD.LCL and qFD.UCL). These statistics are computed only for q = 0, 1 and 2. More detailed information about asymptotic and observed diversity estimates for any order q between 0 and 2 can be obtained from function ObsAsy3D(). The output is shown below:

output_FD_inci$FDAsyEst

  Assemblage           qFD FD_obs FD_asy  s.e. qFD.LCL qFD.UCL
1  2013-2015 q = 0 FD(AUC) 17.904 18.916 2.717  13.590  24.242
2  2013-2015 q = 1 FD(AUC) 15.944 16.044 0.347  15.364  16.724
3  2013-2015 q = 2 FD(AUC) 15.463 15.491 0.314  14.875  16.106
4  2016-2018 q = 0 FD(AUC) 17.739 19.768 4.425  11.096  28.440
5  2016-2018 q = 1 FD(AUC) 15.749 15.868 0.471  14.945  16.791
6  2016-2018 q = 2 FD(AUC) 15.275 15.306 0.475  14.374  16.237

# FD sample-size-based R/E curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_FD_inci, type = 1, facet.var = "Assemblage")

# FD sample-size-based R/E curves for incidence data, separating by "Order.q"
ggiNEXT3D(output_FD_inci, type = 1, facet.var = "Order.q")

The following commands return the sample completeness (sample coverage) curve (type = 2) in which different colors are used for different assemblages.

# Sample completeness curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_FD_inci, type = 2, color.var = "Assemblage")

# FD coverage-based R/E curves for incidence data, separating by "Assemblage"
ggiNEXT3D(output_FD_inci, type = 3, facet.var = "Assemblage")

# FD coverage-based R/E curves for incidence data, separating by "Order.q"
ggiNEXT3D(output_FD_inci, type = 3, facet.var = "Order.q")

FUNCTION DataInfo3D(): DATA INFORMATION

The function DataInfo3D() provides basic data information for the reference sample in each individual assemblage. The function DataInfo3D() with default arguments is shown below:

DataInfo3D(data, diversity = "TD", datatype = "abundance", 
           nT = NULL, PDtree, PDreftime = NULL, 
           FDdistM, FDtype = "AUC", FDtau = NULL)

All arguments in the above function are the same as those for the main function iNEXT3D. Running the DataInfo3D() function returns basic data information including sample size, observed species richness, two sample coverage estimates (SC(n) and SC(2n)) as well as other relevant information in each of the three dimensions of diversity. We use Brazil_rainforest_abun_data and Fish_incidence_data to demo the function for each dimension of diversity.

TAXONOMIC DIVERSITY (TD): Basic data information for abundance data

data(Brazil_rainforest_abun_data)
DataInfo3D(Brazil_rainforest_abun_data, diversity = 'TD', datatype = "abundance")

  Assemblage    n S.obs SC(n) SC(2n)  f1 f2 f3 f4 f5
1       Edge 1794   319 0.939  0.974 110 48 38 28 13
2   Interior 2074   356 0.941  0.973 123 48 41 32 19

Output description:

Assemblage = assemblage name.
n = number of observed individuals in the reference sample (sample size).
S.obs = number of observed species in the reference sample.
SC(n) = sample coverage estimate of the reference sample with size n.
SC(2n) = sample coverage estimate of the reference sample with size 2n.
f1-f5 = the first five species abundance frequency counts in the reference sample.

TAXONOMIC DIVERSITY (TD): Basic data information for incidence data

data(Fish_incidence_data)
DataInfo3D(Fish_incidence_data, diversity = 'TD', datatype = "incidence_raw")

  Assemblage  T   U S.obs SC(T) SC(2T) Q1 Q2 Q3 Q4 Q5
1  2013-2015 36 532    50 0.980  0.993 11  6  4  1  3
2  2016-2018 36 522    53 0.976  0.989 13  5  5  2  3

Output description:

Assemblage = assemblage name.
T = number of sampling units in the reference sample (sample size for incidence data).
U = total number of incidences in the reference sample.
S.obs = number of observed species in the reference sample.
SC(T) = sample coverage estimate of the reference sample with size T.
SC(2T) = sample coverage estimate of the reference sample with size 2T.
Q1-Q5 = the first five species incidence frequency counts in the reference sample.

PHYLOGENETIC DIVERSITY (PD): Basic data information for abundance data

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_phylo_tree)
data <- Brazil_rainforest_abun_data
tree <- Brazil_rainforest_phylo_tree
DataInfo3D(data, diversity = 'PD', datatype = "abundance", PDtree = tree)

# A tibble: 2 × 11
  Assemblage     n S.obs `SC(n)` `SC(2n)` PD.obs `f1*` `f2*`    g1    g2 Reftime
  <chr>      <int> <int>   <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Edge        1794   319   0.939    0.974  24516   110    52  6578  2885     400
2 Interior    2074   356   0.941    0.973  27727   123    56  7065  3656     400

Output description:

Assemblage, n, S.obs, SC(n) and SC(2n): definitions are the same as in the TD abundance output and thus are omitted.
PD.obs = the observed total branch length in the phylogenetic tree spanned by all observed species.
f1*,f2* = the number of singletons and doubletons in the node/branch abundance set.
g1,g2 = the total branch length of those singletons/doubletons in the node/branch abundance set.
Reftime = reference time for phylogenetic diversity (the age of the root of phylogenetic tree).

PHYLOGENETIC DIVERSITY (PD): Basic data information for incidence data

data(Fish_incidence_data)
data(Fish_phylo_tree)
data <- Fish_incidence_data
tree <- Fish_phylo_tree
DataInfo3D(data, diversity = 'PD', datatype = "incidence_raw", PDtree = tree)

# A tibble: 2 × 12
  Assemblage     T     U S.obs `SC(T)` `SC(2T)` PD.obs `Q1*` `Q2*`    R1    R2 Reftime
  <chr>      <int> <int> <int>   <dbl>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 2013-2015     36   532    50   0.98     0.993   9.62    11     7 0.69  1.23    0.977
2 2016-2018     36   522    53   0.976    0.989   9.44    13     6 0.368 0.345   0.977

Output description:

Assemblage, T, U, S.obs, SC(T) and SC(2T): definitions are the same as in the TD incidence output and thus are omitted.
PD.obs = the observed total branch length in the phylogenetic tree spanned by all observed species.
Q1*,Q2* = the singletons/doubletons in the sample branch incidence.
R1,R2 = the total branch length of those singletons/doubletons in the sample branch incidence.
Reftime = reference time.

FUNCTIONAL DIVERSITY (FD): Basic data information for abundance data

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_distance_matrix)
data <- Brazil_rainforest_abun_data
distM <- Brazil_rainforest_distance_matrix
DataInfo3D(data, diversity = 'FD', datatype = "abundance", 
           FDdistM = distM, FDtype = 'AUC')

  Assemblage    n S.obs SC(n) SC(2n) dmin dmean  dmax
1       Edge 1794   319 0.939  0.974    0 0.372 0.776
2   Interior 2074   356 0.941  0.973    0 0.329 0.776

Output description:

Assemblage, n, S.obs, SC(n) and SC(2n): definitions are the same as in TD abundance output and thus are omitted.
dmin = the minimum distance among all non-diagonal elements in the distance matrix.
dmean = the mean distance between any two individuals randomly selected from each assemblage.
dmax = the maximum distance among all elements in the distance matrix.

FUNCTIONAL DIVERSITY (FD): Basic data information for incidence data

data(Fish_incidence_data)
data(Fish_distance_matrix)
data <- Fish_incidence_data
distM <- Fish_distance_matrix
DataInfo3D(data, diversity = 'FD', datatype = "incidence_raw", 
           FDdistM = distM, FDtype = 'AUC')

  Assemblage  T   U S.obs SC(T) SC(2T)  dmin dmean  dmax
1  2013-2015 36 532    50 0.980  0.993 0.006 0.240 0.733
2  2016-2018 36 522    53 0.976  0.989 0.006 0.237 0.733

Output description:

Assemblage, T, U, S.obs, SC(T) and SC(2T): definitions are the same as in the TD incidence output and thus are omitted.
dmin = the minimum distance among all non-diagonal elements in the distance matrix.
dmean = the mean distance between any two individuals randomly selected from each assemblage.
dmax = the maximum distance among all elements in the distance matrix.

FUNCTION estimate3D(): POINT ESTIMATION

estimate3D is used to compute 3D diversity (TD, PD, FD) estimates with q = 0, 1, 2 under any specified levels of sample size (when base = "size") and sample coverage values (when base = "coverage") for abundance data (datatype = "abundance") or incidence data (datatype = "incidence_raw"). When base = "size", level can be specified with a particular vector of sample sizes (greater than 0); if level = NULL, this function computes the diversity estimates for the minimum sample size among all samples extrapolated to the double reference sizes. When base = "coverage", level can be specified with a particular vector of sample coverage values (between 0 and 1); if level = NULL, this function computes the diversity estimates for the minimum sample coverage among all samples extrapolated to the double reference sizes. All arguments in the function are the same as those for the main function iNEXT3D.

estimate3D(data, diversity = "TD", q = c(0, 1, 2), datatype = "abundance", 
           base = "coverage", level = NULL, nboot = 50, conf = 0.95, 
           nT = NULL, PDtree, PDreftime = NULL, PDtype = "meanPD", 
           FDdistM, FDtype = "AUC", FDtau = NULL, FDcut_number = 50)

TAXONOMIC DIVERSITY (TD): point estimation

Example 7a: TD for abundance data with two target coverage values (93% and 97%)

The following commands return the TD estimates with two specified levels of sample coverage (93% and 97%) based on the Brazil_rainforest_abun_data.

data(Brazil_rainforest_abun_data)
output_est_TD_abun <- estimate3D(Brazil_rainforest_abun_data, diversity = 'TD', q = c(0,1,2), 
                                 datatype = "abundance", base = "coverage", level = c(0.93, 0.97))
output_est_TD_abun

   Assemblage Order.q   SC        m        Method     qTD   s.e. qTD.LCL qTD.UCL
1        Edge       0 0.93 1547.562   Rarefaction 302.879 13.192 277.023 328.735
2        Edge       0 0.97 3261.971 Extrapolation 383.307 19.554 344.983 421.632
3        Edge       1 0.93 1547.562   Rarefaction 152.374  5.021 142.533 162.215
4        Edge       1 0.97 3261.971 Extrapolation 166.837  5.724 155.618 178.056
5        Edge       2 0.93 1547.562   Rarefaction  81.437  3.942  73.711  89.163
6        Edge       2 0.97 3261.971 Extrapolation  83.726  4.196  75.502  91.950
7    Interior       0 0.93 1699.021   Rarefaction 331.917 13.002 306.435 357.400
8    Interior       0 0.97 3883.447 Extrapolation 433.807 22.707 389.303 478.311
9    Interior       1 0.93 1699.021   Rarefaction 159.330  5.573 148.408 170.253
10   Interior       1 0.97 3883.447 Extrapolation 175.739  6.077 163.829 187.650
11   Interior       2 0.93 1699.021   Rarefaction  71.611  4.210  63.359  79.862
12   Interior       2 0.97 3883.447 Extrapolation  73.326  4.386  64.730  81.921

Example 7b: TD for incidence data with two target coverage values (97.5% and 99%)

The following commands return the TD estimates with two specified levels of sample coverage (97.5% and 99%) for the Fish_incidence_data.

data(Fish_incidence_data)
output_est_TD_inci <- estimate3D(Fish_incidence_data, diversity = 'TD', q = c(0, 1, 2), 
                                 datatype = "incidence_raw", base = "coverage", 
                                 level = c(0.975, 0.99))
output_est_TD_inci

   Assemblage Order.q    SC     mT        Method    qTD   s.e. qTD.LCL qTD.UCL
1   2013-2015       0 0.975 29.169   Rarefaction 47.703  3.128  41.573  53.834
2   2013-2015       0 0.990 58.667 Extrapolation 54.914  8.422  38.407  71.421
3   2013-2015       1 0.975 29.169   Rarefaction 29.773  0.929  27.953  31.592
4   2013-2015       1 0.990 58.667 Extrapolation 30.751  1.019  28.753  32.748
5   2013-2015       2 0.975 29.169   Rarefaction 23.861  0.693  22.504  25.219
6   2013-2015       2 0.990 58.667 Extrapolation 24.126  0.721  22.713  25.540
7   2016-2018       0 0.975 34.825   Rarefaction 52.574  8.444  36.025  69.124
8   2016-2018       0 0.990 76.971 Extrapolation 62.688 17.236  28.906  96.470
9   2016-2018       1 0.975 34.825   Rarefaction 31.479  1.158  29.210  33.747
10  2016-2018       1 0.990 76.971 Extrapolation 32.721  1.205  30.360  35.083
11  2016-2018       2 0.975 34.825   Rarefaction 24.872  0.761  23.380  26.363
12  2016-2018       2 0.990 76.971 Extrapolation 25.163  0.778  23.638  26.688

PHYLOGENETIC DIVERSITY (PD): point estimation

Example 8a: PD for abundance data with two target sample sizes (1500 and 3500)

The following commands return the PD estimates with two specified levels of sample sizes (1500 and 3500) for the Brazil_rainforest_abun_data.

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_phylo_tree)
data <- Brazil_rainforest_abun_data
tree <- Brazil_rainforest_phylo_tree
output_est_PD_abun <- estimate3D(data, diversity = 'PD', datatype = "abundance", 
                                 base = "size", level = c(1500, 3500), PDtree = tree)
output_est_PD_abun

   Assemblage Order.q    m        Method    SC    qPD  s.e. qPD.LCL qPD.UCL Reftime   Type
1        Edge       0 1500   Rarefaction 0.928 58.370 1.284  55.854  60.886     400 meanPD
2        Edge       0 3500 Extrapolation 0.973 71.893 2.269  67.446  76.339     400 meanPD
3        Edge       1 1500   Rarefaction 0.928  5.224 0.111   5.006   5.441     400 meanPD
4        Edge       1 3500 Extrapolation 0.973  5.320 0.113   5.099   5.541     400 meanPD
5        Edge       2 1500   Rarefaction 0.928  1.797 0.028   1.742   1.851     400 meanPD
6        Edge       2 3500 Extrapolation 0.973  1.797 0.028   1.742   1.852     400 meanPD
7    Interior       0 1500   Rarefaction 0.922 63.555 1.221  61.163  65.948     400 meanPD
8    Interior       0 3500 Extrapolation 0.965 78.004 2.256  73.582  82.426     400 meanPD
9    Interior       1 1500   Rarefaction 0.922  5.675 0.114   5.452   5.898     400 meanPD
10   Interior       1 3500 Extrapolation 0.965  5.784 0.116   5.557   6.012     400 meanPD
11   Interior       2 1500   Rarefaction 0.922  1.913 0.034   1.847   1.980     400 meanPD
12   Interior       2 3500 Extrapolation 0.965  1.914 0.034   1.847   1.981     400 meanPD

Example 8b: PD for incidence data with two target coverage values (97.5% and 99%)

The following commands return the PD estimates with two specified levels of sample coverage (97.5% and 99%) for the Fish_incidence_data.

data(Fish_incidence_data)
data(Fish_phylo_tree)
data <- Fish_incidence_data
tree <- Fish_phylo_tree
output_est_PD_inci <- estimate3D(data, diversity = 'PD', datatype = "incidence_raw", 
                                 base = "coverage", level = c(0.975, 0.99), PDtree = tree)
output_est_PD_inci

   Assemblage Order.q    SC     mT        Method    qPD  s.e. qPD.LCL qPD.UCL   Reftime   Type
1   2013-2015       0 0.975 29.169   Rarefaction  9.672 0.355   8.976  10.369 0.9770115 meanPD
2   2013-2015       0 0.990 58.667 Extrapolation 10.018 0.513   9.013  11.023 0.9770115 meanPD
3   2013-2015       1 0.975 29.169   Rarefaction  7.612 0.134   7.349   7.876 0.9770115 meanPD
4   2013-2015       1 0.990 58.667 Extrapolation  7.680 0.131   7.423   7.937 0.9770115 meanPD
5   2013-2015       2 0.975 29.169   Rarefaction  7.003 0.135   6.739   7.266 0.9770115 meanPD
6   2013-2015       2 0.990 58.667 Extrapolation  7.030 0.133   6.770   7.291 0.9770115 meanPD
7   2016-2018       0 0.975 34.825   Rarefaction  9.646 0.492   8.682  10.611 0.9770115 meanPD
8   2016-2018       0 0.990 76.971 Extrapolation  9.831 0.724   8.412  11.250 0.9770115 meanPD
9   2016-2018       1 0.975 34.825   Rarefaction  7.779 0.126   7.531   8.026 0.9770115 meanPD
10  2016-2018       1 0.990 76.971 Extrapolation  7.835 0.128   7.585   8.085 0.9770115 meanPD
11  2016-2018       2 0.975 34.825   Rarefaction  7.201 0.123   6.959   7.442 0.9770115 meanPD
12  2016-2018       2 0.990 76.971 Extrapolation  7.224 0.122   6.984   7.464 0.9770115 meanPD

FUNCTIONAL DIVERSITY (FD): point estimation

Example 9a: FD for abundance data with two target coverage values (93% and 97%)

The following commands return the FD estimates with two specified levels of sample coverage (93% and 97%) for the Brazil_rainforest_abun_data.

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_distance_matrix)
data <- Brazil_rainforest_abun_data
distM <- Brazil_rainforest_distance_matrix
output_est_FD_abun <- estimate3D(data, diversity = 'FD', datatype = "abundance", 
                                 base = "coverage", level = c(0.93, 0.97), nboot = 10, 
                                 FDdistM = distM, FDtype = 'AUC')
output_est_FD_abun

   Assemblage Order.q   SC        m        Method    qFD  s.e. qFD.LCL qFD.UCL
1        Edge       0 0.93 1547.562   Rarefaction 17.590 2.119  13.437  21.743
2        Edge       0 0.97 3261.971 Extrapolation 18.578 2.917  12.861  24.295
3        Edge       1 0.93 1547.562   Rarefaction 11.732 0.334  11.078  12.386
4        Edge       1 0.97 3261.971 Extrapolation 11.932 0.355  11.236  12.628
5        Edge       2 0.93 1547.562   Rarefaction  9.120 0.287   8.558   9.683
6        Edge       2 0.97 3261.971 Extrapolation  9.190 0.295   8.611   9.769
7    Interior       0 0.93 1699.021   Rarefaction 16.890 2.258  12.464  21.316
8    Interior       0 0.97 3883.447 Extrapolation 17.839 3.527  10.925  24.752
9    Interior       1 0.93 1699.021   Rarefaction  9.668 0.217   9.242  10.094
10   Interior       1 0.97 3883.447 Extrapolation  9.841 0.227   9.397  10.286
11   Interior       2 0.93 1699.021   Rarefaction  6.994 0.139   6.722   7.267
12   Interior       2 0.97 3883.447 Extrapolation  7.035 0.141   6.758   7.311

Example 9b: FD for incidence data with two target number of sampling units (30 and 70)

The following commands return the FD estimates with two specified levels of sample sizes (30 and 70) for the Fish_incidence_data.

data(Fish_incidence_data)
data(Fish_distance_matrix)
data <- Fish_incidence_data
distM <- Fish_distance_matrix
output_est_FD_inci <- estimate3D(data, diversity = 'FD', datatype = "incidence_raw", 
                                 base = "size", level = c(30, 70), nboot = 10, 
                                 FDdistM = distM, FDtype = 'AUC')
output_est_FD_inci

   Assemblage Order.q mT        Method    SC    qFD  s.e. qFD.LCL qFD.UCL
1   2013-2015       0 30   Rarefaction 0.976 17.748 0.339  17.084  18.411
2   2013-2015       0 70 Extrapolation 0.993 18.558 0.636  17.312  19.804
3   2013-2015       1 30   Rarefaction 0.976 15.929 0.344  15.256  16.603
4   2013-2015       1 70 Extrapolation 0.993 16.006 0.330  15.359  16.654
5   2013-2015       2 30   Rarefaction 0.976 15.459 0.361  14.752  16.167
6   2013-2015       2 70 Extrapolation 0.993 15.477 0.361  14.770  16.185
7   2016-2018       0 30   Rarefaction 0.972 17.503 0.526  16.473  18.533
8   2016-2018       0 70 Extrapolation 0.988 18.705 1.092  16.565  20.845
9   2016-2018       1 30   Rarefaction 0.972 15.729 0.365  15.014  16.444
10  2016-2018       1 70 Extrapolation 0.988 15.817 0.361  15.110  16.524
11  2016-2018       2 30   Rarefaction 0.972 15.268 0.348  14.586  15.950
12  2016-2018       2 70 Extrapolation 0.988 15.290 0.348  14.608  15.971

FUNCTION ObsAsy3D: ASYMPTOTIC AND OBSERVED DIVERSITY PROFILES

ObsAsy3D(data, diversity = "TD", q = seq(0, 2, 0.2), datatype = "abundance",
         nboot = 50, conf = 0.95, nT = NULL, 
         method = c("Asymptotic", "Observed"),
         PDtree, PDreftime = NULL, PDtype = "meanPD",
         FDdistM, FDtype = "AUC", FDtau = NULL, FDcut_number = 50
         )

All arguments in the above function are the same as those for the main function iNEXT3D (except that the default of q here is seq(0, 2, 0.2)). The function ObsAsy3D() computes observed and asymptotic diversity of order q between 0 and 2 (in increments of 0.2) for 3D diversity; these 3D values with different order q can be used to depict a q-profile in the ggObsAsy3D function.

It also computes observed and asymptotic PD for various reference times by specifying the argument PDreftime; these PD values with different reference times can be used to depict a time-profile in the ggObsAsy3D function.

It also computes observed and asymptotic FD for various threshold tau levels by specifying the argument FDtau; these FD values with different threshold levels can be used to depict a tau-profile in the ggObsAsy3D function.

For each dimension, by default, both the observed and asymptotic diversity estimates will be computed.

FUNCTION ggObsAsy3D(): GRAPHIC DISPLAYS OF DIVERSITY PROFILES

ggObsAsy3D(output, profile = "q")

ggObsAsy3D is a ggplot2 extension for an ObsAsy3D object to plot 3D q-profile (which depicts the observed diversity and asymptotic diversity estimate with respect to order q) for q between 0 and 2 (in increments of 0.2).

It also plots time-profile (which depicts the observed and asymptotic estimate of PD or mean PD with respect to reference times when diversity = "PD" specified in the ObsAsy3D function), and tau-profile (which depicts the observed and asymptotic estimate of FD with respect to threshold level tau when diversity = "FD" and FDtype = "tau_values" specified in the ObsAsy3D function) based on the output from the function ObsAsy3D.

In the plot of profiles, only confidence intervals of the asymptotic diversity will be shown when both the observed and asymptotic diversity estimates are computed.

TAXONOMIC DIVERSITY (TD): q-profiles

Example 10a: TD q-profiles for abundance data

The following commands returns the observed and asymptotic taxonomic diversity (‘TD’) for the Brazil_rainforest_abun_data, along with its confidence interval for diversity order q between 0 to 2. Here only the first ten rows of the output are shown.

data(Brazil_rainforest_abun_data)
output_ObsAsy_TD_abun <- ObsAsy3D(Brazil_rainforest_abun_data, diversity = 'TD', 
                                  datatype = "abundance")
output_ObsAsy_TD_abun

   Assemblage Order.q     qTD   s.e. qTD.LCL qTD.UCL     Method
1        Edge     0.0 444.971 27.945 390.200 499.743 Asymptotic
2        Edge     0.2 375.270 19.335 337.374 413.167 Asymptotic
3        Edge     0.4 312.452 12.736 287.490 337.414 Asymptotic
4        Edge     0.6 258.379  8.372 241.970 274.789 Asymptotic
5        Edge     0.8 213.730  5.999 201.971 225.488 Asymptotic
6        Edge     1.0 178.000  4.940 168.318 187.682 Asymptotic
7        Edge     1.2 149.914  4.508 141.079 158.750 Asymptotic
8        Edge     1.4 127.945  4.345 119.429 136.460 Asymptotic
9        Edge     1.6 110.672  4.320 102.206 119.139 Asymptotic
10       Edge     1.8  96.948  4.375  88.374 105.522 Asymptotic

The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.

# q-profile curves
ggObsAsy3D(output_ObsAsy_TD_abun)

Example 10b: TD q-profiles for incidence data

The following commands return the observed and asymptotic taxonomic diversity (‘TD’) estimates for the Fish_incidence_data, along with its confidence interval for diversity order q between 0 to 2. Here only the first ten rows of the output are shown.

data(Fish_incidence_data)
output_ObsAsy_TD_inci <- ObsAsy3D(Fish_incidence_data, diversity = 'TD', 
                                  datatype = "incidence_raw")
output_ObsAsy_TD_inci

   Assemblage Order.q    qTD  s.e. qTD.LCL qTD.UCL     Method
1   2013-2015     0.0 59.803 9.032  42.101  77.506 Asymptotic
2   2013-2015     0.2 50.828 5.275  40.489  61.167 Asymptotic
3   2013-2015     0.4 43.790 3.055  37.802  49.778 Asymptotic
4   2013-2015     0.6 38.458 1.903  34.728  42.188 Asymptotic
5   2013-2015     0.8 34.490 1.350  31.844  37.136 Asymptotic
6   2013-2015     1.0 31.542 1.079  29.428  33.656 Asymptotic
7   2013-2015     1.2 29.328 0.932  27.501  31.154 Asymptotic
8   2013-2015     1.4 27.635 0.844  25.980  29.289 Asymptotic
9   2013-2015     1.6 26.312 0.787  24.769  27.855 Asymptotic
10  2013-2015     1.8 25.255 0.748  23.789  26.722 Asymptotic

The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.

# q-profile curves
ggObsAsy3D(output_ObsAsy_TD_inci)

PHYLOGENETIC DIVERSITY (PD): time-profiles and q-profiles

Example 11a: PD time-profiles for abundance data

The following commands return the observed and asymptotic phylogenetic diversity (‘PD’) estimates for the Brazil_rainforest_abun_data, along with its confidence interval for diversity order q = 0, 1, 2 under reference times from 0.01 to 400 (tree height). Here only the first ten rows of the output are shown.

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_phylo_tree)
data <- Brazil_rainforest_abun_data
tree <- Brazil_rainforest_phylo_tree
output_ObsAsy_PD_abun <- ObsAsy3D(data, diversity = 'PD', q = c(0, 1, 2), 
                                  PDreftime = seq(0.01, 400, length.out = 20),
                                  datatype = "abundance", nboot = 20, PDtree = tree)
output_ObsAsy_PD_abun

   Assemblage Order.q     qPD   s.e. qPD.LCL qPD.UCL     Method Reftime   Type
1        Edge       0 444.971 28.711 388.699 501.244 Asymptotic   0.100 meanPD
2        Edge       1 178.000  5.395 167.426 188.573 Asymptotic   0.100 meanPD
3        Edge       2  85.905  4.390  77.301  94.510 Asymptotic   0.100 meanPD
4    Interior       0 513.518 36.385 442.205 584.831 Asymptotic   0.100 meanPD
5    Interior       1 186.983  4.265 178.624 195.342 Asymptotic   0.100 meanPD
6    Interior       2  74.718  3.501  67.856  81.579 Asymptotic   0.100 meanPD
7        Edge       0 371.100 28.093 316.039 426.161 Asymptotic  10.354 meanPD
8        Edge       1 141.418  4.460 132.677 150.160 Asymptotic  10.354 meanPD
9        Edge       2  72.848  3.408  66.169  79.528 Asymptotic  10.354 meanPD
10   Interior       0 413.568 25.035 364.500 462.635 Asymptotic  10.354 meanPD

The argument profile = "time" in the ggObsAsy3D function creates a separate plot for each diversity order q = 0, 1, and 2 with x-axis being “Reference time”. Different assemblages will be represented by different color lines.

# time-profile curves
ggObsAsy3D(output_ObsAsy_PD_abun, profile = "time")

Example 11b: PD q-profiles for incidence data

The following commands return the observed and asymptotic taxonomic diversity (‘PD’) estimates for the Fish_incidence_data, along with its confidence interval for diversity order q between 0 to 2. Here only the first ten rows of the output are shown.

data(Fish_incidence_data)
data(Fish_phylo_tree)
data <- Fish_incidence_data
tree <- Fish_phylo_tree
output_ObsAsy_PD_inci <- ObsAsy3D(data, diversity = 'PD', q = seq(0, 2, 0.2), 
                                  datatype = "incidence_raw", nboot = 20, PDtree = tree, 
                                  PDreftime = NULL)
output_ObsAsy_PD_inci

   Assemblage Order.q    qPD  s.e. qPD.LCL qPD.UCL     Method Reftime   Type
1   2013-2015     0.0 10.039 0.834   8.406  11.673 Asymptotic   0.977 meanPD
2   2013-2015     0.2  9.462 0.423   8.633  10.292 Asymptotic   0.977 meanPD
3   2013-2015     0.4  8.802 0.263   8.286   9.318 Asymptotic   0.977 meanPD
4   2013-2015     0.6  8.329 0.188   7.961   8.697 Asymptotic   0.977 meanPD
5   2013-2015     0.8  7.985 0.154   7.683   8.287 Asymptotic   0.977 meanPD
6   2013-2015     1.0  7.729 0.139   7.456   8.002 Asymptotic   0.977 meanPD
7   2013-2015     1.2  7.533 0.134   7.270   7.795 Asymptotic   0.977 meanPD
8   2013-2015     1.4  7.378 0.133   7.116   7.639 Asymptotic   0.977 meanPD
9   2013-2015     1.6  7.252 0.136   6.986   7.517 Asymptotic   0.977 meanPD
10  2013-2015     1.8  7.147 0.139   6.874   7.419 Asymptotic   0.977 meanPD

The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2, for the default reference time = 0.977 (the tree depth).

# q-profile curves
ggObsAsy3D(output_ObsAsy_PD_inci, profile = "q")

FUNCTIONAL DIVERSITY (FD): tau-profiles and q-profiles

Example 12a: FD tau-profiles for abundance data

The following commands returns observed and asymptotic functional diversity (‘FD’) for Brazil_rainforest_abun_data, along with its confidence interval at diversity order q = 0, 1, 2 under tau values from 0 to 1. Here only the first ten rows of the output are shown.

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_distance_matrix)
data <- Brazil_rainforest_abun_data
distM <- Brazil_rainforest_distance_matrix
output_ObsAsy_FD_abun_tau <- ObsAsy3D(data, diversity = 'FD', q = c(0, 1, 2), 
                                      datatype = "abundance", nboot = 10, FDdistM = distM, 
                                      FDtype = 'tau_values', FDtau = seq(0, 1, 0.05))
output_ObsAsy_FD_abun_tau

   Assemblage Order.q     qFD   s.e. qFD.LCL qFD.UCL     Method  Tau
1        Edge       0 444.971 34.314 377.717 512.226 Asymptotic 0.00
2        Edge       1 178.000  4.857 168.479 187.520 Asymptotic 0.00
3        Edge       2  85.905  4.235  77.604  94.206 Asymptotic 0.00
4        Edge       0  79.904 26.295  28.368 131.440 Asymptotic 0.05
5        Edge       1  45.187  1.448  42.350  48.024 Asymptotic 0.05
6        Edge       2  32.092  1.128  29.882  34.302 Asymptotic 0.05
7        Edge       0  73.276 25.542  23.215 123.336 Asymptotic 0.10
8        Edge       1  42.200  1.439  39.380  45.019 Asymptotic 0.10
9        Edge       2  30.182  1.111  28.004  32.360 Asymptotic 0.10
10       Edge       0  35.509 25.624   0.000  85.731 Asymptotic 0.15

The following commands plot the corresponding tau-profiles, along with its confidence interval for diversity order q = 0, 1, 2.

# tau-profile curves
ggObsAsy3D(output_ObsAsy_FD_abun_tau, profile = "tau")

Example 12b: FD q-profiles for abundance data

The following commands returns the observed and asymptotic taxonomic diversity (‘FD’) for the Brazil_rainforest_abun_data, along with its confidence interval for diversity order q between 0 to 2 with FDtype = 'AUC'. Here only the first ten rows of the output are shown.

data(Brazil_rainforest_abun_data)
data(Brazil_rainforest_distance_matrix)
data <- Brazil_rainforest_abun_data
distM <- Brazil_rainforest_distance_matrix
output_ObsAsy_FD_abun <- ObsAsy3D(data, diversity = 'FD', q = seq(0, 2, 0.5), 
                                  datatype = "abundance", nboot = 10, 
                                  FDdistM = distM, FDtype = 'AUC')
output_ObsAsy_FD_abun

   Assemblage Order.q    qFD  s.e. qFD.LCL qFD.UCL     Method
1        Edge     0.0 19.008 6.320   6.621  31.395 Asymptotic
2        Edge     0.5 14.714 1.215  12.333  17.094 Asymptotic
3        Edge     1.0 12.058 0.337  11.397  12.719 Asymptotic
4        Edge     1.5 10.369 0.295   9.791  10.947 Asymptotic
5        Edge     2.0  9.254 0.301   8.665   9.843 Asymptotic
6    Interior     0.0 18.215 5.591   7.257  29.174 Asymptotic
7    Interior     0.5 13.084 1.060  11.006  15.163 Asymptotic
8    Interior     1.0  9.935 0.304   9.339  10.532 Asymptotic
9    Interior     1.5  8.115 0.187   7.749   8.481 Asymptotic
10   Interior     2.0  7.067 0.150   6.772   7.361 Asymptotic

The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.

# q-profile curves
ggObsAsy3D(output_ObsAsy_FD_abun, profile = "q")

Example 12c: FD q-profiles for incidence data

The following commands returns observed and asymptotic functional diversity (‘FD’) for Fish_incidence_data, along with its confidence interval at diversity order q from 0 to 2. Here only the first ten rows of the output are shown.

data(Fish_incidence_data)
data(Fish_distance_matrix)
data <- Fish_incidence_data
distM <- Fish_distance_matrix
output_ObsAsy_FD_inci <- ObsAsy3D(data, diversity = 'FD', datatype = "incidence_raw",
                                  nboot = 20, FDdistM = distM, FDtype = 'AUC')
output_ObsAsy_FD_inci

   Assemblage Order.q    qFD  s.e. qFD.LCL qFD.UCL     Method
1   2013-2015     0.0 18.916 1.292  16.384  21.449 Asymptotic
2   2013-2015     0.2 17.828 0.660  16.535  19.120 Asymptotic
3   2013-2015     0.4 17.117 0.439  16.256  17.978 Asymptotic
4   2013-2015     0.6 16.626 0.380  15.880  17.372 Asymptotic
5   2013-2015     0.8 16.285 0.367  15.566  17.005 Asymptotic
6   2013-2015     1.0 16.044 0.363  15.333  16.755 Asymptotic
7   2013-2015     1.2 15.869 0.360  15.163  16.574 Asymptotic
8   2013-2015     1.4 15.738 0.358  15.036  16.439 Asymptotic
9   2013-2015     1.6 15.636 0.356  14.938  16.335 Asymptotic
10  2013-2015     1.8 15.556 0.355  14.860  16.253 Asymptotic

The following commands plot the corresponding q-profiles, along with its confidence interval for q between 0 to 2.

# q-profile curves
ggObsAsy3D(output_ObsAsy_FD_inci, profile = "q")

License

The iNEXT.3D package is licensed under the GPLv3. To help refine iNEXT.3D, your comments or feedback would be welcome (please send them to Anne Chao or report an issue on the iNEXT.3D github iNEXT.3D_github.

References

Chao, A., Henderson, P. A., Chiu, C.-H., Moyes, F., Hu, K.-H., Dornelas, M. and Magurran, A. E. (2021). Measuring temporal change in alpha diversity: a framework integrating taxonomic, phylogenetic and functional diversity and the iNEXT.3D standardization. Methods in Ecology and Evolution, 12, 1926-1940.
Hsieh, T. C., Ma, K-H, and Chao, A. (2016). iNEXT: An R package for rarefaction and extrapolation of species diversity (Hill numbers). Methods in Ecology and Evolution, 7, 1451-1456.