| Title: | Calculation and Interpretation of Data Cube Indicator Uncertainty |
|---|---|
| Description: | This R package provides functions to explore data cubes using simple measures and cross-validation techniques. It can also be used for uncertainty calculation using the bootstrap resampling method, and functionality is provided for efficient interpretation and visualisation of uncertainty related to indicators based on occurrence cubes. |
| Authors: | Ward Langeraert [aut, cre] (ORCID: <https://orcid.org/0000-0002-5900-8109>, affiliation: Research Institute for Nature and Forest (INBO)), Toon Van Daele [aut] (ORCID: <https://orcid.org/0000-0002-1362-853X>, affiliation: Research Institute for Nature and Forest (INBO)), Research Institute for Nature and Forest (INBO) [cph, pbl] (ROR: <https://ror.org/00j54wy13>), European Union (ID 101059592) [fnd] (grant_id: 101059592) |
| Maintainer: | Ward Langeraert <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.12.3 |
| Built: | 2026-05-29 09:18:27 UTC |
| Source: | https://github.com/b-cubed-eu/dubicube |
This function adds classified effects to a dataframe as ordered factor variables by comparing the confidence intervals with a reference and thresholds.
add_effect_classification( df, cl_columns, threshold, reference = 0, coarse = TRUE )add_effect_classification( df, cl_columns, threshold, reference = 0, coarse = TRUE )
df |
A dataframe containing summary data of confidence limits. Two
columns are required containing lower and upper limits indicated by the
|
cl_columns |
A vector of 2 column names in |
threshold |
A vector of either 1 or 2 thresholds. A single threshold
will be transformed into |
reference |
The null hypothesis value to compare confidence intervals against. Defaults to 0. |
coarse |
Logical, defaults to |
This function is a wrapper around effectclass::classify() and
effectclass::coarse_classification() from the effectclass package
(Onkelinx, 2023). They classify effects in a stable and transparent manner.
| Symbol | Fine effect / trend | Coarse effect / trend | Rule |
++ |
strong positive effect / strong increase | positive effect / increase | confidence interval above the upper threshold |
+ |
positive effect / increase | positive effect / increase | confidence interval above reference and contains the upper threshold |
+~ |
moderate positive effect / moderate increase | positive effect / increase | confidence interval between reference and the upper threshold |
~ |
no effect / stable | no effect / stable | confidence interval between thresholds and contains reference |
-~ |
moderate negative effect / moderate decrease | negative effect / decrease | confidence interval between reference and the lower threshold |
- |
negative effect / decrease | negative effect / decrease | confidence interval below reference and contains the lower threshold |
-- |
strong negative effect / strong decrease | negative effect / decrease | confidence interval below the lower threshold |
?+ |
potential positive effect / potential increase | unknown effect / unknown | confidence interval contains reference and the upper threshold |
?- |
potential negative effect / potential decrease | unknown effect / unknown | confidence interval contains reference and the lower threshold |
? |
unknown effect / unknown | unknown effect / unknown | confidence interval contains the lower and upper threshold |
The returned value is a modified version of the original input
dataframe df with additional columns effect_code and effect containing
respectively the effect symbols and descriptions as ordered factor variables.
In case of coarse = TRUE (by default) also effect_code_coarse and
effect_coarse containing the coarse classification effects.
Onkelinx, T. (2023). effectclass: Classification and visualisation of effects [Computer software]. https://inbo.github.io/effectclass/
# Example dataset ds <- data.frame( mean = c(0, 0.5, -0.5, 1, -1, 1.5, -1.5, 0.5, -0.5, 0), sd = c(1, 0.5, 0.5, 0.5, 0.5, 0.25, 0.25, 0.25, 0.25, 0.5) ) ds$lcl <- qnorm(0.05, ds$mean, ds$sd) ds$ucl <- qnorm(0.95, ds$mean, ds$sd) add_effect_classification( df = ds, cl_columns = c("lcl", "ucl"), threshold = 1, reference = 0, coarse = TRUE )# Example dataset ds <- data.frame( mean = c(0, 0.5, -0.5, 1, -1, 1.5, -1.5, 0.5, -0.5, 0), sd = c(1, 0.5, 0.5, 0.5, 0.5, 0.25, 0.25, 0.25, 0.25, 0.5) ) ds$lcl <- qnorm(0.05, ds$mean, ds$sd) ds$ucl <- qnorm(0.95, ds$mean, ds$sd) add_effect_classification( df = ds, cl_columns = c("lcl", "ucl"), threshold = 1, reference = 0, coarse = TRUE )
This function calculates a basic confidence interval from a bootstrap sample.
It is used by calculate_bootstrap_ci().
basic_ci(t0, t, conf = 0.95, h = function(t) t, hinv = function(t) t)basic_ci(t0, t, conf = 0.95, h = function(t) t, hinv = function(t) t)
t0 |
Original statistic. |
t |
Numeric vector of bootstrap replicates. |
conf |
A numeric value specifying the confidence level of the interval.
Default is |
h |
A function defining a transformation. The intervals are calculated
on the scale of |
hinv |
A function, like |
where and
are the and
percentiles of the bootstrap distribution, respectively.
A matrix with four columns:
conf: confidence level
rk_lower: rank of lower endpoint (interpolated)
rk_upper: rank of upper endpoint (interpolated)
ll: lower confidence limit
ul: lower confidence limit
This function is adapted from the internal function basic.ci()
in the boot package (Canty & Ripley, 1999).
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Other interval_calculation:
bca_ci(),
norm_ci(),
perc_ci()
set.seed(123) boot_reps <- rnorm(1000) t0 <- mean(boot_reps) # Basic bootstrap CI basic_ci(t0, boot_reps, conf = 0.95)set.seed(123) boot_reps <- rnorm(1000) t0 <- mean(boot_reps) # Basic bootstrap CI basic_ci(t0, boot_reps, conf = 0.95)
Returns basic diagnostic rules used by diagnose_cube().
Each rule defines how a specific data quality metric is computed and
evaluated.
basic_cube_rules()basic_cube_rules()
Rules are implemented as lists containing:
id – name of the diagnostic metric
dimension – cube dimension being evaluated (e.g. temporal)
thresholds – reference values used to determine severity
compute() – function that calculates the metric
severity() – function assigning a severity level
message() – function generating a human-readable message
Contains the following rules:
rule_temporal_min_years(): Number of years
rule_temporal_missing_years(): Missing years
rule_spatial_min_cells(): Number of grid cells
rule_spatial_max_uncertainty(): Number of records where coordinate
uncertainty is larger than grid resolution
rule_spatial_miss_uncertainty: Number of records with missing coordinate
uncertainty
rule_taxon_min_taxa(): Number of taxa
rule_obs_min_records(): Number of records (rows)
rule_obs_min_total(): Total number of observations (sum)
Default thresholds are used.
A list of diagnostic rule definitions.
This function calculates a Bias-Corrected and Accelerated (BCa) confidence
interval from a bootstrap sample. It is used by calculate_bootstrap_ci().
bca_ci(t0, t, a, conf = 0.95, h = function(t) t, hinv = function(t) t)bca_ci(t0, t, a, conf = 0.95, h = function(t) t, hinv = function(t) t)
t0 |
Original statistic. |
t |
Numeric vector of bootstrap replicates. |
a |
Acceleration constant. See also |
conf |
A numeric value specifying the confidence level of the interval.
Default is |
h |
A function defining a transformation. The intervals are calculated
on the scale of |
hinv |
A function, like |
Adjusts for bias and acceleration. Bias refers to the systematic difference between the observed statistic from the original dataset and the center of the bootstrap distribution of the statistic. The bias correction term is calculated as follows:
where is the counting operator, counting the number of times
is smaller than , and
the inverse cumulative density function of the standard
normal distribution. is the number of bootstrap samples.
Acceleration quantifies how sensitive the variability of the statistic is
to changes in the data.
See calculate_acceleration() on how this is calculated.
: The statistic's variability does not depend on the data
(e.g., symmetric distribution)
: Small changes in the data have a large effect on the
statistic's variability (e.g., positive skew)
: Small changes in the data have a smaller effect on the
statistic's variability (e.g., negative skew).
The bias and acceleration estimates are then used to calculate adjusted percentiles.
,
So, we get
A matrix with four columns:
conf: confidence level
rk_lower: rank of lower endpoint (interpolated)
rk_upper: rank of upper endpoint (interpolated)
ll: lower confidence limit
ul: lower confidence limit
This function is adapted from the internal function bca.ci()
in the boot package (Canty & Ripley, 1999).
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Other interval_calculation:
basic_ci(),
norm_ci(),
perc_ci()
set.seed(123) boot_reps <- rnorm(1000) t0 <- mean(boot_reps) # Example acceleration value (normally estimated via jackknife) a <- 0.01 # BCa bootstrap CI bca_ci(t0, boot_reps, a, conf = 0.95)set.seed(123) boot_reps <- rnorm(1000) t0 <- mean(boot_reps) # Example acceleration value (normally estimated via jackknife) a <- 0.01 # BCa bootstrap CI bca_ci(t0, boot_reps, a, conf = 0.95)
This function converts a named list of "boot" objects
(typically produced by bootstrap_cube() into a single
long-format dataframe. Each element of the list is assumed to correspond
to one group, with the list names defining the values of the grouping
variable.
boot_list_to_dataframe(boot_list, grouping_var)boot_list_to_dataframe(boot_list, grouping_var)
boot_list |
A named list of objects of class |
grouping_var |
A character string giving the name of the grouping
variable (e.g. |
This function is primarily intended for use with bootstrapping using the
bootstrap_cube() function generated with boot methods.
The function assumes that each boot object in boot_list
contains a single bootstrap statistic per replicate (i.e. boot$t
is a vector or a one-column matrix).
A dataframe with the following columns:
sample: Sample ID of the bootstrap replicate
est_original: The statistic based on the full dataset per group
rep_boot: The statistic based on a bootstrapped dataset (bootstrap
replicate)
est_boot: The bootstrap estimate (mean of bootstrap replicates per
group)
se_boot: The standard error of the bootstrap estimate (standard
deviation of the bootstrap replicates per group)
bias_boot: The bias of the bootstrap estimate per group
Other indicator_uncertainty_helper:
bootstrap_cube_raw(),
calculate_acceleration(),
calculate_boot_ci_from_boot(),
resolve_bootstrap_method()
## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", samples = 1000, method = "boot_group_specific", seed = 123 ) bootstrap_df <- boot_list_to_dataframe( boot_list = bootstrap_mean_obs, grouping_var = "year" ) head(bootstrap_df) ## End(Not run)## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", samples = 1000, method = "boot_group_specific", seed = 123 ) bootstrap_df <- boot_list_to_dataframe( boot_list = bootstrap_mean_obs, grouping_var = "year" ) head(bootstrap_df) ## End(Not run)
This function generate samples bootstrap replicates of a statistic applied
to a data cube. It resamples the data cube and computes a statistic fun for
each bootstrap replicate, optionally comparing the results to a reference
group (ref_group).
bootstrap_cube( data_cube, fun, ..., grouping_var, samples = 1000, ref_group = NA, seed = NA, processed_cube = TRUE, method = "smart", progress = FALSE, boot_args = list() )bootstrap_cube( data_cube, fun, ..., grouping_var, samples = 1000, ref_group = NA, seed = NA, processed_cube = TRUE, method = "smart", progress = FALSE, boot_args = list() )
data_cube |
A data cube object (class 'processed_cube' or 'sim_cube',
see |
fun |
A function which, when applied to |
... |
Additional arguments passed on to |
grouping_var |
A character vector specifying the grouping variable(s)
for the bootstrap analysis. The function |
samples |
The number of bootstrap replicates. A single positive integer. Default is 1000. |
ref_group |
A string indicating the reference group to compare the
statistic with. Default is |
seed |
A positive numeric value setting the seed for random number
generation to ensure reproducibility. If |
processed_cube |
Logical. If |
method |
A character string specifying the bootstrap method. Options include:
|
progress |
Logical. Whether to show a progress bar. Set to |
boot_args |
Named list of additional arguments passed to |
Bootstrapping is a statistical technique used to estimate the distribution of a statistic by resampling with replacement from the original data (Davison & Hinkley, 1997; Efron & Tibshirani, 1994). In the case of data cubes, each row is sampled with replacement. Below are the common notations used in bootstrapping:
Original Sample Data:
The initial set of data points. Here, is the sample
size. This corresponds to the number of cells in a data cube or the number
of rows in tabular format.
Statistic of Interest:
The parameter or statistic being estimated, such as the mean
, variance , or a biodiversity indicator. Let
denote the estimated value of calculated
from the complete dataset .
Bootstrap Sample:
A sample of size drawn with replacement from the original sample
. Each is drawn independently from
.
A total of bootstrap samples are drawn from the original data.
Common choices for are 1000 or 10,000 to ensure a good
approximation of the distribution of the bootstrap replications (see
further).
Bootstrap Replication:
The value of the statistic of interest calculated from the -th
bootstrap sample . For example, if is
the sample mean, .
Bootstrap Statistics:
Bootstrap Estimate of the Statistic:
The average of the bootstrap replications:
Bootstrap Bias:
This bias indicates how much the bootstrap estimate deviates from the original sample estimate. It is calculated as the difference between the average bootstrap estimate and the original estimate:
Bootstrap Standard Error:
The standard deviation of the bootstrap replications, which estimates the variability of the statistic.
There are two methods for bootstrapping:
Whole-cube bootstrapping: resampling all rows in the cube, regardless of grouping. For indicators that are use data across groups.
Group-specific bootstrapping: resampling rows only within a group of interest (e.g., a species, year, or habitat). For indicators that are calculated independently per group.
The default smart option (method = "smart") determines both
(i) whether the indicator is group-specific or whole-cube, and
(ii) whether the boot package should be used.
The decision is made by calculating the statistic on larger and smaller
subsets of the data (containing respectively more and fewer groups in
grouping_var). If indicator values for the common groups are identical,
the indicator is treated as group-specific; otherwise, it is treated as
whole-cube.
If no reference group is used (ref_group = NA), method = "smart"
resolves to "boot_group_specific" or "boot_whole_cube", both of which
use boot::boot(). If a reference group is specified, method = "smart"
resolves to "group_specific" or "whole_cube" and bootstrapping is
handled internally.
A dataframe containing the bootstrap results with the following columns:
sample: Sample ID of the bootstrap replicate
est_original: The statistic based on the full dataset per group
rep_boot: The statistic based on a bootstrapped dataset (bootstrap
replicate)
est_boot: The bootstrap estimate (mean of bootstrap replicates per
group)
se_boot: The standard error of the bootstrap estimate (standard
deviation of the bootstrap replicates per group)
bias_boot: The bias of the bootstrap estimate per group
If method resolves to "boot_whole_cube" or "boot_group_specific",
the returned value is a named list of objects of class "boot", as produced
by boot::boot().
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap (1st ed.). Chapman and Hall/CRC. doi:10.1201/9780429246593
Other indicator_uncertainty:
calculate_bootstrap_ci()
## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", samples = 1000, seed = 123 ) ## End(Not run)## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", samples = 1000, seed = 123 ) ## End(Not run)
This function generate samples bootstrap replicates of a statistic applied
to a dataframe. It resamples the data cube and computes a statistic fun for
each bootstrap replicate, optionally comparing the results to a reference
group (ref_group). Bootstrapping happens over the whole dataset
data_cube.
bootstrap_cube_raw( data_cube, fun, ..., grouping_var, samples = 1000, ref_group = NA, seed = NA, progress = FALSE )bootstrap_cube_raw( data_cube, fun, ..., grouping_var, samples = 1000, ref_group = NA, seed = NA, progress = FALSE )
data_cube |
A dataframe. |
fun |
A function which, when applied to |
... |
Additional arguments passed on to |
grouping_var |
A character vector specifying the grouping variable(s)
for the bootstrap analysis. The function |
samples |
The number of bootstrap replicates. A single positive integer. Default is 1000. |
ref_group |
A string indicating the reference group to compare the
statistic with. Default is |
seed |
A positive numeric value setting the seed for random number
generation to ensure reproducibility. If |
progress |
Logical. Whether to show a progress bar. Set to |
Bootstrapping is a statistical technique used to estimate the distribution of a statistic by resampling with replacement from the original data (Davison & Hinkley, 1997; Efron & Tibshirani, 1994). In the case of data cubes, each row is sampled with replacement. Below are the common notations used in bootstrapping:
Original Sample Data:
The initial set of data points. Here, is the sample
size. This corresponds to the number of cells in a data cube or the number
of rows in tabular format.
Statistic of Interest:
The parameter or statistic being estimated, such as the mean
, variance , or a biodiversity indicator. Let
denote the estimated value of calculated
from the complete dataset .
Bootstrap Sample:
A sample of size drawn with replacement from the original sample
. Each is drawn independently from
.
A total of bootstrap samples are drawn from the original data.
Common choices for are 1000 or 10,000 to ensure a good
approximation of the distribution of the bootstrap replications (see
further).
Bootstrap Replication:
The value of the statistic of interest calculated from the -th
bootstrap sample . For example, if is
the sample mean, .
Bootstrap Statistics:
Bootstrap Estimate of the Statistic:
The average of the bootstrap replications:
Bootstrap Bias:
This bias indicates how much the bootstrap estimate deviates from the original sample estimate. It is calculated as the difference between the average bootstrap estimate and the original estimate:
Bootstrap Standard Error:
The standard deviation of the bootstrap replications, which estimates the variability of the statistic.
A dataframe containing the bootstrap results with the following columns:
sample: Sample ID of the bootstrap replicate
est_original: The statistic based on the full dataset per group
rep_boot: The statistic based on a bootstrapped dataset (bootstrap
replicate)
est_boot: The bootstrap estimate (mean of bootstrap replicates per
group)
se_boot: The standard error of the bootstrap estimate (standard
deviation of the bootstrap replicates per group)
bias_boot: The bias of the bootstrap estimate per group
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap (1st ed.). Chapman and Hall/CRC. doi:10.1201/9780429246593
Other indicator_uncertainty_helper:
boot_list_to_dataframe(),
calculate_acceleration(),
calculate_boot_ci_from_boot(),
resolve_bootstrap_method()
## Not run: # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube_raw( data_cube = data, fun = mean_obs, grouping_var = "year", samples = 1000, seed = 123 ) head(bootstrap_mean_obs) ## End(Not run)## Not run: # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube_raw( data_cube = data, fun = mean_obs, grouping_var = "year", samples = 1000, seed = 123 ) head(bootstrap_mean_obs) ## End(Not run)
This function calculates acceleration values, which quantify the sensitivity
of a statistic’s variability to changes in the dataset. Acceleration is used
for bias-corrected and accelerated (BCa) confidence intervals in
calculate_bootstrap_ci().
calculate_acceleration( data_cube, fun, ..., grouping_var, ref_group = NA, influence_method = "usual", processed_cube = TRUE, progress = FALSE )calculate_acceleration( data_cube, fun, ..., grouping_var, ref_group = NA, influence_method = "usual", processed_cube = TRUE, progress = FALSE )
data_cube |
A data cube object (class 'processed_cube' or 'sim_cube',
see |
fun |
A function which, when applied to |
... |
Additional arguments passed on to |
grouping_var |
A character vector specifying the grouping variable(s)
for the bootstrap analysis. The function |
ref_group |
A string indicating the
reference group to compare the statistic with. Default is |
influence_method |
A string specifying the method used for calculating the influence values.
|
processed_cube |
Logical. If |
progress |
Logical. Whether to show a progress bar for jackknifing. Set
to |
Acceleration quantifies how sensitive the variability of a statistic
is to changes in the data.
: The statistic's variability does not depend on the data
(e.g., symmetric distribution)
: Small changes in the data have a large effect on the
statistic's variability (e.g., positive skew)
: Small changes in the data have a smaller effect on the
statistic's variability (e.g., negative skew).
It is used for BCa confidence interval calculation, which adjust for
bias and skewness in bootstrapped distributions (Davison & Hinkley, 1997,
Chapter 5). See also the empinf() function of the boot package in R
(Canty & Ripley, 1999)). The acceleration is calculated as follows:
where denotes the influence of data point on the
estimation of . can be estimated using jackknifing.
Examples are (1) the negative jackknife:
, and (2) the positive
jackknife
(Frangos & Schucany, 1990). Here, is the estimated
value leaving out the ’th data point . The boot
package also offers infinitesimal jackknife and regression estimation.
Implementation of these jackknife algorithms can be explored in the
future.
If a reference group is used, jackknifing is implemented in a different way.
Consider where
is the estimate for the indicator value of a
non-reference period (sample size ) and is the
estimate for the indicator value of a reference period (sample size
). The acceleration is now calculated as follows:
can be calculated using the negative or positive jackknife. Such
that
, and
A dataframe containing the acceleration values per grouping_var.
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Frangos, C. C., & Schucany, W. R. (1990). Jackknife estimation of the bootstrap acceleration constant. Computational Statistics & Data Analysis, 9(3), 271–281. doi:10.1016/0167-9473(90)90109-U
Other indicator_uncertainty_helper:
boot_list_to_dataframe(),
bootstrap_cube_raw(),
calculate_boot_ci_from_boot(),
resolve_bootstrap_method()
## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Calculate acceleration acceleration_df <- calculate_acceleration( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", progress = FALSE ) acceleration_df ## End(Not run)## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Calculate acceleration acceleration_df <- calculate_acceleration( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", progress = FALSE ) acceleration_df ## End(Not run)
This function calculates multiple types of confidence intervals
(normal, basic, percentile, BCa) for a boot object using
boot::boot.ci().
calculate_boot_ci_from_boot( boot_obj, type = c("norm", "basic", "perc", "bca"), conf = 0.95, h = function(t) t, hinv = function(t) t, boot_args = list() )calculate_boot_ci_from_boot( boot_obj, type = c("norm", "basic", "perc", "bca"), conf = 0.95, h = function(t) t, hinv = function(t) t, boot_args = list() )
boot_obj |
A |
type |
A character vector specifying the type(s) of confidence intervals to compute. Options include:
|
conf |
A numeric value specifying the confidence level of the intervals.
Default is |
h |
A function defining a transformation. The intervals are calculated
on the scale of |
hinv |
A function, like |
boot_args |
Named list of additional arguments to pass to
|
A tidy dataframe with columns:
stat_index: Index of statistic in the boot object
est_original: Original estimate from full dataset
int_type: Interval type
ll: Lower confidence limit
ul: Upper confidence limit
conf: Confidence level
Other indicator_uncertainty_helper:
boot_list_to_dataframe(),
bootstrap_cube_raw(),
calculate_acceleration(),
resolve_bootstrap_method()
## Not run: library(boot) # Function to compute the mean mean_fun <- function(data, indices) { mean(data[indices]) } # Bootstrap mean of the 'mpg' variable in mtcars set.seed(123) boot_obj <- boot(data = mtcars$mpg, statistic = mean_fun, R = 1000) # Calculate confidence intervals for all types ci_df <- calculate_boot_ci_from_boot( boot_obj = boot_obj, type = "all", conf = 0.95 ) ci_df ## End(Not run)## Not run: library(boot) # Function to compute the mean mean_fun <- function(data, indices) { mean(data[indices]) } # Bootstrap mean of the 'mpg' variable in mtcars set.seed(123) boot_obj <- boot(data = mtcars$mpg, statistic = mean_fun, R = 1000) # Calculate confidence intervals for all types ci_df <- calculate_boot_ci_from_boot( boot_obj = boot_obj, type = "all", conf = 0.95 ) ci_df ## End(Not run)
This function calculates confidence intervals for a dataframe containing
bootstrap replicates based on different methods, including percentile
(perc), bias-corrected and accelerated (bca), normal (norm), and basic
(basic). The function also supports a boot object from the boot
package.
calculate_bootstrap_ci( bootstrap_results, grouping_var = NULL, type = c("perc", "bca", "norm", "basic"), conf = 0.95, h = function(t) t, hinv = function(t) t, no_bias = FALSE, aggregate = TRUE, data_cube = NA, fun = NA, ..., ref_group = NA, influence_method = ifelse(is.element("bca", type), "usual", NA), progress = FALSE, boot_args = list() )calculate_bootstrap_ci( bootstrap_results, grouping_var = NULL, type = c("perc", "bca", "norm", "basic"), conf = 0.95, h = function(t) t, hinv = function(t) t, no_bias = FALSE, aggregate = TRUE, data_cube = NA, fun = NA, ..., ref_group = NA, influence_method = ifelse(is.element("bca", type), "usual", NA), progress = FALSE, boot_args = list() )
bootstrap_results |
A dataframe with bootstrap replicates,
or a |
grouping_var |
A character vector specifying the grouping variable(s)
for the bootstrap analysis. The function |
type |
A character vector specifying the type(s) of confidence intervals to compute. Options include:
|
conf |
A numeric value specifying the confidence level of the intervals.
Default is |
h |
A function defining a transformation. The intervals are calculated
on the scale of |
hinv |
A function, like |
no_bias |
Logical. If |
aggregate |
Logical. If |
data_cube |
Only used when |
fun |
Only used when |
... |
Additional arguments passed on to |
ref_group |
Only used when |
influence_method |
A string specifying the method used for calculating the influence values.
|
progress |
Logical. Whether to show a progress bar for jackknifing. Set
to |
boot_args |
Named list of additional arguments passed to
|
We consider four different types of intervals (with confidence level
). The choice for confidence interval types and their calculation
is in line with the boot package in R (Canty & Ripley, 1999) to ensure
ease of implementation. They are based on the definitions provided by
Davison & Hinkley (1997, Chapter 5)
(see also DiCiccio & Efron, 1996; Efron, 1987).
Percentile: Uses the percentiles of the bootstrap distribution.
where and
are the and
percentiles of the bootstrap distribution, respectively.
Bias-Corrected and Accelerated (BCa): Adjusts for bias and acceleration
Bias refers to the systematic difference between the observed statistic from the original dataset and the center of the bootstrap distribution of the statistic. The bias correction term is calculated as follows:
where is the counting operator, counting the number of times
is smaller than , and
the inverse cumulative density function of the standard
normal distribution. is the number of bootstrap samples.
Acceleration quantifies how sensitive the variability of the statistic is
to changes in the data.
See calculate_acceleration() on how this is calculated.
: The statistic's variability does not depend on the data
(e.g., symmetric distribution)
: Small changes in the data have a large effect on the
statistic's variability (e.g., positive skew)
: Small changes in the data have a smaller effect on the
statistic's variability (e.g., negative skew).
The bias and acceleration estimates are then used to calculate adjusted percentiles.
,
So, we get
Normal: Assumes the bootstrap distribution of the statistic is approximately normal
where is the quantile of the
standard normal distribution.
Basic: Centers the interval using percentiles
where and
are the and
percentiles of the bootstrap distribution, respectively.
A dataframe containing the bootstrap results with the following columns:
est_original: The statistic based on the full dataset per group
est_boot: The bootstrap estimate (mean of bootstrap replicates per
group)
se_boot: The standard error of the bootstrap estimate (standard
deviation of the bootstrap replicates per group)
bias_boot: The bias of the bootstrap estimate per group
int_type: The interval type
ll: The lower limit of the confidence interval
ul: The upper limit of the confidence interval
conf: The confidence level of the interval
When aggregate = FALSE, the dataframe contains the columns from
bootstrap_results with one row per bootstrap replicate.
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, 11(3). doi:10.1214/ss/1032280214
Efron, B. (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association, 82(397), 171–185. doi:10.1080/01621459.1987.10478410
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap (1st ed.). Chapman and Hall/CRC. doi:10.1201/9780429246593
Other indicator_uncertainty:
bootstrap_cube()
## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", samples = 1000, seed = 123 ) # Calculate confidence limits # Percentile interval ci_mean_obs <- calculate_bootstrap_ci( bootstrap_results = bootstrap_mean_obs, grouping_var = "year", type = "perc", conf = 0.95 ) ci_mean_obs ## End(Not run)## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform bootstrapping bootstrap_mean_obs <- bootstrap_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", samples = 1000, seed = 123 ) # Calculate confidence limits # Percentile interval ci_mean_obs <- calculate_bootstrap_ci( bootstrap_results = bootstrap_mean_obs, grouping_var = "year", type = "perc", conf = 0.95 ) ci_mean_obs ## End(Not run)
This function performs leave-one-out (LOO) or k-fold (experimental) cross-validation (CV) on a biodiversity data cube to assess the performance of a specified indicator function. It partitions the data by a specified variable, calculates the specified indicator on training data, and compares it with the true values to evaluate the influence of one or more categories on the final result.
cross_validate_cube( data_cube, fun, ..., grouping_var, out_var = "taxonKey", crossv_method = c("loo", "kfold"), k = ifelse(crossv_method == "kfold", 5, NA), max_out_cats = 1000, processed_cube = TRUE, progress = FALSE )cross_validate_cube( data_cube, fun, ..., grouping_var, out_var = "taxonKey", crossv_method = c("loo", "kfold"), k = ifelse(crossv_method == "kfold", 5, NA), max_out_cats = 1000, processed_cube = TRUE, progress = FALSE )
data_cube |
A data cube object (class 'processed_cube' or 'sim_cube',
see |
fun |
A function which, when applied to |
... |
Additional arguments passed on to |
grouping_var |
A character vector specifying the grouping variable(s)
for |
out_var |
A string specifying the column by which the data should be
left out iteratively. Default is |
crossv_method |
Method of data partitioning.
If |
k |
Number of folds (an integer). Used only if
|
max_out_cats |
An integer specifying the maximum number of unique
categories in |
processed_cube |
Logical. If |
progress |
Logical. Whether to show a progress bar. Set to |
This function assesses the influence of each category in out_var on the
indicator value by iteratively leaving out one category at a time, similar to
leave-one-out cross-validation. K-fold CV works in a similar fashion but is
experimental and will not be covered here.
Original Sample Data:
The initial set of data points, where there are
different categories in out_var and total samples across all
categories (= the sample size). corresponds to the number of cells
in a data cube or the number of rows in tabular format.
Statistic of Interest:
The parameter or statistic being estimated, such as the mean
, variance , or a biodiversity indicator. Let
denote the estimated value of calculated
from the complete dataset .
Cross-Validation (CV) Sample:
The full dataset excluding all samples belonging to
category . This subset is used to investigate the influence of
category on the estimated statistic .
CV Estimate for Category :
The value of the statistic of interest calculated from
, which excludes category . For example, if
is the sample mean,
.
Error Measures:
The Error is the difference between the statistic estimated without
category () and the statistic calculated
on the complete dataset ().
The Relative Error is the absolute error, normalised by the true
estimate and a small error term
to avoid division by zero.
The Percent Error is the relative error expressed as a percentage.
Summary Measures:
The Mean Relative Error (MRE) is the average of the relative errors over all categories.
The Mean Squared Error (MSE) is the average of the squared errors.
The Root Mean Squared Error (RMSE) is the square root of the MSE.
A dataframe containing the cross-validation results with the following columns:
Cross-Validation id (id_cv)
The grouping variable grouping_var (e.g., year)
The category left out during each cross-validation iteration
(specified out_var with suffix '_out' in lower case)
The computed statistic values for both training (rep_cv) and true
datasets (est_original)
Error metrics: error (error), squared error (sq_error),
absolute difference (abs_error), relative difference (rel_error), and
percent difference (perc_error)
Error metrics summarised by grouping_var: mean relative difference
(mre), mean squared error (mse) and root mean squared error (rmse)
See Details section on how these error metrics are calculated.
## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform leave-one-species-out CV cv_mean_obs <- cross_validate_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", out_var = "taxonKey", crossv_method = "loo", progress = FALSE ) head(cv_mean_obs) ## End(Not run)## Not run: # After processing a data cube with b3gbi::process_cube() # Function to calculate statistic of interest # Mean observations per year mean_obs <- function(x) { out_df <- aggregate(obs ~ year, x, mean) # Calculate mean obs per year names(out_df) <- c("year", "diversity_val") # Rename columns return(out_df) } mean_obs(processed_cube$data) # Perform leave-one-species-out CV cv_mean_obs <- cross_validate_cube( data_cube = processed_cube, fun = mean_obs, grouping_var = "year", out_var = "taxonKey", crossv_method = "loo", progress = FALSE ) head(cv_mean_obs) ## End(Not run)
Evaluates a set of diagnostic rules describing the data quality of a biodiversity occurrence cube. Each rule computes a metric on the cube and assigns a severity level indicating potential limitations of the data for exploratory analysis or indicator calculation.
diagnose_cube(data_cube, rules = "basic", verbose = TRUE, ...)diagnose_cube(data_cube, rules = "basic", verbose = TRUE, ...)
data_cube |
A |
rules |
Diagnostic rules to evaluate. Can be:
|
verbose |
Logical indicating whether a diagnostic summary should be printed. |
... |
Additional arguments passed to |
An object of class cube_diagnostics, containing one row
per metric with the following columns:
dimension: Dimension of the cube being evaluated
(e.g. "spatial", "temporal", "taxonomical").
metric: Name of the diagnostic metric.
value: Computed metric value.
severity: Severity level ("ok", "note", "important",
"very_important").
message: Human-readable description of the diagnostic result.
The rule objects are attached as an attribute of the diagnostics object.
Other data_exploration:
filter_cube()
# Example cube # ! Real cubes should be processed with b3gbi::process_cube() processed_cube <- list( data = data.frame( obs = c(5, 2, 10, 1), year = c(2001, 2001, 2002, 2003), minCoordinateUncertaintyInMeters = c(50, 2000, NA, 10) ), resolutions = "10km" ) class(processed_cube) <- "processed_cube" # Diagnose based on default rules diag <- diagnose_cube(processed_cube) # Sort diagnoses diag <- diagnose_cube(processed_cube, sort_summary = "asc") # Only show at least important diagnoses diag <- diagnose_cube(processed_cube, filter_summary = "important")# Example cube # ! Real cubes should be processed with b3gbi::process_cube() processed_cube <- list( data = data.frame( obs = c(5, 2, 10, 1), year = c(2001, 2001, 2002, 2003), minCoordinateUncertaintyInMeters = c(50, 2000, NA, 10) ), resolutions = "10km" ) class(processed_cube) <- "processed_cube" # Diagnose based on default rules diag <- diagnose_cube(processed_cube) # Sort diagnoses diag <- diagnose_cube(processed_cube, sort_summary = "asc") # Only show at least important diagnoses diag <- diagnose_cube(processed_cube, filter_summary = "important")
Filters observations from a processed_cube based on rule definitions.
Filtering reuses the rule infrastructure used by diagnose_cube(), but
applies row-level filtering logic through rule-specific filter_fn()
functions.
filter_cube( data_cube, rules = NULL, diagnostics = NULL, ..., process_cube_args = list() )filter_cube( data_cube, rules = NULL, diagnostics = NULL, ..., process_cube_args = list() )
data_cube |
A |
rules |
Character vector or list of cube rule objects.
Ignored if |
diagnostics |
Optional |
... |
Additional arguments passed to rule-specific |
process_cube_args |
Named list of additional arguments passed to
|
The function evaluates rule-specific filter_fn() functions that return
a logical vector indicating which rows should be removed. Only rules that
implement a filter_fn() are applied. Rules without a filtering function
are ignored.
Filtering rules operate independently from diagnostic severity levels. For example, a cube may have acceptable overall diagnostics while still containing individual observations that fail filtering criteria.
After filtering, the function attempts to rebuild the cube using
b3gbi::process_cube() to ensure cube metadata remains consistent.
If this function is unavailable or fails, the filtered data replaces
data_cube$data directly and the original cube metadata is retained.
In that case a warning is issued.
A filtered processed_cube.
Other data_exploration:
diagnose_cube()
# Example cube # ! Real cubes should be processed with b3gbi::process_cube() processed_cube <- list( data = data.frame( obs = c(5, 2, 10, 1), year = c(2001, 2001, 2002, 2003), minCoordinateUncertaintyInMeters = c(50, 2000, NA, 10) ), resolutions = "10km" ) class(processed_cube) <- "processed_cube" # Filter cube based on rule filtered_cube1 <- filter_cube( processed_cube, rules = list(rule_spatial_miss_uncertainty()) ) # Filter cube based cube diagnostics diag <- diagnose_cube( processed_cube, rules = list( rule_spatial_miss_uncertainty(), rule_temporal_missing_years() ) ) filtered_cube2 <- filter_cube( processed_cube, diagnostics = diag ) # The results are identical identical(filtered_cube1$data, filtered_cube2$data)# Example cube # ! Real cubes should be processed with b3gbi::process_cube() processed_cube <- list( data = data.frame( obs = c(5, 2, 10, 1), year = c(2001, 2001, 2002, 2003), minCoordinateUncertaintyInMeters = c(50, 2000, NA, 10) ), resolutions = "10km" ) class(processed_cube) <- "processed_cube" # Filter cube based on rule filtered_cube1 <- filter_cube( processed_cube, rules = list(rule_spatial_miss_uncertainty()) ) # Filter cube based cube diagnostics diag <- diagnose_cube( processed_cube, rules = list( rule_spatial_miss_uncertainty(), rule_temporal_missing_years() ) ) filtered_cube2 <- filter_cube( processed_cube, diagnostics = diag ) # The results are identical identical(filtered_cube1$data, filtered_cube2$data)
This function calculates a normal confidence interval from a bootstrap
sample. It is used by calculate_bootstrap_ci().
norm_ci( t0, t, conf = 0.95, h = function(t) t, hinv = function(t) t, no_bias = FALSE )norm_ci( t0, t, conf = 0.95, h = function(t) t, hinv = function(t) t, no_bias = FALSE )
t0 |
Original statistic. |
t |
Numeric vector of bootstrap replicates. |
conf |
A numeric value specifying the confidence level of the interval.
Default is |
h |
A function defining a transformation. The intervals are calculated
on the scale of |
hinv |
A function, like |
no_bias |
Logical. If |
where is the quantile of the
standard normal distribution.
A matrix with four columns:
conf: confidence level
ll: lower confidence limit
ul: lower confidence limit
This function is adapted from the function norm.ci()
in the boot package (Canty & Ripley, 1999).
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Other interval_calculation:
basic_ci(),
bca_ci(),
perc_ci()
set.seed(123) boot_reps <- rnorm(1000) t0 <- mean(boot_reps) # Normal-based CI norm_ci(t0, boot_reps, conf = 0.90) # Without bias correction norm_ci(t0, boot_reps, conf = 0.90, no_bias = TRUE)set.seed(123) boot_reps <- rnorm(1000) t0 <- mean(boot_reps) # Normal-based CI norm_ci(t0, boot_reps, conf = 0.90) # Without bias correction norm_ci(t0, boot_reps, conf = 0.90, no_bias = TRUE)
This function calculates a percentile confidence interval from a bootstrap
sample. It is used by calculate_bootstrap_ci().
perc_ci(t, conf = 0.95, h = function(t) t, hinv = function(t) t)perc_ci(t, conf = 0.95, h = function(t) t, hinv = function(t) t)
t |
Numeric vector of bootstrap replicates. |
conf |
A numeric value specifying the confidence level of the interval.
Default is |
h |
A function defining a transformation. The intervals are calculated
on the scale of |
hinv |
A function, like |
where and
are the and
percentiles of the bootstrap distribution, respectively.
A matrix with four columns:
conf: confidence level
rk_lower: rank of lower endpoint (interpolated)
rk_upper: rank of upper endpoint (interpolated)
ll: lower confidence limit
ul: lower confidence limit
This function is adapted from the internal function perc.ci()
in the boot package (Canty & Ripley, 1999).
Canty, A., & Ripley, B. (1999). boot: Bootstrap Functions (Originally by Angelo Canty for S) [Computer software]. https://CRAN.R-project.org/package=boot
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application (1st ed.). Cambridge University Press. doi:10.1017/CBO9780511802843
Other interval_calculation:
basic_ci(),
bca_ci(),
norm_ci()
set.seed(123) boot_reps <- rnorm(1000) # bootstrap replicates t0 <- mean(boot_reps) # observed statistic # Percentile CI perc_ci(boot_reps, conf = 0.95)set.seed(123) boot_reps <- rnorm(1000) # bootstrap replicates t0 <- mean(boot_reps) # observed statistic # Percentile CI perc_ci(boot_reps, conf = 0.95)
Visualises diagnostic results returned by diagnose_cube(). The plot
summarises the number of diagnostics per severity level and cube dimension.
## S3 method for class 'cube_diagnostics' plot(x, type = "severity", ...)## S3 method for class 'cube_diagnostics' plot(x, type = "severity", ...)
x |
A |
type |
Type of plot. Options are |
... |
Additional arguments passed to other methods (currently unused). |
Three visualisations are supported:
"severity": Number of diagnostics per severity level.
"dimension": Diagnostics grouped by cube dimension.
"rule": Severity levels per diagnostic rule and dimension.
A ggplot object.
Other diagnostic_methods:
print.cube_diagnostics(),
summary.cube_diagnostics()
Displays a human-readable summary of data cube diagnostics produced by
diagnose_cube(). Each diagnostic metric is shown with a severity flag,
the metric name, and a short explanatory message.
## S3 method for class 'cube_diagnostics' print(x, filter_summary = "ok", sort_summary = NA, ...)## S3 method for class 'cube_diagnostics' print(x, filter_summary = "ok", sort_summary = NA, ...)
x |
A |
filter_summary |
Filter the summary output based on a minimum severity
level. Default, all levels are shown: |
sort_summary |
Sort the summary output based on severity level. Options
are descending ( |
... |
Additional arguments passed to other methods (currently unused). |
Severity levels are indicated using coloured symbols:
green ball: ok
yellow ball: note
orange ball: important
red ball: very important
The input object x, returned invisibly.
Other diagnostic_methods:
plot.cube_diagnostics(),
summary.cube_diagnostics()
Resolves the effective bootstrap method to be used by
bootstrap_cube(), combining:
resolve_bootstrap_method( df, fun, ..., cat_var, ref_group = NA, method = "smart" )resolve_bootstrap_method( df, fun, ..., cat_var, ref_group = NA, method = "smart" )
df |
A dataframe. |
fun |
A function which, when applied to |
... |
Additional arguments passed to |
cat_var |
A character vector specifying the grouping variable(s)
used by |
ref_group |
A value indicating the reference group. If |
method |
Character string specifying the bootstrap method.
One of |
the scope of the indicator (group-specific vs whole-cube), and
whether a reference group is used.
When method = "smart", the scope of the indicator is inferred using
derive_bootstrap_method(). If no reference group is specified
(ref_group = NA) and exactly one grouping variable is used
(length(cat_var) == 1), the corresponding boot_* method is selected.
The resolution follows these rules:
If method is not "smart", it is returned unchanged.
If method = "smart", the indicator scope is inferred using
derive_bootstrap_method().
If more than one grouping variable is specified
(length(cat_var) > 1), bootstrapping via the boot package
is disabled and the inferred non-boot method is returned.
If exactly one grouping variable is used and ref_group = NA,
the resolved method is prefixed with "boot_", resulting in
"boot_group_specific" or "boot_whole_cube".
If a reference group is specified, the non-boot variants
"group_specific" or "whole_cube" are returned.
A single character string giving the resolved bootstrap method:
"whole_cube"
"group_specific"
"boot_whole_cube"
"boot_group_specific"
Other indicator_uncertainty_helper:
boot_list_to_dataframe(),
bootstrap_cube_raw(),
calculate_acceleration(),
calculate_boot_ci_from_boot()
# Example 1: Group-specific indicator without a reference group # Mean sepal length per species (calculated independently per group) mean_sepal_length <- function(x) { out_df <- aggregate(Sepal.Length ~ Species, x, mean) names(out_df) <- c("Species", "diversity_val") out_df } resolve_bootstrap_method( df = iris, fun = mean_sepal_length, cat_var = "Species", ref_group = NA, method = "smart" ) # Example 2: Group-specific indicator with a reference group resolve_bootstrap_method( df = iris, fun = mean_sepal_length, cat_var = "Species", ref_group = "setosa", method = "smart" ) # Example 3: Indicator that depends on the whole cube # The statistic per species depends on all species together scaled_sepal_length <- function(x) { out_df <- aggregate(Sepal.Length ~ Species, x, mean) out_df$Sepal.Length <- out_df$Sepal.Length / nrow(out_df) names(out_df) <- c("Species", "diversity_val") out_df } resolve_bootstrap_method( df = iris, fun = scaled_sepal_length, cat_var = "Species", ref_group = NA, method = "smart" )# Example 1: Group-specific indicator without a reference group # Mean sepal length per species (calculated independently per group) mean_sepal_length <- function(x) { out_df <- aggregate(Sepal.Length ~ Species, x, mean) names(out_df) <- c("Species", "diversity_val") out_df } resolve_bootstrap_method( df = iris, fun = mean_sepal_length, cat_var = "Species", ref_group = NA, method = "smart" ) # Example 2: Group-specific indicator with a reference group resolve_bootstrap_method( df = iris, fun = mean_sepal_length, cat_var = "Species", ref_group = "setosa", method = "smart" ) # Example 3: Indicator that depends on the whole cube # The statistic per species depends on all species together scaled_sepal_length <- function(x) { out_df <- aggregate(Sepal.Length ~ Species, x, mean) out_df$Sepal.Length <- out_df$Sepal.Length / nrow(out_df) names(out_df) <- c("Species", "diversity_val") out_df } resolve_bootstrap_method( df = iris, fun = scaled_sepal_length, cat_var = "Species", ref_group = NA, method = "smart" )
Creates a diagnostic rule that evaluates whether a data cube contains a sufficient number of observation records (rows). The rule counts the number of records present in the cube and compares it to a threshold to determine the severity level.
rule_obs_min_records( thresholds = c(ok = 40, note = 30, important = 20, very_important = 0) )rule_obs_min_records( thresholds = c(ok = 40, note = 30, important = 20, very_important = 0) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_total(),
rule_spatial_max_uncertainty(),
rule_spatial_min_cells(),
rule_spatial_miss_uncertainty(),
rule_taxon_min_taxa(),
rule_temporal_min_years(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains a sufficient number of total observations, using a named vector of thresholds for severity classification.
rule_obs_min_total( thresholds = c(ok = 40, note = 30, important = 20, very_important = 0) )rule_obs_min_total( thresholds = c(ok = 40, note = 30, important = 20, very_important = 0) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_spatial_max_uncertainty(),
rule_spatial_min_cells(),
rule_spatial_miss_uncertainty(),
rule_taxon_min_taxa(),
rule_temporal_min_years(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains a records with high coordinate uncertainty. The rule counts the number of records (rows) in the cube where the minimal coordinate uncertainty is larger than the resolution of the grid, and compares it to a threshold to determine the severity level.
rule_spatial_max_uncertainty( thresholds = c(ok = 0, note = 1, important = 3, very_important = 5) )rule_spatial_max_uncertainty( thresholds = c(ok = 0, note = 1, important = 3, very_important = 5) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_obs_min_total(),
rule_spatial_min_cells(),
rule_spatial_miss_uncertainty(),
rule_taxon_min_taxa(),
rule_temporal_min_years(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains a sufficient number of spatial observations (grid cells). The rule counts the number of unique grid cells present in the cube and compares it to a threshold to determine the severity level.
rule_spatial_min_cells( thresholds = c(ok = 5, note = 3, important = 0, very_important = NULL) )rule_spatial_min_cells( thresholds = c(ok = 5, note = 3, important = 0, very_important = NULL) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_obs_min_total(),
rule_spatial_max_uncertainty(),
rule_spatial_miss_uncertainty(),
rule_taxon_min_taxa(),
rule_temporal_min_years(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains a records with missing coordinate uncertainty. The rule counts the number of records (rows) with missing coordinate uncertainty and compares it to a threshold to determine the severity level.
rule_spatial_miss_uncertainty( thresholds = c(ok = 0, note = 1, important = 3, very_important = 5) )rule_spatial_miss_uncertainty( thresholds = c(ok = 0, note = 1, important = 3, very_important = 5) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_obs_min_total(),
rule_spatial_max_uncertainty(),
rule_spatial_min_cells(),
rule_taxon_min_taxa(),
rule_temporal_min_years(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains a sufficient number of taxonomical observations (taxa). The rule counts the number of unique taxa present in the cube and compares it to a threshold to determine the severity level.
rule_taxon_min_taxa( thresholds = c(ok = 5, note = 3, important = 0, very_important = NULL) )rule_taxon_min_taxa( thresholds = c(ok = 5, note = 3, important = 0, very_important = NULL) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_obs_min_total(),
rule_spatial_max_uncertainty(),
rule_spatial_min_cells(),
rule_spatial_miss_uncertainty(),
rule_temporal_min_years(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains a sufficient number of temporal observations (years). The rule counts the number of unique years present in the cube and compares it to a threshold to determine the severity level.
rule_temporal_min_years( thresholds = c(ok = 5, note = 3, important = 0, very_important = NULL) )rule_temporal_min_years( thresholds = c(ok = 5, note = 3, important = 0, very_important = NULL) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_obs_min_total(),
rule_spatial_max_uncertainty(),
rule_spatial_min_cells(),
rule_spatial_miss_uncertainty(),
rule_taxon_min_taxa(),
rule_temporal_missing_years()
Creates a diagnostic rule that evaluates whether a data cube contains missing years. The rule counts the number of missing years present in the cube and compares it to a threshold to determine the severity level.
rule_temporal_missing_years( thresholds = c(ok = 0, note = 1, important = 3, very_important = NULL) )rule_temporal_missing_years( thresholds = c(ok = 0, note = 1, important = 3, very_important = NULL) )
thresholds |
Named numeric vector with severity thresholds: ok, note, important, very_important. Defaults are used if not provided. |
An object of class cube_rule.
Other diagnostic_rules:
rule_obs_min_records(),
rule_obs_min_total(),
rule_spatial_max_uncertainty(),
rule_spatial_min_cells(),
rule_spatial_miss_uncertainty(),
rule_taxon_min_taxa(),
rule_temporal_min_years()
Provides a summary of diagnostic results returned by
diagnose_cube(). The summary reports the number of evaluated
rules, counts per severity level, and the number of diagnostics
per cube dimension.
## S3 method for class 'cube_diagnostics' summary(object, ...)## S3 method for class 'cube_diagnostics' summary(object, ...)
object |
A |
... |
Additional arguments passed to other methods (currently unused). |
An object of class summary_cube_diagnostics, containing
aggregated diagnostic information.
Other diagnostic_methods:
plot.cube_diagnostics(),
print.cube_diagnostics()