Skip to contents

Wrapper around a C++ dataset class used to sample a forest. A dataset consists of three matrices / vectors: covariates, bases, and variance weights. Both the basis vector and variance weights are optional.

This class is intended for advanced use cases in which users require detailed control of sampling algorithms and data structures. Minimal input validation and error checks are performed – users are responsible for providing the correct inputs. For tutorials on the "proper" usage of the stochtree's advanced workflow, we provide several vignettes at https://stochtree.ai/

Public fields

data_ptr

External pointer to a C++ ForestDataset class

Methods


Method new()

Create a new ForestDataset object.

Usage

ForestDataset$new(covariates, basis = NULL, variance_weights = NULL)

Arguments

covariates

Matrix of covariates

basis

(Optional) Matrix of bases used to define a leaf regression

variance_weights

(Optional) Vector of observation-specific variance weights

Returns

A new ForestDataset object.


Method update_basis()

Update basis matrix in a dataset

Usage

ForestDataset$update_basis(basis)

Arguments

basis

Updated matrix of bases used to define a leaf regression


Method update_variance_weights()

Update variance_weights in a dataset

Usage

ForestDataset$update_variance_weights(variance_weights, exponentiate = F)

Arguments

variance_weights

Updated vector of variance weights used to define individual variance / case weights

exponentiate

Whether or not input vector should be exponentiated before being written to the Dataset's variance weights. Default: F.


Method num_observations()

Return number of observations in a ForestDataset object

Usage

ForestDataset$num_observations()

Returns

Observation count


Method num_covariates()

Return number of covariates in a ForestDataset object

Usage

ForestDataset$num_covariates()

Returns

Covariate count


Method num_basis()

Return number of bases in a ForestDataset object

Usage

ForestDataset$num_basis()

Returns

Basis count


Method get_covariates()

Return covariates as an R matrix

Usage

ForestDataset$get_covariates()

Returns

Covariate data


Method get_basis()

Return bases as an R matrix

Usage

ForestDataset$get_basis()

Returns

Basis data


Method get_variance_weights()

Return variance weights as an R vector

Usage

ForestDataset$get_variance_weights()

Returns

Variance weight data


Method has_basis()

Whether or not a dataset has a basis matrix

Usage

ForestDataset$has_basis()

Returns

True if basis matrix is loaded, false otherwise


Method has_variance_weights()

Whether or not a dataset has variance weights

Usage

ForestDataset$has_variance_weights()

Returns

True if variance weights are loaded, false otherwise


Method has_auxiliary_dimension()

Whether or not a dataset has auxiliary data stored at the dimension indicated

Usage

ForestDataset$has_auxiliary_dimension(dim_idx)

Arguments

dim_idx

Dimension of auxiliary data

Returns

True if auxiliary data has been allocated for dim_idx False otherwise


Method add_auxiliary_dimension()

Initialize a new dimension / lane of auxiliary data and allocate data in its place

Usage

ForestDataset$add_auxiliary_dimension(dim_size)

Arguments

dim_size

Size of the new vector of data to allocate

Returns

None


Method get_auxiliary_data_value()

Retrieve auxiliary data value

Usage

ForestDataset$get_auxiliary_data_value(dim_idx, element_idx)

Arguments

dim_idx

Dimension from which data value to be retrieved

element_idx

Element to retrieve from dimension dim_idx

Returns

Floating point value stored in the requested auxiliary data space


Method set_auxiliary_data_value()

Set auxiliary data value

Usage

ForestDataset$set_auxiliary_data_value(dim_idx, element_idx, value)

Arguments

dim_idx

Dimension in which data value to be set

element_idx

Element to set within dimension dim_idx

value

Data value to set at auxiliary data dimension dim_idx and element element_idx

Returns

None


Method get_auxiliary_data_vector()

Retrieve entire auxiliary data vector

Usage

ForestDataset$get_auxiliary_data_vector(dim_idx)

Arguments

dim_idx

Dimension to retrieve

Returns

Vector of all of the auxiliary data stored at dimension dim_idx