Wrapper around a C++ class that stores all of the non-outcome data used in stochtree. This includes:
Features used for partitioning (also referred to as “covariates” in many places in these docs).
Basis vectors used to define non-constant leaf models. This is optional but may be included via the add_basis method.
Variance weights used to define heteroskedastic or otherwise weighted models. This is optional but may be included via the add_variance_weights method.
Numpy array of covariates. If data contain categorical, string, time series, or other columns in a dataframe, please first preprocess using the CovariateTransformer.
required
add_basis
data.Dataset.add_basis(basis)
Add basis matrix to a dataset
Parameters
Name
Type
Description
Default
basis
np.array
Numpy array of basis vectors.
required
update_basis
data.Dataset.update_basis(basis)
Update basis matrix in a dataset. Allows users to build an ensemble whose leaves regress on bases that are updated throughout the sampler.