Data API#
stochtree.data.Dataset()
#
Wrapper around a C++ class that stores all of the non-outcome data used in stochtree. This includes:
- Features used for partitioning (also referred to as "covariates" in many places in these docs).
- Basis vectors used to define non-constant leaf models. This is optional but may be included via the
add_basismethod. - Variance weights used to define heteroskedastic or otherwise weighted models. This is optional but may be included via the
add_variance_weightsmethod.
add_covariates(covariates)
#
Add covariates to a dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
covariates
|
array
|
Numpy array of covariates. If data contain categorical, string, time series, or other columns in a
dataframe, please first preprocess using the |
required |
add_basis(basis)
#
Add basis matrix to a dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
basis
|
array
|
Numpy array of basis vectors. |
required |
update_basis(basis)
#
Update basis matrix in a dataset. Allows users to build an ensemble whose leaves regress on bases that are updated throughout the sampler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
basis
|
array
|
Numpy array of basis vectors. |
required |
add_variance_weights(variance_weights)
#
Add variance weights to a dataset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variance_weights
|
array
|
Univariate numpy array of variance weights. |
required |
update_variance_weights(variance_weights, exponentiate=False)
#
Update variance weights in a dataset. Allows users to build an ensemble that depends on variance weights that are updated throughout the sampler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
variance_weights
|
array
|
Univariate numpy array of variance weights. |
required |
exponentiate
|
bool
|
Whether to exponentiate the variance weights before storing them in the dataset. |
False
|
num_observations()
#
Query the number of observations in a dataset
Returns:
| Type | Description |
|---|---|
int
|
Number of observations in the dataset |
num_covariates()
#
Query the number of covariates (features) in a dataset
Returns:
| Type | Description |
|---|---|
int
|
Number of covariates in the dataset |
num_basis()
#
Query the dimension of the basis vector in a dataset
Returns:
| Type | Description |
|---|---|
int
|
Dimension of the basis vector in the dataset, returning 0 if the dataset does not have a basis |
get_covariates()
#
Return the covariates in a Dataset as a numpy array
Returns:
| Type | Description |
|---|---|
array
|
Covariate data |
get_basis()
#
Return the bases in a Dataset as a numpy array
Returns:
| Type | Description |
|---|---|
array
|
Basis data |
get_variance_weights()
#
Return the variance weights in a Dataset as a numpy array
Returns:
| Type | Description |
|---|---|
array
|
Variance weights data |
has_basis()
#
Whether or not a dataset has a basis vector (for leaf regression)
Returns:
| Type | Description |
|---|---|
bool
|
|
has_variance_weights()
#
Whether or not a dataset has variance weights
Returns:
| Type | Description |
|---|---|
bool
|
|
stochtree.data.Residual(residual)
#
Wrapper around a C++ class that stores residual data used in stochtree.
This object becomes part of the real-time model "state" in that its contents
always contain a full or partial residual, depending on the state of the sampler.
Typically this object is initialized with the original outcome and then "residualized" by subtracting out the initial prediction value of every tree in every forest term (as well as the predictions of any other model term).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
residual
|
array
|
Univariate numpy array of residual values. |
required |
get_residual()
#
Extract the current values of the residual as a numpy array
Returns:
| Type | Description |
|---|---|
array
|
Current values of the residual (which may be net of any forest / other model terms) |
update_data(new_vector)
#
Update the current state of the outcome (i.e. partial residual) data by replacing each element with the elements of new_vector
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_vector
|
array
|
Univariate numpy array of new residual values. |
required |
add_vector(update_vector)
#
Update the current state of the outcome (i.e. partial residual) data by adding each element of update_vector
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_vector
|
array
|
Univariate numpy array of values to add to the current residual. |
required |
subtract_vector(update_vector)
#
Update the current state of the outcome (i.e. partial residual) data by subtracting each element of update_vector
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_vector
|
array
|
Univariate numpy array of values to subtracted from the current residual. |
required |