Container that stores sampled (and retained) tree ensembles from BART, BCF or a custom sampler.
Parameters
Name
Type
Description
Default
num_trees
int
Number of trees that each forest should contain
required
output_dimension
int
Dimension of the leaf node parameters in each tree
1
leaf_constant
bool
Whether the leaf node model is “constant” (i.e. prediction is simply a sum of leaf node parameters for every observation in a dataset) or not (i.e. each leaf node parameter is multiplied by a “basis vector” before being returned as a prediction).
True
is_exponentiated
bool
Whether or not the leaf node parameters are stored in log scale (in which case, they must be exponentiated before being returned as predictions).
Modify the ForestContainer by removing the forest sample indexed by forest_num.
predict
forest.ForestContainer.predict(dataset)
Predict from each forest in the container, using the provided Dataset object.
Parameters
Name
Type
Description
Default
dataset
Dataset
Python object wrapping the “dataset” class used by C++ sampling and prediction data structures.
required
Returns
Name
Type
Description
np.array
Numpy array with (n, m) dimensions, where n is the number of observations in dataset and m is the number of samples in the forest container.
predict_raw
forest.ForestContainer.predict_raw(dataset)
Predict raw leaf values for a every forest in the container, using the provided Dataset object
Parameters
Name
Type
Description
Default
dataset
Dataset
Python object wrapping the “dataset” class used by C++ sampling and prediction data structures.
required
Returns
Name
Type
Description
np.array
Numpy array with (n, k, m) dimensions, where n is the number of observations in dataset, k is the dimension of the leaf parameter, and m is the number of samples in the forest container. If k = 1, then the returned array is simply (n, m) dimensions.
Set constant (root) leaf node values for every tree in the forest indexed by forest_num. Assumes the forest consists of all root (single-node) trees.
Parameters
Name
Type
Description
Default
forest_num
int
Index of the forest for which we will set root node parameters.
required
leaf_value
float or np.array
Constant values to which root nodes are to be set. If the trees in forest forest_num are univariate, then leaf_value must be a float, while if the trees in forest forest_num are multivariate, then leaf_value must be a np.array.
required
collapse
forest.ForestContainer.collapse(batch_size)
Collapse forests in this container by a pre-specified batch size. For example, if we have a container of twenty 10-tree forests, and we specify a batch_size of 5, then this method will yield four 50-tree forests. “Excess” forests remaining after the size of a forest container is divided by batch_size will be pruned from the beginning of the container (i.e. earlier sampled forests will be deleted). This method has no effect if batch_size is larger than the number of forests in a container.
Parameters
Name
Type
Description
Default
batch_size
int
Number of forests to be collapsed into a single forest
Retrieve a vector of split counts for every training set feature in a given tree in a given forest
Parameters
Name
Type
Description
Default
forest_num
int
Index of the forest which contains tree tree_num
required
tree_num
int
Index of the tree for which split counts will be retrieved
required
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with as many elements as in the forest model’s training set, containing the split count for each feature for a given forest and tree.
Retrieve a vector of split counts for every training set feature in a given forest
Parameters
Name
Type
Description
Default
forest_num
int
Index of the forest which contains tree tree_num
required
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with as many elements as in the forest model’s training set, containing the split count for each feature for a given forest (summed across every tree in the forest).
Retrieve a vector of split counts for every training set feature, aggregated across ensembles and trees.
Parameters
Name
Type
Description
Default
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with as many elements as in the forest model’s training set, containing the split count for each feature summed across every forest of every tree in the container.
Retrieve a vector of split counts for every training set variable in a given forest, reported separately for each ensemble and tree
Parameters
Name
Type
Description
Default
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
Three-dimensional numpy array, containing the number of splits a variable receives in each tree of each forest in a ForestContainer. Array will have dimensions (m,b,p) where m is the number of forests in the container, b is the number of trees in each forest, and p is the number of features in the forest model’s training dataset.
Threshold that defines a numeric split for a given node of a given tree in a given forest in the ForestContainer. Returns np.Inf if the node is a leaf or a categorical split node.
Parameters
Name
Type
Description
Default
forest_num
int
Index of the forest to be queried
required
tree_num
int
Index of the tree to be queried
required
node_id
int
Index of the node to be queried
required
Returns
Name
Type
Description
float
Threshold that defines a numeric split for node node_id in tree tree_num of forest forest_num.
Array of category indices that define a categorical split for a given node of a given tree in a given forest in the ForestContainer. Returns np.array([np.Inf]) if the node is a leaf or a numeric split node.
Parameters
Name
Type
Description
Default
forest_num
int
Index of the forest to be queried
required
tree_num
int
Index of the tree to be queried
required
node_id
int
Index of the node to be queried
required
Returns
Name
Type
Description
np.array
Array of category indices that define a categorical split for node node_id in tree tree_num of forest forest_num.