In-memory python wrapper around a C++ tree ensemble object
Parameters
Name
Type
Description
Default
num_trees
int
Number of trees that each forest should contain
required
output_dimension
int
Dimension of the leaf node parameters in each tree
1
leaf_constant
bool
Whether the leaf node model is “constant” (i.e. prediction is simply a sum of leaf node parameters for every observation in a dataset) or not (i.e. each leaf node parameter is multiplied by a “basis vector” before being returned as a prediction).
True
is_exponentiated
bool
Whether or not the leaf node parameters are stored in log scale (in which case, they must be exponentiated before being returned as predictions).
Add a constant value to every leaf of every tree in an ensemble. If leaves are multi-dimensional, constant_value will be added to every dimension of the leaves.
Multiply every leaf of every tree by a constant value. If leaves are multi-dimensional, constant_multiple will be multiplied through every dimension of the leaves.
When a Forest object is created, it is “empty” in the sense that none
reset_root
forest.Forest.reset_root()
Reset forest to a forest with all single node (i.e. “root”) trees
reset
forest.Forest.reset(forest_container, forest_num)
Reset forest to the forest indexed by forest_num in forest_container
Parameters
Name
Type
Description
Default
forest_container
`ForestContainer
Stochtree object storing tree ensembles
required
forest_num
int
Index of the ensemble used to reset the Forest
required
predict
forest.Forest.predict(dataset)
Predict from each forest in the container, using the provided Dataset object.
Parameters
Name
Type
Description
Default
dataset
Dataset
Python object wrapping the “dataset” class used by C++ sampling and prediction data structures.
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with length equal to the number of observations in dataset.
predict_raw
forest.Forest.predict_raw(dataset)
Predict raw leaf values for a every forest in the container, using the provided Dataset object
Parameters
Name
Type
Description
Default
dataset
Dataset
Python object wrapping the “dataset” class used by C++ sampling and prediction data structures.
required
Returns
Name
Type
Description
np.array
Numpy array with (n, k) dimensions, where n is the number of observations in dataset and k is the dimension of the leaf parameter. If k = 1, then the returned array is simply one-dimensional with n observations.
set_root_leaves
forest.Forest.set_root_leaves(leaf_value)
Set constant (root) leaf node values for every tree in the forest. Assumes the forest consists of all root (single-node) trees.
Parameters
Name
Type
Description
Default
leaf_value
float or np.array
Constant values to which root nodes are to be set. If the trees in forest forest_num are univariate, then leaf_value must be a float, while if the trees in forest forest_num are multivariate, then leaf_value must be a np.array.
required
merge_forest
forest.Forest.merge_forest(other_forest)
Create a larger forest by merging the trees of this forest with those of another forest
Parameters
Name
Type
Description
Default
other_forest
Forest
Forest to be merged into this forest
required
add_constant
forest.Forest.add_constant(constant_value)
Add a constant value to every leaf of every tree in an ensemble. If leaves are multi-dimensional, constant_value will be added to every dimension of the leaves.
Parameters
Name
Type
Description
Default
constant_value
float
Value that will be added to every leaf of every tree
Multiply every leaf of every tree by a constant value. If leaves are multi-dimensional, constant_multiple will be multiplied through every dimension of the leaves.
Parameters
Name
Type
Description
Default
constant_multiple
float
Value that will be multiplied by every leaf of every tree
Retrieve a vector of split counts for every training set variable in a given tree in the forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree for which split counts will be retrieved
required
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with as many elements as in the forest model’s training set, containing the split count for each feature for a given tree of the forest.
Retrieve a vector of split counts for every training set variable in the forest
Parameters
Name
Type
Description
Default
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with as many elements as in the forest model’s training set, containing the overall split count in the forest for each feature.
Retrieve a vector of split counts for every training set variable in the forest, reported separately for each tree
Parameters
Name
Type
Description
Default
num_features
int
Total number of features in the training set
required
Returns
Name
Type
Description
np.array
One-dimensional numpy array with as many elements as in the forest model’s training set, containing the split count for each feature for a every tree in the forest.
num_forest_leaves
forest.Forest.num_forest_leaves()
Return the total number of leaves in a forest
Returns
Name
Type
Description
int
Number of leaves in a forest
sum_leaves_squared
forest.Forest.sum_leaves_squared()
Return the total sum of squared leaf values in a forest
Returns
Name
Type
Description
float
Sum of squared leaf values in a forest
is_leaf_node
forest.Forest.is_leaf_node(tree_num, node_id)
Whether or not a given node of a given tree of a forest is a leaf
tree_num : int Index of the tree to be queried node_id : int Index of the node to be queried
Returns
Name
Type
Description
bool
True if node node_id in tree tree_num is a leaf, False otherwise
Array of category indices that define a categorical split for a given node of a given tree of a forest. Returns np.array([np.Inf]) if the node is a leaf or a numeric split node.
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
node_id
int
Index of the node to be queried
required
Returns
Name
Type
Description
np.array
Array of category indices that define a categorical split for node node_id in tree tree_num.
node_leaf_values
forest.Forest.node_leaf_values(tree_num, node_id)
Leaf node value(s) for a given node of a given tree of a forest. Values are stale if the node is a split node.
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
node_id
int
Index of the node to be queried
required
Returns
Name
Type
Description
np.array
Array of parameter values for node node_id in tree tree_num.
num_nodes
forest.Forest.num_nodes(tree_num)
Number of nodes in a given tree of a forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
Returns
Name
Type
Description
int
Total number of nodes in tree tree_num.
num_leaves
forest.Forest.num_leaves(tree_num)
Number of leaves in a given tree of a forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
Returns
Name
Type
Description
int
Total number of leaves in tree tree_num.
num_leaf_parents
forest.Forest.num_leaf_parents(tree_num)
Number of leaf parents in a given tree of a forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
Returns
Name
Type
Description
int
Total number of leaf parents in tree tree_num.
num_split_nodes
forest.Forest.num_split_nodes(tree_num)
Number of split_nodes in a given tree of a forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
Returns
Name
Type
Description
int
Total number of split nodes in tree tree_num.
nodes
forest.Forest.nodes(tree_num)
Array of node indices in a given tree of a forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
Returns
Name
Type
Description
np.array
Array of indices of nodes in tree tree_num.
leaves
forest.Forest.leaves(tree_num)
Array of leaf indices in a given tree of a forest
Parameters
Name
Type
Description
Default
tree_num
int
Index of the tree to be queried
required
Returns
Name
Type
Description
np.array
Array of indices of leaf nodes in tree tree_num.
is_empty
forest.Forest.is_empty()
When a Forest object is created, it is “empty” in the sense that none of its component trees have leaves with values. There are two ways to “initialize” a Forest object. First, the set_root_leaves() method of the Forest class simply initializes every tree in the forest to a single node carrying the same (user-specified) leaf value. Second, the prepare_for_sampler() method of the ForestSampler class initializes every tree in the forest to a single node with the same value and also propagates this information through to the temporary tracking data structrues in a ForestSampler object, which must be synchronized with a Forest during a forest sampler loop.
Returns
Name
Type
Description
bool
True if a Forest has not yet been initialized with a constant root value, False otherwise if the forest has already been initialized / grown.