StochTree 0.1.1
|
Class storing a "forest," or an ensemble of decision trees. More...
#include <ensemble.h>
Public Member Functions | |
TreeEnsemble (int num_trees, int output_dimension=1, bool is_leaf_constant=true, bool is_exponentiated=false) | |
Initialize a new TreeEnsemble. | |
TreeEnsemble (TreeEnsemble &ensemble) | |
Initialize an ensemble based on the state of an existing ensemble. | |
void | MergeForest (TreeEnsemble &ensemble) |
Combine two forests into a single forest by merging their trees. | |
void | AddValueToLeaves (double constant_value) |
Add a constant value to every leaf of every tree in an ensemble. If leaves are multi-dimensional, constant_value will be added to every dimension of the leaves. | |
void | MultiplyLeavesByValue (double constant_multiple) |
Multiply every leaf of every tree by a constant value. If leaves are multi-dimensional, constant_multiple will be multiplied through every dimension of the leaves. | |
Tree * | GetTree (int i) |
Return a pointer to a tree in the forest. | |
void | ResetRoot () |
Reset a TreeEnsemble to all single-node "root" trees. | |
void | ResetTree (int i) |
Reset a single tree in an ensemble. | |
void | ResetInitTree (int i) |
Reset a single tree in an ensemble. | |
void | CloneFromExistingTree (int i, Tree *tree) |
Clone a single tree in an ensemble from an existing tree, overwriting current tree. | |
void | ReconstituteFromForest (TreeEnsemble &ensemble) |
Reset an ensemble to clone another ensemble. | |
int | GetMaxLeafIndex () |
Obtain a 0-based "maximum" leaf index for an ensemble, which is equivalent to the sum of the number of leaves in each tree. This is used in conjunction with PredictLeafIndicesInplace , which returns an observation-specific leaf index for every observation-tree pair. | |
void | PredictLeafIndicesInplace (ForestDataset *dataset, std::vector< int32_t > &output, int num_trees, data_size_t n) |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1 . We compute this at the tree-level and coordinate this computation at the ensemble level. | |
void | PredictLeafIndicesInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, std::vector< int32_t > &output, int num_trees, data_size_t n) |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1 . We compute this at the tree-level and coordinate this computation at the ensemble level. | |
void | PredictLeafIndicesInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, Eigen::Map< Eigen::Matrix< int, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &output, int column_ind, int num_trees, data_size_t n) |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1 . We compute this at the tree-level and coordinate this computation at the ensemble level. | |
void | PredictLeafIndicesInplace (Eigen::MatrixXd &covariates, std::vector< int32_t > &output, int num_trees, data_size_t n) |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1 . We compute this at the tree-level and coordinate this computation at the ensemble level. | |
std::vector< int32_t > | PredictLeafIndices (ForestDataset *dataset) |
Same as PredictLeafIndicesInplace but assumes responsibility for allocating and returning output vector. | |
json | to_json () |
Save to JSON. | |
void | from_json (const json &ensemble_json) |
Load from JSON. | |
Class storing a "forest," or an ensemble of decision trees.
|
inline |
Initialize a new TreeEnsemble.
num_trees | Number of trees in a forest |
output_dimension | Dimension of the leaf node parameter |
is_leaf_constant | Whether or not the leaves of each tree are treated as "constant." If true, then predicting from an ensemble is simply a matter or determining which leaf node an observation falls into. If false, prediction will multiply a leaf node's parameter(s) for a given observation by a basis vector. |
is_exponentiated | Whether or not the leaves of each tree are stored in log scale. If true, leaf predictions are exponentiated before their prediction is returned. |
|
inline |
Initialize an ensemble based on the state of an existing ensemble.
ensemble | TreeEnsemble used to initialize the current ensemble |
|
inline |
Combine two forests into a single forest by merging their trees.
ensemble | Reference to another TreeEnsemble that will be merged into the current ensemble |
|
inline |
Add a constant value to every leaf of every tree in an ensemble. If leaves are multi-dimensional, constant_value
will be added to every dimension of the leaves.
constant_value | Value that will be added to every leaf of every tree |
|
inline |
Multiply every leaf of every tree by a constant value. If leaves are multi-dimensional, constant_multiple
will be multiplied through every dimension of the leaves.
constant_multiple | Value that will be multiplied by every leaf of every tree |
|
inline |
Return a pointer to a tree in the forest.
i | Index (0-based) of a tree to be queried |
|
inline |
Reset a single tree in an ensemble.
i | Index (0-based) of the tree to be reset |
|
inline |
Reset a single tree in an ensemble.
i | Index (0-based) of the tree to be reset |
|
inline |
Clone a single tree in an ensemble from an existing tree, overwriting current tree.
i | Index of the tree to be overwritten |
tree | Pointer to tree used to clone tree i |
|
inline |
Reset an ensemble to clone another ensemble.
ensemble | Reference to an existing TreeEnsemble |
|
inline |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1
. We compute this at the tree-level and coordinate this computation at the ensemble level.
Note: this assumes the creation of a vector of column indices of size dataset.NumObservations()
x ensemble.NumTrees()
ForestDataset | Dataset with which to predict leaf indices from the tree |
output | Vector of length num_trees*n which stores the leaf node prediction |
num_trees | Number of trees in an ensemble |
n | Size of dataset |
|
inline |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1
. We compute this at the tree-level and coordinate this computation at the ensemble level.
Note: this assumes the creation of a vector of column indices of size dataset.NumObservations()
x ensemble.NumTrees()
covariates | Matrix of covariates |
output | Vector of length num_trees*n which stores the leaf node prediction |
num_trees | Number of trees in an ensemble |
n | Size of dataset |
|
inline |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1
. We compute this at the tree-level and coordinate this computation at the ensemble level.
Note: this assumes the creation of a matrix of column indices with num_trees*n
rows and as many columns as forests that were requested from R / Python
covariates | Matrix of covariates |
output | Matrix with num_trees*n rows and as many columns as forests that were requested from R / Python |
column_ind | Index of column in output into which the result should be unpacked |
num_trees | Number of trees in an ensemble |
n | Size of dataset |
|
inline |
Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1
. We compute this at the tree-level and coordinate this computation at the ensemble level.
Note: this assumes the creation of a vector of column indices of size dataset.NumObservations()
x ensemble.NumTrees()
ForestDataset | Dataset with which to predict leaf indices from the tree |
output | Vector of length num_trees*n which stores the leaf node prediction |
num_trees | Number of trees in an ensemble |
n | Size of dataset |
|
inline |
Same as PredictLeafIndicesInplace
but assumes responsibility for allocating and returning output vector.
ForestDataset | Dataset with which to predict leaf indices from the tree |