StochTree 0.0.1
Loading...
Searching...
No Matches
Public Member Functions | List of all members
StochTree::TreeEnsemble Class Reference

Class storing a "forest," or an ensemble of decision trees. More...

#include <ensemble.h>

Public Member Functions

 TreeEnsemble (int num_trees, int output_dimension=1, bool is_leaf_constant=true, bool is_exponentiated=false)
 Initialize a new TreeEnsemble.
 
 TreeEnsemble (TreeEnsemble &ensemble)
 Initialize an ensemble based on the state of an existing ensemble.
 
TreeGetTree (int i)
 Return a pointer to a tree in the forest.
 
void ResetRoot ()
 Reset a TreeEnsemble to all single-node "root" trees.
 
void ResetTree (int i)
 Reset a single tree in an ensemble.
 
void ResetInitTree (int i)
 Reset a single tree in an ensemble.
 
void CloneFromExistingTree (int i, Tree *tree)
 Clone a single tree in an ensemble from an existing tree, overwriting current tree.
 
void ReconstituteFromForest (TreeEnsemble &ensemble)
 Reset an ensemble to clone another ensemble.
 
int GetMaxLeafIndex ()
 Obtain a 0-based "maximum" leaf index for an ensemble, which is equivalent to the sum of the number of leaves in each tree. This is used in conjunction with PredictLeafIndicesInplace, which returns an observation-specific leaf index for every observation-tree pair.
 
void PredictLeafIndicesInplace (ForestDataset *dataset, std::vector< int32_t > &output, int num_trees, data_size_t n)
 Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.
 
void PredictLeafIndicesInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, std::vector< int32_t > &output, int num_trees, data_size_t n)
 Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.
 
void PredictLeafIndicesInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, Eigen::Map< Eigen::Matrix< int, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &output, int column_ind, int num_trees, data_size_t n)
 Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.
 
void PredictLeafIndicesInplace (Eigen::MatrixXd &covariates, std::vector< int32_t > &output, int num_trees, data_size_t n)
 Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.
 
std::vector< int32_t > PredictLeafIndices (ForestDataset *dataset)
 Same as PredictLeafIndicesInplace but assumes responsibility for allocating and returning output vector.
 
json to_json ()
 Save to JSON.
 
void from_json (const json &ensemble_json)
 Load from JSON.
 

Detailed Description

Class storing a "forest," or an ensemble of decision trees.

Constructor & Destructor Documentation

◆ TreeEnsemble() [1/2]

StochTree::TreeEnsemble::TreeEnsemble ( int  num_trees,
int  output_dimension = 1,
bool  is_leaf_constant = true,
bool  is_exponentiated = false 
)
inline

Initialize a new TreeEnsemble.

Parameters
num_treesNumber of trees in a forest
output_dimensionDimension of the leaf node parameter
is_leaf_constantWhether or not the leaves of each tree are treated as "constant." If true, then predicting from an ensemble is simply a matter or determining which leaf node an observation falls into. If false, prediction will multiply a leaf node's parameter(s) for a given observation by a basis vector.
is_exponentiatedWhether or not the leaves of each tree are stored in log scale. If true, leaf predictions are exponentiated before their prediction is returned.

◆ TreeEnsemble() [2/2]

StochTree::TreeEnsemble::TreeEnsemble ( TreeEnsemble ensemble)
inline

Initialize an ensemble based on the state of an existing ensemble.

Parameters
ensembleTreeEnsemble used to initialize the current ensemble

Member Function Documentation

◆ GetTree()

Tree * StochTree::TreeEnsemble::GetTree ( int  i)
inline

Return a pointer to a tree in the forest.

Parameters
iIndex (0-based) of a tree to be queried
Returns
Tree*

◆ ResetTree()

void StochTree::TreeEnsemble::ResetTree ( int  i)
inline

Reset a single tree in an ensemble.

Parameters
iIndex (0-based) of the tree to be reset

◆ ResetInitTree()

void StochTree::TreeEnsemble::ResetInitTree ( int  i)
inline

Reset a single tree in an ensemble.

Parameters
iIndex (0-based) of the tree to be reset

◆ CloneFromExistingTree()

void StochTree::TreeEnsemble::CloneFromExistingTree ( int  i,
Tree tree 
)
inline

Clone a single tree in an ensemble from an existing tree, overwriting current tree.

Parameters
iIndex of the tree to be overwritten
treePointer to tree used to clone tree i

◆ ReconstituteFromForest()

void StochTree::TreeEnsemble::ReconstituteFromForest ( TreeEnsemble ensemble)
inline

Reset an ensemble to clone another ensemble.

Parameters
ensembleReference to an existing TreeEnsemble

◆ PredictLeafIndicesInplace() [1/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace ( ForestDataset dataset,
std::vector< int32_t > &  output,
int  num_trees,
data_size_t  n 
)
inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees()

Parameters
ForestDatasetDataset with which to predict leaf indices from the tree
outputVector of length num_trees*n which stores the leaf node prediction
num_treesNumber of trees in an ensemble
nSize of dataset

◆ PredictLeafIndicesInplace() [2/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace ( Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &  covariates,
std::vector< int32_t > &  output,
int  num_trees,
data_size_t  n 
)
inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees()

Parameters
covariatesMatrix of covariates
outputVector of length num_trees*n which stores the leaf node prediction
num_treesNumber of trees in an ensemble
nSize of dataset

◆ PredictLeafIndicesInplace() [3/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace ( Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &  covariates,
Eigen::Map< Eigen::Matrix< int, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &  output,
int  column_ind,
int  num_trees,
data_size_t  n 
)
inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a matrix of column indices with num_trees*n rows and as many columns as forests that were requested from R / Python

Parameters
covariatesMatrix of covariates
outputMatrix with num_trees*n rows and as many columns as forests that were requested from R / Python
column_indIndex of column in output into which the result should be unpacked
num_treesNumber of trees in an ensemble
nSize of dataset

◆ PredictLeafIndicesInplace() [4/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace ( Eigen::MatrixXd &  covariates,
std::vector< int32_t > &  output,
int  num_trees,
data_size_t  n 
)
inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees()

Parameters
ForestDatasetDataset with which to predict leaf indices from the tree
outputVector of length num_trees*n which stores the leaf node prediction
num_treesNumber of trees in an ensemble
nSize of dataset

◆ PredictLeafIndices()

std::vector< int32_t > StochTree::TreeEnsemble::PredictLeafIndices ( ForestDataset dataset)
inline

Same as PredictLeafIndicesInplace but assumes responsibility for allocating and returning output vector.

Parameters
ForestDatasetDataset with which to predict leaf indices from the tree

The documentation for this class was generated from the following file: