Class storing a "forest," or an ensemble of decision trees. More...

#include <ensemble.h>

Public Member Functions
	TreeEnsemble (int num_trees, int output_dimension=1, bool is_leaf_constant=true, bool is_exponentiated=false)
	Initialize a new TreeEnsemble.

	TreeEnsemble (TreeEnsemble &ensemble)
	Initialize an ensemble based on the state of an existing ensemble.

void	MergeForest (TreeEnsemble &ensemble)
	Combine two forests into a single forest by merging their trees.

void	AddValueToLeaves (double constant_value)
	Add a constant value to every leaf of every tree in an ensemble. If leaves are multi-dimensional, `constant_value` will be added to every dimension of the leaves.

void	MultiplyLeavesByValue (double constant_multiple)
	Multiply every leaf of every tree by a constant value. If leaves are multi-dimensional, `constant_multiple` will be multiplied through every dimension of the leaves.

Tree *	GetTree (int i)
	Return a pointer to a tree in the forest.

void	ResetRoot ()
	Reset a `TreeEnsemble` to all single-node "root" trees.

void	ResetTree (int i)
	Reset a single tree in an ensemble.

void	ResetInitTree (int i)
	Reset a single tree in an ensemble.

void	CloneFromExistingTree (int i, Tree *tree)
	Clone a single tree in an ensemble from an existing tree, overwriting current tree.

void	ReconstituteFromForest (TreeEnsemble &ensemble)
	Reset an ensemble to clone another ensemble.

int	GetMaxLeafIndex ()
	Obtain a 0-based "maximum" leaf index for an ensemble, which is equivalent to the sum of the number of leaves in each tree. This is used in conjunction with `PredictLeafIndicesInplace`, which returns an observation-specific leaf index for every observation-tree pair.

void	PredictLeafIndicesInplace (ForestDataset *dataset, std::vector< int32_t > &output, int num_trees, data_size_t n)
	Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to `leaves_.size()-1`. We compute this at the tree-level and coordinate this computation at the ensemble level.

void	PredictLeafIndicesInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, std::vector< int32_t > &output, int num_trees, data_size_t n)
	Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to `leaves_.size()-1`. We compute this at the tree-level and coordinate this computation at the ensemble level.

void	PredictLeafIndicesInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, Eigen::Map< Eigen::Matrix< int, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &output, int column_ind, int num_trees, data_size_t n)
	Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to `leaves_.size()-1`. We compute this at the tree-level and coordinate this computation at the ensemble level.

void	PredictLeafIndicesInplace (Eigen::MatrixXd &covariates, std::vector< int32_t > &output, int num_trees, data_size_t n)
	Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to `leaves_.size()-1`. We compute this at the tree-level and coordinate this computation at the ensemble level.

std::vector< int32_t >	PredictLeafIndices (ForestDataset *dataset)
	Same as `PredictLeafIndicesInplace` but assumes responsibility for allocating and returning output vector.

json	to_json ()
	Save to JSON.

void	from_json (const json &ensemble_json)
	Load from JSON.

Detailed Description

Class storing a "forest," or an ensemble of decision trees.

Constructor & Destructor Documentation

◆ TreeEnsemble() [1/2]

StochTree::TreeEnsemble::TreeEnsemble	(	int	num_trees,
		int	output_dimension = `1`,
		bool	is_leaf_constant = `true`,
		bool	is_exponentiated = `false`
	)

inline

Initialize a new TreeEnsemble.

Parameters

num_trees	Number of trees in a forest
output_dimension	Dimension of the leaf node parameter
is_leaf_constant	Whether or not the leaves of each tree are treated as "constant." If true, then predicting from an ensemble is simply a matter or determining which leaf node an observation falls into. If false, prediction will multiply a leaf node's parameter(s) for a given observation by a basis vector.
is_exponentiated	Whether or not the leaves of each tree are stored in log scale. If true, leaf predictions are exponentiated before their prediction is returned.

◆ TreeEnsemble() [2/2]

StochTree::TreeEnsemble::TreeEnsemble ( TreeEnsemble & ensemble )

inline

Initialize an ensemble based on the state of an existing ensemble.

Parameters

ensemble TreeEnsemble used to initialize the current ensemble

Member Function Documentation

◆ MergeForest()

void StochTree::TreeEnsemble::MergeForest ( TreeEnsemble & ensemble )

inline

Combine two forests into a single forest by merging their trees.

Parameters

ensemble Reference to another TreeEnsemble that will be merged into the current ensemble

◆ AddValueToLeaves()

void StochTree::TreeEnsemble::AddValueToLeaves ( double constant_value )

inline

Add a constant value to every leaf of every tree in an ensemble. If leaves are multi-dimensional, constant_value will be added to every dimension of the leaves.

Parameters

constant_value Value that will be added to every leaf of every tree

◆ MultiplyLeavesByValue()

void StochTree::TreeEnsemble::MultiplyLeavesByValue ( double constant_multiple )

inline

Multiply every leaf of every tree by a constant value. If leaves are multi-dimensional, constant_multiple will be multiplied through every dimension of the leaves.

Parameters

constant_multiple Value that will be multiplied by every leaf of every tree

◆ GetTree()

Tree * StochTree::TreeEnsemble::GetTree ( int i )

inline

Return a pointer to a tree in the forest.

Parameters

i	Index (0-based) of a tree to be queried

Returns: Tree*

◆ ResetTree()

void StochTree::TreeEnsemble::ResetTree ( int i )

inline

Reset a single tree in an ensemble.

Parameters

i	Index (0-based) of the tree to be reset

◆ ResetInitTree()

void StochTree::TreeEnsemble::ResetInitTree ( int i )

inline

Reset a single tree in an ensemble.

Parameters

i	Index (0-based) of the tree to be reset

◆ CloneFromExistingTree()

void StochTree::TreeEnsemble::CloneFromExistingTree	(	int	i,
		Tree *	tree
	)

inline

Clone a single tree in an ensemble from an existing tree, overwriting current tree.

Parameters

i	Index of the tree to be overwritten
tree	Pointer to tree used to clone tree `i`

◆ ReconstituteFromForest()

void StochTree::TreeEnsemble::ReconstituteFromForest ( TreeEnsemble & ensemble )

inline

Reset an ensemble to clone another ensemble.

Parameters

ensemble Reference to an existing TreeEnsemble

◆ PredictLeafIndicesInplace() [1/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace	(	ForestDataset *	dataset,
		std::vector< int32_t > &	output,
		int	num_trees,
		data_size_t	n
	)

inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees()

Parameters

ForestDataset	Dataset with which to predict leaf indices from the tree
output	Vector of length num_trees*n which stores the leaf node prediction
num_trees	Number of trees in an ensemble
n	Size of dataset

◆ PredictLeafIndicesInplace() [2/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace	(	Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &	covariates,
		std::vector< int32_t > &	output,
		int	num_trees,
		data_size_t	n
	)

inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees()

Parameters

covariates	Matrix of covariates
output	Vector of length num_trees*n which stores the leaf node prediction
num_trees	Number of trees in an ensemble
n	Size of dataset

◆ PredictLeafIndicesInplace() [3/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace	(	Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &	covariates,
		Eigen::Map< Eigen::Matrix< int, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &	output,
		int	column_ind,
		int	num_trees,
		data_size_t	n
	)

inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a matrix of column indices with num_trees*n rows and as many columns as forests that were requested from R / Python

Parameters

covariates	Matrix of covariates
output	Matrix with num_trees*n rows and as many columns as forests that were requested from R / Python
column_ind	Index of column in `output` into which the result should be unpacked
num_trees	Number of trees in an ensemble
n	Size of dataset

◆ PredictLeafIndicesInplace() [4/4]

void StochTree::TreeEnsemble::PredictLeafIndicesInplace	(	Eigen::MatrixXd &	covariates,
		std::vector< int32_t > &	output,
		int	num_trees,
		data_size_t	n
	)

inline

Obtain a 0-based leaf index for every tree in an ensemble and for each observation in a ForestDataset. Internally, trees are stored as essentially vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1. We compute this at the tree-level and coordinate this computation at the ensemble level.

Note: this assumes the creation of a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees()

Parameters

ForestDataset	Dataset with which to predict leaf indices from the tree
output	Vector of length num_trees*n which stores the leaf node prediction
num_trees	Number of trees in an ensemble
n	Size of dataset

◆ PredictLeafIndices()

std::vector< int32_t > StochTree::TreeEnsemble::PredictLeafIndices ( ForestDataset * dataset )

inline

Same as PredictLeafIndicesInplace but assumes responsibility for allocating and returning output vector.

Parameters

ForestDataset Dataset with which to predict leaf indices from the tree

The documentation for this class was generated from the following file:

include/stochtree/ensemble.h

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

◆ TreeEnsemble() [1/2]

◆ TreeEnsemble() [2/2]

Member Function Documentation

◆ MergeForest()

◆ AddValueToLeaves()

◆ MultiplyLeavesByValue()

◆ GetTree()

◆ ResetTree()

◆ ResetInitTree()

◆ CloneFromExistingTree()

◆ ReconstituteFromForest()

◆ PredictLeafIndicesInplace() [1/4]

◆ PredictLeafIndicesInplace() [2/4]

◆ PredictLeafIndicesInplace() [3/4]

◆ PredictLeafIndicesInplace() [4/4]

◆ PredictLeafIndices()