StochTree 0.0.1
Loading...
Searching...
No Matches
Public Member Functions | List of all members
StochTree::ForestContainer Class Reference

Container of TreeEnsemble forest objects. This is the primary (in-memory) storage interface for multiple "samples" of a decision tree ensemble in stochtree. More...

#include <container.h>

Public Member Functions

 ForestContainer (int num_trees, int output_dimension=1, bool is_leaf_constant=true, bool is_exponentiated=false)
 Construct a new ForestContainer object.
 
 ForestContainer (int num_samples, int num_trees, int output_dimension=1, bool is_leaf_constant=true, bool is_exponentiated=false)
 Construct a new ForestContainer object.
 
void DeleteSample (int sample_num)
 Remove a forest from a container of forest samples and delete the corresponding object, freeing its memory.
 
void AddSample (TreeEnsemble &forest)
 Add a new forest to the container by copying forest.
 
void InitializeRoot (double leaf_value)
 Initialize a "root" forest of univariate trees as the first element of the container, setting all root node values in every tree to leaf_value.
 
void InitializeRoot (std::vector< double > &leaf_vector)
 Initialize a "root" forest of multivariate trees as the first element of the container, setting all root node values in every tree to leaf_vector.
 
void AddSamples (int num_samples)
 Pre-allocate space for num_samples additional forests in the container.
 
void CopyFromPreviousSample (int new_sample_id, int previous_sample_id)
 Copy the forest stored at previous_sample_id to the forest stored at new_sample_id.
 
std::vector< double > Predict (ForestDataset &dataset)
 Predict from every forest in the container on every observation in the provided dataset. The resulting vector is "column-major", where every forest in a container defines the columns of a prediction matrix and every observation in the provided dataset defines the rows. The (i,j) element of this prediction matrix can be read from the j * num_rows + i element of the returned std::vector<double>, where num_rows is equal to the number of observations in dataset (i.e. dataset.NumObservations()).
 
std::vector< double > PredictRaw (ForestDataset &dataset)
 Predict from every forest in the container on every observation in the provided dataset. The resulting vector stores a possibly three-dimensional array, where the dimensions are arranged as follows.
 
nlohmann::json to_json ()
 Save to JSON.
 
void from_json (const nlohmann::json &forest_container_json)
 Load from JSON.
 
void append_from_json (const nlohmann::json &forest_container_json)
 Append to a forest container from JSON, requires that the ensemble already contains a nonzero number of forests.
 

Detailed Description

Container of TreeEnsemble forest objects. This is the primary (in-memory) storage interface for multiple "samples" of a decision tree ensemble in stochtree.

Constructor & Destructor Documentation

◆ ForestContainer() [1/2]

StochTree::ForestContainer::ForestContainer ( int  num_trees,
int  output_dimension = 1,
bool  is_leaf_constant = true,
bool  is_exponentiated = false 
)

Construct a new ForestContainer object.

Parameters
num_treesNumber of trees in each forest.
output_dimensionDimension of the leaf node parameter in each tree of each forest.
is_leaf_constantWhether or not the leaves of each tree are treated as "constant." If true, then predicting from an ensemble is simply a matter or determining which leaf node an observation falls into. If false, prediction will multiply a leaf node's parameter(s) for a given observation by a basis vector.
is_exponentiatedWhether or not the leaves of each tree are stored in log scale. If true, leaf predictions are exponentiated before their prediction is returned.

◆ ForestContainer() [2/2]

StochTree::ForestContainer::ForestContainer ( int  num_samples,
int  num_trees,
int  output_dimension = 1,
bool  is_leaf_constant = true,
bool  is_exponentiated = false 
)

Construct a new ForestContainer object.

Parameters
num_samplesInitial size of a container of forest samples.
num_treesNumber of trees in each forest.
output_dimensionDimension of the leaf node parameter in each tree of each forest.
is_leaf_constantWhether or not the leaves of each tree are treated as "constant." If true, then predicting from an ensemble is simply a matter or determining which leaf node an observation falls into. If false, prediction will multiply a leaf node's parameter(s) for a given observation by a basis vector.
is_exponentiatedWhether or not the leaves of each tree are stored in log scale. If true, leaf predictions are exponentiated before their prediction is returned.

Member Function Documentation

◆ DeleteSample()

void StochTree::ForestContainer::DeleteSample ( int  sample_num)

Remove a forest from a container of forest samples and delete the corresponding object, freeing its memory.

Parameters
sample_numIndex of forest to be deleted.

◆ AddSample()

void StochTree::ForestContainer::AddSample ( TreeEnsemble forest)

Add a new forest to the container by copying forest.

Parameters
forestForest to be copied and added to the container of retained forest samples.

◆ InitializeRoot() [1/2]

void StochTree::ForestContainer::InitializeRoot ( double  leaf_value)

Initialize a "root" forest of univariate trees as the first element of the container, setting all root node values in every tree to leaf_value.

Parameters
leaf_valueValue to assign to the root node of every tree.

◆ InitializeRoot() [2/2]

void StochTree::ForestContainer::InitializeRoot ( std::vector< double > &  leaf_vector)

Initialize a "root" forest of multivariate trees as the first element of the container, setting all root node values in every tree to leaf_vector.

Parameters
leaf_valueVector of values to assign to the root node of every tree.

◆ AddSamples()

void StochTree::ForestContainer::AddSamples ( int  num_samples)

Pre-allocate space for num_samples additional forests in the container.

Parameters
num_samplesNumber of (default-constructed) forests to allocated space for in the container.

◆ CopyFromPreviousSample()

void StochTree::ForestContainer::CopyFromPreviousSample ( int  new_sample_id,
int  previous_sample_id 
)

Copy the forest stored at previous_sample_id to the forest stored at new_sample_id.

Parameters
new_sample_idIndex of the new forest to be copied from an earlier sample.
previous_sample_idIndex of the previous forest to copy to new_sample_id.

◆ Predict()

std::vector< double > StochTree::ForestContainer::Predict ( ForestDataset dataset)

Predict from every forest in the container on every observation in the provided dataset. The resulting vector is "column-major", where every forest in a container defines the columns of a prediction matrix and every observation in the provided dataset defines the rows. The (i,j) element of this prediction matrix can be read from the j * num_rows + i element of the returned std::vector<double>, where num_rows is equal to the number of observations in dataset (i.e. dataset.NumObservations()).

Parameters
datasetData object containining training data, including covariates, leaf regression bases, and case weights.
Returns
std::vector<double> Vector of predictions for every forest in the container and every observation in dataset.

◆ PredictRaw()

std::vector< double > StochTree::ForestContainer::PredictRaw ( ForestDataset dataset)

Predict from every forest in the container on every observation in the provided dataset. The resulting vector stores a possibly three-dimensional array, where the dimensions are arranged as follows.

  1. Dimension of the leaf node's raw values (1 for GaussianConstantLeafModel, GaussianUnivariateRegressionLeafModel, and LogLinearVarianceLeafModel, >1 for GaussianMultivariateRegressionLeafModel)
  2. Observations in the provided dataset.
  3. Forest samples in the container.

If the leaf nodes have univariate values, then the "first dimension" is 1 and the resulting array has the exact same layout as in Predict.

Parameters
datasetData object containining training data, including covariates, leaf regression bases, and case weights.
Returns
std::vector<double> Vector of predictions for every forest in the container and every observation in dataset.

The documentation for this class was generated from the following file: