StochTree 0.0.1
Loading...
Searching...
No Matches
Public Member Functions | List of all members
StochTree::Tree Class Reference

Decision tree data structure. More...

#include <tree.h>

Public Member Functions

void CloneFromTree (Tree *tree)
 Copy the structure and parameters of another tree. If the Tree object calling this method already has a non-root tree structure / parameters, this will be erased and replaced with a copy of tree.
 
void Reset ()
 Reset tree to empty vectors and default values of boolean / integer variables.
 
void Init (int output_dimension=1, bool is_log_scale=false)
 Initialize the tree with a single root node.
 
int AllocNode ()
 Allocate a new node and return the node's ID.
 
void DeleteNode (std::int32_t nid)
 Deletes node indexed by node ID.
 
void ExpandNode (std::int32_t nid, int split_index, double split_value, double left_value, double right_value)
 Expand a node based on a numeric split rule.
 
void ExpandNode (std::int32_t nid, int split_index, std::vector< std::uint32_t > const &categorical_indices, double left_value, double right_value)
 Expand a node based on a categorical split rule.
 
void ExpandNode (std::int32_t nid, int split_index, double split_value, std::vector< double > left_value_vector, std::vector< double > right_value_vector)
 Expand a node based on a numeric split rule.
 
void ExpandNode (std::int32_t nid, int split_index, std::vector< std::uint32_t > const &categorical_indices, std::vector< double > left_value_vector, std::vector< double > right_value_vector)
 Expand a node based on a categorical split rule.
 
void ExpandNode (std::int32_t nid, int split_index, TreeSplit &split, double left_value, double right_value)
 Expand a node based on a generic split rule.
 
void ExpandNode (std::int32_t nid, int split_index, TreeSplit &split, std::vector< double > left_value_vector, std::vector< double > right_value_vector)
 Expand a node based on a generic split rule.
 
bool IsRoot ()
 Whether or not a tree is a "stump" consisting of a single root node.
 
json to_json ()
 Convert tree to JSON and return JSON in-memory.
 
void from_json (const json &tree_json)
 Load from JSON.
 
void CollapseToLeaf (std::int32_t nid, double value)
 Collapse an internal node to a leaf node, deleting its children from the tree.
 
void CollapseToLeaf (std::int32_t nid, std::vector< double > value_vector)
 Collapse an internal node to a leaf node, deleting its children from the tree.
 
template<typename Func >
void WalkTree (Func func) const
 Iterate through all nodes in this tree.
 
bool HasVectorOutput () const
 Whether or not a tree has vector output.
 
std::int32_t OutputDimension () const
 Dimension of tree output.
 
bool IsLogScale () const
 Whether or not tree parameters should be exponentiated at prediction time.
 
std::int32_t Parent (std::int32_t nid) const
 Index of the node's parent.
 
std::int32_t LeftChild (std::int32_t nid) const
 Index of the node's left child.
 
std::int32_t RightChild (std::int32_t nid) const
 Index of the node's right child.
 
std::int32_t DefaultChild (std::int32_t nid) const
 Index of the node's "default" child (potentially used in the case of a missing feature at prediction time)
 
std::int32_t SplitIndex (std::int32_t nid) const
 Feature index defining the node's split rule.
 
bool IsLeaf (std::int32_t nid) const
 Whether the node is a leaf node.
 
bool IsRoot (std::int32_t nid) const
 Whether the node is root.
 
bool IsDeleted (std::int32_t nid) const
 Whether the node has been deleted.
 
double LeafValue (std::int32_t nid) const
 Get parameter value of a node (typically though not necessarily a leaf node)
 
double LeafValue (std::int32_t nid, std::int32_t dim_id) const
 Get parameter value of a node (typically though not necessarily a leaf node) at a given output dimension.
 
std::int32_t MaxLeafDepth () const
 Get maximum depth of all of the leaf nodes.
 
std::vector< double > LeafVector (std::int32_t nid) const
 Get vector-valued parameters of a node (typically leaf)
 
double SumSquaredNodeValues (std::int32_t nid) const
 Sum of squared parameter values for a given node (typically though not necessarily a leaf node)
 
double SumSquaredLeafValues () const
 Sum of squared values for all leaves in a tree.
 
bool HasLeafVector (std::int32_t nid) const
 Tests whether the leaf node has a non-empty leaf vector.
 
double Threshold (std::int32_t nid) const
 Get split threshold of the node.
 
std::vector< std::uint32_t > CategoryList (std::int32_t nid) const
 Get list of all categories belonging to the left child node. Categories are integers ranging from 0 to (n-1), where n is the number of categories in that particular feature. This list is assumed to be in ascending order.
 
TreeNodeType NodeType (std::int32_t nid) const
 Get the type of a node (i.e. numeric split, categorical split, leaf)
 
bool IsNumericSplitNode (std::int32_t nid) const
 Whether the node is a numeric split node.
 
bool IsCategoricalSplitNode (std::int32_t nid) const
 Whether the node is a numeric split node.
 
bool HasCategoricalSplit () const
 Query whether this tree contains any categorical splits.
 
std::vector< std::int32_t > const & GetInternalNodes () const
 Get indices of all internal nodes.
 
std::vector< std::int32_t > const & GetLeaves () const
 Get indices of all leaf nodes.
 
std::vector< std::int32_t > const & GetLeafParents () const
 Get indices of all leaf parent nodes.
 
std::vector< std::int32_t > GetNodes ()
 Get indices of all valid (non-deleted) nodes.
 
std::int32_t GetDepth (std::int32_t nid) const
 Get the depth of a node.
 
std::int32_t NumNodes () const noexcept
 Get the total number of nodes including deleted ones in this tree.
 
std::int32_t NumDeletedNodes () const noexcept
 Get the total number of deleted nodes in this tree.
 
std::int32_t NumValidNodes () const noexcept
 Get the total number of valid nodes in this tree.
 
void SetLeftChild (std::int32_t nid, std::int32_t left_child)
 Identify left child node.
 
void SetRightChild (std::int32_t nid, std::int32_t right_child)
 Identify right child node.
 
void SetChildren (std::int32_t nid, std::int32_t left_child, std::int32_t right_child)
 Identify two child nodes of the node and the corresponding parent node of the child nodes.
 
void SetParent (std::int32_t child_node, std::int32_t parent_node)
 Identify parent node.
 
void SetParents (std::int32_t nid, std::int32_t left_child, std::int32_t right_child)
 Identify parent node of the left and right node ids.
 
void SetNumericSplit (std::int32_t nid, std::int32_t split_index, double threshold)
 Create a numerical split.
 
void SetCategoricalSplit (std::int32_t nid, std::int32_t split_index, std::vector< std::uint32_t > const &category_list)
 Create a categorical split.
 
void SetLeaf (std::int32_t nid, double value)
 Set the leaf value of the node.
 
void SetLeafVector (std::int32_t nid, std::vector< double > const &leaf_vector)
 Set the leaf vector of the node; useful for multi-output trees.
 
void PredictLeafIndexInplace (ForestDataset *dataset, std::vector< int32_t > &output, int32_t offset, int32_t max_leaf)
 Obtain a 0-based leaf index for each observation in a ForestDataset. Internally, trees are stored as vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1.
 
void PredictLeafIndexInplace (Eigen::MatrixXd &covariates, std::vector< int32_t > &output, int32_t offset, int32_t max_leaf)
 Obtain a 0-based leaf index for each observation in a ForestDataset. Internally, trees are stored as vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1.
 
void PredictLeafIndexInplace (Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &covariates, std::vector< int32_t > &output, int32_t offset, int32_t max_leaf)
 Obtain a 0-based leaf index for each observation in a ForestDataset. Internally, trees are stored as vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1.
 

Detailed Description

Decision tree data structure.

Member Function Documentation

◆ CloneFromTree()

void StochTree::Tree::CloneFromTree ( Tree tree)

Copy the structure and parameters of another tree. If the Tree object calling this method already has a non-root tree structure / parameters, this will be erased and replaced with a copy of tree.

Parameters
treeTree to be cloned

◆ from_json()

void StochTree::Tree::from_json ( const json &  tree_json)

Load from JSON.

Parameters
tree_jsonIn-memory json object (of type nlohmann::json)

◆ CollapseToLeaf() [1/2]

void StochTree::Tree::CollapseToLeaf ( std::int32_t  nid,
double  value 
)
inline

Collapse an internal node to a leaf node, deleting its children from the tree.

Parameters
nidNode id of the new leaf node
value_vectorNew leaf value

◆ CollapseToLeaf() [2/2]

void StochTree::Tree::CollapseToLeaf ( std::int32_t  nid,
std::vector< double >  value_vector 
)
inline

Collapse an internal node to a leaf node, deleting its children from the tree.

Parameters
nidNode id of the new leaf node
value_vectorNew leaf vector value

◆ WalkTree()

template<typename Func >
void StochTree::Tree::WalkTree ( Func  func) const
inline

Iterate through all nodes in this tree.

Template Parameters
FuncFunction object type, must map std::int32_t to bool.
Parameters
funcFunction that accepts a node index and returns False when iteration through a given branch of the tree should stop and True otherwise.

◆ HasVectorOutput()

bool StochTree::Tree::HasVectorOutput ( ) const
inline

Whether or not a tree has vector output.

Getters

◆ Parent()

std::int32_t StochTree::Tree::Parent ( std::int32_t  nid) const
inline

Index of the node's parent.

Parameters
nidID of node being queried

◆ LeftChild()

std::int32_t StochTree::Tree::LeftChild ( std::int32_t  nid) const
inline

Index of the node's left child.

Parameters
nidID of node being queried

◆ RightChild()

std::int32_t StochTree::Tree::RightChild ( std::int32_t  nid) const
inline

Index of the node's right child.

Parameters
nidID of node being queried

◆ DefaultChild()

std::int32_t StochTree::Tree::DefaultChild ( std::int32_t  nid) const
inline

Index of the node's "default" child (potentially used in the case of a missing feature at prediction time)

Parameters
nidID of node being queried

◆ SplitIndex()

std::int32_t StochTree::Tree::SplitIndex ( std::int32_t  nid) const
inline

Feature index defining the node's split rule.

Parameters
nidID of node being queried

◆ IsLeaf()

bool StochTree::Tree::IsLeaf ( std::int32_t  nid) const
inline

Whether the node is a leaf node.

Parameters
nidID of node being queried

◆ IsRoot()

bool StochTree::Tree::IsRoot ( std::int32_t  nid) const
inline

Whether the node is root.

Parameters
nidID of node being queried

◆ IsDeleted()

bool StochTree::Tree::IsDeleted ( std::int32_t  nid) const
inline

Whether the node has been deleted.

Parameters
nidID of node being queried

◆ LeafValue() [1/2]

double StochTree::Tree::LeafValue ( std::int32_t  nid) const
inline

Get parameter value of a node (typically though not necessarily a leaf node)

Parameters
nidID of node being queried

◆ LeafValue() [2/2]

double StochTree::Tree::LeafValue ( std::int32_t  nid,
std::int32_t  dim_id 
) const
inline

Get parameter value of a node (typically though not necessarily a leaf node) at a given output dimension.

Parameters
nidID of node being queried
dim_idOutput dimension being queried

◆ LeafVector()

std::vector< double > StochTree::Tree::LeafVector ( std::int32_t  nid) const
inline

Get vector-valued parameters of a node (typically leaf)

Parameters
nidID of node being queried

◆ SumSquaredNodeValues()

double StochTree::Tree::SumSquaredNodeValues ( std::int32_t  nid) const
inline

Sum of squared parameter values for a given node (typically though not necessarily a leaf node)

Parameters
nidID of node being queried

◆ HasLeafVector()

bool StochTree::Tree::HasLeafVector ( std::int32_t  nid) const
inline

Tests whether the leaf node has a non-empty leaf vector.

Parameters
nidID of node being queried

◆ Threshold()

double StochTree::Tree::Threshold ( std::int32_t  nid) const
inline

Get split threshold of the node.

Parameters
nidID of node being queried

◆ CategoryList()

std::vector< std::uint32_t > StochTree::Tree::CategoryList ( std::int32_t  nid) const
inline

Get list of all categories belonging to the left child node. Categories are integers ranging from 0 to (n-1), where n is the number of categories in that particular feature. This list is assumed to be in ascending order.

Parameters
nidID of node being queried

◆ NodeType()

TreeNodeType StochTree::Tree::NodeType ( std::int32_t  nid) const
inline

Get the type of a node (i.e. numeric split, categorical split, leaf)

Parameters
nidID of node being queried

◆ IsNumericSplitNode()

bool StochTree::Tree::IsNumericSplitNode ( std::int32_t  nid) const
inline

Whether the node is a numeric split node.

Parameters
nidID of node being queried

◆ IsCategoricalSplitNode()

bool StochTree::Tree::IsCategoricalSplitNode ( std::int32_t  nid) const
inline

Whether the node is a numeric split node.

Parameters
nidID of node being queried

◆ GetDepth()

std::int32_t StochTree::Tree::GetDepth ( std::int32_t  nid) const
inline

Get the depth of a node.

Parameters
nidnode id

◆ SetLeftChild()

void StochTree::Tree::SetLeftChild ( std::int32_t  nid,
std::int32_t  left_child 
)
inline

Identify left child node.

Setters

Parameters
nidID of node being modified
left_childID of the left child node

◆ SetRightChild()

void StochTree::Tree::SetRightChild ( std::int32_t  nid,
std::int32_t  right_child 
)
inline

Identify right child node.

Parameters
nidID of node being modified
right_childID of the right child node

◆ SetChildren()

void StochTree::Tree::SetChildren ( std::int32_t  nid,
std::int32_t  left_child,
std::int32_t  right_child 
)
inline

Identify two child nodes of the node and the corresponding parent node of the child nodes.

Parameters
nidID of node being modified
left_childID of the left child node
right_childID of the right child node

◆ SetParent()

void StochTree::Tree::SetParent ( std::int32_t  child_node,
std::int32_t  parent_node 
)
inline

Identify parent node.

Parameters
child_nodeID of child node
parent_nodeID of the parent node

◆ SetParents()

void StochTree::Tree::SetParents ( std::int32_t  nid,
std::int32_t  left_child,
std::int32_t  right_child 
)
inline

Identify parent node of the left and right node ids.

Parameters
nidID of parent node
left_childID of the left child node
right_childID of the right child node

◆ SetNumericSplit()

void StochTree::Tree::SetNumericSplit ( std::int32_t  nid,
std::int32_t  split_index,
double  threshold 
)

Create a numerical split.

Parameters
nidID of node being updated
split_indexFeature index to split
thresholdThreshold value

◆ SetCategoricalSplit()

void StochTree::Tree::SetCategoricalSplit ( std::int32_t  nid,
std::int32_t  split_index,
std::vector< std::uint32_t > const &  category_list 
)

Create a categorical split.

Parameters
nidID of node being updated
split_indexFeature index to split
category_listList of categories to belong to either the right child node or the left child node. Set categories_list_right_child parameter to indicate which node the category list should represent.

◆ SetLeaf()

void StochTree::Tree::SetLeaf ( std::int32_t  nid,
double  value 
)

Set the leaf value of the node.

Parameters
nidID of node being updated
valueLeaf value

◆ SetLeafVector()

void StochTree::Tree::SetLeafVector ( std::int32_t  nid,
std::vector< double > const &  leaf_vector 
)

Set the leaf vector of the node; useful for multi-output trees.

Parameters
nidID of node being updated
leaf_vectorLeaf vector

◆ PredictLeafIndexInplace() [1/3]

void StochTree::Tree::PredictLeafIndexInplace ( ForestDataset dataset,
std::vector< int32_t > &  output,
int32_t  offset,
int32_t  max_leaf 
)

Obtain a 0-based leaf index for each observation in a ForestDataset. Internally, trees are stored as vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1.

Note: this is a tree-level helper function for an ensemble-level function. It assumes the creation of:

  1. a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees(), stored in "tree-major" order
  2. a running counter of the number of tree-observations already indexed in the ensemble
    (used as offsets for the leaf number computed and returned here) Users running this function for a single tree may simply pre-allocate an output vector as std::vector<int32_t> output(dataset->NumObservations()) and set the offset to 0.
    Parameters
    datasetDataset with which to predict leaf indices from the tree
    outputPre-allocated output vector storing a matrix of column indices, with "rows" corresponding to observations in dataset and "columns" corresponding to trees in an ensemble
    offsetBookkeeping index that determines where in output vector that column indices should be unpacked
    max_leafLargest leaf value mapped so far. (Leaf indices serve as sparse column indices, so it is important that leaf values be unique to each tree.)

◆ PredictLeafIndexInplace() [2/3]

void StochTree::Tree::PredictLeafIndexInplace ( Eigen::MatrixXd &  covariates,
std::vector< int32_t > &  output,
int32_t  offset,
int32_t  max_leaf 
)

Obtain a 0-based leaf index for each observation in a ForestDataset. Internally, trees are stored as vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1.

Note: this is a tree-level helper function for an ensemble-level function. It assumes the creation of:

  1. a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees(), stored in "tree-major" order
  2. a running counter of the number of tree-observations already indexed in the ensemble
    (used as offsets for the leaf number computed and returned here) Users running this function for a single tree may simply pre-allocate an output vector as std::vector<int32_t> output(dataset->NumObservations()) and set the offset to 0.
    Parameters
    covariatesEigen matrix with which to predict leaf indices
    outputPre-allocated output vector storing a matrix of column indices, with "rows" corresponding to observations in covariates and "columns" corresponding to trees in an ensemble
    offsetBookkeeping index that determines where in output vector that column indices should be unpacked
    max_leafLargest leaf value mapped so far. (Leaf indices serve as sparse column indices, so it is important that leaf values be unique to each tree.)

◆ PredictLeafIndexInplace() [3/3]

void StochTree::Tree::PredictLeafIndexInplace ( Eigen::Map< Eigen::Matrix< double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor > > &  covariates,
std::vector< int32_t > &  output,
int32_t  offset,
int32_t  max_leaf 
)

Obtain a 0-based leaf index for each observation in a ForestDataset. Internally, trees are stored as vectors of node information, and the leaves_ vector gives us node IDs for every leaf in the tree. Here, we would like to know, for every observation in a dataset, which leaf number it is mapped to. Since the leaf numbers themselves do not carry any information, we renumber them from 0 to leaves_.size()-1.

Note: this is a tree-level helper function for an ensemble-level function. It assumes the creation of:

  1. a vector of column indices of size dataset.NumObservations() x ensemble.NumTrees(), stored in "tree-major" order
  2. a running counter of the number of tree-observations already indexed in the ensemble
    (used as offsets for the leaf number computed and returned here) Users running this function for a single tree may simply pre-allocate an output vector as std::vector<int32_t> output(dataset->NumObservations()) and set the offset to 0.
    Parameters
    covariatesEigen matrix with which to predict leaf indices
    outputPre-allocated output vector storing a matrix of column indices, with "rows" corresponding to observations in covariates and "columns" corresponding to trees in an ensemble
    offsetBookkeeping index that determines where in output vector that column indices should be unpacked
    max_leafLargest leaf value mapped so far. (Leaf indices serve as sparse column indices, so it is important that leaf values be unique to each tree.)

The documentation for this class was generated from the following file: