kernel.compute_forest_leaf_indices

kernel.compute_forest_leaf_indices(
    model_object,
    covariates,
    forest_type=None,
    propensity=None,
    forest_inds=None,
)

Compute and return a vector representation of a forest’s leaf predictions for every observation in a dataset.

The vector has a “tree-major” format that can be easily re-represented as as a CSR sparse matrix: elements are organized so that the first n elements correspond to leaf predictions for all n observations in a dataset for the first tree in an ensemble, the next n elements correspond to predictions for the second tree and so on. The “data” for each element corresponds to a uniquely mapped column index that corresponds to a single leaf of a single tree (i.e. if tree 1 has 3 leaves, its column indices range from 0 to 2, and then tree 2’s leaf indices begin at 3, etc…).

Parameters

Name	Type	Description	Default
model_object	BARTModel, BCFModel, or ForestContainer	Object corresponding to a BART / BCF model with at least one forest sample, or a low-level `ForestContainer` object.	required
covariates	np.array or pd.DataFrame	Covariates to use for prediction. Must have the same dimensions / column types as the data used to train a forest.	required
forest_type	str	Which forest to use from `model_object`. Valid inputs depend on the model type, and whether or not a given forest was sampled in that model. * BART * `'mean'`: `'mean'`: Extracts leaf indices for the mean forest * `'variance'`: Extracts leaf indices for the variance forest * BCF * `'prognostic'`: Extracts leaf indices for the prognostic forest * `'treatment'`: Extracts leaf indices for the treatment effect forest * `'variance'`: Extracts leaf indices for the variance forest * ForestContainer * `NULL`: It is not necessary to disambiguate when this function is called directly on a `ForestSamples` object. This is the default value of this	`None`
propensity	`np.array`	Optional test set propensities. Must be provided if propensities were provided when the model was sampled.	`None`
forest_inds	int or np.ndarray	Indices of the forest sample(s) for which to compute leaf indices. If not provided, this function will return leaf indices for every sample of a forest. This function uses 0-indexing, so the first forest sample corresponds to `forest_num = 0`, and so on.	`None`

Returns

Name	Type	Description
	Numpy array with dimensions `num_obs` by `num_trees`, where `num_obs` is the number of rows in `covariates` and `num_trees` is the number of trees in the relevant forest of `model_object`.