kernel.compute_forest_leaf_indices
kernel.compute_forest_leaf_indices(
model_object,
covariates,
forest_type=None,
propensity=None,
forest_inds=None,
)Compute and return a vector representation of a forest’s leaf predictions for every observation in a dataset.
The vector has a “tree-major” format that can be easily re-represented as as a CSR sparse matrix: elements are organized so that the first n elements correspond to leaf predictions for all n observations in a dataset for the first tree in an ensemble, the next n elements correspond to predictions for the second tree and so on. The “data” for each element corresponds to a uniquely mapped column index that corresponds to a single leaf of a single tree (i.e. if tree 1 has 3 leaves, its column indices range from 0 to 2, and then tree 2’s leaf indices begin at 3, etc…).
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| model_object | BARTModel, BCFModel, or ForestContainer | Object corresponding to a BART / BCF model with at least one forest sample, or a low-level ForestContainer object. |
required |
| covariates | np.array or pd.DataFrame | Covariates to use for prediction. Must have the same dimensions / column types as the data used to train a forest. | required |
| forest_type | str | Which forest to use from model_object. Valid inputs depend on the model type, and whether or not a given forest was sampled in that model. * BART * 'mean': 'mean': Extracts leaf indices for the mean forest * 'variance': Extracts leaf indices for the variance forest * BCF * 'prognostic': Extracts leaf indices for the prognostic forest * 'treatment': Extracts leaf indices for the treatment effect forest * 'variance': Extracts leaf indices for the variance forest * ForestContainer * NULL: It is not necessary to disambiguate when this function is called directly on a ForestSamples object. This is the default value of this |
None |
| propensity | np.array |
Optional test set propensities. Must be provided if propensities were provided when the model was sampled. | None |
| forest_inds | int or np.ndarray | Indices of the forest sample(s) for which to compute leaf indices. If not provided, this function will return leaf indices for every sample of a forest. This function uses 0-indexing, so the first forest sample corresponds to forest_num = 0, and so on. |
None |
Returns
| Name | Type | Description |
|---|---|---|
Numpy array with dimensions num_obs by num_trees, where num_obs is the number of rows in covariates and num_trees is the number of trees in the relevant forest of model_object. |