bart.BARTModel

bart.BARTModel()

Class that handles sampling, storage, and serialization of stochastic forest models for supervised learning. The class takes its name from Bayesian Additive Regression Trees, an MCMC sampler originally developed in Chipman, George, McCulloch (2010), but supports several sampling algorithms:

  • MCMC: The “classic” sampler defined in Chipman, George, McCulloch (2010). In order to run the MCMC sampler, set num_gfr = 0 (explained below) and then define a sampler according to several parameters:
    • num_burnin: the number of iterations to run before “retaining” samples for further analysis. These “burned in” samples are helpful for allowing a sampler to converge before retaining samples.
    • num_chains: the number of independent sequences of MCMC samples to generate (typically referred to in the literature as “chains”)
    • num_mcmc: the number of “retained” samples of the posterior distribution
    • keep_every: after a sampler has “burned in”, we will run the sampler for keep_every * num_mcmc iterations, retaining one of each keep_every iteration in a chain.
  • GFR (Grow-From-Root): A fast, greedy approximation of the BART MCMC sampling algorithm introduced in He and Hahn (2021). GFR sampler iterations are governed by the num_gfr parameter, and there are two primary ways to use this sampler:
    • Standalone: setting num_gfr > 0 and both num_burnin = 0 and num_mcmc = 0 will only run and retain GFR samples of the posterior. This is typically referred to as “XBART” (accelerated BART).
    • Initializer for MCMC: setting num_gfr > 0 and num_mcmc > 0 will use ensembles from the GFR algorithm to initialize num_chains independent MCMC BART samplers, which are run for num_mcmc iterations. This is typically referred to as “warm start BART”.

In addition to enabling multiple samplers, we support a broad set of models. First, note that the original BART model of Chipman, George, McCulloch (2010) is

\[\begin{equation*} \begin{aligned} y &= f(X) + \epsilon\\ f(X) &\sim \text{BART}(\cdot)\\ \epsilon &\sim N(0, \sigma^2)\\ \sigma^2 &\sim IG(\nu, \nu\lambda) \end{aligned} \end{equation*}\]

In words, there is a nonparametric mean function governed by a tree ensemble with a BART prior and an additive (mean-zero) Gaussian error term, whose variance is parameterized with an inverse gamma prior.

The BARTModel class supports the following extensions of this model:

  • Leaf Regression: Rather than letting f(X) define a standard decision tree ensemble, in which each tree uses X to partition the data and then serve up constant predictions, we allow for models f(X,Z) in which X and Z together define a partitioned linear model (X partitions the data and Z serves as the basis for regression models). This model can be run by specifying leaf_basis_train in the sample method.
  • Heteroskedasticity: Rather than define \(\epsilon\) parameterically, we can let a forest \(\sigma^2(X)\) model a conditional error variance function. This can be done by setting num_trees_variance > 0 in the params dictionary passed to the sample method.

Methods

Name Description
sample Runs a BART sampler on provided training set. Predictions will be cached for the training set and (if provided) the test set.
predict Return predictions from every forest sampled (either / both of mean and variance).
compute_contrast Compute a contrast using a BART model by making two sets of outcome predictions and taking their
compute_posterior_interval Compute posterior credible intervals for specified terms from a fitted BART model. It supports intervals for mean functions, variance functions, random effects, and overall predictions.
sample_posterior_predictive Sample from the posterior predictive distribution for outcomes modeled by BART
to_json Converts a sampled BART model to JSON string representation (which can then be saved to a file or
from_json Converts a JSON string to an in-memory BART model.
from_json_string_list Convert a list of (in-memory) JSON strings that represent BART models to a single combined BART model object
is_sampled Whether or not a BART model has been sampled.
has_term Whether or not a model includes a term.
extract_parameter Extract a vector, matrix or array of parameter samples from a BART model by name.
summary Summarize a BART fit with a description of the model that was fit and numeric summaries of any sampled quantities

sample

bart.BARTModel.sample(
    X_train,
    y_train,
    leaf_basis_train=None,
    rfx_group_ids_train=None,
    rfx_basis_train=None,
    X_test=None,
    leaf_basis_test=None,
    rfx_group_ids_test=None,
    rfx_basis_test=None,
    num_gfr=5,
    num_burnin=0,
    num_mcmc=100,
    previous_model_json=None,
    previous_model_warmstart_sample_num=None,
    general_params=None,
    mean_forest_params=None,
    variance_forest_params=None,
    random_effects_params=None,
)

Runs a BART sampler on provided training set. Predictions will be cached for the training set and (if provided) the test set. Does not require a leaf regression basis.

Parameters

Name Type Description Default
X_train np.array Training set covariates on which trees are partitioned. required
y_train np.array Training set outcome. required
leaf_basis_train np.array Optional training set basis vector used to define a regression to be run in the leaves of each tree. None
rfx_group_ids_train np.array Optional group labels used for an additive random effects model. None
rfx_basis_train np.array Optional basis for “random-slope” regression in an additive random effects model. None
X_test np.array Optional test set covariates. None
leaf_basis_test np.array Optional test set basis vector used to define a regression to be run in the leaves of each tree. Must be included / omitted consistently (i.e. if leaf_basis_train is provided, then leaf_basis_test must be provided alongside X_test). None
rfx_group_ids_test np.array Optional test set group labels used for an additive random effects model. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. None
rfx_basis_test np.array Optional test set basis for “random-slope” regression in additive random effects model. None
num_gfr int Number of “warm-start” iterations run using the grow-from-root algorithm (He and Hahn, 2021). Defaults to 5. 5
num_burnin int Number of “burn-in” iterations of the MCMC sampler. Defaults to 0. Ignored if num_gfr > 0. 0
num_mcmc int Number of “retained” iterations of the MCMC sampler. Defaults to 100. If this is set to 0, GFR (XBART) samples will be retained. 100
general_params dict Dictionary of general model parameters, each of which has a default value processed internally. See Notes for supported keys. None
mean_forest_params dict Dictionary of mean forest model parameters, each of which has a default value processed internally. See Notes for supported keys. None
variance_forest_params dict Dictionary of variance forest model parameters, each of which has a default value processed internally. See Notes for supported keys. None
random_effects_params dict Dictionary of random effects parameters, each of which has a default value processed internally. See Notes for supported keys. None
previous_model_json str JSON string containing a previous BART model. This can be used to “continue” a sampler interactively after inspecting the samples or to run parallel chains “warm-started” from existing forest samples. Defaults to None. None
previous_model_warmstart_sample_num int Sample number from previous_model_json that will be used to warmstart this BART sampler. Zero-indexed (so that the first sample is used for warm-start by setting previous_model_warmstart_sample_num = 0). Defaults to None. If num_chains in the general_params list is > 1, then each successive chain will be initialized from a different sample, counting backwards from previous_model_warmstart_sample_num. That is, if previous_model_warmstart_sample_num = 10 and num_chains = 4, then chain 1 will be initialized from sample 10, chain 2 from sample 9, chain 3 from sample 8, and chain 4 from sample 7. If previous_model_json is provided but previous_model_warmstart_sample_num is NULL, the last sample in the previous model will be used to initialize the first chain, counting backwards as noted before. If more chains are requested than there are samples in previous_model_json, a warning will be raised and only the last sample will be used. None

Returns

Name Type Description
self BARTModel Sampled BART Model.

Notes

general_params keys

  • cutpoint_grid_size (int): Maximum number of cutpoints to consider for each feature. Defaults to 100.
  • standardize (bool): Whether or not to standardize the outcome (and store the offset / scale in the model object). Defaults to True.
  • sample_sigma2_global (bool): Whether or not to update the sigma^2 global error variance parameter based on IG(sigma2_global_shape, sigma2_global_scale). Defaults to True.
  • sigma2_init (float): Starting value of global variance parameter. Set internally to the outcome variance (standardized if standardize = True) if not set here.
  • sigma2_global_shape (float): Shape parameter in the IG(sigma2_global_shape, sigma2_global_scale) global error variance model. Defaults to 0.
  • sigma2_global_scale (float): Scale parameter in the IG(sigma2_global_shape, sigma2_global_scale) global error variance model. Defaults to 0.
  • variable_weights (np.array): Numeric weights reflecting the relative probability of splitting on each variable. Does not need to sum to 1 but cannot be negative. Defaults to uniform over the columns of X_train if not provided.
  • random_seed (int): Integer parameterizing the C++ random number generator. If not specified, the C++ random number generator is seeded according to std::random_device.
  • keep_burnin (bool): Whether or not “burnin” samples should be included in predictions. Defaults to False. Ignored if num_mcmc == 0.
  • keep_gfr (bool): Whether or not “warm-start” / grow-from-root samples should be included in predictions. Defaults to False. Ignored if num_mcmc == 0.
  • keep_every (int): How many iterations of the burned-in MCMC sampler should be run before forests and parameters are retained. Defaults to 1. Setting keep_every = k for some k > 1 will “thin” the MCMC samples by retaining every k-th sample, rather than every sample. This can reduce the autocorrelation of the MCMC samples.
  • num_chains (int): How many independent MCMC chains should be sampled. If num_mcmc = 0, this is ignored. If num_gfr = 0, each chain is run from root for num_mcmc * keep_every + num_burnin iterations, with num_mcmc samples retained. If num_gfr > 0, each MCMC chain will be initialized from a separate GFR ensemble, with the requirement that num_gfr >= num_chains. Defaults to 1. When num_chains > 1, samples from all chains are stored consecutively (chain 1 first, then chain 2, etc.). See the multi-chain vignettes for details.
  • outcome_model (stochtree.OutcomeModel): An object of class OutcomeModel specifying the outcome model. Default: OutcomeModel(outcome="continuous", link="identity"). Pre-empts the deprecated probit_outcome_model parameter if specified.
  • probit_outcome_model (bool): Deprecated in favor of outcome_model. Whether or not the outcome should be modeled as explicitly binary via a probit link. If True, y must only contain the values 0 and 1. Default: False.
  • num_threads (int): Number of threads to use in the GFR and MCMC algorithms, as well as prediction. Defaults to 1 if OpenMP is unavailable, otherwise to the maximum number of available threads.

mean_forest_params keys

  • num_trees (int): Number of trees in the conditional mean model. Defaults to 200. If num_trees = 0, the conditional mean will not be modeled using a forest and sampling will only proceed if num_trees > 0 for the variance forest.
  • alpha (float): Prior probability of splitting for a tree of depth 0 in the conditional mean model. Tree split prior combines alpha and beta via alpha*(1+node_depth)^-beta. Defaults to 0.95.
  • beta (float): Exponent that decreases split probabilities for nodes of depth > 0 in the conditional mean model. Tree split prior combines alpha and beta via alpha*(1+node_depth)^-beta. Defaults to 2.
  • min_samples_leaf (int): Minimum allowable size of a leaf, in terms of training samples, in the conditional mean model. Defaults to 5.
  • max_depth (int): Maximum depth of any tree in the ensemble in the conditional mean model. Defaults to 10. Can be overridden with -1 to impose no depth limit.
  • sample_sigma2_leaf (bool): Whether or not to update the tau leaf scale variance parameter based on IG(sigma2_leaf_shape, sigma2_leaf_scale). Cannot currently be set to True if leaf_basis_train has more than one column. Defaults to False.
  • sigma2_leaf_init (float): Starting value of leaf node scale parameter. Calibrated internally as 1/num_trees if not set here.
  • sigma2_leaf_shape (float): Shape parameter in the IG(sigma2_leaf_shape, sigma2_leaf_scale) leaf node parameter variance model. Defaults to 3.
  • sigma2_leaf_scale (float): Scale parameter in the IG(sigma2_leaf_shape, sigma2_leaf_scale) leaf node parameter variance model. Calibrated internally as 0.5/num_trees if not set here.
  • keep_vars (list or np.array): Variable names or column indices to include in the mean forest. Defaults to None.
  • drop_vars (list or np.array): Variable names or column indices to exclude from the mean forest. Defaults to None. Ignored if keep_vars is also set.
  • num_features_subsample (int): How many features to subsample when growing each tree for the GFR algorithm. Defaults to the number of features in the training dataset.

variance_forest_params keys

  • num_trees (int): Number of trees in the conditional variance model. Defaults to 0. Variance is only modeled using a forest if num_trees > 0.
  • alpha (float): Prior probability of splitting for a tree of depth 0 in the conditional variance model. Tree split prior combines alpha and beta via alpha*(1+node_depth)^-beta. Defaults to 0.95.
  • beta (float): Exponent that decreases split probabilities for nodes of depth > 0 in the conditional variance model. Tree split prior combines alpha and beta via alpha*(1+node_depth)^-beta. Defaults to 2.
  • min_samples_leaf (int): Minimum allowable size of a leaf, in terms of training samples, in the conditional variance model. Defaults to 5.
  • max_depth (int): Maximum depth of any tree in the ensemble in the conditional variance model. Defaults to 10. Can be overridden with -1 to impose no depth limit.
  • leaf_prior_calibration_param (float): Hyperparameter used to calibrate the IG(var_forest_prior_shape, var_forest_prior_scale) conditional error variance model. Used to set var_forest_prior_shape = num_trees / leaf_prior_calibration_param^2 + 0.5 and var_forest_prior_scale = num_trees / leaf_prior_calibration_param^2 when those are not set directly. Defaults to 1.5.
  • var_forest_leaf_init (float): Starting value of root forest prediction in the heteroskedastic error variance model. Calibrated internally as np.log(0.6*np.var(y_train))/num_trees_variance if not set.
  • var_forest_prior_shape (float): Shape parameter in the IG(var_forest_prior_shape, var_forest_prior_scale) conditional error variance forest (only sampled if num_trees > 0). Calibrated internally as num_trees / leaf_prior_calibration_param^2 + 0.5 if not set here.
  • var_forest_prior_scale (float): Scale parameter in the IG(var_forest_prior_shape, var_forest_prior_scale) conditional error variance forest (only sampled if num_trees > 0). Calibrated internally as num_trees / leaf_prior_calibration_param^2 if not set here.
  • keep_vars (list or np.array): Variable names or column indices to include in the variance forest. Defaults to None.
  • drop_vars (list or np.array): Variable names or column indices to exclude from the variance forest. Defaults to None. Ignored if keep_vars is also set.
  • num_features_subsample (int): How many features to subsample when growing each tree for the GFR algorithm. Defaults to the number of features in the training dataset.

random_effects_params keys

  • model_spec (str): Specification of the random effects model. Options are "custom", "intercept_only", and "intercept_plus_treatment". If "custom", a user-provided basis must be passed through rfx_basis_train. If "intercept_only", a basis of all ones is dispatched internally. Default: "custom". If "intercept_only" is set, rfx_basis_train and rfx_basis_test are ignored.
  • working_parameter_prior_mean: Prior mean for the random effects “working parameter”. Default: None. Must be a 1D numpy array matching the number of random effects bases, or a scalar expanded to a vector.
  • group_parameter_prior_mean: Prior mean for the random effects “group parameters”. Default: None. Must be a 1D numpy array matching the number of random effects bases, or a scalar expanded to a vector.
  • working_parameter_prior_cov: Prior covariance matrix for the random effects “working parameter”. Default: None. Must be a square numpy matrix matching the number of random effects bases, or a scalar expanded to a diagonal matrix.
  • group_parameter_prior_cov: Prior covariance matrix for the random effects “group parameters”. Default: None. Must be a square numpy matrix matching the number of random effects bases, or a scalar expanded to a diagonal matrix.
  • variance_prior_shape (float): Shape parameter for the inverse-gamma prior on the variance of the random effects “group parameter”. Default: 1.
  • variance_prior_scale (float): Scale parameter for the inverse-gamma prior on the variance of the random effects “group parameter”. Default: 1.

predict

bart.BARTModel.predict(
    X,
    leaf_basis=None,
    rfx_group_ids=None,
    rfx_basis=None,
    type='posterior',
    terms='all',
    scale='linear',
)

Return predictions from every forest sampled (either / both of mean and variance). Return type is either a single array of predictions, if a BART model only includes a mean or variance term, or a tuple of prediction arrays, if a BART model includes both.

Parameters

Name Type Description Default
X np.array Test set covariates. required
leaf_basis np.array Optional test set basis vector, must be provided if the model was trained with a leaf regression basis. None
rfx_group_ids np.array Optional group labels used for an additive random effects model. None
rfx_basis np.array Optional basis for “random-slope” regression in an additive random effects model. None
type str Type of prediction to return. Options are “mean”, which averages the predictions from every draw of a BART model, and “posterior”, which returns the entire matrix of posterior predictions. Default: “posterior”. 'posterior'
terms str Which model terms to include in the prediction. This can be a single term or a list of model terms. Options include “y_hat”, “mean_forest”, “rfx”, “variance_forest”, or “all”. If a model doesn’t have mean forest, random effects, or variance forest predictions, but one of those terms is request, the request will simply be ignored. If none of the requested terms are present in a model, this function will return NULL along with a warning. Default: “all”. 'all'
scale str Scale of mean function predictions. Options are “linear”, which returns predictions on the original scale of the mean forest / RFX terms, “probability”, which transforms predictions into category probabilities, and “class”, which returns the predicted class label. “probability” and “class” are only valid for models fit with a probit or cloglog outcome model. Default: “linear”. 'linear'

Returns

Name Type Description
Dict of numpy arrays for each prediction term, or a simple numpy array if a single term is requested.

compute_contrast

bart.BARTModel.compute_contrast(
    X_0,
    X_1,
    leaf_basis_0=None,
    leaf_basis_1=None,
    rfx_group_ids_0=None,
    rfx_group_ids_1=None,
    rfx_basis_0=None,
    rfx_basis_1=None,
    type='posterior',
    scale='linear',
)

Compute a contrast using a BART model by making two sets of outcome predictions and taking their difference. This function provides the flexibility to compute any contrast of interest by specifying covariates, leaf basis, and random effects bases / IDs for both sides of a two term contrast. For simplicity, we refer to the subtrahend of the contrast as the “control” or Y0 term and the minuend of the contrast as the Y1 term, though the requested contrast need not match the “control vs treatment” terminology of a classic two-treatment causal inference problem. We mirror the function calls and terminology of the predict.bartmodel function, labeling each prediction data term with a 1 to denote its contribution to the treatment prediction of a contrast and 0 to denote inclusion in the control prediction.

Parameters

Name Type Description Default
X_0 np.array or pd.DataFrame Covariates used for prediction in the “control” case. Must be a numpy array or dataframe. required
X_1 np.array or pd.DataFrame Covariates used for prediction in the “treatment” case. Must be a numpy array or dataframe. required
leaf_basis_0 np.array Bases used for prediction in the “control” case (by e.g. dot product with leaf values). None
leaf_basis_1 np.array Bases used for prediction in the “treatment” case (by e.g. dot product with leaf values). None
rfx_group_ids_0 np.array Test set group labels used for prediction from an additive random effects model in the “control” case. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. Must be a numpy array. None
rfx_group_ids_1 np.array Test set group labels used for prediction from an additive random effects model in the “treatment” case. We do not currently support (but plan to in the near future), test set evaluation for group labels that were not in the training set. Must be a numpy array. None
rfx_basis_0 np.array Test set basis for used for prediction from an additive random effects model in the “control” case. None
rfx_basis_1 np.array Test set basis for used for prediction from an additive random effects model in the “treatment” case. None
type str Aggregation level of the contrast. Options are “mean”, which averages the contrast evaluations over every draw of a BART model, and “posterior”, which returns the entire matrix of posterior contrast estimates. Default: “posterior”. 'posterior'
scale str Scale of the contrast. Options are “linear”, which returns predictions on the original scale of the mean forest / RFX terms, and “probability”. scale = "probability" is only valid for models fit with a probit / cloglog link on binary or ordinal outcomes. For binary outcome models, scale = "probability" will return a contrast over the probability that y == 1. For ordinal outcome models, scale = "probability" will return contrasts over the “survival function” P(y > k) for k = 1, 2, ..., K-1 where K is the total number of categories. Default: “linear”. 'linear'

Returns

Name Type Description
Array, either 1d or 2d depending on whether type = "mean" or "posterior".

compute_posterior_interval

bart.BARTModel.compute_posterior_interval(
    X=None,
    leaf_basis=None,
    rfx_group_ids=None,
    rfx_basis=None,
    terms='all',
    level=0.95,
    scale='linear',
)

Compute posterior credible intervals for specified terms from a fitted BART model. It supports intervals for mean functions, variance functions, random effects, and overall predictions.

Parameters

Name Type Description Default
X np.array Optional array or data frame of covariates at which to compute the intervals. Required if the requested term depends on covariates (e.g., mean forest, variance forest, or overall predictions). None
leaf_basis np.array Optional array of basis function evaluations for mean forest models with regression defined in the leaves. Required for “leaf regression” models. None
rfx_group_ids np.array Optional vector of group IDs for random effects. Required if the requested term includes random effects. None
rfx_basis np.array Optional matrix of basis function evaluations for random effects. Required if the requested term includes random effects. None
terms str Character string specifying the model term(s) for which to compute intervals. Options for BART models are "mean_forest", "variance_forest", "rfx", "y_hat", or "all". Defaults to "all". 'all'
scale str Scale of mean function predictions. Options are “linear”, which returns predictions on the original scale of the mean forest / RFX terms, and “probability”. scale = "probability" is only valid for models fit with a probit / cloglog link on binary or ordinal outcomes. For binary outcome models, scale = "probability" will return an interval over the probability that y == 1. For ordinal outcome models, scale = "probability" will return intervals over the “survival function” P(y > k) for k = 1, 2, ..., K-1 where K is the total number of categories. Defaults to "linear". 'linear'
level float A numeric value between 0 and 1 specifying the credible interval level. Defaults to 0.95 for a 95% credible interval. 0.95

Returns

Name Type Description
dict A dict containing the lower and upper bounds of the credible interval for the specified term. If multiple terms are requested, a dict with intervals for each term is returned.

sample_posterior_predictive

bart.BARTModel.sample_posterior_predictive(
    X=None,
    leaf_basis=None,
    rfx_group_ids=None,
    rfx_basis=None,
    num_draws_per_sample=None,
)

Sample from the posterior predictive distribution for outcomes modeled by BART

Parameters

Name Type Description Default
X np.array An array or data frame of covariates at which to compute the intervals. Required if the BART model depends on covariates (e.g., contains a mean or variance forest). None
leaf_basis np.array An array of basis function evaluations for mean forest models with regression defined in the leaves. Required for “leaf regression” models. None
rfx_group_ids np.array An array of group IDs for random effects. Required if the BART model includes random effects. None
rfx_basis np.array An array of basis function evaluations for random effects. Required if the BART model includes random effects. None
num_draws_per_sample int The number of posterior predictive samples to draw for each posterior sample. Defaults to a heuristic based on the number of samples in a BART model (i.e. if the BART model has >1000 draws, we use 1 draw from the likelihood per sample, otherwise we upsample to ensure intervals are based on at least 1000 posterior predictive draws). None

Returns

Name Type Description
np.array A matrix of posterior predictive samples. If num_draws = 1.

to_json

bart.BARTModel.to_json()

Converts a sampled BART model to JSON string representation (which can then be saved to a file or processed using the json library)

Returns

Name Type Description
str JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests

from_json

bart.BARTModel.from_json(json_string)

Converts a JSON string to an in-memory BART model.

Parameters

Name Type Description Default
json_string str JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests required

from_json_string_list

bart.BARTModel.from_json_string_list(json_string_list)

Convert a list of (in-memory) JSON strings that represent BART models to a single combined BART model object which can be used for prediction, etc…

Parameters

Name Type Description Default
json_string_list list of str List of JSON strings which can be parsed to objects of type JSONSerializer containing Json representation of a BART model required

is_sampled

bart.BARTModel.is_sampled()

Whether or not a BART model has been sampled.

Returns

Name Type Description
bool True if a BART model has been sampled, False otherwise

has_term

bart.BARTModel.has_term(term)

Whether or not a model includes a term.

Parameters

Name Type Description Default
term str Character string specifying the model term to check for. Options for BART models are "mean_forest", "variance_forest", "rfx", "y_hat", or "all". required

Returns

Name Type Description
bool True if the model includes the specified term, False otherwise

extract_parameter

bart.BARTModel.extract_parameter(term)

Extract a vector, matrix or array of parameter samples from a BART model by name. Random effects are handled by a separate extract_parameter_samples method attached to the underlying RandomEffectsContainer object due to the complexity of the random effects parameters. If the requested model term is not found, an error is thrown. The following conventions are used for parameter names:

  • Global error variance: "sigma2", "global_error_scale", "sigma2_global"
  • Leaf scale: "sigma2_leaf", "leaf_scale"
  • In-sample mean function predictions: "y_hat_train"
  • Test set mean function predictions: "y_hat_test"
  • In-sample variance forest predictions: "sigma2_x_train", "var_x_train"
  • Test set variance forest predictions: "sigma2_x_test", "var_x_test"
  • Ordinal model cutpoints (valid only for ordinal cloglog models): "cloglog_cutpoints", "cutpoints"

Parameters

Name Type Description Default
term str Name of the parameter to extract (e.g., "sigma2", "y_hat_train", etc.) required

Returns

Name Type Description
np.array Array of parameter samples. If the underlying parameter is a scalar, this will be a vector of length num_samples. If the underlying parameter is vector-valued, this will be (parameter_dimension x num_samples) matrix, and if the underlying parameter is multidimensional, this will be an array of dimension (parameter_dimension_1 x parameter_dimension_2 x … x num_samples).

summary

bart.BARTModel.summary()

Summarize a BART fit with a description of the model that was fit and numeric summaries of any sampled quantities

Prints summary directly to the console with no return type.

Returns

Name Type Description
None