Skip to content

Utilies API#

stochtree.sampler.RNG #

Wrapper around the C++ standard library random number generator. Accepts an optional random seed at initialization for replicability.


Name Type Description Default
random_seed int

Random seed for replicability. If not specified, the default value of -1 triggers an initialization of the RNG based on std::random_device.


stochtree.preprocessing.CovariateTransformer #

Class that transforms covariates to a format that can be used to define tree splits. Modeled after the scikit-learn preprocessing classes.

fit(covariates) #

Fits a CovariateTransformer by unpacking (and storing) data type information on the input (raw) covariates and then converting to a numpy array which can be passed to a tree ensemble sampler.

If covariates is a pd.DataFrame, column dtypes will be handled as follows:

  • category: one-hot encoded if unordered, ordinal encoded if ordered
  • string: one-hot encoded
  • boolean: passed through as binary integer, treated as ordered categorical by tree samplers
  • integer (i.e. Int8, Int16, etc...): passed through as double (note: if you have categorical data stored as integers, you should explicitly convert it to categorical in pandas, see this user guide)
  • float (i.e. Float32, Float64): passed through as double
  • object: currently unsupported, convert object columns to numeric or categorical before passing
  • Datetime (i.e. datetime64): currently unsupported, though datetime columns can be converted to numeric features, see here
  • Period (i.e. period[<freq>]): currently unsupported, though period columns can be converted to numeric features, see here
  • Interval (i.e. interval, Interval[datetime64[ns]]): currently unsupported, though interval columns can be converted to numeric or categorical features, see here
  • Sparse (i.e. Sparse, Sparse[float]): currently unsupported, convert sparse columns to dense before passing

Columns with unsupported types will be ignored, with a warning.

If covariates is a np.array, columns must be numeric and the only preprocessing done by is to auto-detect binary columns. All other integer-valued columns will be passed through to the tree sampler as (continuous) numeric data. If you would like to treat integer-valued data as categorical, you can either convert your numpy array to a pandas dataframe and explicitly tag such columns as ordered / unordered categorical, or preprocess manually using sklearn.preprocessing.OneHotEncoder and sklearn.preprocessing.OrdinalEncoder.


Name Type Description Default
covariates array or DataFrame

Covariates to be preprocessed.


transform(covariates) #

Run a fitted a CovariateTransformer on a new covariate set, returning a numpy array of covariates preprocessed into a format needed to sample or predict from a stochtree ensemble.


Name Type Description Default
covariates array or DataFrame

Covariates to be preprocessed.



Type Description

Numpy array of preprocessed covariates, with as many rows as in covariates and as many columns as were created during pre-processing (including one-hot encoding categorical features).

fit_transform(covariates) #

Runs the fit() and transform() methods in sequence.


Name Type Description Default
covariates array or DataFrame

Covariates to be preprocessed.



Type Description

Numpy array of preprocessed covariates, with as many rows as in covariates and as many columns as were created during pre-processing (including one-hot encoding categorical features).

fetch_original_feature_indices() #

Map features in a preprocessed covariate set back to the original set of features provided to a CovariateTransformer.


Type Description

List with as many entries as features in the preprocessed results returned by a fitted CovariateTransformer. Each element is a feature index indicating the feature from which a given preprocessed feature was generated. If a single categorical feature were one-hot encoded into 5 binary features, this method would return a list [0,0,0,0,0]. If the transformer merely passes through k numeric features, this method would return a list [0,...,k-1].

stochtree.serialization.JSONSerializer #

Class that handles serialization and deserialization of stochastic forest models

return_json_string() #

Convert JSON object to in-memory string


Type Description

JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests

load_from_json_string(json_string) #

Parse in-memory JSON string to JsonCpp object


Name Type Description Default
json_string str

JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests


add_forest(forest_samples) #

Adds a container of forest samples to a json object


Name Type Description Default
forest_samples ForestContainer

Samples of a tree ensemble


add_scalar(field_name, field_value, subfolder_name=None) #

Adds a scalar (numeric) value to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the numeric value will be stored

field_value float

Numeric value to be stored

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


add_boolean(field_name, field_value, subfolder_name=None) #

Adds a scalar (boolean) value to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the boolean value will be stored

field_value bool

Boolean value to be stored

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


add_string(field_name, field_value, subfolder_name=None) #

Adds a string to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the numeric value will be stored

field_value str

String field to be stored

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


add_numeric_vector(field_name, field_vector, subfolder_name=None) #

Adds a numeric vector (stored as a numpy array) to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the numeric vector will be stored

field_vector array

Numpy array containing the vector to be stored in json. Should be one-dimensional.

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


add_string_vector(field_name, field_vector, subfolder_name=None) #

Adds a list of strings to a json object as an array


Name Type Description Default
field_name str

Name of the json field / label under which the string list will be stored

field_vector list

Python list of strings containing the array to be stored in json

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


get_scalar(field_name, subfolder_name=None) #

Retrieves a scalar (numeric) value from a json object


Name Type Description Default
field_name str

Name of the json field / label under which the numeric value is stored

subfolder_name str

Name of "subfolder" under which field_name is stored in the json hierarchy


get_boolean(field_name, subfolder_name=None) #

Retrieves a scalar (boolean) value from a json object


Name Type Description Default
field_name str

Name of the json field / label under which the boolean value is stored

subfolder_name str

Name of "subfolder" under which field_name is stored in the json hierarchy


get_string(field_name, subfolder_name=None) #

Retrieve a string to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the numeric value is stored

subfolder_name str

Name of "subfolder" under which field_name is stored in the json hierarchy


get_numeric_vector(field_name, subfolder_name=None) #

Adds a string to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the numeric vector is stored

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


get_string_vector(field_name, subfolder_name=None) #

Adds a string to a json object


Name Type Description Default
field_name str

Name of the json field / label under which the string list is stored

subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy


get_forest_container(forest_str) #

Converts a JSON string for a container of forests to a ForestContainer object.


Name Type Description Default
forest_str str

String containing the JSON representation of a ForestContainer



Type Description

In-memory ForestContainer python object, created from JSON string