Skip to content

Utilies API#

stochtree.sampler.RNG #

Wrapper around the C++ standard library random number generator. Accepts an optional random seed at initialization for replicability.

Parameters:

Name Type Description Default
random_seed int

Random seed for replicability. If not specified, the default value of -1 triggers an initialization of the RNG based on std::random_device.

-1

stochtree.preprocessing.CovariateTransformer #

Class that transforms covariates to a format that can be used to define tree splits. Modeled after the scikit-learn preprocessing classes.

fit(covariates) #

Fits a CovariateTransformer by unpacking (and storing) data type information on the input (raw) covariates and then converting to a numpy array which can be passed to a tree ensemble sampler.

If covariates is a pd.DataFrame, column dtypes will be handled as follows:

  • category: one-hot encoded if unordered, ordinal encoded if ordered
  • string: one-hot encoded
  • boolean: passed through as binary integer, treated as ordered categorical by tree samplers
  • integer (i.e. Int8, Int16, etc...): passed through as double (note: if you have categorical data stored as integers, you should explicitly convert it to categorical in pandas, see this user guide)
  • float (i.e. Float32, Float64): passed through as double
  • object: currently unsupported, convert object columns to numeric or categorical before passing
  • Datetime (i.e. datetime64): currently unsupported, though datetime columns can be converted to numeric features, see here
  • Period (i.e. period[<freq>]): currently unsupported, though period columns can be converted to numeric features, see here
  • Interval (i.e. interval, Interval[datetime64[ns]]): currently unsupported, though interval columns can be converted to numeric or categorical features, see here
  • Sparse (i.e. Sparse, Sparse[float]): currently unsupported, convert sparse columns to dense before passing

Columns with unsupported types will be ignored, with a warning.

If covariates is a np.array, columns must be numeric and the only preprocessing done by CovariateTransformer.fit() is to auto-detect binary columns. All other integer-valued columns will be passed through to the tree sampler as (continuous) numeric data. If you would like to treat integer-valued data as categorical, you can either convert your numpy array to a pandas dataframe and explicitly tag such columns as ordered / unordered categorical, or preprocess manually using sklearn.preprocessing.OneHotEncoder and sklearn.preprocessing.OrdinalEncoder.

Parameters:

Name Type Description Default
covariates array or DataFrame

Covariates to be preprocessed.

required

transform(covariates) #

Run a fitted a CovariateTransformer on a new covariate set, returning a numpy array of covariates preprocessed into a format needed to sample or predict from a stochtree ensemble.

Parameters:

Name Type Description Default
covariates array or DataFrame

Covariates to be preprocessed.

required

Returns:

Type Description
array

Numpy array of preprocessed covariates, with as many rows as in covariates and as many columns as were created during pre-processing (including one-hot encoding categorical features).

fit_transform(covariates) #

Runs the fit() and transform() methods in sequence.

Parameters:

Name Type Description Default
covariates array or DataFrame

Covariates to be preprocessed.

required

Returns:

Type Description
array

Numpy array of preprocessed covariates, with as many rows as in covariates and as many columns as were created during pre-processing (including one-hot encoding categorical features).

fetch_original_feature_indices() #

Map features in a preprocessed covariate set back to the original set of features provided to a CovariateTransformer.

Returns:

Type Description
list

List with as many entries as features in the preprocessed results returned by a fitted CovariateTransformer. Each element is a feature index indicating the feature from which a given preprocessed feature was generated. If a single categorical feature were one-hot encoded into 5 binary features, this method would return a list [0,0,0,0,0]. If the transformer merely passes through k numeric features, this method would return a list [0,...,k-1].

stochtree.serialization.JSONSerializer #

Class that handles serialization and deserialization of stochastic forest models

return_json_string() #

Convert JSON object to in-memory string

Returns:

Type Description
str

JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests

load_from_json_string(json_string) #

Parse in-memory JSON string to JsonCpp object

Parameters:

Name Type Description Default
json_string str

JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests

required

add_forest(forest_samples) #

Adds a container of forest samples to a json object

Parameters:

Name Type Description Default
forest_samples ForestContainer

Samples of a tree ensemble

required

add_scalar(field_name, field_value, subfolder_name=None) #

Adds a scalar (numeric) value to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the numeric value will be stored

required
field_value float

Numeric value to be stored

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

add_boolean(field_name, field_value, subfolder_name=None) #

Adds a scalar (boolean) value to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the boolean value will be stored

required
field_value bool

Boolean value to be stored

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

add_string(field_name, field_value, subfolder_name=None) #

Adds a string to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the numeric value will be stored

required
field_value str

String field to be stored

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

add_numeric_vector(field_name, field_vector, subfolder_name=None) #

Adds a numeric vector (stored as a numpy array) to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the numeric vector will be stored

required
field_vector array

Numpy array containing the vector to be stored in json. Should be one-dimensional.

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

add_string_vector(field_name, field_vector, subfolder_name=None) #

Adds a list of strings to a json object as an array

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the string list will be stored

required
field_vector list

Python list of strings containing the array to be stored in json

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

get_scalar(field_name, subfolder_name=None) #

Retrieves a scalar (numeric) value from a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the numeric value is stored

required
subfolder_name str

Name of "subfolder" under which field_name is stored in the json hierarchy

None

get_boolean(field_name, subfolder_name=None) #

Retrieves a scalar (boolean) value from a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the boolean value is stored

required
subfolder_name str

Name of "subfolder" under which field_name is stored in the json hierarchy

None

get_string(field_name, subfolder_name=None) #

Retrieve a string to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the numeric value is stored

required
subfolder_name str

Name of "subfolder" under which field_name is stored in the json hierarchy

None

get_numeric_vector(field_name, subfolder_name=None) #

Adds a string to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the numeric vector is stored

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

get_string_vector(field_name, subfolder_name=None) #

Adds a string to a json object

Parameters:

Name Type Description Default
field_name str

Name of the json field / label under which the string list is stored

required
subfolder_name str

Name of "subfolder" under which field_name to be stored in the json hierarchy

None

get_forest_container(forest_str) #

Converts a JSON string for a container of forests to a ForestContainer object.

Parameters:

Name Type Description Default
forest_str str

String containing the JSON representation of a ForestContainer

required

Returns:

Type Description
ForestContainer

In-memory ForestContainer python object, created from JSON string