Utilies API#

`stochtree.sampler.RNG` #

Wrapper around the C++ standard library random number generator. Accepts an optional random seed at initialization for replicability.

Parameters:

Name	Type	Description	Default
`random_seed`	`int`	Random seed for replicability. If not specified, the default value of `-1` triggers an initialization of the RNG based on std::random_device.	`-1`

`stochtree.preprocessing.CovariateTransformer` #

Class that transforms covariates to a format that can be used to define tree splits. Modeled after the scikit-learn preprocessing classes.

`fit(covariates)` #

Fits a CovariateTransformer by unpacking (and storing) data type information on the input (raw) covariates and then converting to a numpy array which can be passed to a tree ensemble sampler.

If covariates is a pd.DataFrame, column dtypes will be handled as follows:

category: one-hot encoded if unordered, ordinal encoded if ordered
string: one-hot encoded
boolean: passed through as binary integer, treated as ordered categorical by tree samplers
integer (i.e. Int8, Int16, etc...): passed through as double (note: if you have categorical data stored as integers, you should explicitly convert it to categorical in pandas, see this user guide)
float (i.e. Float32, Float64): passed through as double
object: currently unsupported, convert object columns to numeric or categorical before passing
Datetime (i.e. datetime64): currently unsupported, though datetime columns can be converted to numeric features, see here
Period (i.e. period[<freq>]): currently unsupported, though period columns can be converted to numeric features, see here
Interval (i.e. interval, Interval[datetime64[ns]]): currently unsupported, though interval columns can be converted to numeric or categorical features, see here
Sparse (i.e. Sparse, Sparse[float]): currently unsupported, convert sparse columns to dense before passing

Columns with unsupported types will be ignored, with a warning.

If covariates is a np.array, columns must be numeric and the only preprocessing done by CovariateTransformer.fit() is to auto-detect binary columns. All other integer-valued columns will be passed through to the tree sampler as (continuous) numeric data. If you would like to treat integer-valued data as categorical, you can either convert your numpy array to a pandas dataframe and explicitly tag such columns as ordered / unordered categorical, or preprocess manually using sklearn.preprocessing.OneHotEncoder and sklearn.preprocessing.OrdinalEncoder.

Parameters:

Name	Type	Description	Default
`covariates`	`array or DataFrame`	Covariates to be preprocessed.	required

`transform(covariates)` #

Run a fitted a CovariateTransformer on a new covariate set, returning a numpy array of covariates preprocessed into a format needed to sample or predict from a stochtree ensemble.

Parameters:

Name	Type	Description	Default
`covariates`	`array or DataFrame`	Covariates to be preprocessed.	required

Returns:

Type	Description
`array`	Numpy array of preprocessed covariates, with as many rows as in `covariates` and as many columns as were created during pre-processing (including one-hot encoding categorical features).

`fit_transform(covariates)` #

Runs the fit() and transform() methods in sequence.

Parameters:

Name	Type	Description	Default
`covariates`	`array or DataFrame`	Covariates to be preprocessed.	required

Returns:

Type	Description
`array`	Numpy array of preprocessed covariates, with as many rows as in `covariates` and as many columns as were created during pre-processing (including one-hot encoding categorical features).

`fetch_original_feature_indices()` #

Map features in a preprocessed covariate set back to the original set of features provided to a CovariateTransformer.

Returns:

Type Description

list

List with as many entries as features in the preprocessed results returned by a fitted CovariateTransformer. Each element is a feature index indicating the feature from which a given preprocessed feature was generated. If a single categorical feature were one-hot encoded into 5 binary features, this method would return a list [0,0,0,0,0]. If the transformer merely passes through k numeric features, this method would return a list [0,...,k-1].

`stochtree.serialization.JSONSerializer` #

Class that handles serialization and deserialization of stochastic forest models

`return_json_string()` #

Convert JSON object to in-memory string

Returns:

Type	Description
`str`	JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests

`load_from_json_string(json_string)` #

Parse in-memory JSON string to JsonCpp object

Parameters:

Name	Type	Description	Default
`json_string`	`str`	JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests	required

`add_forest(forest_samples)` #

Adds a container of forest samples to a json object

Parameters:

Name	Type	Description	Default
`forest_samples`	`ForestContainer`	Samples of a tree ensemble	required

`add_scalar(field_name, field_value, subfolder_name=None)` #

Adds a scalar (numeric) value to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the numeric value will be stored	required
`field_value`	`float`	Numeric value to be stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`add_boolean(field_name, field_value, subfolder_name=None)` #

Adds a scalar (boolean) value to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the boolean value will be stored	required
`field_value`	`bool`	Boolean value to be stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`add_string(field_name, field_value, subfolder_name=None)` #

Adds a string to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the numeric value will be stored	required
`field_value`	`str`	String field to be stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`add_numeric_vector(field_name, field_vector, subfolder_name=None)` #

Adds a numeric vector (stored as a numpy array) to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the numeric vector will be stored	required
`field_vector`	`array`	Numpy array containing the vector to be stored in json. Should be one-dimensional.	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`add_string_vector(field_name, field_vector, subfolder_name=None)` #

Adds a list of strings to a json object as an array

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the string list will be stored	required
`field_vector`	`list`	Python list of strings containing the array to be stored in json	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`get_scalar(field_name, subfolder_name=None)` #

Retrieves a scalar (numeric) value from a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the numeric value is stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` is stored in the json hierarchy	`None`

`get_boolean(field_name, subfolder_name=None)` #

Retrieves a scalar (boolean) value from a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the boolean value is stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` is stored in the json hierarchy	`None`

`get_string(field_name, subfolder_name=None)` #

Retrieve a string to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the numeric value is stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` is stored in the json hierarchy	`None`

`get_numeric_vector(field_name, subfolder_name=None)` #

Adds a string to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the numeric vector is stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`get_string_vector(field_name, subfolder_name=None)` #

Adds a string to a json object

Parameters:

Name	Type	Description	Default
`field_name`	`str`	Name of the json field / label under which the string list is stored	required
`subfolder_name`	`str`	Name of "subfolder" under which `field_name` to be stored in the json hierarchy	`None`

`get_forest_container(forest_str)` #

Converts a JSON string for a container of forests to a ForestContainer object.

Parameters:

Name	Type	Description	Default
`forest_str`	`str`	String containing the JSON representation of a `ForestContainer`	required

Returns:

Type	Description
`ForestContainer`	In-memory `ForestContainer` python object, created from JSON string

Utilies API#

stochtree.sampler.RNG #

stochtree.preprocessing.CovariateTransformer #

fit(covariates) #

transform(covariates) #

fit_transform(covariates) #

fetch_original_feature_indices() #

stochtree.serialization.JSONSerializer #

return_json_string() #

load_from_json_string(json_string) #

add_forest(forest_samples) #

add_scalar(field_name, field_value, subfolder_name=None) #

add_boolean(field_name, field_value, subfolder_name=None) #

add_string(field_name, field_value, subfolder_name=None) #

add_numeric_vector(field_name, field_vector, subfolder_name=None) #

add_string_vector(field_name, field_vector, subfolder_name=None) #

get_scalar(field_name, subfolder_name=None) #

get_boolean(field_name, subfolder_name=None) #

get_string(field_name, subfolder_name=None) #

get_numeric_vector(field_name, subfolder_name=None) #

get_string_vector(field_name, subfolder_name=None) #

get_forest_container(forest_str) #

`stochtree.sampler.RNG` #

`stochtree.preprocessing.CovariateTransformer` #

`fit(covariates)` #

`transform(covariates)` #

`fit_transform(covariates)` #

`fetch_original_feature_indices()` #

`stochtree.serialization.JSONSerializer` #

`return_json_string()` #

`load_from_json_string(json_string)` #

`add_forest(forest_samples)` #

`add_scalar(field_name, field_value, subfolder_name=None)` #

`add_boolean(field_name, field_value, subfolder_name=None)` #

`add_string(field_name, field_value, subfolder_name=None)` #

`add_numeric_vector(field_name, field_vector, subfolder_name=None)` #

`add_string_vector(field_name, field_vector, subfolder_name=None)` #

`get_scalar(field_name, subfolder_name=None)` #

`get_boolean(field_name, subfolder_name=None)` #

`get_string(field_name, subfolder_name=None)` #

`get_numeric_vector(field_name, subfolder_name=None)` #

`get_string_vector(field_name, subfolder_name=None)` #

`get_forest_container(forest_str)` #