Utilies API#
stochtree.sampler.RNG
#
Wrapper around the C++ standard library random number generator. Accepts an optional random seed at initialization for replicability.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
random_seed
|
int
|
Random seed for replicability. If not specified, the default value of |
-1
|
stochtree.preprocessing.CovariateTransformer
#
Class that transforms covariates to a format that can be used to define tree splits. Modeled after the scikit-learn preprocessing classes.
fit(covariates)
#
Fits a CovariateTransformer
by unpacking (and storing) data type information on the input (raw) covariates
and then converting to a numpy array which can be passed to a tree ensemble sampler.
If covariates
is a pd.DataFrame
, column dtypes
will be handled as follows:
category
: one-hot encoded if unordered, ordinal encoded if orderedstring
: one-hot encodedboolean
: passed through as binary integer, treated as ordered categorical by tree samplers- integer (i.e.
Int8
,Int16
, etc...): passed through as double (note: if you have categorical data stored as integers, you should explicitly convert it to categorical in pandas, see this user guide) - float (i.e.
Float32
,Float64
): passed through as double object
: currently unsupported, convert object columns to numeric or categorical before passing- Datetime (i.e.
datetime64
): currently unsupported, though datetime columns can be converted to numeric features, see here - Period (i.e.
period[<freq>]
): currently unsupported, though period columns can be converted to numeric features, see here - Interval (i.e.
interval
,Interval[datetime64[ns]]
): currently unsupported, though interval columns can be converted to numeric or categorical features, see here - Sparse (i.e.
Sparse
,Sparse[float]
): currently unsupported, convert sparse columns to dense before passing
Columns with unsupported types will be ignored, with a warning.
If covariates
is a np.array
, columns must be numeric and the only preprocessing done by CovariateTransformer.fit()
is to
auto-detect binary columns. All other integer-valued columns will be passed through to the tree sampler as (continuous) numeric data.
If you would like to treat integer-valued data as categorical, you can either convert your numpy array to a pandas dataframe and
explicitly tag such columns as ordered / unordered categorical, or preprocess manually using sklearn.preprocessing.OneHotEncoder
and sklearn.preprocessing.OrdinalEncoder
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
covariates
|
array or DataFrame
|
Covariates to be preprocessed. |
required |
transform(covariates)
#
Run a fitted a CovariateTransformer
on a new covariate set,
returning a numpy array of covariates preprocessed into a format needed
to sample or predict from a stochtree
ensemble.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
covariates
|
array or DataFrame
|
Covariates to be preprocessed. |
required |
Returns:
Type | Description |
---|---|
array
|
Numpy array of preprocessed covariates, with as many rows as in |
fit_transform(covariates)
#
Runs the fit()
and transform()
methods in sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
covariates
|
array or DataFrame
|
Covariates to be preprocessed. |
required |
Returns:
Type | Description |
---|---|
array
|
Numpy array of preprocessed covariates, with as many rows as in |
fetch_original_feature_indices()
#
Map features in a preprocessed covariate set back to the
original set of features provided to a CovariateTransformer
.
Returns:
Type | Description |
---|---|
list
|
List with as many entries as features in the preprocessed results
returned by a fitted |
stochtree.serialization.JSONSerializer
#
Class that handles serialization and deserialization of stochastic forest models
return_json_string()
#
Convert JSON object to in-memory string
Returns:
Type | Description |
---|---|
str
|
JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests |
load_from_json_string(json_string)
#
Parse in-memory JSON string to JsonCpp
object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_string
|
str
|
JSON string representing model metadata (hyperparameters), sampled parameters, and sampled forests |
required |
add_forest(forest_samples)
#
Adds a container of forest samples to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forest_samples
|
ForestContainer
|
Samples of a tree ensemble |
required |
add_scalar(field_name, field_value, subfolder_name=None)
#
Adds a scalar (numeric) value to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the numeric value will be stored |
required |
field_value
|
float
|
Numeric value to be stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
add_boolean(field_name, field_value, subfolder_name=None)
#
Adds a scalar (boolean) value to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the boolean value will be stored |
required |
field_value
|
bool
|
Boolean value to be stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
add_string(field_name, field_value, subfolder_name=None)
#
Adds a string to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the numeric value will be stored |
required |
field_value
|
str
|
String field to be stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
add_numeric_vector(field_name, field_vector, subfolder_name=None)
#
Adds a numeric vector (stored as a numpy array) to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the numeric vector will be stored |
required |
field_vector
|
array
|
Numpy array containing the vector to be stored in json. Should be one-dimensional. |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
add_string_vector(field_name, field_vector, subfolder_name=None)
#
Adds a list of strings to a json object as an array
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the string list will be stored |
required |
field_vector
|
list
|
Python list of strings containing the array to be stored in json |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
get_scalar(field_name, subfolder_name=None)
#
Retrieves a scalar (numeric) value from a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the numeric value is stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
get_boolean(field_name, subfolder_name=None)
#
Retrieves a scalar (boolean) value from a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the boolean value is stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
get_string(field_name, subfolder_name=None)
#
Retrieve a string to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the numeric value is stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
get_numeric_vector(field_name, subfolder_name=None)
#
Adds a string to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the numeric vector is stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
get_string_vector(field_name, subfolder_name=None)
#
Adds a string to a json object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
field_name
|
str
|
Name of the json field / label under which the string list is stored |
required |
subfolder_name
|
str
|
Name of "subfolder" under which |
None
|
get_forest_container(forest_str)
#
Converts a JSON string for a container of forests to a ForestContainer
object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forest_str
|
str
|
String containing the JSON representation of a |
required |
Returns:
Type | Description |
---|---|
ForestContainer
|
In-memory |