tubular.nominal.OneHotEncodingTransformer

class tubular.nominal.OneHotEncodingTransformer(columns=None, separator='_', drop_original=False, copy=True, verbose=False, **kwargs)[source]

Bases: tubular.nominal.BaseNominalTransformer, sklearn.preprocessing._encoders.OneHotEncoder

Transformer to convert cetegorical variables into dummy columns.

Extends the sklearn OneHotEncoder class to provide easy renaming of dummy columns.

Parameters
  • columns (str or list of strings, default = None) – Names of columns to transform

  • separator (str) – Used to create dummy column names, the name will take the format [categorical feature][separator][category level]

  • drop_original (bool, default = False) – Should original columns be dropped after creating dummy fields?

  • copy (bool, default = True) – Should X be copied prior to transform?

  • verbose (bool, default = True) – Should warnings/checkmarks get displayed?

  • **kwargs – Arbitrary keyword arguments passed onto sklearn OneHotEncoder.init method.

separator

Separator used in naming for dummy columns.

Type

str

drop_original

Should original columns be dropped after creating dummy fields?

Type

bool

__init__(columns=None, separator='_', drop_original=False, copy=True, verbose=False, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([columns, separator, …])

Initialize self.

check_is_fitted(attribute)

Check if particular attributes are on the object.

check_mappable_rows(X)

Method to check that all the rows to apply the transformer to are able to be mapped according to the values in the mappings dict.

check_weights_column(X, weights_column)

Helper method for validating weights column in dataframe.

classname()

Method that returns the name of the current class when called.

columns_check(X)

Method to check that the columns attribute is set and all values are present in X.

columns_set_or_check(X)

Function to check or set columns attribute.

fit(X[, y])

Gets list of levels for each column to be transformed.

fit_transform(X[, y])

Fit OneHotEncoder to X, then transform X.

get_feature_names([input_features])

DEPRECATED: get_feature_names is deprecated in 1.0 and will be removed in 1.2.

get_feature_names_out([input_features])

Get output feature names for transformation.

get_params([deep])

Get parameters for this estimator.

inverse_transform(X)

Convert the data back to the original representation.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Create new dummy columns from categorical fields.

check_is_fitted(attribute)

Check if particular attributes are on the object. This is useful to do before running transform to avoid trying to transform data without first running the fit method.

Wrapper for utils.validation.check_is_fitted function.

Parameters

attributes (List) – List of str values giving names of attribute to check exist on self.

check_mappable_rows(X)

Method to check that all the rows to apply the transformer to are able to be mapped according to the values in the mappings dict.

Raises

ValueError – If any of the rows in a column (c) to be mapped, could not be mapped according to the mapping dict in mappings[c].

static check_weights_column(X, weights_column)

Helper method for validating weights column in dataframe.

X (pd.DataFrame): df containing weight column weights_column (str): name of weight column

classname()

Method that returns the name of the current class when called.

columns_check(X)

Method to check that the columns attribute is set and all values are present in X.

Parameters

X (pd.DataFrame) – Data to check columns are in.

columns_set_or_check(X)

Function to check or set columns attribute.

If the columns attribute is None then set it to all object and category columns in X. Otherwise run the columns_check method.

Parameters

X (pd.DataFrame) – Data to check columns are in.

fit(X, y=None)[source]

Gets list of levels for each column to be transformed. This defines which dummy columns will be created in transform.

Parameters
  • X (pd.DataFrame) – Data to identify levels from.

  • y (None) – Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

fit_transform(X, y=None)

Fit OneHotEncoder to X, then transform X.

Equivalent to fit(X).transform(X) but more convenient.

Parameters
  • X (array-like of shape (n_samples, n_features)) – The data to encode.

  • y (None) – Ignored. This parameter exists only for compatibility with Pipeline.

Returns

X_out – Transformed input. If sparse=True, a sparse matrix will be returned.

Return type

{ndarray, sparse matrix} of shape (n_samples, n_encoded_features)

get_feature_names(input_features=None)

DEPRECATED: get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.

Return feature names for output features.

input_featureslist of str of shape (n_features,)

String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

output_feature_namesndarray of shape (n_output_features,)

Array of feature names.

get_feature_names_out(input_features=None)

Get output feature names for transformation.

Parameters

input_features (array-like of str or None, default=None) –

Input features.

  • If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then names are generated: [x0, x1, …, x(n_features_in_)].

  • If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.

Returns

feature_names_out – Transformed feature names.

Return type

ndarray of str objects

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

dict

inverse_transform(X)

Convert the data back to the original representation.

When unknown categories are encountered (all zeros in the one-hot encoding), None is used to represent this category. If the feature with the unknown category has a dropped category, the dropped category will be its inverse.

Parameters

X ({array-like, sparse matrix} of shape (n_samples, n_encoded_features)) – The transformed data.

Returns

X_tr – Inverse transformed array.

Return type

ndarray of shape (n_samples, n_features)

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

estimator instance

transform(X)[source]

Create new dummy columns from categorical fields.

Parameters

X (pd.DataFrame) – Data to apply one hot encoding to.

Returns

X_transformed – Transformed input X with dummy columns derived from categorical columns added. If drop_original = True then the original categorical columns that the dummies are created from will not be in the output X.

Return type

pd.DataFrame