tubular.nominal.OneHotEncodingTransformer¶
- class tubular.nominal.OneHotEncodingTransformer(columns=None, separator='_', drop_original=False, copy=True, verbose=False, **kwargs)[source]¶
Bases:
tubular.nominal.BaseNominalTransformer
,sklearn.preprocessing._encoders.OneHotEncoder
Transformer to convert cetegorical variables into dummy columns.
Extends the sklearn OneHotEncoder class to provide easy renaming of dummy columns.
- Parameters
columns (str or list of strings, default = None) – Names of columns to transform
separator (str) – Used to create dummy column names, the name will take the format [categorical feature][separator][category level]
drop_original (bool, default = False) – Should original columns be dropped after creating dummy fields?
copy (bool, default = True) – Should X be copied prior to transform?
verbose (bool, default = True) – Should warnings/checkmarks get displayed?
**kwargs – Arbitrary keyword arguments passed onto sklearn OneHotEncoder.init method.
- separator¶
Separator used in naming for dummy columns.
- Type
str
- drop_original¶
Should original columns be dropped after creating dummy fields?
- Type
bool
- __init__(columns=None, separator='_', drop_original=False, copy=True, verbose=False, **kwargs)[source]¶
Initialize self. See help(type(self)) for accurate signature.
Methods
__init__
([columns, separator, …])Initialize self.
check_is_fitted
(attribute)Check if particular attributes are on the object.
Method to check that all the rows to apply the transformer to are able to be mapped according to the values in the mappings dict.
check_weights_column
(X, weights_column)Helper method for validating weights column in dataframe.
Method that returns the name of the current class when called.
Method to check that the columns attribute is set and all values are present in X.
Function to check or set columns attribute.
fit
(X[, y])Gets list of levels for each column to be transformed.
fit_transform
(X[, y])Fit OneHotEncoder to X, then transform X.
get_feature_names
([input_features])DEPRECATED: get_feature_names is deprecated in 1.0 and will be removed in 1.2.
get_feature_names_out
([input_features])Get output feature names for transformation.
get_params
([deep])Get parameters for this estimator.
Convert the data back to the original representation.
set_params
(**params)Set the parameters of this estimator.
transform
(X)Create new dummy columns from categorical fields.
- check_is_fitted(attribute)¶
Check if particular attributes are on the object. This is useful to do before running transform to avoid trying to transform data without first running the fit method.
Wrapper for utils.validation.check_is_fitted function.
- Parameters
attributes (List) – List of str values giving names of attribute to check exist on self.
- check_mappable_rows(X)¶
Method to check that all the rows to apply the transformer to are able to be mapped according to the values in the mappings dict.
- Raises
ValueError – If any of the rows in a column (c) to be mapped, could not be mapped according to the mapping dict in mappings[c].
- static check_weights_column(X, weights_column)¶
Helper method for validating weights column in dataframe.
X (pd.DataFrame): df containing weight column weights_column (str): name of weight column
- classname()¶
Method that returns the name of the current class when called.
- columns_check(X)¶
Method to check that the columns attribute is set and all values are present in X.
- Parameters
X (pd.DataFrame) – Data to check columns are in.
- columns_set_or_check(X)¶
Function to check or set columns attribute.
If the columns attribute is None then set it to all object and category columns in X. Otherwise run the columns_check method.
- Parameters
X (pd.DataFrame) – Data to check columns are in.
- fit(X, y=None)[source]¶
Gets list of levels for each column to be transformed. This defines which dummy columns will be created in transform.
- Parameters
X (pd.DataFrame) – Data to identify levels from.
y (None) – Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.
- fit_transform(X, y=None)¶
Fit OneHotEncoder to X, then transform X.
Equivalent to fit(X).transform(X) but more convenient.
- Parameters
X (array-like of shape (n_samples, n_features)) – The data to encode.
y (None) – Ignored. This parameter exists only for compatibility with
Pipeline
.
- Returns
X_out – Transformed input. If sparse=True, a sparse matrix will be returned.
- Return type
{ndarray, sparse matrix} of shape (n_samples, n_encoded_features)
- get_feature_names(input_features=None)¶
DEPRECATED: get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.
Return feature names for output features.
- input_featureslist of str of shape (n_features,)
String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- output_feature_namesndarray of shape (n_output_features,)
Array of feature names.
- get_feature_names_out(input_features=None)¶
Get output feature names for transformation.
- Parameters
input_features (array-like of str or None, default=None) –
Input features.
If input_features is None, then feature_names_in_ is used as feature names in. If feature_names_in_ is not defined, then names are generated: [x0, x1, …, x(n_features_in_)].
If input_features is an array-like, then input_features must match feature_names_in_ if feature_names_in_ is defined.
- Returns
feature_names_out – Transformed feature names.
- Return type
ndarray of str objects
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
dict
- inverse_transform(X)¶
Convert the data back to the original representation.
When unknown categories are encountered (all zeros in the one-hot encoding),
None
is used to represent this category. If the feature with the unknown category has a dropped category, the dropped category will be its inverse.- Parameters
X ({array-like, sparse matrix} of shape (n_samples, n_encoded_features)) – The transformed data.
- Returns
X_tr – Inverse transformed array.
- Return type
ndarray of shape (n_samples, n_features)
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
**params (dict) – Estimator parameters.
- Returns
self – Estimator instance.
- Return type
estimator instance
- transform(X)[source]¶
Create new dummy columns from categorical fields.
- Parameters
X (pd.DataFrame) – Data to apply one hot encoding to.
- Returns
X_transformed – Transformed input X with dummy columns derived from categorical columns added. If drop_original = True then the original categorical columns that the dummies are created from will not be in the output X.
- Return type
pd.DataFrame