tubular package

Submodules

tubular.base module

Contains transformers that other transformers in the package inherit from.

These transformers contain key checks to be applied in all cases.

class tubular.base.BaseTransformer(columns: ]] | str, copy: bool = False, verbose: bool = False, return_native: bool = True)[source]

Bases: BaseEstimator, TransformerMixin

Base transformer class which all other transformers in the package inherit from.

Provides fit and transform methods (required by sklearn transformers), simple input checking and functionality to copy X prior to transform.

Attributes:

columnslist: Either a list of str values giving which columns in a input pandas.DataFrame the transformer will be applied to.
copybool: Should X be copied before transforms are applied? Copy argument no longer used and will be deprecated in a future release
verbosebool: Print statements to show which methods are being run or not.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
return_native: bool, default = True: Controls whether transformer returns narwhals or native pandas/polars type
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BaseTransformer( … columns=”a”, … ) BaseTransformer(columns=[‘a’])

```

FITS = True

check_is_fitted(attribute: str) → None[source]

Check if particular attributes are on the object.

This is useful to do before running transform to avoid trying to transform data without first running the fit method.

Wrapper for utils.validation.check_is_fitted function.

Parameters:: attribute (List) – List of str values giving names of attribute to check exist on self.

Example

```pycon >>> transformer = BaseTransformer( … columns=”a”, … )

>>> transformer.check_is_fitted("columns")

```

classname() → str[source]

Return the name of the current class when called.

Returns:: str
Return type:: name of class

columns_check(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) → None[source]

Check that the columns attribute is set and all values are present in X.

Parameters:: X (DataFrame) – Data to check columns are in.
Raises:: ValueError – if columns missing from dataframe:

Examples

```pycon >>> import polars as pl >>> transformer = BaseTransformer( … columns=”a”, … )

>>> df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})

>>> transformer.columns_check(df)

```

Check data before fit.

Fit calls the columns_check method which will check that the columns attribute is set and all values are present in X

Parameters:

X (DataFrame) – Data to fit the transformer on.
y (None or Series or LazyFrame, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

BaseTransformer

Return type:

returns self

Examples

```pycon >>> import polars as pl >>> transformer = BaseTransformer( … columns=”a”, … ) >>> df = pl.DataFrame({“a”: [1, 2], “b”: [3, 4]}) >>> transformer.fit(df) BaseTransformer(columns=[‘a’])

```

classmethod from_json(json: dict[str, Any]) → BaseTransformer[source]

Rebuild transformer from json dict, readyfor transform.

Parameters:: json (dict[str, dict[str, Any]]) – json-ified transformer
Returns:: reconstructed transformer class, ready for transform
Return type:: BaseTransformer
Raises:: RuntimeError – if transformer does not have to/from json: functionality enabled

Examples

```pycon >>> json_dict = {“init”: {“columns”: [“a”, “b”]}, “fit”: {}}

>>> BaseTransformer.from_json(json=json_dict)
BaseTransformer(columns=['a', 'b'])

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Child classes will need to overload this method if their behaviour is more complex than just returning the input columns.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = BaseTransformer( … columns=”a”, … )

>>> transformer.get_feature_names_out()
['a']

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

set_transform_request(*, return_native_override: bool | None | str = '$UNCHANGED$') → BaseTransformer

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: return_native_override (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_native_override parameter in transform.
Returns:: self – The updated object.
Return type:: object

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]
Raises:: RuntimeError – if transformer does not have to/from json functionality: enabled

Examples

```pycon >>> transformer = BaseTransformer(columns=[“a”, “b”])

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'BaseTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True}, 'fit': {'is_fitted_': False}}

```

Check data before child transform.

Transform calls the columns_check method which will check columns in columns attribute are in X.

Parameters:

X (DataFrame) – Data to transform with the transformer.
return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Input X, copied if specified by user.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = BaseTransformer( … columns=”a”, … )

>>> df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})

>>> transformer.transform(df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

```

class tubular.base.DataFrameMethodTransformer(**kwargs)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer that applies a pandas.DataFrame method.

Transformer assigns the output of the method to a new column or columns. It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.DataFrame method being called.

Be aware it is possible to supply incompatible arguments to init that will only be identified when transform is run. This is because there are many combinations of method, input and output sizes. Additionally some methods may only work as expected when called in transform with specific key word arguments.

new_column_names

The name of the column or columns to be assigned to the output of running the pandas method in transform.

Type:: str or list of str

pd_method_name

The name of the pandas.DataFrame method to call.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Transform input data.

Uses the given pandas.DataFrame method and assign the output back to column or columns in X.

Any keyword arguments set in the pd_method_kwargs attribute are passed onto the pandas DataFrame method when calling it.

Parameters:: X (pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional column or columns (self.new_column_names) added. These contain the output of running the pandas DataFrame method.
Return type:: pd.DataFrame

tubular.base.register(cls: BaseTransformer) → BaseTransformer[source]: Add transformer to registry dict.

Returns:

cls - transformer

Example:

```pycon >>> @register … class MyTransformer(BaseTransformer): … pass … >>> CLASS_REGISTRY[“MyTransformer”] <class ‘tubular.base.MyTransformer’>

```

tubular.capping module

Contains transformers that apply capping to numeric columns.

class tubular.capping.BaseCappingTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseNumericTransformer, WeightColumnMixin

Base class for capping transformers, contains functionality shared across capping transformer classes.

capping_values

Capping values to apply to each column, capping_values argument.

Type:: dict[str, CappingValues] or None

quantiles

Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.

Type:: dict[str, CappingValues] or None

quantile_capping_values

Capping values learned from quantiles (if provided) to apply to each column.

Type:: dict[str, CappingValues] or None

weights_column

weights_column argument.

Type:: str or None

_replacement_values

Replacement values when capping is applied. Will be a copy of capping_values.

Type:: dict[str, CappingValues]

built_from_json

Type:: bool

indicates if transformer was reconstructed from json, which limits it's supported

functionality to .transform

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

FITS = True

check_capping_values_dict(capping_values_dict: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]], dict_name: str) → None[source]

Check passed dictionary.

Parameters:

capping_values_dict (dict[str, float]) – dict of form {column_name: [lower_cap, upper_cap]}
dict_name (str) – ‘capping_values’ or ‘quantiles’

Raises:

ValueError – if capping values are invalid, e.g. lower_cap>upper_cap:

Examples

```pycon >>> transformer = BaseCappingTransformer( … capping_values={“a”: [10, 20], “b”: [1, 3]}, … )

>>> transformer.check_capping_values_dict(transformer.capping_values, "capping_values")

```

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the quantile_capping_values and replacement_values attributes.

Parameters:

X (DataFrame) – A dataframe with required columns to be capped.
y (Series or LazyFrame or None. Defaults to None) – Required for pipeline.

Returns:

BaseCappingTransformer

Return type:

fitted instance of class

Examples

```pycon >>> import polars as pl

>>> transformer = BaseCappingTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> test_target = pl.Series(name="target", values=[5, 6, 7, 8])

>>> transformer.fit(test_df, test_target)
BaseCappingTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[source]

Return a JSON-serializable representation of the transformer.

Return type:: dict

Dictionary containing all necessary attributes to recreate the transformer with from_json. Keys include ‘init’ (initialization parameters) and ‘fit’ (fitted values).

Apply capping to columns in X.

If cap_value_max is set, any values above cap_value_max will be set to cap_value_max. If cap_value_min is set any values below cap_value_min will be set to cap_value_min. Only works or numeric columns.

Parameters:

X (DataFrame) – Data to apply capping to.
return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Transformed input X with min and max capping applied to the specified columns.

Return type:

DataFrame

Calculate weighted quantiles.

This method is adapted from the “Completely vectorized numpy solution” answer from user Alleo (https://stackoverflow.com/users/498892/alleo) to the following stackoverflow question; https://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy. This method is also licenced under the CC-BY-SA terms, as the original code sample posted to stackoverflow (pre February 1, 2016) was.

Method is similar to numpy.percentile, but supports weights. Supplied quantiles should be in the range [0, 1]. Method calculates cumulative % of weight for each observation, then interpolates between these observations to calculate the desired quantiles. Null values in the observations (values) and 0 weight observations are filtered out before calculating.

Parameters:

X (DataFrame) – Dataframe with relevant columns to calculate quantiles from.
quantiles (list[Number]) – Weighted quantiles to calculate. Must all be between 0 and 1.
values_column (str) – name of relevant values column in data
weights_column (str) – name of relevant weight column in data

Returns:

interp_quantiles – List containing computed quantiles.

Return type:

list[Number]

Examples

```pycon >>> import polars as pl >>> x = CappingTransformer(capping_values={“a”: [2, 10]}) >>> df = pl.DataFrame({“a”: [1, 2, 3], “weight”: [1, 1, 1]}) >>> quantiles_to_compute = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] >>> computed_quantiles = x.weighted_quantile( … X=df, values_column=”a”, weights_column=”weight”, quantiles=quantiles_to_compute … ) >>> [round(q, 1) for q in computed_quantiles] [1.0, 1.0, 1.0, 1.0, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3.0]

>>> df = pl.DataFrame({"a": [1, 2, 3], "weight": [0, 1, 0]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=quantiles_to_compute
... )
>>> [round(q, 1) for q in computed_quantiles]
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]

>>> df = pl.DataFrame({"a": [1, 2, 3], "weight": [1, 1, 0]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=quantiles_to_compute
... )
>>> [round(q, 1) for q in computed_quantiles]
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]

>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5], "weight": [1, 1, 1, 1, 1]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=quantiles_to_compute
... )
>>> [round(q, 1) for q in computed_quantiles]
[1.0, 1.0, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]

>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5], "weight": [1, 0, 1, 0, 1]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=[0, 0.5, 1.0]
... )
>>> [round(q, 1) for q in computed_quantiles]
[1.0, 2.0, 5.0]

```

class tubular.capping.CappingTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to cap numeric values at both or either minimum and maximum values.

For max capping any values above the cap value will be set to the cap. Similarly for min capping any values below the cap will be set to the cap. Only works for numeric columns.

Attributes:

capping_valuesdict[str, CappingValues] or None: Capping values to apply to each column, capping_values argument.
quantilesdict[str, CappingValues] or None: Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.
quantile_capping_valuesdict[str, CappingValues] or None: Capping values learned from quantiles (if provided) to apply to each column.
weights_columnstr or None: weights_column argument.
_replacement_valuesdict[str, CappingValues]: Replacement values when capping is applied. Will be a copy of capping_values.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 10  ┆ 3   ┆ 1   │
│ 15  ┆ 2   ┆ 2   │
│ 18  ┆ 3   ┆ 3   │
│ 20  ┆ 1   ┆ 4   │
└─────┴─────┴─────┘

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'CappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}

>>> CappingTransformer.from_json(json_dump)
CappingTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:

X (DataFrame) – A dataframe with required columns to be capped.
y (None) – Required for pipeline.

Returns:

CappingTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.fit(test_df)
CappingTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.capping.OutOfRangeNullTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, dtype: ]] = 'Float64', **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to set values outside of a range to null.

This transformer sets the cut off values in the same way as the CappingTransformer. So either the user can specify them directly in the capping_values argument or they can be calculated in the fit method, if the user supplies the quantiles argument.

Attributes:

capping_valuesdict[str, CappingValues] or None: Capping values to apply to each column, capping_values argument.
quantilesdict[str, CappingValues] or None: Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.
quantile_capping_valuesdict[str, CappingValues] or None: Capping values learned from quantiles (if provided) to apply to each column.
weights_columnstr or None: weights_column argument.
_replacement_valuesdict[str, CappingValues]: Replacement values when capping is applied. This will contain nulls for each column.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> transformer
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

# transform method is inherited so also demo that here >>> test_df = pl.DataFrame()

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.transform(test_df)
shape: (4, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ f64  ┆ f64  ┆ i64 │
╞══════╪══════╪═════╡
│ null ┆ null ┆ 1   │
│ 15.0 ┆ 2.0  ┆ 2   │
│ 18.0 ┆ null ┆ 3   │
│ null ┆ 1.0  ┆ 4   │
└──────┴──────┴─────┘

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OutOfRangeNullTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}

>>> OutOfRangeNullTransformer.from_json(json_dump)
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:

X (DataFrame) – A dataframe with required columns to be capped.
y (None) – Required for pipeline.

Returns:

OutOfRangeNullTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.fit(test_df)
OutOfRangeNullTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

static set_replacement_values(capping_values: dict[str, list[int | float | None]]) → dict[str, list[bool | None]][source]

Set the _replacement_values to have all null values.

Keeps the existing keys in the _replacement_values dict and sets all values (except None) in the lists to np.NaN. Any None values remain in place.

Returns:: replacement_values
Return type:: replacement values for OutOfRangeNullTransformer

Examples

```pycon >>> import polars as pl

>>> capping_values = {"a": [0.1, 0.2], "b": [None, 10]}

>>> OutOfRangeNullTransformer.set_replacement_values(capping_values)
{'a': [None, None], 'b': [False, None]}

```

tubular.comparison module

module for comparing and conditionally updating provided columns.

class tubular.comparison.CompareTwoColumnsTransformer(columns: ]], condition: ]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to compare two columns and generate outcomes based on conditions.

This transformer evaluates a condition between two columns and generates an outcome based on the result.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:: bool

FITS

Indicates whether transform requires fit to be run first.

Type:: bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:: bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:: bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from tubular.functions.comparison import ConditionEnum >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=ConditionEnum.GREATER_THAN.value, … ) >>> json_dict = transformer.to_json() >>> from pprint import pprint >>> pprint(json_dict, sort_dicts=True) {‘classname’: ‘CompareTwoColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘condition’: ‘>’, ‘copy’: False, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

Transform two columns based on a condition to generate an outcome.

Parameters:: X (DataFrame) – DataFrame containing the columns to be transformed.
Returns:: Transformed DataFrame with the new outcome column.
Return type:: DataFrame
Raises:: TypeError – If the columns are not of a numeric type.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

class tubular.comparison.EqualityChecker(**kwargs)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer to check if two columns are equal.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

get_feature_names_out() → list[str][source]

Get list of features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> # base classes just return inputs >>> transformer = EqualityChecker( … columns=[“a”, “b”], … new_column_name=”bla”, … )

>>> transformer.get_feature_names_out()
['bla']

```

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: pd.DataFrame) → pd.DataFrame[source]

Create a column which indicated equality between given columns.

Parameters:: X (pd.DataFrame) – Data to apply mappings to.
Returns:: X – Transformed input X with additional boolean column.
Return type:: pd.DataFrame

class tubular.comparison.WhenThenOtherwiseTransformer(columns: ]], when_column: str, then_column: str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to apply conditional logic across multiple columns.

This transformer evaluates specified columns against a condition and updates with given values based on the results.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:: bool

FITS

Indicates whether transform requires fit to be run first.

Type:: bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:: bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:: bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], when_column=”condition_col”, then_column=”update_col” … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, # noqa: E501 … ) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘WhenThenOtherwiseTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘copy’: False, ‘return_native’: True, ‘then_column’: ‘update_col’, ‘verbose’: False, ‘when_column’: ‘condition_col’},

‘tubular_version’: …}

```

Apply conditional logic to transform specified columns.

Parameters:: X (DataFrame) – DataFrame containing the columns to be transformed.
Returns:: Transformed DataFrame with updated columns based on conditions.
Return type:: DataFrame
Raises:: TypeError – If the when_column is not of type Boolean or if columns have mismatched types.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

tubular.dates module

Contains transformers for working with date columns.

class tubular.dates.BaseDatetimeTransformer(columns: list[str] | str, new_column_name: str, drop_original: bool = False, **kwargs: bool | None)[source]

Bases: BaseGenericDateTransformer

Extends BaseTransformer for datetime scenarios.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BaseDatetimeTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … ) BaseDatetimeTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’)

```

FITS = False

jsonable = False

lazyframe_compatible = True

polars_compatible = True

Check types of selected columns in provided data.

Parameters:

X (DataFrame) – Data containing self.columns
return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X (DataFrame) – Validated data
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = BaseDatetimeTransformer(
… columns=[“a”, “b”],
… new_column_name=”bla”,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)
… “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)
… },
… )
>>> # base transform has no effect on data
>>> transformer.transform(test_df)
shape ((2, 2))
┌─────────────────────┬─────────────────────┐
│ a ┆ b │
│ — ┆ — │
│ datetime[μs] ┆ datetime[μs] │
╞═════════════════════╪═════════════════════╡
│ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 │)
│ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 │)
└─────────────────────┴─────────────────────┘
```

class tubular.dates.BaseGenericDateTransformer(columns: list[str] | str, new_column_name: str, drop_original: bool = False, **kwargs: bool | None)[source]

Bases: DropOriginalMixin, BaseTransformer

Extends BaseTransformer for datetime/date scenarios.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
return_native: bool, default = True: Controls whether transformer returns narwhals or native pandas/polars type
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BaseGenericDateTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … ) BaseGenericDateTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’)

```

FITS = False

check_columns_are_date_or_datetime(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, datetime_only: bool) → None[source]

Check types of provided columns.

Columns must be datetime or date type, depending on the datetime_only flag. If a column does not meet the expected type criteria, a TypeError is raised.

Parameters:

X (DataFrame) – Data to validate
datetime_only (bool) – Indicates whether ONLY datetime types are accepted

Raises:

TypeError – if non date/datetime types are found:
TypeError – if mismatched date/datetime types are found,:
types should be consistent –

Examples

```pycon >>> import polars as pl

>>> transformer = BaseGenericDateTransformer(
...     columns=["a", "b"],
...     new_column_name="bla",
... )

>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )

>>> transformer.check_columns_are_date_or_datetime(test_df, datetime_only=False)

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> # base classes just return inputs >>> transformer = BaseGenericDateTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … )

>>> transformer.get_feature_names_out()
['a', 'b']

>>> # other classes return new columns
>>> transformer = DateDifferenceTransformer(
...     columns=["a", "b"],
...     new_column_name="bla",
... )

>>> transformer.get_feature_names_out()
['bla']

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = BaseGenericDateTransformer(columns=[“a”, “b”], new_column_name=”bla”)

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'BaseGenericDateTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'bla', 'drop_original': False}, 'fit': {'is_fitted_': True}}

```

Validate data pre transform.

Parameters:

X (DataFrame) – Data containing self.columns
datetime_only (bool) – Indicates whether ONLY datetime types are accepted
return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Validated data

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = BaseGenericDateTransformer(
...     columns=["a", "b"],
...     new_column_name="bla",
... )

>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )

>>> # base transform has no effect on data
>>> transformer.transform(test_df)
shape: (2, 2)
┌────────────┬────────────┐
│ a          ┆ b          │
│ ---        ┆ ---        │
│ date       ┆ date       │
╞════════════╪════════════╡
│ 1993-09-27 ┆ 1991-05-22 │
│ 2005-10-07 ┆ 2001-12-10 │
└────────────┴────────────┘

```

class tubular.dates.BetweenDatesTransformer(columns: ]], new_column_name: str, drop_original: bool = False, lower_inclusive: bool = True, upper_inclusive: bool = True, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Transformer to generate a boolean column indicating if one date is between two others.

If any row has column_lower greater than column_upper, the output column for that row will be null instead of raising a warning.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
column_lowerstr: Name of date column to subtract. This attribute is not for use in any method, use ‘columns’ instead. Here only as a fix to allow string representation of transformer.
column_upperstr: Name of date column to subtract from. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.
column_betweenstr: Name of column to check if it’s values fall between column_lower and column_upper. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.
columnslist: Contains the names of the columns to compare in the order [column_lower, column_between column_upper].
new_column_namestr: new_column_name argument passed when initialising the transformer.
lower_inclusivebool: lower_inclusive argument passed when initialising the transformer.
upper_inclusivebool: upper_inclusive argument passed when initialising the transformer.
drop_original: bool: indicates whether to drop original columns.
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=True, … ) BetweenDatesTransformer(columns=[‘a’, ‘b’, ‘c’],

new_column_name=’b_between_a_c’)

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=False, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘BetweenDatesTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’, ‘c’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘b_between_a_c’, ‘drop_original’: False, ‘lower_inclusive’: True, ‘upper_inclusive’: False}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: FrameT) → FrameT[source]

Transform - creates column indicating if middle date is between the other two.

Rows where the lower bound is greater than the upper bound will produce null in the resulting output column for that row.

Parameters:

X (pd/pl/nw.DataFrame) – Data to transform.

Returns:

X (pd/pl/nw.DataFrame) – Input X with additional column (self.new_column_name) added. This column is boolean and indicates if the middle column is between the other 2.
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = BetweenDatesTransformer(
… columns=[“a”, “b”, “c”],
… new_column_name=”b_between_a_c”,
… lower_inclusive=True,
… upper_inclusive=True,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([)
… datetime.date(1990, 9, 27),
… datetime.date(2005, 10, 7),
… datetime.date(2010, 1, 1),
… ],
… “b” ([)
… datetime.date(1991, 5, 22),
… datetime.date(2001, 12, 10),
… datetime.date(2009, 1, 1),
… ],
… “c” ([)
… datetime.date(1993, 4, 20),
… datetime.date(2007, 11, 8),
… datetime.date(2008, 1, 1),
… ],
… },
… )
>>> transformer.transform(test_df)
shape ((3, 4))
┌────────────┬────────────┬────────────┬───────────────┐
│ a ┆ b ┆ c ┆ b_between_a_c │
│ — ┆ — ┆ — ┆ — │
│ date ┆ date ┆ date ┆ bool │
╞════════════╪════════════╪════════════╪═══════════════╡
│ 1990-09-27 ┆ 1991-05-22 ┆ 1993-04-20 ┆ true │
│ 2005-10-07 ┆ 2001-12-10 ┆ 2007-11-08 ┆ false │
│ 2010-01-01 ┆ 2009-01-01 ┆ 2008-01-01 ┆ null │
└────────────┴────────────┴────────────┴───────────────┘
```

class tubular.dates.DateDiffLeapYearTransformer(**kwargs)[source]

Bases: BaseGenericDateTransformer

Transformer to calculate the number of years between two dates.

!!! warning “Deprecated”: This transformer is now deprecated; use DateDifferenceTransformer instead.

columns

List of 2 columns. First column will be subtracted from second.

Type:: List[str]

new_column_name

Name given to calculated datediff column. If None then {column_upper}_{column_lower}_datediff will be used.

Type:: str, default = None

drop_original

Indicator whether to drop old columns during transform method.

Type:: bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = True

transform(X: FrameT) → FrameT[source]

Calculate year gap between the two provided columns.

New column is created under the ‘new_column_name’, and optionally removes the old date columns.

Parameters:: X (pd/pl/nw.DataFrame) – Data containing self.columns
Returns:: X – Data containing self.columns
Return type:: pd/pl/nw.DataFrame

class tubular.dates.DateDifferenceTransformer(columns: ]], new_column_name: str, units: ]] = 'D', drop_original: bool = False, custom_days_divider: int | None = None, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Class to transform calculate the difference between 2 date fields in specified units.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DateDifferenceTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … units=”common_year”, … ) >>> transformer DateDifferenceTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’,

units=’common_year’)

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'bla', 'drop_original': False, 'units': 'common_year', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}

>>> DateDifferenceTransformer.from_json(json_dump)
DateDifferenceTransformer(columns=['a', 'b'], new_column_name='bla',
                          units='common_year')

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DateDifferenceTransformer(columns=[“a”, “b”], new_column_name=”a_diff_b”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'a_diff_b', 'drop_original': False, 'units': 'D', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}

```

Calculate the difference between the given fields in the specified units.

Parameters:: X (DataFrame) – Data containing self.columns
Returns:: dataframe with added date difference column
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DateDifferenceTransformer(
...     columns=["a", "b"],
...     new_column_name="a_b_difference_years",
...     units="common_year",
... )

>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )

>>> transformer.transform(test_df)
shape: (2, 3)
┌────────────┬────────────┬──────────────────────┐
│ a          ┆ b          ┆ a_b_difference_years │
│ ---        ┆ ---        ┆ ---                  │
│ date       ┆ date       ┆ f64                  │
╞════════════╪════════════╪══════════════════════╡
│ 1993-09-27 ┆ 1991-05-22 ┆ -2.353425            │
│ 2005-10-07 ┆ 2001-12-10 ┆ -3.827397            │
└────────────┴────────────┴──────────────────────┘

```

class tubular.dates.DatetimeComponentExtractor(columns: str | list[str], include: ]], **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract numeric datetime components.

Attributes:

columns: List[str]: List of columns for processing
includelist of str: Which numeric datetime components to extract
polars_compatiblebool: Indicates whether transformer has been converted to polars/pandas agnostic framework
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes
jsonable: bool: Indicates if transformer supports to/from_json methods
FITS: bool: Indicates whether transform requires fit to be run first

Example:

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … ) >>> transformer DatetimeComponentExtractor(columns=[‘a’], include=[‘hour’, ‘day’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}

>>> DatetimeComponentExtractor.from_json(json_dump)
DatetimeComponentExtractor(columns=['a'], include=['hour', 'day'])

```

FITS = False

INCLUDE_OPTIONS: ClassVar[list[str]] = ['hour', 'day', 'month', 'year']

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: List of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=[“a”, “b”], … include=[“hour”, “day”], … )

>>> transformer.get_feature_names_out()
['a_hour', 'a_day', 'b_hour', 'b_day']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, Any][source]

Convert transformer to JSON format.

Returns:: JSON representation of the transformer
Return type:: dict

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … )

>>> transformer.to_json()
{'tubular_version': '...', 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}

```

Transform - Extracts numeric datetime components.

Parameters:: X (DataFrame) – Data with columns to extract info from.
Returns:: X – Transformed input X with added columns of extracted information.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DatetimeComponentExtractor(
...     columns="a",
...     include=["hour", "day"],
... )

>>> test_df = pl.DataFrame(
...     {
...         "a": [
...             datetime.datetime(1993, 9, 27, 14, 30),
...             datetime.datetime(2005, 10, 7, 9, 45),
...         ],
...         "b": [
...             datetime.datetime(1991, 5, 22, 18, 0),
...             datetime.datetime(2001, 12, 10, 23, 59),
...         ],
...     },
... )

>>> transformer.transform(test_df)
shape: (2, 4)
┌─────────────────────┬─────────────────────┬────────┬───────┐
│ a                   ┆ b                   ┆ a_hour ┆ a_day │
│ ---                 ┆ ---                 ┆ ---    ┆ ---   │
│ datetime[μs]        ┆ datetime[μs]        ┆ f32    ┆ f32   │
╞═════════════════════╪═════════════════════╪════════╪═══════╡
│ 1993-09-27 14:30:00 ┆ 1991-05-22 18:00:00 ┆ 14.0   ┆ 27.0  │
│ 2005-10-07 09:45:00 ┆ 2001-12-10 23:59:00 ┆ 9.0    ┆ 7.0   │
└─────────────────────┴─────────────────────┴────────┴───────┘

```

class tubular.dates.DatetimeInfoExtractor(columns: str | list[str], include: ]] | None = None, datetime_mappings: dict[~typing.Annotated[str, beartype.vale.Is[lambda s: ...]], dict[int, str]] | None = None, drop_original: bool | None = False, **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract various features from datetime var.

Attributes:

columns: List[str]: List of columns for processing
includelist of str, default = [“timeofday”, “timeofmonth”, “timeofyear”, “dayofweek”]: Which datetime categorical information to extract
datetime_mappingsdict, default = None: Optional argument to define custom mappings for datetime values.
drop_original: str: indicates whether to drop provided columns post transform
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DatetimeInfoExtractor( … columns=”a”, … include=”timeofday”, … ) >>> transformer DatetimeInfoExtractor(columns=[‘a’], datetime_mappings={},

include=[‘timeofday’])

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}

```

FITS = False

INCLUDE_OPTIONS = ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek']

RANGE_TO_MAP = {'dayofweek': {1, 2, 3, 4, 5, 6, 7}, 'timeofday': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}, 'timeofmonth': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, 'timeofyear': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}}

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = DatetimeInfoExtractor( … columns=[“a”, “b”], … include=[“timeofday”, “timeofmonth”], … )

>>> transformer.get_feature_names_out()
['a_timeofday', 'a_timeofmonth', 'b_timeofday', 'b_timeofmonth']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

>>> transformer=DatetimeInfoExtractor(columns='a')

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}

Transform - Extracts new features from datetime variables.

Parameters:

X (DataFrame) – Data with columns to extract info from.

Returns:

X (DataFrame) – Transformed input X with added columns of extracted information.
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = DatetimeInfoExtractor(
… columns=”a”,
… include=”timeofmonth”,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)
… “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)
… },
… )
>>> transformer.transform(test_df)
shape ((2, 3))
┌─────────────────────┬─────────────────────┬───────────────┐
│ a ┆ b ┆ a_timeofmonth │
│ — ┆ — ┆ — │
│ datetime[μs] ┆ datetime[μs] ┆ enum │
╞═════════════════════╪═════════════════════╪═══════════════╡
│ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ end │)
│ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ start │)
└─────────────────────┴─────────────────────┴───────────────┘
```

class tubular.dates.DatetimeSinusoidCalculator(columns: str | list[str], method: ]], units: ]]], period: ]]] = 6.283185307179586, drop_original: bool = False, **kwargs: bool | str)[source]

Bases: BaseDatetimeTransformer

Calculate the sine or cosine of a datetime column in a given unit (e.g hour).

Includes the option to scale period of the sine or cosine to match the natural period of the unit (e.g. 24).

Attributes:

columnsstr or list: Columns to take the sine or cosine of.
methodstr or list: The function to be calculated; either sin, cos or a list containing both.
unitsstr or dict: Which time unit the calculation is to be carried out on. Will take any of ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘microsecond’. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.
periodstr, float or dict, default = 2*np.pi: The period of the output in the units specified above. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) DatetimeSinusoidCalculator(columns=[‘a’], method=[‘sin’], units=’month’)

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … )

>>> transformer.get_feature_names_out()
['sin_6.283185307179586_month_a']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘DatetimeSinusoidCalculator’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘dummy’, ‘drop_original’: False, ‘method’: [‘sin’], ‘units’: ‘month’, ‘period’: 6.283185307179586}, ‘fit’: {’is_fitted_’: True}}

```

Transform - creates column containing sine or cosine of another datetime column.

Which function is used is stored in the self.method attribute.

Parameters:

X (pd/pl/nw.DataFrame) – Data to transform.
return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

X (pd/pl/nw.DataFrame) – Input X with additional columns added, these are named “<method>_<original_column>”
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = DatetimeSinusoidCalculator(
… columns=”a”,
… method=”sin”,
… units=”month”,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)
… “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)
… },
… )
>>> transformer.transform(test_df)
shape ((2, 3))
┌─────────────────────┬─────────────────────┬───────────────────────────────┐
│ a ┆ b ┆ sin_6.283185307179586_month_a │
│ — ┆ — ┆ — │
│ datetime[μs] ┆ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════════════╪═══════════════════════════════╡
│ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ 0.412118 │)
│ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ -0.544021 │)
└─────────────────────┴─────────────────────┴───────────────────────────────┘
```

class tubular.dates.SeriesDtMethodTransformer(**kwargs)[source]

Bases: BaseDatetimeTransformer

Transformer that applies a pandas.Series.dt method.

Transformer assigns the output of the method to a new column. It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.Series.dt method being called.

Be aware it is possible to supply incompatible arguments to init that will only be identified when transform is run. This is because there are many combinations of method, input and output sizes. Additionally some methods may only work as expected when called in transform with specific key word arguments.

column

Name of column to apply transformer to. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.

Type:: str

columns

Column name for transformation.

Type:: str

new_column_name

The name of the column or columns to be assigned to the output of running the pandas method in transform.

Type:: str

pd_method_name

The name of the pandas.DataFrame method to call.

Type:: str

pd_method_kwargs

Dictionary of keyword arguments to call the pd.Series.dt method with.

Type:: dict

drop_original

Indicates whether to drop self.column post transform

Type:: bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Transform specific column on input pandas.DataFrame (X) using the given pandas.Series.dt method.

Any keyword arguments set in the pd_method_kwargs attribute are passed onto the pd.Series.dt method when calling it.

Parameters:: X (pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional column (self.new_column_name) added. These contain the output of running the pd.Series.dt method.
Return type:: pd.DataFrame

class tubular.dates.ToDatetimeTransformer(columns: str | list[str], time_format: str | None = None, **kwargs: bool)[source]

Bases: BaseTransformer

Class to transform convert specified columns to datetime.

Class simply uses the pd.to_datetime method on the specified columns.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = ToDatetimeTransformer( … columns=”a”, … time_format=”%d/%m/%Y”, … ) >>> transformer ToDatetimeTransformer(columns=[‘a’], time_format=’%d/%m/%Y’)

>>> # version will vary for local vs CI, so use ... as generic match
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = ToDatetimeTransformer(columns=”a”, time_format=”%d/%m/%Y”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

Convert specified column to datetime using pd.to_datetime.

Parameters:: X (DataFrame) – Data with column to transform.
Returns:: dataframe with provided columns converted to datetime
Return type:: DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = ToDatetimeTransformer(
...     columns="a",
...     time_format="%d/%m/%Y",
... )

>>> test_df = pl.DataFrame({"a": ["01/02/2020", "10/12/1996"], "b": [1, 2]})

>>> transformer.transform(test_df)
shape: (2, 2)
┌─────────────────────┬─────┐
│ a                   ┆ b   │
│ ---                 ┆ --- │
│ datetime[μs]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2020-02-01 00:00:00 ┆ 1   │
│ 1996-12-10 00:00:00 ┆ 2   │
└─────────────────────┴─────┘

```

tubular.imputers module

Contains transformers that deal with imputation of missing values.

Bases: BaseImputer

Transformer to impute null values with an arbitrary pre-defined value.

impute_value

Value to impute nulls with.

Type:: int or float or str or bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> arbitrary_imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5) >>> arbitrary_imputer ArbitraryImputer(columns=[‘a’, ‘b’], impute_value=5)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = arbitrary_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ArbitraryImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'impute_value': 5}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 5, 'b': 5}}}

>>> ArbitraryImputer.from_json(json_dump)
ArbitraryImputer(columns=['a', 'b'], impute_value=5)

```

FITS = False

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Impute missing values with the supplied impute_value.

Parameters:

X (DataFrame) – Data containing columns to impute.

Returns:

X (DataFrame) – Transformed input X with nulls imputed with the specified impute_value, for the specified columns.
Example
——–
```pycon
>>> import polars as pl
>>> test_df = pl.DataFrame({“a” ([1, None, 2], “b”: [3, None, 4]}))
>>> imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5)
>>> imputer.transform(test_df)
shape ((3, 2))
┌─────┬─────┐
│ a ┆ b │
│ — ┆ — │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 3 │
│ 5 ┆ 5 │
│ 2 ┆ 4 │
└─────┴─────┘
```

class tubular.imputers.BaseImputer(columns: ]] | str, copy: bool = False, verbose: bool = False, return_native: bool = True)[source]

Bases: BaseTransformer

Contains transform method that will use fill nulls with values from self.impute_values_.

Other imputers in this module should inherit from this class.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> BaseImputer(columns=[“a”, “b”]) BaseImputer(columns=[‘a’, ‘b’])

```

FITS = False

jsonable = False

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]
Raises:: RuntimeError: – if class is not jsonable

Examples

```pycon >>> arbitrary_imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=1)

>>> # version will vary for local vs CI, so use ... as generic match
>>> arbitrary_imputer.to_json()
{'tubular_version': ..., 'classname': 'ArbitraryImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'impute_value': 1}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 1, 'b': 1}}}

>>> mean_imputer = MeanImputer(columns=["a", "b"])

>>> test_df = pl.DataFrame({"a": [1, None], "b": [None, 2]})

>>> _ = mean_imputer.fit(test_df)

>>> mean_imputer.to_json()
{'tubular_version': ..., 'classname': 'MeanImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 1.0, 'b': 2.0}}}

```

Impute missing values with values calculated from fit method.

Parameters:

X (DataFrame) – Data to impute.
return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Transformed input X with nulls imputed with the median value for the specified columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> imputer = BaseImputer(columns=["a", "b"])

>>> imputer.impute_values_ = {"a": 2, "b": 3.5}

>>> test_df = pl.DataFrame({"a": [1, None, 2], "b": [3, None, 4]})

>>> imputer.transform(test_df)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 3.0 │
│ 2   ┆ 3.5 │
│ 2   ┆ 4.0 │
└─────┴─────┘

```

class tubular.imputers.MeanImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: WeightColumnMixin, BaseImputer

Transformer to impute missing values with the mean of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (mean) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> mean_imputer = MeanImputer( … columns=[“a”, “b”], … ) >>> mean_imputer MeanImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})

>>> _ = mean_imputer.fit(test_df)

>>> json_dump = mean_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}

>>> MeanImputer.from_json(json_dump)
MeanImputer(columns=['a', 'b'])

```

FITS = True

Calculate mean values to impute with from X.

Parameters:

X (DataFrame) – Data to “learn” the mean values from.
y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MeanImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MeanImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.imputers.MedianImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the median of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (median) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> median_imputer = MedianImputer( … columns=[“a”, “b”], … ) >>> median_imputer MedianImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})

>>> _ = median_imputer.fit(test_df)

>>> json_dump = median_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MedianImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}

>>> MedianImputer.from_json(json_dump)
MedianImputer(columns=['a', 'b'])

```

FITS = True

Calculate median values to impute with from X.

Parameters:

X (DataFrame) – Data to “learn” the median values from.
y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MedianImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MedianImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.imputers.ModeImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the mode of the supplied columns.

If mode is NaN, a warning will be raised.

impute_values_

Created during fit method. Dictionary of float / int (mode) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> mode_imputer = ModeImputer( … columns=[“a”, “b”], … ) >>> mode_imputer ModeImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})

>>> _ = mode_imputer.fit(test_df)

>>> json_dump = mode_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ModeImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0, 'b': 1}}}

>>> ModeImputer.from_json(json_dump)
ModeImputer(columns=['a', 'b'])

```

FITS = True

Calculate mode values to impute with from X.

In the event of a tie, the highest modal value will be returned.

Parameters:

X (DataFrame) – Data to “learn” the mode values from.
y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance

Return type:

ModeImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = ModeImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 3 │ │ 2 ┆ 4 │ │ 2 ┆ 4 │ └─────┴─────┘

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.imputers.NearestMeanResponseImputer(**kwargs)[source]

Bases: BaseImputer

Impute nulls with the value where the average target is most similar to that for the nulls.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = True

deprecated = True

Calculate mean values to impute with.

Parameters:

X (FrameT) – Data to fit the transformer on.
y (nw.Series) – Response column used to determine the value to impute with. The average response for each level of every column is calculated. The level which has the closest average response to the average response of the unknown levels is selected as the imputation value.

Returns:

NearestMeanResponseImputer

Return type:

fitted class instance

Raises:

ValueError – provided y contains nulls:

jsonable = False

lazyframe_compatible = False

polars_compatible = True

class tubular.imputers.NullIndicator(columns: ]] | str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Class to create a binary indicator column for null values.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> null_indicator = NullIndicator( … columns=[“a”, “b”], … ) >>> null_indicator NullIndicator(columns=[‘a’, ‘b’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = null_indicator.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'NullIndicator', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True}, 'fit': {'is_fitted_': True}}

>>> NullIndicator.from_json(json_dump)
NullIndicator(columns=['a', 'b'])

```

FITS = False

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Create new columns indicating the position of null values for each variable in self.columns.

Parameters:: X (DataFrame) – Data to add indicators to.
Returns:: dataframe with null indicator columns added
Return type:: DataFrame

Examples

——–, ```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = NullIndicator(columns=[“a”, “b”]) >>> imputer.transform(test_df) shape: (3, 4) ┌──────┬──────┬─────────┬─────────┐ │ a ┆ b ┆ a_nulls ┆ b_nulls │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ bool │ ╞══════╪══════╪═════════╪═════════╡ │ 1 ┆ 3 ┆ false ┆ false │ │ null ┆ null ┆ true ┆ true │ │ 2 ┆ 4 ┆ false ┆ false │ └──────┴──────┴─────────┴─────────┘

```

tubular.mapping module

Contains transformers that apply different types of mappings to columns.

class tubular.mapping.BaseCrossColumnMappingTransformer(**kwargs)[source]

Bases: BaseMappingTransformer

BaseMappingTransformer Extension for cross column mapping transformers.

adjust_column

Column containing the values to be adjusted.

Type:: str

mappings

Dictionary of mappings for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Check X is valid for transform and calls parent transform.

Parameters:: X (pd.DataFrame) – Data to apply adjustments to.
Returns:: X – Transformed data X with adjustments applied to specified columns.
Return type:: pd.DataFrame
Raises:: ValueError: – if provided adjust_column is not in DataFrame.

class tubular.mapping.BaseCrossColumnNumericTransformer(**kwargs)[source]

Bases: BaseCrossColumnMappingTransformer

BaseCrossColumnNumericTransformer Extension for cross column numerical mapping transformers.

adjust_column

Column containing the values to be adjusted.

Type:: str

mappings

Dictionary of mappings for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Check X is valid for transform and calls parent transform.

Parameters:: X (pd.DataFrame) – Data to apply adjustments to.
Returns:: X – Transformed data X with adjustments applied to specified columns.
Return type:: pd.DataFrame
Raises:: TypeError: – if provided columns are non-numeric

class tubular.mapping.BaseMappingTransformMixin(columns: ]] | str, copy: bool = False, verbose: bool = False, return_native: bool = True)[source]

Bases: BaseTransformer

Mixin class to apply mappings to columns method.

Transformer uses the mappings attribute which should be a dict of dicts/mappings for each required column.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

FITS = False

jsonable = False

lazyframe_compatible = True

polars_compatible = True

Apply mapping defined in the mappings dict to each column in the columns attribute.

Parameters:

X (DataFrame) – Data with nominal columns to transform.
return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X (DataFrame) – Transformed input X with levels mapped according to mappings dict.
# not currently including doctest for this, as is not intended to be used
# independently (should be inherited as a mixin)

class tubular.mapping.BaseMappingTransformer(mappings: dict[str, dict[Any, Any]], return_dtypes: dict[str, RETURN_DTYPES] | None = None, **kwargs: bool | None)[source]

Bases: BaseTransformer

Base Transformer Extension for mapping transformers.

mappings

Dictionary of mappings for each column individually. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

mappings_from_null

dict storing what null values will be mapped to. Generally best to use an imputer, but this functionality is useful for inverting pipelines.

Type:: dict[str, Any]

return_dtypes

Dictionary of col:dtype for returned columns

Type:: dict[str, RETURN_DTYPES]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> BaseMappingTransformer( … mappings={“a”: {“Y”: 1, “N”: 0}}, … return_dtypes={“a”: “Int8”}, … ) BaseMappingTransformer(mappings={‘a’: {‘N’: 0, ‘Y’: 1}},

return_dtypes={‘a’: ‘Int8’})

```

FITS = False

RETURN_DTYPES: alias of Literal[‘String’, ‘Object’, ‘Categorical’, ‘Boolean’, ‘Int8’, ‘Int16’, ‘Int32’, ‘Int64’, ‘Float32’, ‘Float64’]

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> mapping_transformer = BaseMappingTransformer(mappings={“a”: {“x”: 1}})

>>> mapping_transformer.to_json()
{'tubular_version': ..., 'classname': 'BaseMappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'mappings': {'a': {'x': 1}}, 'return_dtypes': {'a': 'Int64'}}, 'fit': {'is_fitted_': True}}

```

Check mappings dict has been fitted.

Parameters:

X (DataFrame) – Data to apply mappings to.
return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Input X, copied if specified by user.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = BaseMappingTransformer(
...     mappings={"a": {"Y": 1, "N": 0}},
...     return_dtypes={"a": "Int8"},
... )

>>> test_df = pl.DataFrame({"a": ["Y", "N"], "b": [3, 4]})

>>> # base class transform has no effect on data
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ Y   ┆ 3   │
│ N   ┆ 4   │
└─────┴─────┘

```

class tubular.mapping.CrossColumnAddTransformer(**kwargs)[source]

Bases: BaseCrossColumnNumericTransformer

Transformer to apply an additive adjustment to values in one column based on the values of another column.

adjust_column

Column containing the values to be adjusted.

Type:: str

mappings

Dictionary of additive adjustments for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Transform values in given column using the values provided in the adjustments dictionary.

Parameters:: X (pd.DataFrame) – Data to apply adjustments to.
Returns:: X – Transformed data X with adjustments applied to specified columns.
Return type:: pd.DataFrame

class tubular.mapping.CrossColumnMappingTransformer(**kwargs)[source]

Bases: BaseCrossColumnMappingTransformer

Transformer to adjust values in one column based on the values of another column.

adjust_column

Column containing the values to be adjusted.

Type:: str

mappings

Dictionary of mappings for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Transform values in given column using the values provided in the adjustments dictionary.

Parameters:: X (pd.DataFrame) – Data to apply adjustments to.
Returns:: X – Transformed data X with adjustments applied to specified columns.
Return type:: pd.DataFrame

class tubular.mapping.CrossColumnMultiplyTransformer(**kwargs)[source]

Bases: BaseCrossColumnNumericTransformer

Transformer to apply a multiplicative adjustment to values in one column based on the values of another column.

adjust_column

Column containing the values to be adjusted.

Type:: str

mappings

Dictionary of multiplicative adjustments for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Transform values in given column using the values provided in the adjustments dictionary.

Parameters:: X (pd.DataFrame) – Data to apply adjustments to.
Returns:: X – Transformed data X with adjustments applied to specified columns.
Return type:: pd.DataFrame

class tubular.mapping.MappingTransformer(mappings: dict[str, dict[Any, Any]], return_dtypes: dict[str, RETURN_DTYPES] | None = None, **kwargs: bool | None)[source]

Bases: BaseMappingTransformer, BaseMappingTransformMixin

Transformer to map values in columns to other values e.g. to merge two levels into one.

Note, the MappingTransformer does not require ‘self-mappings’ to be defined i.e. if you want to map a value to itself, you can omit this value from the mappings rather than having to map it to itself.

This transformer inherits from BaseMappingTransformMixin as well as the BaseMappingTransformer, BaseMappingTransformer performs standard checks, while BasemappingTransformMixin handles the actual logic.

Parameters:

mappings (dict) – Dictionary containing column mappings. Each value in mappings should be a dictionary of key (column to apply mapping to) value (mapping dict for given columns) pairs. For example the following dict {‘a’: {1: 2, 3: 4}, ‘b’: {‘a’: 1, ‘b’: 2}} would specify a mapping for column a of 1->2, 3->4 and a mapping for column b of ‘a’->1, b->2.
return_dtype (Optional[Dict[str, RETURN_DTYPES]]) – Dictionary of col:dtype for returned columns
**kwargs – Arbitrary keyword arguments passed onto BaseMappingTransformer.init method.

mappings

Dictionary of mappings for each column individually. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

mappings_from_null

dict storing what null values will be mapped to. Generally best to use an imputer, but this functionality is useful for inverting pipelines.

Type:: dict[str, Any]

return_dtypes

Dictionary of col:dtype for returned columns

Type:: dict[str, RETURN_DTYPES]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> transformer = MappingTransformer( … mappings={“a”: {“Y”: 1, “N”: 0}}, … return_dtypes={“a”: “Int8”}, … ) >>> transformer MappingTransformer(mappings={‘a’: {‘N’: 0, ‘Y’: 1}},

return_dtypes={‘a’: ‘Int8’})

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'mappings': {'a': {'Y': 1, 'N': 0}}, 'return_dtypes': {'a': 'Int8'}}, 'fit': {'is_fitted_': True}}

>>> MappingTransformer.from_json(json_dump)
MappingTransformer(mappings={'a': {'N': 0, 'Y': 1}},
                   return_dtypes={'a': 'Int8'})

```

FITS = False

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Transform the input data X according to the mappings in the mappings attribute dict.

This method calls the BaseMappingTransformMixin.transform. Note, this transform method is different to some of the transform methods in the nominal module, even though they also use the BaseMappingTransformMixin.transform method. Here, if a value does not exist in the mapping it is unchanged.

Parameters:: X (DataFrame) – Data with nominal columns to transform.
Returns:: X – Transformed input X with levels mapped according to mappings dict.
Return type:: DataFrame

Examples

``pycon >>> import polars as pl

>>> transformer = MappingTransformer(
...   mappings={'a': {'Y': 1, 'N': 0}},
...   return_dtypes={"a":"Int8"},
...    )

>>> test_df=pl.DataFrame({'a': ["Y", "N"], 'b': [3,4]})

>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i8  ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 0   ┆ 4   │
└─────┴─────┘

```

tubular.misc module

Contains legacy transformers for introducing fixed columns and changing dtypes.

class tubular.misc.ColumnDtypeSetter(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], dtype: ]], **kwargs: bool)[source]

Bases: BaseTransformer

Transformer to set transform columns in a dataframe to a dtype.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘ColumnDtypeSetter’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘dtype’: ‘Float32’, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

Transform data.

Parameters:: X (DataFrame) – data to transform.
Returns:: DataFrame
Return type:: transformed data

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2]}) >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> transformer.transform(df) shape: (2, 1) ┌─────┐ │ a │ │ — │ │ f32 │ ╞═════╡ │ 1.0 │ │ 2.0 │ └─────┘

```

class tubular.misc.RenameColumnsTransformer(columns: ]] | str, new_column_names: dict[str, str], drop_original: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, DropOriginalMixin

Transformer to rename a given set of columns.

This can be useful for personalising the auto-output names from other transformers, or for creating a few different versions of a given column to undergo separate paths of logic in a pipeline (as the expression logic effectively creates duplicates of the column).

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> transformer RenameColumnsTransformer(columns=[‘a’], new_column_names={‘a’: ‘new_a’})

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> pprint(json_dump, sort_dicts=True)
{'classname': 'RenameColumnsTransformer',
 'fit': {'is_fitted_': True},
 'init': {'columns': ['a'],
          'copy': False,
          'drop_original': True,
          'new_column_names': {'a': 'new_a'},
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> RenameColumnsTransformer.from_json(json_dump)
RenameColumnsTransformer(columns=['a'], new_column_names={'a': 'new_a'})

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = RenameColumnsTransformer( … columns=[“a”, “b”], … new_column_names={“a”: “new_a”, “b”: “new_b”}, … )

>>> transformer.get_feature_names_out()
['new_a', 'new_b']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘RenameColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘drop_original’: True, ‘new_column_names’: {‘a’: ‘new_a’}, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

Create column copies.

Parameters:: X (DataFrame) – Data to apply mappings to.
Returns:: X – Transformed input X with columns set to value.
Return type:: DataFrame
Raises:: ValueError – if new_column_names values are already present in X:

Examples

```pycon >>> import polars as pl

>>> transformer = RenameColumnsTransformer(
...     columns="a", new_column_names={"a": "new_a"}
... )  # noqa: E501

>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬───────┐
│ b   ┆ new_a │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 4   ┆ 1     │
│ 5   ┆ 2     │
│ 6   ┆ 3     │
└─────┴───────┘

```

Bases: BaseTransformer

Transformer to set value of column(s) to a given value.

This should be used if columns need to be set to a constant value.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> SetValueTransformer(columns=”a”, value=1) SetValueTransformer(columns=[‘a’], value=1)

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = SetValueTransformer(columns=”a”, value=1) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘SetValueTransformer’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘value’: 1}, ‘fit’: {’is_fitted_’: True}}

```

Set columns to value.

Parameters:: X (DataFrame) – Data to apply mappings to.
Returns:: X – Transformed input X with columns set to value.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = SetValueTransformer(columns="a", value=1)

>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 1   ┆ 5   │
│ 1   ┆ 6   │
└─────┴─────┘

```

tubular.mixins module

Contains mixin classes for use across transformers.

class tubular.mixins.CheckNumericMixin[source]

Bases: object

Mixin class with methods for numeric transformers.

Check column args are numeric for numeric transformers.

Parameters:

X (DataFrame) – Data containing columns to check.
return_native (bool) – indicates whether to return nw or pd/pl dataframe

Returns:

validated dataframe

Return type:

DataFrame

Raises:

TypeError: – if provided columns are non-numeric

classname() → str[source]

Get name of the current class when called.

Returns:: name of class
Return type:: str

class tubular.mixins.DropOriginalMixin[source]

Bases: object

Mixin class to validate and apply ‘drop_original’ argument used by various transformers.

Transformer deletes transformer input columns depending on boolean argument.

classname() → str[source]

Get name of the current class when called.

Returns:: name of class
Return type:: str

Drop input columns from X if drop_original set to True.

Parameters:

X (DataFrame) – Data with columns to drop.
drop_original (bool) – boolean dictating dropping the input columns from X after checks.
columns (list[str] | str | None) – Object containing columns to drop
return_native (bool) – controls whether mixin returns native or narwhals type

Returns:

X – Transformed input X with columns dropped.

Return type:

DataFrame

class tubular.mixins.WeightColumnMixin[source]

Bases: object

Mixin class with weights functionality.

check_weights_column(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, weights_column: str) → None[source]

Validate weights column in dataframe.

Parameters:

X (DataFrame) – input data
weights_column (str) – name of weight column

Raises:

ValueError: – if weights_column is missing from data
ValueError: – if weights_column is non-numeric

classname() → str[source]

Get the name of the current class when called.

Returns:: name of class
Return type:: str

static get_valid_weights_filter_expr(weights_column: str, verbose: bool = False) → Expr[source]

Validate weights column in dataframe.

Parameters:

weights_column (str) – name of weight column
verbose (bool) – control verbosity of method

Returns:

nw.Expr

Return type:

expression to be used for filtering down to valid weights rows

tubular.nominal module

Contains transformers that apply encodings to nominal columns.

class tubular.nominal.GroupRareLevelsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, cut_off_percent: ]] = 0.01, weights_column: str | None = None, rare_level_name: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] = 'rare', record_rare_levels: bool = True, unseen_levels_to_rare: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, WeightColumnMixin

Group together rare levels of nominal variables into a new rare level.

Rare levels are defined by a cut off percentage, which can either be based on the number of rows or sum of weights. Any levels below this cut off value will be grouped into the rare level.

cut_off_percent

Cut off percentage (either in terms of number of rows or sum of weight) for a given nominal level to be considered rare.

Type:: float

non_rare_levels

Created in fit. A dict of non-rare levels (i.e. levels with more than cut_off_percent weight or rows) that is used to identify rare levels in transform.

Type:: dict

rare_level_name

Must be of the same type as columns. Label for the new nominal level that will be added to group together rare levels (as defined by cut_off_percent).

Type:: any

record_rare_levels

Should the ‘rare’ levels that will be grouped together be recorded? If not they will be lost after the fit and the only information remaining will be the ‘non’rare’ levels.

Type:: bool

rare_levels_record

Only created (in fit) if record_rare_levels is True. This is dict containing a list of levels that were grouped into ‘rare’ for each column the transformer was applied to.

Type:: dict

weights_column

Name of weights columns to use if cut_off_percent should be in terms of sum of weight not number of rows.

Type:: str

unseen_levels_to_rare

If True, unseen levels in new data will be passed to rare, if set to false they will be left unchanged.

Type:: bool

training_data_levels

Dictionary containing the set of values present in the training data for each column in self.columns. It will only exist in if unseen_levels_to_rare is set to False.

Type:: dict[set]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> GroupRareLevelsTransformer( … columns=”a”, … cut_off_percent=0.02, … rare_level_name=”rare_level”, … ) GroupRareLevelsTransformer(columns=[‘a’], cut_off_percent=0.02,

rare_level_name=’rare_level’)

```

FITS = True

Record non-rare levels for categorical variables.

When transform is called, only levels records in non_rare_levels during fit will remain unchanged - all other levels will be grouped. If record_rare_levels is True then the rare levels will also be recorded.

The label for the rare levels must be of the same type as the columns.

Parameters:

X (DataFrame) – Data to identify non-rare levels from.
y (Series or LazyFrame or None, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

GroupRareLevelsTransformer

Return type:

fitted class instance

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.02,
...     rare_level_name="rare_level",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})

>>> transformer.fit(test_df)
GroupRareLevelsTransformer(columns=['a'], cut_off_percent=0.02,
                           rare_level_name='rare_level')

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> import tests.test_data as d

>>> df = d.create_df_8("pandas")

>>> x = GroupRareLevelsTransformer(
...     columns=["b", "c"], cut_off_percent=0.4, unseen_levels_to_rare=False
... )

>>> x.fit(df)
GroupRareLevelsTransformer(columns=['b', 'c'], cut_off_percent=0.4,
                           unseen_levels_to_rare=False)

>>> x.to_json()
{'tubular_version': ..., 'classname': 'GroupRareLevelsTransformer', 'init': {'columns': ['b', 'c'], 'copy': False, 'verbose': False, 'return_native': True, 'cut_off_percent': 0.4, 'weights_column': None, 'rare_level_name': 'rare', 'record_rare_levels': True, 'unseen_levels_to_rare': False}, 'fit': {'is_fitted_': True, 'non_rare_levels': {'b': ['w'], 'c': ['a']}, 'training_data_levels': {'b': ['w', 'x', 'y', 'z'], 'c': ['a', 'b', 'c']}, 'rare_levels_record': {'b': ['x', 'y', 'z'], 'c': ['b', 'c']}}}

```

Group rare levels together into a new ‘rare’ level.

Parameters:: X (DataFrame) – Data to with catgeorical variables to apply rare level grouping to.
Returns:: X – Transformed input X with rare levels grouped for into a new rare level.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.5,
...     rare_level_name="rare_level",
... )

>>> test_df = pl.DataFrame({"a": ["x", "x", "y"], "b": ["w", "z", "z"]})

>>> _ = transformer.fit(test_df)

>>> transformer.transform(test_df)
shape: (3, 2)
┌────────────┬─────┐
│ a          ┆ b   │
│ ---        ┆ --- │
│ str        ┆ str │
╞════════════╪═════╡
│ x          ┆ w   │
│ x          ┆ z   │
│ rare_level ┆ z   │
└────────────┴─────┘

```

Bases: BaseTransformer, WeightColumnMixin, DropOriginalMixin

Convert categorical variables to numeric by mapping levels to the mean response for level.

For a continuous or binary response the categorical columns specified will have values replaced with the mean response for each category.

For an n > 1 level categorical response, up to n binary responses can be created, which in turn can then be used to encode each categorical column specified. This will generate up to n * len(columns) new columns, of with names of the form {column}_{response_level}. The original columns will be removed from the dataframe. This functionality is controlled using the ‘level’ parameter. Note that the above only works for a n > 1 level categorical response. Do not use ‘level’ parameter for a n = 1 level numerical response. In this case, use the standard mean response transformer without the ‘level’ parameter.

If a categorical variable contains null values these will not be transformed.

The same weights and prior are applied to each response level in the multi-level case.

columns

Categorical columns to encode in the input data.

Type:: str or list

weights_column

Weights column to use when calculating the mean response.

Type:: str or None

prior

Regularisation parameter, can be thought of roughly as the size a category should be in order for its statistics to be considered reliable (hence default value of 0 means no regularisation).

Type:: int, default = 0

level

Parameter to control encoding against a multi-level categorical response. If None the response will be treated as binary or continuous, if ‘all’ all response levels will be encoded against and if it is a list of levels then only the levels specified will be encoded against.

Type:: str, int, float, list or None, default = None

response_levels

Only created in the multi-level case. Generated from level, list of all the response levels to encode against.

Type:: list

mappings

Created in fit. A nested Dict of {column names : column specific mapping dictionary} pairs. Column specific mapping dictionaries contain {initial value : mapped value} pairs.

Type:: dict

mapped_columns

Only created in the multi-level case. A list of the new columns produced by encoded the columns in self.columns against multiple response levels, of the form {column}_{level}.

Type:: list

transformer_dict

Only created in the multi-level case. A dictionary of the form level : transformer containing the mean response transformers for each level to be encoded against.

Type:: dict

unseen_levels_encoding_dict

Dict containing the values (based on chosen unseen_level_handling) derived from the encoded columns to use when handling unseen levels in data passed to transform method.

Type:: dict

return_type

What type to cast return column as. Defaults to float32.

Type:: Literal[‘float32’, ‘float64’]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> transformer
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})

>>> _ = transformer.fit(test_df[["a"]], test_df["b"])

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 1, 'level': None, 'unseen_level_handling': 'mean', 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.25, 'y': 0.75}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a'], 'unseen_levels_encoding_dict': {'a': 0.5}}}
>>> MeanResponseTransformer.from_json(json_dump)
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

FITS = True

Identify mapping of categorical levels to mean response values.

If the user specified the weights_column arg in when initialising the transformer the weighted mean response will be calculated using that column.

In the multi-level case this method learns which response levels are present and are to be encoded against.

Parameters:

X (DataFrame) – Data to with catgeorical variable columns to transform and also containing response_column column.
y (Series or LazyFrame) – Response variable or target.

Returns:

MeanResponseTransformer

Return type:

fitted class instance

Raises:

ValueError – if y contains null values:

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})

>>> transformer.fit(test_df, test_df["target"])
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )

>>> transformer.get_feature_names_out()
['a']

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level=["x", "y"],
...     unseen_level_handling="mean",
... )

>>> transformer.get_feature_names_out()
['a_x', 'a_y']

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level="all",
...     unseen_level_handling="mean",
... )

>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...

>>> test_df = pl.DataFrame({"a": ["x", "y", "x"], "b": ["cat", "dog", "rat"]})

>>> _ = transformer.fit(test_df, test_df["b"])

>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(columns=["a"])

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})

>>> _ = transformer.fit(test_df[["a"]], test_df["b"])

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 0, 'level': None, 'unseen_level_handling': None, 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.0, 'y': 1.0}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a']}}

```

Apply mean response encoding stored in the mappings attribute to columns.

Parameters:: X (DataFrame) – Data with nominal columns to transform.
Returns:: X – Transformed input X with levels mapped according to mappings dict.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> # example with no prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=0, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})

>>> _ = transformer.fit(test_df, test_df["target"])

>>> transformer.transform(test_df)
shape: (2, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ target │
│ --- ┆ --- ┆ ---    │
│ f32 ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ 0.0 ┆ 1   ┆ 0      │
│ 1.0 ┆ 2   ┆ 1      │
└─────┴─────┴────────┘

# example with prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=1, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})

>>> _ = transformer.fit(test_df, test_df["target"])

>>> transformer.transform(test_df)
shape: (2, 3)
┌──────┬─────┬────────┐
│ a    ┆ b   ┆ target │
│ ---  ┆ --- ┆ ---    │
│ f32  ┆ i64 ┆ i64    │
╞══════╪═════╪════════╡
│ 0.25 ┆ 1   ┆ 0      │
│ 0.75 ┆ 2   ┆ 1      │
└──────┴─────┴────────┘

```

class tubular.nominal.NominalToIntegerTransformer(**kwargs)[source]

Bases: BaseMappingTransformMixin

Transformer to convert columns containing nominal values into integer values.

The nominal levels that are mapped to integers are not ordered in any way.

start_encoding

Value to start the encoding / mapping of nominal to integer from.

Type:: int

mappings

Created in fit. A dict of key (column names) value (mappings between levels and integers for given column) pairs.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = True

deprecated = True

fit(X: pd.DataFrame, y: pd.Series | None = None) → pd.DataFrame[source]

Create mapping between nominal levels and integer values for categorical variables.

Parameters:

X (pd.DataFrame) – Data to fit the transformer on, this sets the nominal levels that can be mapped.
y (None or pd.DataFrame or pd.Series, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

NominalToIntegerTransformer

Return type:

fitted class instance

Raises:

ValueError – if column has more levels than can be encoded as int8:

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: pd.DataFrame) → pd.DataFrame[source]

Apply integer encoding stored in the mappings attribute to columns.

Parameters:: X (pd.DataFrame) – Data with nominal columns to transform.
Returns:: X – Transformed input X with levels mapped according to mappings dict.
Return type:: pd.DataFrame

class tubular.nominal.OneHotEncodingTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, wanted_values: dict[str, ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]]] | None = None, separator: str = '_', drop_original: bool = False, **kwargs: bool)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer to convert categorical variables into dummy columns.

separator

Separator used in naming for dummy columns.

Type:: str

drop_original

Should original columns be dropped after creating dummy fields?

Type:: bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> transformer
OneHotEncodingTransformer(columns=['a'])

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})

>>> _ = transformer.fit(test_df)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}

>>> OneHotEncodingTransformer.from_json(json_dump)
OneHotEncodingTransformer(columns=['a'])

```

FITS = True

MAX_LEVELS = 100

Get list of levels for each column to be transformed.

This defines which dummy columns will be created in transform.

Parameters:

X (DataFrame) – Data to identify levels from.
y (None) – Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Returns:

OneHotEncodingTransformer

Return type:

fitted class instance

Raises:

ValueError – if column has >100 levels:

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})

>>> transformer.fit(test_df)
OneHotEncodingTransformer(columns=['a'])

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
...     wanted_values={"a": ["cat", "dog"]},
... )

>>> transformer.get_feature_names_out()
['a_cat', 'a_dog']

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )

>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...

>>> test_df = pl.DataFrame({"a": ["cat", "dog", "rat"]})

>>> _ = transformer.fit(test_df)

>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(columns=["a"])

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})

>>> _ = transformer.fit(test_df)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}

```

Create new dummy columns from categorical fields.

Parameters:

X (DataFrame) – Data to apply one hot encoding to.
return_native_override (Optional[bool]) – controls whether transformer returns narwhals or native type.
return_native_override
transformer (option to override return_native attr in)
parent (useful when calling)
methods

Returns:

X_transformed – Transformed input X with dummy columns derived from categorical columns added. If drop_original = True then the original categorical columns that the dummies are created from will not be in the output X.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})

>>> _ = transformer.fit(test_df)

>>> transformer.transform(test_df)
shape: (2, 4)
┌─────┬─────┬───────┬───────┐
│ a   ┆ b   ┆ a_x   ┆ a_y   │
│ --- ┆ --- ┆ ---   ┆ ---   │
│ str ┆ i64 ┆ bool  ┆ bool  │
╞═════╪═════╪═══════╪═══════╡
│ x   ┆ 1   ┆ true  ┆ false │
│ y   ┆ 2   ┆ false ┆ true  │
└─────┴─────┴───────┴───────┘

```

class tubular.nominal.OrdinalEncoderTransformer(**kwargs)[source]

Bases: BaseMappingTransformMixin, WeightColumnMixin

Encode categorical variables into ascending rank-ordered integer values variables.

Maps levels to the target-mean response for that level.

Values will be sorted in ascending order only i.e. categorical level with lowest target mean response to be encoded as 1, the next highest value as 2 and so on.

If a categorical variable contains null values these will not be transformed.

weights_column

Weights column to use when calculating the mean response.

Type:: str or None

mappings

Created in fit. Dict of key (column names) value (mapping of categorical levels to numeric, ordinal encoded response values) pairs.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = True

deprecated = True

Identify mapping of categorical levels to rank-ordered integer values by target-mean in ascending order.

If the user specified the weights_column arg in when initialising the transformer the weighted mean response will be calculated using that column.

Parameters:

X (DataFrame) – Data to with catgeorical variable columns to transform and response_column column specified when object was initialised.
y (Series or LazyFrame) – Response column or target.

Returns:

OrdinalEncoderTransformer

Return type:

fitted class instance

Raises:

ValueError – if y contains nulls:

jsonable = False

lazyframe_compatible = False

polars_compatible = False

Apply ordinal encoding stored in the mappings attribute to columns.

This maps categorical levels to rank-ordered integer values by target-mean in ascending order.

Parameters:: X (DataFrame) – Data to with catgeorical variable columns to transform.
Returns:: X – Transformed data with levels mapped to ordinal encoded values for categorical variables.
Return type:: DataFrame

tubular.numeric module

Contains transformers that apply numeric functions.

class tubular.numeric.BaseNumericTransformer(columns: list[str], **kwargs: dict[str, bool])[source]

Bases: BaseTransformer, CheckNumericMixin

Extends BaseTransformer for datetime scenarios.

columns

List of columns to be operated on

Type:: List[str]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> BaseNumericTransformer( … columns=”a”, … ) BaseNumericTransformer(columns=[‘a’])

```

FITS = False

Validate data and attributes prior to the child objects fit logic.

Parameters:

X (DataFrame) – A dataframe containing the required columns
y (Series | None) – Required for pipeline.

Returns:

fitted class instance.

Return type:

BaseNumericTransformer

Examples

```pycon >>> import polars as pl

>>> transformer = BaseNumericTransformer(
...     columns="a",
... )

>>> test_df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})

>>> transformer.fit(test_df)
BaseNumericTransformer(columns=['a'])

```

jsonable = False

lazyframe_compatible = True

polars_compatible = True

Validate data and attributes prior to the child objects transform logic.

Parameters:

X (DataFrame) – Data to transform.
return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Validated data

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = BaseNumericTransformer(
...     columns="a",
... )

>>> test_df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})

>>> # base class has no effect on datag
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

```

class tubular.numeric.CutTransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Class to bin a column into discrete intervals.

Class simply uses the [pd.cut](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html) method on the specified column.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Discretise specified column using pd.cut.

Parameters:: X (pd.DataFrame) – Data with column to transform.
Returns:: Dataframe with binned column
Return type:: pd.DataFrame

class tubular.numeric.DifferenceTransformer(columns: ]], **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs subtraction operation between two columns.

This transformer allows performing subtraction between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The second column is subtracted from the first.

Type:: ListOfTwoStrs

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> transformer.columns [‘a’, ‘b’]

```

FITS = False

get_feature_names_out() → list[str][source]

Get the names of the output features.

Returns:: List containing the name of the new column created by the transformation.
Return type:: list[str]

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Transform the DataFrame by applying the subtraction operation between two columns.

Parameters:: X (DataFrame) – DataFrame containing the columns to operate on.
Returns:: Transformed DataFrame with the new column containing the subtraction results.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬───────────┐ │ a ┆ b ┆ a_minus_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═══════════╡ │ 100 ┆ 80 ┆ 20 │ │ 200 ┆ 150 ┆ 50 │ │ 300 ┆ 200 ┆ 100 │ └─────┴─────┴───────────┘

```

class tubular.numeric.InteractionTransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Generates interaction features.

Transformer generates a new column for all combinations from the selected columns up to the maximum degree provided. (For sklearn version higher than 1.0.0>, only interaction of a degree higher or equal to the minimum degree would be computed). Each interaction column consists of the product of the specific combination of columns. Ex: with 3 columns provided [“a”,”b”,”c”], if max degree is 3, the total possible combinations are : - of degree 1 : [“a”,”b”,”c”] - of degree 2 : [“a b”,”b c”,”a c”] - of degree 3 : [“a b c”].

min_degree

minimum degree of interaction features to be considered

Type:: int

max_degree

maximum degree of interaction features to be considered

Type:: int

nb_features_to_interact

number of selected columns from which interactions should be computed. (=len(columns))

Type:: int

nb_combinations

number of new interaction features

Type:: int

interaction_colname

names of each new interaction feature. The name of an interaction feature is the combinations of previous column names joined with a whitespace. Interaction feature of [“col1”,”col2”,”col3] would be “col1 col2 col3”.

Type:: list

nb_feature_out

number of total columns of transformed dataset, including new interaction features

Type:: int

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

MIN_DEGREE_VALUE = 2

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Generate interaction features using the “product” pandas.DataFrame method.

Parameters:: X (pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional column or columns (self.interaction_colname) added. These contain the output of running the product pandas DataFrame method on identified combinations.
Return type:: pd.DataFrame
Raises:: TypeError – for invalid PolynomialFeatures._combinations arguments:

class tubular.numeric.LogTransformer(**kwargs)[source]

Bases: BaseNumericTransformer, DropOriginalMixin

Transformer to apply log transformation.

Transformer has the option to add 1 to the columns to log and drop the original columns.

add_1

The name of the column or columns to be assigned to the output of running the pandas method in transform.

Type:: bool

drop_original

The name of the pandas.DataFrame method to call.

Type:: bool

suffix

The suffix to add onto the end of column names for new columns.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Apply the log transform to the specified columns.

If the drop attribute is True then the original columns are dropped. If the add_1 attribute is True then the original columns + 1 are logged.

Parameters:: X (pd.DataFrame) – The dataframe to be transformed.
Returns:: X – The dataframe with the specified columns logged, optionally dropping the original columns if self.drop is True.
Return type:: pd.DataFrame
Raises:: ValueError: – if provided columns contain negative values.

class tubular.numeric.OneDKmeansTransformer(columns: str | ~typing.Annotated[list[str], beartype.vale.Is[lambda list_arg: ...]], new_column_name: str, n_init: str | int = 'auto', n_clusters: int = 8, drop_original: bool = False, kmeans_kwargs: dict[str, object] | None = None, **kwargs: bool)[source]

Bases: BaseNumericTransformer, DropOriginalMixin

Generates a new column based on kmeans algorithm.

Transformer runs the kmeans algorithm based on given number of clusters and then identifies the bins’ cuts based on the results. Finally it passes them into the a cut function.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”new”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … ) OneDKmeansTransformer(columns=[‘a’], kmeans_kwargs={‘random_state’: 42},

n_clusters=2, new_column_name=’new’)

```

FITS = True

fit(X: FrameT, y: IntoSeriesT | None = None) → OneDKmeansTransformer[source]

Fit transformer to input data.

Parameters:

X (pd/pl.DataFrame) – Dataframe with columns to learn scaling values from.
y (None) – Required for pipeline.

Returns:

Fitted class instance.

Return type:

OneDKmeansTransformer

Raises:

ValueError: – if columns in X contain missing values.

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )

>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})

>>> transformer.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”kmeans_column”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … )

>>> transformer.get_feature_names_out()
['kmeans_column']

```

jsonable = True

lazyframe_compatible = False

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

>>> import polars as pl
>>> x = OneDKmeansTransformer(
... columns='a',
... n_clusters=2,
... new_column_name="new",
... drop_original=False,
... kmeans_kwargs={"random_state": 42},
...    )
>>> test_df=pl.DataFrame({'a': [1,2,3,4],  'b': [5,6,7,8]})
>>> x.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')
>>> x.to_json()
{'tubular_version': ..., 'classname': 'OneDKmeansTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'new', 'n_init': 'auto', 'n_clusters': 2, 'drop_original': False, 'kmeans_kwargs': {'random_state': 42}}, 'fit': {'is_fitted_': True, 'bins': [3, 4]}}

transform(X: FrameT) → FrameT[source]

Generate from input pd/pl.DataFrame (X) bins based on Kmeans results and add this column or columns in X.

Parameters:: X (pl/pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional cluster column added.
Return type:: pl/pd.DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )

>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})

>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ new │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 0   │
│ 2   ┆ 6   ┆ 0   │
│ 3   ┆ 7   ┆ 0   │
│ 4   ┆ 8   ┆ 1   │
└─────┴─────┴─────┘

```

class tubular.numeric.PCATransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Generates variables using Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

It is based on sklearn class sklearn.decomposition.PCA

pca

Type:: PCA class from sklearn.decomposition

n_components_

The estimated number of components. When n_components is set to ‘mle’ or a number between 0 and 1 (with svd_solver == ‘full’) this number is estimated from input data. Otherwise it equals the parameter n_components, or the lesser value of n_features and n_samples if n_components is None.

Type:: int

feature_names_out

list of feature name representing the new dimensions.

Type:: list or None

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = True

deprecated = True

fit(X: DataFrame, y: Series | None = None) → DataFrame[source]

Fit PCA to input data.

Parameters:

X (pd.DataFrame) – Dataframe with columns to learn scaling values from.
y (None) – Required for pipeline.

Returns:

fitted class instance.

Return type:

PCATransformer

Raises:

ValueError: – if n_components is invalid for data

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Generate from input pandas DataFrame (X) PCA features and add this column or columns in X.

Parameters:: X (pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional column or columns (self.interaction_colname) added. These contain the output of running the product pandas DataFrame method on identified combinations.
Return type:: pd.DataFrame

class tubular.numeric.RatioTransformer(columns: ]], return_dtype: ]] = 'Float32', **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs division operation between two columns.

This transformer allows performing division between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The first column is the numerator, and the second column is the denominator.

Type:: ListOfTwoStrs

return_dtype

The dtype of the resulting column, either ‘Float32’ or ‘Float64’.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> transformer.columns [‘a’, ‘b’] >>> transformer.return_dtype ‘Float32’

```

FITS = False

get_feature_names_out() → list[str][source]

Get the names of the output features.

Returns:: List containing the name of the new column created by the transformation.
Return type:: list[str]

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> ratio_transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> ratio_transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘RatioTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘return_dtype’: ‘Float32’}, ‘fit’: {’is_fitted_’: True}}

```

Transform the DataFrame by applying the division operation between two columns.

Parameters:: X (DataFrame) – DataFrame containing the columns to operate on.
Returns:: Transformed DataFrame with the new column containing the division results.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬────────────────┐ │ a ┆ b ┆ a_divided_by_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ f32 │ ╞═════╪═════╪════════════════╡ │ 100 ┆ 80 ┆ 1.25 │ │ 200 ┆ 150 ┆ 1.333333 │ │ 300 ┆ 200 ┆ 1.5 │ └─────┴─────┴────────────────┘

```

class tubular.numeric.ScalingTransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Transformer to perform scaling of numeric columns.

Transformer can apply min max scaling, max absolute scaling or standardisation (subtract mean and divide by std). The transformer uses the appropriate sklearn.preprocessing scaler.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = True

deprecated = True

fit(X: DataFrame, y: Series | None = None) → ScalingTransformer[source]

Fit scaler to input data.

Parameters:

X (pd.DataFrame) – Dataframe with columns to learn scaling values from.
y (None) – Required for pipeline.

Returns:

fitted class instance.

Return type:

ScalingTransformer

jsonable = False

lazyframe_compatible = False

polars_compatible = False

scaler_options: ClassVar[dict[str, MinMaxScaler | MaxAbsScaler | StandardScaler]] = {'max_abs': <class 'sklearn.preprocessing._data.MaxAbsScaler'>, 'min_max': <class 'sklearn.preprocessing._data.MinMaxScaler'>, 'standard': <class 'sklearn.preprocessing._data.StandardScaler'>}

transform(X: DataFrame) → DataFrame[source]

Transform input data X with fitted scaler.

Parameters:: X (pd.DataFrame) – Dataframe containing columns to be scaled.
Returns:: X – Input X with columns scaled.
Return type:: pd.DataFrame

class tubular.numeric.TwoColumnOperatorTransformer(**kwargs)[source]

Bases: DataFrameMethodTransformer, BaseNumericTransformer

Applies a pandas.DataFrame method to two columns (add, sub, mul, div, mod, pow).

Transformer assigns the output of the method to a new column. The method will be applied in the form (column 1)operator(column 2), so order matters (if the method does not commute). It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.DataFrame method being called.

pd_method_name

The name of the pandas.DataFrame method to be called.

Type:: str

columns

list containing two string items: [column1_name, column2_name] The first will be operated upon by the chosen pandas method using the second.

Type:: list

column2_name

The name of the 2nd column in the operation.

Type:: str

new_column_name

The name of the new column that the output is assigned to.

Type:: str

pd_method_kwargs

Dictionary of method kwargs to be passed to pandas.DataFrame method.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]: Transform input data by applying the chosen method to the two specified columns.

Args:

X (pd.DataFrame): Data to transform.

Returns:

pd.DataFrame: Input X with an additional column.

tubular.strings module

Contains transformers that apply string functions.

class tubular.strings.ExtractStringComponentsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], by: str, return_n_components: ]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to extract components from string columns, split by given character.

by

character to split on

Type:: str

return_n_components

number of components to return

Type:: int

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … ) >>> transformer ExtractStringComponentsTransformer(by=’@’, columns=[‘a’], return_n_components=2)

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'ExtractStringComponentsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'by': '@',
          'columns': ['a'],
          'copy': False,
          'return_n_components': 2,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> ExtractStringComponentsTransformer.from_json(json_dump)
ExtractStringComponentsTransformer(by='@', columns=['a'], return_n_components=2)

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … )

>>> transformer.get_feature_names_out()
['a_split_by_@_entry_0', 'a_split_by_@_entry_1']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … )

>>> pprint(transformer.to_json())
{'classname': 'ExtractStringComponentsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'by': '@',
          'columns': ['a'],
          'copy': False,
          'return_n_components': 2,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

Extract components from string columns, split by given character.

Parameters:: X (DataFrame) – Data containing columns to extract components from.
Returns:: X – Transformed input X with string components extracted from columns.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [”greg@gmail.com”, “bob@apple.net”]}) >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … ) >>> transformer.transform(test_df) shape: (2, 3) ┌────────────────┬──────────────────────┬──────────────────────┐ │ a ┆ a_split_by_@_entry_0 ┆ a_split_by_@_entry_1 │ │ — ┆ — ┆ — │ │ str ┆ str ┆ str │ ╞════════════════╪══════════════════════╪══════════════════════╡ │ greg@gmail.com ┆ greg ┆ gmail.com │ │ bob@apple.net ┆ bob ┆ apple.net │ └────────────────┴──────────────────────┴──────────────────────┘

```

class tubular.strings.LowerCaseTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to lower case of text columns.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = LowerCaseTransformer( … columns=[“a”], … ) >>> transformer LowerCaseTransformer(columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'LowerCaseTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> LowerCaseTransformer.from_json(json_dump)
LowerCaseTransformer(columns=['a'])

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Lower case of text in given columns.

Parameters:: X (DataFrame) – Data containing columns to lowercase.
Returns:: X – Transformed input X with text lowercased in given columns.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [“HeLlO”, None, “ HI”]}) >>> transformer = LowerCaseTransformer(columns=”a”) >>> transformer.transform(test_df) shape: (3, 1) ┌───────┐ │ a │ │ — │ │ str │ ╞═══════╡ │ hello │ │ null │ │ hi │ └───────┘

```

class tubular.strings.RemoveCharactersTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], characters: list[str], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to remove characters from text columns.

characters

list of characters to remove from text columns.

Type:: list[str]

characters_formatted

characters attr formatted into regex string.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”\d”]) >>> transformer RemoveCharactersTransformer(characters=[’\d’], columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['\\d'],
          'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> RemoveCharactersTransformer.from_json(json_dump)
RemoveCharactersTransformer(characters=['\\d'], columns=['a'])

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”, “b”], characters=[“a”])

>>> pprint(transformer.to_json())
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['a'],
          'columns': ['a', 'b'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

Strip unwanted characters from specified columns.

Parameters:: X (DataFrame) – Data containing columns to strip.
Returns:: X – Transformed input X with characters stripped from specified columns.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [” 8hi!”, None, “9999hello “]}) >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”W”, “s”]) >>> transformer.transform(test_df) shape: (3, 1) ┌───────────┐ │ a │ │ — │ │ str │ ╞═══════════╡ │ 8hi │ │ null │ │ 9999hello │ └───────────┘

```

class tubular.strings.SeriesStrMethodTransformer(**kwargs)[source]

Bases: BaseTransformer

Transformer that applies a pandas.Series.str method.

Transformer assigns the output of the method to a new column. It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.Series.str method being called.

Be aware it is possible to supply incompatible arguments to init that will only be identified when transform is run. This is because there are many combinations of method, input and output sizes. Additionally some methods may only work as expected when called in transform with specific key word arguments.

new_column_name

The name of the column or columns to be assigned to the output of running the pd.Series.str in transform.

Type:: str

pd_method_name

The name of the pd.Series.str method to call.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Apply given pandas.Series.str method to given column.

Any keyword arguments set in the pd_method_kwargs attribute are passed onto the pd.Series.str method when calling it.

Parameters:: X (pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional column (self.new_column_name) added. These contain the output of running the pd.Series.str method.
Return type:: pd.DataFrame

class tubular.strings.StringConcatenator(**kwargs)[source]

Bases: BaseTransformer

Transformer to combine data from specified columns, of mixed datatypes, into a new column containing one string.

Parameters:

columns (str or list of str) – Columns to concatenate.
new_column_name (str, default = "new_column") – New column name
separator (str, default = " ") – Separator for the new string value

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

deprecated = True

jsonable = False

lazyframe_compatible = False

polars_compatible = False

transform(X: DataFrame) → DataFrame[source]

Combine data from specified columns, of mixed datatypes, into a new column containing one string.

Parameters:: X (df) – Data to concatenate values on.
Returns:: X – Returns a dataframe with concatenated values.
Return type:: df

class tubular.strings.StringContainsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], reference: str, reference_as_column: bool = False, **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to indicate if given columns contain reference values.

reference

column or value to compare against, e.g. look for values of reference=’a’ in columns [‘b’, ‘c’].

Type:: str

reference_as_column

indicates whether reference represents a column (or value). Note, reference_as_column=True is not supported for pandas backend.

Type:: bool

characters_formatted

characters attr formatted into regex string.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = StringContainsTransformer( … columns=[“a”], reference=”b”, reference_as_column=True … ) >>> transformer StringContainsTransformer(columns=[‘a’], reference=’b’,

reference_as_column=True)

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'StringContainsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'reference': 'b',
          'reference_as_column': True,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> StringContainsTransformer.from_json(json_dump)
StringContainsTransformer(columns=['a'], reference='b',
                          reference_as_column=True)

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = StringContainsTransformer(columns=[“a”, “b”], reference=”c”)

>>> transformer.get_feature_names_out()
['a_contains_c', 'b_contains_c']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = StringContainsTransformer( … columns=[“a”], reference=”b”, reference_as_column=True … )

>>> pprint(transformer.to_json())
{'classname': 'StringContainsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'reference': 'b',
          'reference_as_column': True,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

Indicate if provided columns contain reference values.

Parameters:: X (DataFrame) – Data containing columns to strip.
Returns:: X – Transformed input X with characters stripped from specified columns.
Return type:: DataFrame
Raises:: TypeError – if called on pandas df when reference_as_column=True:

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame( … {“a”: [“cat”, “dog”, None, “mouse”], “b”: [“cat”, “rat”, None, “mouse”]} … ) >>> transformer = StringContainsTransformer( … columns=[“a”], reference=”b”, reference_as_column=True … ) >>> transformer.transform(test_df) shape: (4, 3) ┌───────┬───────┬──────────────┐ │ a ┆ b ┆ a_contains_b │ │ — ┆ — ┆ — │ │ str ┆ str ┆ bool │ ╞═══════╪═══════╪══════════════╡ │ cat ┆ cat ┆ true │ │ dog ┆ rat ┆ false │ │ null ┆ null ┆ null │ │ mouse ┆ mouse ┆ true │ └───────┴───────┴──────────────┘

```

Module contents

Initialise classes exposed by package.

class tubular.AggregateColumnsOverRowTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], aggregations: ]], drop_original: bool = False, **kwargs: bool)[source]

Bases: BaseAggregationTransformer

Aggregate provided columns over each row.

This transformer aggregates data within specified columns and can optionally drop the original columns post-transformation.

Attributes:

columnsUnion[str,list[str]]: List of column names to apply the aggregation transformations to.
aggregationslist[str]: List of aggregation methods to apply.
drop_originalbool, optional: Whether to drop the original columns after transformation. Default is False.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatible: bool: Indicates if transformer will work with polars frames
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> AggregateColumnsOverRowTransformer( … columns=[“a”, “b”], … aggregations=[“min”, “max”], … ) AggregateColumnsOverRowTransformer(aggregations=[‘min’, ‘max’],

columns=[‘a’, ‘b’])

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = AggregateColumnsOverRowTransformer( … columns=[“a”, “b”], … aggregations=[“min”, “max”], … )

>>> transformer.get_feature_names_out()
['a_b_min', 'a_b_max']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Transform the dataframe by aggregating provided columns over each row.

Parameters:

X (DataFrame) – DataFrame to transform by aggregating provided columns over each row

Returns:

DataFrame – Transformed DataFrame with aggregated columns.
Example
——–
```pycon
>>> import polars as pl
>>> transformer = AggregateColumnsOverRowTransformer(
… columns=[“a”, “b”],
… aggregations=[“min”, “max”],
… )
>>> test_df = pl.DataFrame({“a” ([1, 2], “b”: [3, 4], “c”: [5, 6]}))
>>> transformer.transform(test_df)
shape ((2, 5))
┌─────┬─────┬─────┬─────────┬─────────┐
│ a ┆ b ┆ c ┆ a_b_min ┆ a_b_max │
│ — ┆ — ┆ — ┆ — ┆ — │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════════╪═════════╡
│ 1 ┆ 3 ┆ 5 ┆ 1 ┆ 3 │
│ 2 ┆ 4 ┆ 6 ┆ 2 ┆ 4 │
└─────┴─────┴─────┴─────────┴─────────┘
```

class tubular.AggregateRowsOverColumnTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], aggregations: ]], key: str, drop_original: bool = False, **kwargs: bool)[source]

Bases: BaseAggregationTransformer

Aggregation transformer.

Aggregate rows over specified columns, where rows are grouped by provided key column.

Attributes:

columnsUnion[str, list[str]]: List of column names to apply the aggregation transformations to.
aggregationslist[str]: List of aggregation methods to apply.
keystr: Column name to group by for aggregation.
drop_originalbool, optional: Whether to drop the original columns after transformation. Default is False.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatible: bool: Indicates if transformer will work with polars frames
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> AggregateRowsOverColumnTransformer( … columns=”a”, … aggregations=[“min”, “max”], … key=”b”, … ) AggregateRowsOverColumnTransformer(aggregations=[‘min’, ‘max’], columns=[‘a’],

key=’b’)

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = AggregateRowsOverColumnTransformer( … columns=”a”, … aggregations=[“min”, “max”], … key=”b”, … )

>>> transformer.get_feature_names_out()
['a_min', 'a_max']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, Any][source]

Dump transformer to json dict.

Returns:

dict[str, Any]:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Example:

```pycon >>> transformer = AggregateRowsOverColumnTransformer( … columns=”a”, … key=”c”, … aggregations=[“min”, “max”], … ) >>> transformer.to_json() # doctest: +NORMALIZE_WHITESPACE {‘tubular_version’: …,

‘classname’: ‘AggregateRowsOverColumnTransformer’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘aggregations’: [‘min’, ‘max’], ‘drop_original’: False, ‘key’: ‘c’}, ‘fit’: {’is_fitted_’: True}}

```

Transform the dataframe by aggregating rows over specified columns.

Parameters:: X (DataFrame) – DataFrame to transform by aggregating specified columns.
Returns:: Transformed DataFrame with aggregated columns.
Return type:: DataFrame
Raises:: ValueError – If the key column is not found in the DataFrame.

Examples

```pycon >>> import polars as pl

>>> transformer = AggregateRowsOverColumnTransformer(
...     columns="a",
...     aggregations=["min", "max"],
...     key="b",
... )

>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [1, 1, 2], "c": [1, 2, 3]})

>>> transformer.transform(test_df)
shape: (3, 5)
┌─────┬─────┬─────┬───────┬───────┐
│ a   ┆ b   ┆ c   ┆ a_min ┆ a_max │
│ --- ┆ --- ┆ --- ┆ ---   ┆ ---   │
│ i64 ┆ i64 ┆ i64 ┆ i64   ┆ i64   │
╞═════╪═════╪═════╪═══════╪═══════╡
│ 1   ┆ 1   ┆ 1   ┆ 1     ┆ 2     │
│ 2   ┆ 1   ┆ 2   ┆ 1     ┆ 2     │
│ 3   ┆ 2   ┆ 3   ┆ 3     ┆ 3     │
└─────┴─────┴─────┴───────┴───────┘

```

Bases: BaseImputer

Transformer to impute null values with an arbitrary pre-defined value.

impute_value

Value to impute nulls with.

Type:: int or float or str or bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> arbitrary_imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5) >>> arbitrary_imputer ArbitraryImputer(columns=[‘a’, ‘b’], impute_value=5)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = arbitrary_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ArbitraryImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'impute_value': 5}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 5, 'b': 5}}}

>>> ArbitraryImputer.from_json(json_dump)
ArbitraryImputer(columns=['a', 'b'], impute_value=5)

```

FITS = False

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Impute missing values with the supplied impute_value.

Parameters:

X (DataFrame) – Data containing columns to impute.

Returns:

X (DataFrame) – Transformed input X with nulls imputed with the specified impute_value, for the specified columns.
Example
——–
```pycon
>>> import polars as pl
>>> test_df = pl.DataFrame({“a” ([1, None, 2], “b”: [3, None, 4]}))
>>> imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5)
>>> imputer.transform(test_df)
shape ((3, 2))
┌─────┬─────┐
│ a ┆ b │
│ — ┆ — │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 3 │
│ 5 ┆ 5 │
│ 2 ┆ 4 │
└─────┴─────┘
```

class tubular.BetweenDatesTransformer(columns: ]], new_column_name: str, drop_original: bool = False, lower_inclusive: bool = True, upper_inclusive: bool = True, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Transformer to generate a boolean column indicating if one date is between two others.

If any row has column_lower greater than column_upper, the output column for that row will be null instead of raising a warning.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
column_lowerstr: Name of date column to subtract. This attribute is not for use in any method, use ‘columns’ instead. Here only as a fix to allow string representation of transformer.
column_upperstr: Name of date column to subtract from. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.
column_betweenstr: Name of column to check if it’s values fall between column_lower and column_upper. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.
columnslist: Contains the names of the columns to compare in the order [column_lower, column_between column_upper].
new_column_namestr: new_column_name argument passed when initialising the transformer.
lower_inclusivebool: lower_inclusive argument passed when initialising the transformer.
upper_inclusivebool: upper_inclusive argument passed when initialising the transformer.
drop_original: bool: indicates whether to drop original columns.
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=True, … ) BetweenDatesTransformer(columns=[‘a’, ‘b’, ‘c’],

new_column_name=’b_between_a_c’)

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=False, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘BetweenDatesTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’, ‘c’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘b_between_a_c’, ‘drop_original’: False, ‘lower_inclusive’: True, ‘upper_inclusive’: False}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: FrameT) → FrameT[source]

Transform - creates column indicating if middle date is between the other two.

Rows where the lower bound is greater than the upper bound will produce null in the resulting output column for that row.

Parameters:

X (pd/pl/nw.DataFrame) – Data to transform.

Returns:

X (pd/pl/nw.DataFrame) – Input X with additional column (self.new_column_name) added. This column is boolean and indicates if the middle column is between the other 2.
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = BetweenDatesTransformer(
… columns=[“a”, “b”, “c”],
… new_column_name=”b_between_a_c”,
… lower_inclusive=True,
… upper_inclusive=True,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([)
… datetime.date(1990, 9, 27),
… datetime.date(2005, 10, 7),
… datetime.date(2010, 1, 1),
… ],
… “b” ([)
… datetime.date(1991, 5, 22),
… datetime.date(2001, 12, 10),
… datetime.date(2009, 1, 1),
… ],
… “c” ([)
… datetime.date(1993, 4, 20),
… datetime.date(2007, 11, 8),
… datetime.date(2008, 1, 1),
… ],
… },
… )
>>> transformer.transform(test_df)
shape ((3, 4))
┌────────────┬────────────┬────────────┬───────────────┐
│ a ┆ b ┆ c ┆ b_between_a_c │
│ — ┆ — ┆ — ┆ — │
│ date ┆ date ┆ date ┆ bool │
╞════════════╪════════════╪════════════╪═══════════════╡
│ 1990-09-27 ┆ 1991-05-22 ┆ 1993-04-20 ┆ true │
│ 2005-10-07 ┆ 2001-12-10 ┆ 2007-11-08 ┆ false │
│ 2010-01-01 ┆ 2009-01-01 ┆ 2008-01-01 ┆ null │
└────────────┴────────────┴────────────┴───────────────┘
```

Bases: BaseCappingTransformer

Transformer to cap numeric values at both or either minimum and maximum values.

For max capping any values above the cap value will be set to the cap. Similarly for min capping any values below the cap will be set to the cap. Only works for numeric columns.

Attributes:

capping_valuesdict[str, CappingValues] or None: Capping values to apply to each column, capping_values argument.
quantilesdict[str, CappingValues] or None: Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.
quantile_capping_valuesdict[str, CappingValues] or None: Capping values learned from quantiles (if provided) to apply to each column.
weights_columnstr or None: weights_column argument.
_replacement_valuesdict[str, CappingValues]: Replacement values when capping is applied. Will be a copy of capping_values.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 10  ┆ 3   ┆ 1   │
│ 15  ┆ 2   ┆ 2   │
│ 18  ┆ 3   ┆ 3   │
│ 20  ┆ 1   ┆ 4   │
└─────┴─────┴─────┘

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'CappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}

>>> CappingTransformer.from_json(json_dump)
CappingTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:

X (DataFrame) – A dataframe with required columns to be capped.
y (None) – Required for pipeline.

Returns:

CappingTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.fit(test_df)
CappingTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.ColumnDtypeSetter(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], dtype: ]], **kwargs: bool)[source]

Bases: BaseTransformer

Transformer to set transform columns in a dataframe to a dtype.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

deprecated

indicates if class has been deprecated

Type:: bool

FITS = False

deprecated = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘ColumnDtypeSetter’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘dtype’: ‘Float32’, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

Transform data.

Parameters:: X (DataFrame) – data to transform.
Returns:: DataFrame
Return type:: transformed data

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2]}) >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> transformer.transform(df) shape: (2, 1) ┌─────┐ │ a │ │ — │ │ f32 │ ╞═════╡ │ 1.0 │ │ 2.0 │ └─────┘

```

class tubular.CompareTwoColumnsTransformer(columns: ]], condition: ]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to compare two columns and generate outcomes based on conditions.

This transformer evaluates a condition between two columns and generates an outcome based on the result.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:: bool

FITS

Indicates whether transform requires fit to be run first.

Type:: bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:: bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:: bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from tubular.functions.comparison import ConditionEnum >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=ConditionEnum.GREATER_THAN.value, … ) >>> json_dict = transformer.to_json() >>> from pprint import pprint >>> pprint(json_dict, sort_dicts=True) {‘classname’: ‘CompareTwoColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘condition’: ‘>’, ‘copy’: False, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

Transform two columns based on a condition to generate an outcome.

Parameters:: X (DataFrame) – DataFrame containing the columns to be transformed.
Returns:: Transformed DataFrame with the new outcome column.
Return type:: DataFrame
Raises:: TypeError – If the columns are not of a numeric type.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

class tubular.DateDifferenceTransformer(columns: ]], new_column_name: str, units: ]] = 'D', drop_original: bool = False, custom_days_divider: int | None = None, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Class to transform calculate the difference between 2 date fields in specified units.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DateDifferenceTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … units=”common_year”, … ) >>> transformer DateDifferenceTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’,

units=’common_year’)

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'bla', 'drop_original': False, 'units': 'common_year', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}

>>> DateDifferenceTransformer.from_json(json_dump)
DateDifferenceTransformer(columns=['a', 'b'], new_column_name='bla',
                          units='common_year')

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DateDifferenceTransformer(columns=[“a”, “b”], new_column_name=”a_diff_b”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'a_diff_b', 'drop_original': False, 'units': 'D', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}

```

Calculate the difference between the given fields in the specified units.

Parameters:: X (DataFrame) – Data containing self.columns
Returns:: dataframe with added date difference column
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DateDifferenceTransformer(
...     columns=["a", "b"],
...     new_column_name="a_b_difference_years",
...     units="common_year",
... )

>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )

>>> transformer.transform(test_df)
shape: (2, 3)
┌────────────┬────────────┬──────────────────────┐
│ a          ┆ b          ┆ a_b_difference_years │
│ ---        ┆ ---        ┆ ---                  │
│ date       ┆ date       ┆ f64                  │
╞════════════╪════════════╪══════════════════════╡
│ 1993-09-27 ┆ 1991-05-22 ┆ -2.353425            │
│ 2005-10-07 ┆ 2001-12-10 ┆ -3.827397            │
└────────────┴────────────┴──────────────────────┘

```

class tubular.DatetimeComponentExtractor(columns: str | list[str], include: ]], **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract numeric datetime components.

Attributes:

columns: List[str]: List of columns for processing
includelist of str: Which numeric datetime components to extract
polars_compatiblebool: Indicates whether transformer has been converted to polars/pandas agnostic framework
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes
jsonable: bool: Indicates if transformer supports to/from_json methods
FITS: bool: Indicates whether transform requires fit to be run first

Example:

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … ) >>> transformer DatetimeComponentExtractor(columns=[‘a’], include=[‘hour’, ‘day’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}

>>> DatetimeComponentExtractor.from_json(json_dump)
DatetimeComponentExtractor(columns=['a'], include=['hour', 'day'])

```

FITS = False

INCLUDE_OPTIONS: ClassVar[list[str]] = ['hour', 'day', 'month', 'year']

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: List of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=[“a”, “b”], … include=[“hour”, “day”], … )

>>> transformer.get_feature_names_out()
['a_hour', 'a_day', 'b_hour', 'b_day']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, Any][source]

Convert transformer to JSON format.

Returns:: JSON representation of the transformer
Return type:: dict

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … )

>>> transformer.to_json()
{'tubular_version': '...', 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}

```

Transform - Extracts numeric datetime components.

Parameters:: X (DataFrame) – Data with columns to extract info from.
Returns:: X – Transformed input X with added columns of extracted information.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DatetimeComponentExtractor(
...     columns="a",
...     include=["hour", "day"],
... )

>>> test_df = pl.DataFrame(
...     {
...         "a": [
...             datetime.datetime(1993, 9, 27, 14, 30),
...             datetime.datetime(2005, 10, 7, 9, 45),
...         ],
...         "b": [
...             datetime.datetime(1991, 5, 22, 18, 0),
...             datetime.datetime(2001, 12, 10, 23, 59),
...         ],
...     },
... )

>>> transformer.transform(test_df)
shape: (2, 4)
┌─────────────────────┬─────────────────────┬────────┬───────┐
│ a                   ┆ b                   ┆ a_hour ┆ a_day │
│ ---                 ┆ ---                 ┆ ---    ┆ ---   │
│ datetime[μs]        ┆ datetime[μs]        ┆ f32    ┆ f32   │
╞═════════════════════╪═════════════════════╪════════╪═══════╡
│ 1993-09-27 14:30:00 ┆ 1991-05-22 18:00:00 ┆ 14.0   ┆ 27.0  │
│ 2005-10-07 09:45:00 ┆ 2001-12-10 23:59:00 ┆ 9.0    ┆ 7.0   │
└─────────────────────┴─────────────────────┴────────┴───────┘

```

class tubular.DatetimeInfoExtractor(columns: str | list[str], include: ]] | None = None, datetime_mappings: dict[~typing.Annotated[str, beartype.vale.Is[lambda s: ...]], dict[int, str]] | None = None, drop_original: bool | None = False, **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract various features from datetime var.

Attributes:

columns: List[str]: List of columns for processing
includelist of str, default = [“timeofday”, “timeofmonth”, “timeofyear”, “dayofweek”]: Which datetime categorical information to extract
datetime_mappingsdict, default = None: Optional argument to define custom mappings for datetime values.
drop_original: str: indicates whether to drop provided columns post transform
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DatetimeInfoExtractor( … columns=”a”, … include=”timeofday”, … ) >>> transformer DatetimeInfoExtractor(columns=[‘a’], datetime_mappings={},

include=[‘timeofday’])

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}

```

FITS = False

INCLUDE_OPTIONS = ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek']

RANGE_TO_MAP = {'dayofweek': {1, 2, 3, 4, 5, 6, 7}, 'timeofday': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}, 'timeofmonth': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, 'timeofyear': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}}

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = DatetimeInfoExtractor( … columns=[“a”, “b”], … include=[“timeofday”, “timeofmonth”], … )

>>> transformer.get_feature_names_out()
['a_timeofday', 'a_timeofmonth', 'b_timeofday', 'b_timeofmonth']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

>>> transformer=DatetimeInfoExtractor(columns='a')

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}

Transform - Extracts new features from datetime variables.

Parameters:

X (DataFrame) – Data with columns to extract info from.

Returns:

X (DataFrame) – Transformed input X with added columns of extracted information.
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = DatetimeInfoExtractor(
… columns=”a”,
… include=”timeofmonth”,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)
… “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)
… },
… )
>>> transformer.transform(test_df)
shape ((2, 3))
┌─────────────────────┬─────────────────────┬───────────────┐
│ a ┆ b ┆ a_timeofmonth │
│ — ┆ — ┆ — │
│ datetime[μs] ┆ datetime[μs] ┆ enum │
╞═════════════════════╪═════════════════════╪═══════════════╡
│ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ end │)
│ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ start │)
└─────────────────────┴─────────────────────┴───────────────┘
```

class tubular.DatetimeSinusoidCalculator(columns: str | list[str], method: ]], units: ]]], period: ]]] = 6.283185307179586, drop_original: bool = False, **kwargs: bool | str)[source]

Bases: BaseDatetimeTransformer

Calculate the sine or cosine of a datetime column in a given unit (e.g hour).

Includes the option to scale period of the sine or cosine to match the natural period of the unit (e.g. 24).

Attributes:

columnsstr or list: Columns to take the sine or cosine of.
methodstr or list: The function to be calculated; either sin, cos or a list containing both.
unitsstr or dict: Which time unit the calculation is to be carried out on. Will take any of ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘microsecond’. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.
periodstr, float or dict, default = 2*np.pi: The period of the output in the units specified above. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) DatetimeSinusoidCalculator(columns=[‘a’], method=[‘sin’], units=’month’)

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … )

>>> transformer.get_feature_names_out()
['sin_6.283185307179586_month_a']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘DatetimeSinusoidCalculator’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘dummy’, ‘drop_original’: False, ‘method’: [‘sin’], ‘units’: ‘month’, ‘period’: 6.283185307179586}, ‘fit’: {’is_fitted_’: True}}

```

Transform - creates column containing sine or cosine of another datetime column.

Which function is used is stored in the self.method attribute.

Parameters:

X (pd/pl/nw.DataFrame) – Data to transform.
return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

X (pd/pl/nw.DataFrame) – Input X with additional columns added, these are named “<method>_<original_column>”
Example
——–
```pycon
>>> import polars as pl
>>> import datetime
>>> transformer = DatetimeSinusoidCalculator(
… columns=”a”,
… method=”sin”,
… units=”month”,
… )
>>> test_df = pl.DataFrame(
… {
… “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)
… “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)
… },
… )
>>> transformer.transform(test_df)
shape ((2, 3))
┌─────────────────────┬─────────────────────┬───────────────────────────────┐
│ a ┆ b ┆ sin_6.283185307179586_month_a │
│ — ┆ — ┆ — │
│ datetime[μs] ┆ datetime[μs] ┆ f64 │
╞═════════════════════╪═════════════════════╪═══════════════════════════════╡
│ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ 0.412118 │)
│ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ -0.544021 │)
└─────────────────────┴─────────────────────┴───────────────────────────────┘
```

class tubular.DifferenceTransformer(columns: ]], **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs subtraction operation between two columns.

This transformer allows performing subtraction between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The second column is subtracted from the first.

Type:: ListOfTwoStrs

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> transformer.columns [‘a’, ‘b’]

```

FITS = False

get_feature_names_out() → list[str][source]

Get the names of the output features.

Returns:: List containing the name of the new column created by the transformation.
Return type:: list[str]

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Transform the DataFrame by applying the subtraction operation between two columns.

Parameters:: X (DataFrame) – DataFrame containing the columns to operate on.
Returns:: Transformed DataFrame with the new column containing the subtraction results.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬───────────┐ │ a ┆ b ┆ a_minus_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═══════════╡ │ 100 ┆ 80 ┆ 20 │ │ 200 ┆ 150 ┆ 50 │ │ 300 ┆ 200 ┆ 100 │ └─────┴─────┴───────────┘

```

class tubular.GroupRareLevelsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, cut_off_percent: ]] = 0.01, weights_column: str | None = None, rare_level_name: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] = 'rare', record_rare_levels: bool = True, unseen_levels_to_rare: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, WeightColumnMixin

Group together rare levels of nominal variables into a new rare level.

Rare levels are defined by a cut off percentage, which can either be based on the number of rows or sum of weights. Any levels below this cut off value will be grouped into the rare level.

cut_off_percent

Cut off percentage (either in terms of number of rows or sum of weight) for a given nominal level to be considered rare.

Type:: float

non_rare_levels

Created in fit. A dict of non-rare levels (i.e. levels with more than cut_off_percent weight or rows) that is used to identify rare levels in transform.

Type:: dict

rare_level_name

Must be of the same type as columns. Label for the new nominal level that will be added to group together rare levels (as defined by cut_off_percent).

Type:: any

record_rare_levels

Should the ‘rare’ levels that will be grouped together be recorded? If not they will be lost after the fit and the only information remaining will be the ‘non’rare’ levels.

Type:: bool

rare_levels_record

Only created (in fit) if record_rare_levels is True. This is dict containing a list of levels that were grouped into ‘rare’ for each column the transformer was applied to.

Type:: dict

weights_column

Name of weights columns to use if cut_off_percent should be in terms of sum of weight not number of rows.

Type:: str

unseen_levels_to_rare

If True, unseen levels in new data will be passed to rare, if set to false they will be left unchanged.

Type:: bool

training_data_levels

Dictionary containing the set of values present in the training data for each column in self.columns. It will only exist in if unseen_levels_to_rare is set to False.

Type:: dict[set]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> GroupRareLevelsTransformer( … columns=”a”, … cut_off_percent=0.02, … rare_level_name=”rare_level”, … ) GroupRareLevelsTransformer(columns=[‘a’], cut_off_percent=0.02,

rare_level_name=’rare_level’)

```

FITS = True

Record non-rare levels for categorical variables.

When transform is called, only levels records in non_rare_levels during fit will remain unchanged - all other levels will be grouped. If record_rare_levels is True then the rare levels will also be recorded.

The label for the rare levels must be of the same type as the columns.

Parameters:

X (DataFrame) – Data to identify non-rare levels from.
y (Series or LazyFrame or None, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

GroupRareLevelsTransformer

Return type:

fitted class instance

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.02,
...     rare_level_name="rare_level",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})

>>> transformer.fit(test_df)
GroupRareLevelsTransformer(columns=['a'], cut_off_percent=0.02,
                           rare_level_name='rare_level')

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> import tests.test_data as d

>>> df = d.create_df_8("pandas")

>>> x = GroupRareLevelsTransformer(
...     columns=["b", "c"], cut_off_percent=0.4, unseen_levels_to_rare=False
... )

>>> x.fit(df)
GroupRareLevelsTransformer(columns=['b', 'c'], cut_off_percent=0.4,
                           unseen_levels_to_rare=False)

>>> x.to_json()
{'tubular_version': ..., 'classname': 'GroupRareLevelsTransformer', 'init': {'columns': ['b', 'c'], 'copy': False, 'verbose': False, 'return_native': True, 'cut_off_percent': 0.4, 'weights_column': None, 'rare_level_name': 'rare', 'record_rare_levels': True, 'unseen_levels_to_rare': False}, 'fit': {'is_fitted_': True, 'non_rare_levels': {'b': ['w'], 'c': ['a']}, 'training_data_levels': {'b': ['w', 'x', 'y', 'z'], 'c': ['a', 'b', 'c']}, 'rare_levels_record': {'b': ['x', 'y', 'z'], 'c': ['b', 'c']}}}

```

Group rare levels together into a new ‘rare’ level.

Parameters:: X (DataFrame) – Data to with catgeorical variables to apply rare level grouping to.
Returns:: X – Transformed input X with rare levels grouped for into a new rare level.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.5,
...     rare_level_name="rare_level",
... )

>>> test_df = pl.DataFrame({"a": ["x", "x", "y"], "b": ["w", "z", "z"]})

>>> _ = transformer.fit(test_df)

>>> transformer.transform(test_df)
shape: (3, 2)
┌────────────┬─────┐
│ a          ┆ b   │
│ ---        ┆ --- │
│ str        ┆ str │
╞════════════╪═════╡
│ x          ┆ w   │
│ x          ┆ z   │
│ rare_level ┆ z   │
└────────────┴─────┘

```

class tubular.LowerCaseTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to lower case of text columns.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = LowerCaseTransformer( … columns=[“a”], … ) >>> transformer LowerCaseTransformer(columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'LowerCaseTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> LowerCaseTransformer.from_json(json_dump)
LowerCaseTransformer(columns=['a'])

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Lower case of text in given columns.

Parameters:: X (DataFrame) – Data containing columns to lowercase.
Returns:: X – Transformed input X with text lowercased in given columns.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [“HeLlO”, None, “ HI”]}) >>> transformer = LowerCaseTransformer(columns=”a”) >>> transformer.transform(test_df) shape: (3, 1) ┌───────┐ │ a │ │ — │ │ str │ ╞═══════╡ │ hello │ │ null │ │ hi │ └───────┘

```

class tubular.MappingTransformer(mappings: dict[str, dict[Any, Any]], return_dtypes: dict[str, RETURN_DTYPES] | None = None, **kwargs: bool | None)[source]

Bases: BaseMappingTransformer, BaseMappingTransformMixin

Transformer to map values in columns to other values e.g. to merge two levels into one.

Note, the MappingTransformer does not require ‘self-mappings’ to be defined i.e. if you want to map a value to itself, you can omit this value from the mappings rather than having to map it to itself.

This transformer inherits from BaseMappingTransformMixin as well as the BaseMappingTransformer, BaseMappingTransformer performs standard checks, while BasemappingTransformMixin handles the actual logic.

Parameters:

mappings (dict) – Dictionary containing column mappings. Each value in mappings should be a dictionary of key (column to apply mapping to) value (mapping dict for given columns) pairs. For example the following dict {‘a’: {1: 2, 3: 4}, ‘b’: {‘a’: 1, ‘b’: 2}} would specify a mapping for column a of 1->2, 3->4 and a mapping for column b of ‘a’->1, b->2.
return_dtype (Optional[Dict[str, RETURN_DTYPES]]) – Dictionary of col:dtype for returned columns
**kwargs – Arbitrary keyword arguments passed onto BaseMappingTransformer.init method.

mappings

Dictionary of mappings for each column individually. The dict passed to mappings in init is set to the mappings attribute.

Type:: dict

mappings_from_null

dict storing what null values will be mapped to. Generally best to use an imputer, but this functionality is useful for inverting pipelines.

Type:: dict[str, Any]

return_dtypes

Dictionary of col:dtype for returned columns

Type:: dict[str, RETURN_DTYPES]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> transformer = MappingTransformer( … mappings={“a”: {“Y”: 1, “N”: 0}}, … return_dtypes={“a”: “Int8”}, … ) >>> transformer MappingTransformer(mappings={‘a’: {‘N’: 0, ‘Y’: 1}},

return_dtypes={‘a’: ‘Int8’})

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'mappings': {'a': {'Y': 1, 'N': 0}}, 'return_dtypes': {'a': 'Int8'}}, 'fit': {'is_fitted_': True}}

>>> MappingTransformer.from_json(json_dump)
MappingTransformer(mappings={'a': {'N': 0, 'Y': 1}},
                   return_dtypes={'a': 'Int8'})

```

FITS = False

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Transform the input data X according to the mappings in the mappings attribute dict.

This method calls the BaseMappingTransformMixin.transform. Note, this transform method is different to some of the transform methods in the nominal module, even though they also use the BaseMappingTransformMixin.transform method. Here, if a value does not exist in the mapping it is unchanged.

Parameters:: X (DataFrame) – Data with nominal columns to transform.
Returns:: X – Transformed input X with levels mapped according to mappings dict.
Return type:: DataFrame

Examples

``pycon >>> import polars as pl

>>> transformer = MappingTransformer(
...   mappings={'a': {'Y': 1, 'N': 0}},
...   return_dtypes={"a":"Int8"},
...    )

>>> test_df=pl.DataFrame({'a': ["Y", "N"], 'b': [3,4]})

>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i8  ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 0   ┆ 4   │
└─────┴─────┘

```

class tubular.MeanImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: WeightColumnMixin, BaseImputer

Transformer to impute missing values with the mean of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (mean) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> mean_imputer = MeanImputer( … columns=[“a”, “b”], … ) >>> mean_imputer MeanImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})

>>> _ = mean_imputer.fit(test_df)

>>> json_dump = mean_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}

>>> MeanImputer.from_json(json_dump)
MeanImputer(columns=['a', 'b'])

```

FITS = True

Calculate mean values to impute with from X.

Parameters:

X (DataFrame) – Data to “learn” the mean values from.
y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MeanImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MeanImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Bases: BaseTransformer, WeightColumnMixin, DropOriginalMixin

Convert categorical variables to numeric by mapping levels to the mean response for level.

For a continuous or binary response the categorical columns specified will have values replaced with the mean response for each category.

For an n > 1 level categorical response, up to n binary responses can be created, which in turn can then be used to encode each categorical column specified. This will generate up to n * len(columns) new columns, of with names of the form {column}_{response_level}. The original columns will be removed from the dataframe. This functionality is controlled using the ‘level’ parameter. Note that the above only works for a n > 1 level categorical response. Do not use ‘level’ parameter for a n = 1 level numerical response. In this case, use the standard mean response transformer without the ‘level’ parameter.

If a categorical variable contains null values these will not be transformed.

The same weights and prior are applied to each response level in the multi-level case.

columns

Categorical columns to encode in the input data.

Type:: str or list

weights_column

Weights column to use when calculating the mean response.

Type:: str or None

prior

Regularisation parameter, can be thought of roughly as the size a category should be in order for its statistics to be considered reliable (hence default value of 0 means no regularisation).

Type:: int, default = 0

level

Parameter to control encoding against a multi-level categorical response. If None the response will be treated as binary or continuous, if ‘all’ all response levels will be encoded against and if it is a list of levels then only the levels specified will be encoded against.

Type:: str, int, float, list or None, default = None

response_levels

Only created in the multi-level case. Generated from level, list of all the response levels to encode against.

Type:: list

mappings

Created in fit. A nested Dict of {column names : column specific mapping dictionary} pairs. Column specific mapping dictionaries contain {initial value : mapped value} pairs.

Type:: dict

mapped_columns

Only created in the multi-level case. A list of the new columns produced by encoded the columns in self.columns against multiple response levels, of the form {column}_{level}.

Type:: list

transformer_dict

Only created in the multi-level case. A dictionary of the form level : transformer containing the mean response transformers for each level to be encoded against.

Type:: dict

unseen_levels_encoding_dict

Dict containing the values (based on chosen unseen_level_handling) derived from the encoded columns to use when handling unseen levels in data passed to transform method.

Type:: dict

return_type

What type to cast return column as. Defaults to float32.

Type:: Literal[‘float32’, ‘float64’]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> transformer
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})

>>> _ = transformer.fit(test_df[["a"]], test_df["b"])

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 1, 'level': None, 'unseen_level_handling': 'mean', 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.25, 'y': 0.75}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a'], 'unseen_levels_encoding_dict': {'a': 0.5}}}
>>> MeanResponseTransformer.from_json(json_dump)
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

FITS = True

Identify mapping of categorical levels to mean response values.

If the user specified the weights_column arg in when initialising the transformer the weighted mean response will be calculated using that column.

In the multi-level case this method learns which response levels are present and are to be encoded against.

Parameters:

X (DataFrame) – Data to with catgeorical variable columns to transform and also containing response_column column.
y (Series or LazyFrame) – Response variable or target.

Returns:

MeanResponseTransformer

Return type:

fitted class instance

Raises:

ValueError – if y contains null values:

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})

>>> transformer.fit(test_df, test_df["target"])
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )

>>> transformer.get_feature_names_out()
['a']

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level=["x", "y"],
...     unseen_level_handling="mean",
... )

>>> transformer.get_feature_names_out()
['a_x', 'a_y']

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level="all",
...     unseen_level_handling="mean",
... )

>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...

>>> test_df = pl.DataFrame({"a": ["x", "y", "x"], "b": ["cat", "dog", "rat"]})

>>> _ = transformer.fit(test_df, test_df["b"])

>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(columns=["a"])

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})

>>> _ = transformer.fit(test_df[["a"]], test_df["b"])

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 0, 'level': None, 'unseen_level_handling': None, 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.0, 'y': 1.0}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a']}}

```

Apply mean response encoding stored in the mappings attribute to columns.

Parameters:: X (DataFrame) – Data with nominal columns to transform.
Returns:: X – Transformed input X with levels mapped according to mappings dict.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> # example with no prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=0, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})

>>> _ = transformer.fit(test_df, test_df["target"])

>>> transformer.transform(test_df)
shape: (2, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ target │
│ --- ┆ --- ┆ ---    │
│ f32 ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ 0.0 ┆ 1   ┆ 0      │
│ 1.0 ┆ 2   ┆ 1      │
└─────┴─────┴────────┘

# example with prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=1, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})

>>> _ = transformer.fit(test_df, test_df["target"])

>>> transformer.transform(test_df)
shape: (2, 3)
┌──────┬─────┬────────┐
│ a    ┆ b   ┆ target │
│ ---  ┆ --- ┆ ---    │
│ f32  ┆ i64 ┆ i64    │
╞══════╪═════╪════════╡
│ 0.25 ┆ 1   ┆ 0      │
│ 0.75 ┆ 2   ┆ 1      │
└──────┴─────┴────────┘

```

class tubular.MedianImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the median of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (median) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> median_imputer = MedianImputer( … columns=[“a”, “b”], … ) >>> median_imputer MedianImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})

>>> _ = median_imputer.fit(test_df)

>>> json_dump = median_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MedianImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}

>>> MedianImputer.from_json(json_dump)
MedianImputer(columns=['a', 'b'])

```

FITS = True

Calculate median values to impute with from X.

Parameters:

X (DataFrame) – Data to “learn” the median values from.
y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MedianImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MedianImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.ModeImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the mode of the supplied columns.

If mode is NaN, a warning will be raised.

impute_values_

Created during fit method. Dictionary of float / int (mode) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:: dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> mode_imputer = ModeImputer( … columns=[“a”, “b”], … ) >>> mode_imputer ModeImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised

>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})

>>> _ = mode_imputer.fit(test_df)

>>> json_dump = mode_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ModeImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0, 'b': 1}}}

>>> ModeImputer.from_json(json_dump)
ModeImputer(columns=['a', 'b'])

```

FITS = True

Calculate mode values to impute with from X.

In the event of a tie, the highest modal value will be returned.

Parameters:

X (DataFrame) – Data to “learn” the mode values from.
y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance

Return type:

ModeImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = ModeImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 3 │ │ 2 ┆ 4 │ │ 2 ┆ 4 │ └─────┴─────┘

```

jsonable = True

lazyframe_compatible = True

polars_compatible = True

class tubular.NullIndicator(columns: ]] | str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Class to create a binary indicator column for null values.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> null_indicator = NullIndicator( … columns=[“a”, “b”], … ) >>> null_indicator NullIndicator(columns=[‘a’, ‘b’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = null_indicator.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'NullIndicator', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True}, 'fit': {'is_fitted_': True}}

>>> NullIndicator.from_json(json_dump)
NullIndicator(columns=['a', 'b'])

```

FITS = False

jsonable = True

lazyframe_compatible = True

polars_compatible = True

Create new columns indicating the position of null values for each variable in self.columns.

Parameters:: X (DataFrame) – Data to add indicators to.
Returns:: dataframe with null indicator columns added
Return type:: DataFrame

Examples

——–, ```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = NullIndicator(columns=[“a”, “b”]) >>> imputer.transform(test_df) shape: (3, 4) ┌──────┬──────┬─────────┬─────────┐ │ a ┆ b ┆ a_nulls ┆ b_nulls │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ bool │ ╞══════╪══════╪═════════╪═════════╡ │ 1 ┆ 3 ┆ false ┆ false │ │ null ┆ null ┆ true ┆ true │ │ 2 ┆ 4 ┆ false ┆ false │ └──────┴──────┴─────────┴─────────┘

```

class tubular.OneDKmeansTransformer(columns: str | ~typing.Annotated[list[str], beartype.vale.Is[lambda list_arg: ...]], new_column_name: str, n_init: str | int = 'auto', n_clusters: int = 8, drop_original: bool = False, kmeans_kwargs: dict[str, object] | None = None, **kwargs: bool)[source]

Bases: BaseNumericTransformer, DropOriginalMixin

Generates a new column based on kmeans algorithm.

Transformer runs the kmeans algorithm based on given number of clusters and then identifies the bins’ cuts based on the results. Finally it passes them into the a cut function.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”new”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … ) OneDKmeansTransformer(columns=[‘a’], kmeans_kwargs={‘random_state’: 42},

n_clusters=2, new_column_name=’new’)

```

FITS = True

fit(X: FrameT, y: IntoSeriesT | None = None) → OneDKmeansTransformer[source]

Fit transformer to input data.

Parameters:

X (pd/pl.DataFrame) – Dataframe with columns to learn scaling values from.
y (None) – Required for pipeline.

Returns:

Fitted class instance.

Return type:

OneDKmeansTransformer

Raises:

ValueError: – if columns in X contain missing values.

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )

>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})

>>> transformer.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”kmeans_column”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … )

>>> transformer.get_feature_names_out()
['kmeans_column']

```

jsonable = True

lazyframe_compatible = False

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

>>> import polars as pl
>>> x = OneDKmeansTransformer(
... columns='a',
... n_clusters=2,
... new_column_name="new",
... drop_original=False,
... kmeans_kwargs={"random_state": 42},
...    )
>>> test_df=pl.DataFrame({'a': [1,2,3,4],  'b': [5,6,7,8]})
>>> x.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')
>>> x.to_json()
{'tubular_version': ..., 'classname': 'OneDKmeansTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'new', 'n_init': 'auto', 'n_clusters': 2, 'drop_original': False, 'kmeans_kwargs': {'random_state': 42}}, 'fit': {'is_fitted_': True, 'bins': [3, 4]}}

transform(X: FrameT) → FrameT[source]

Generate from input pd/pl.DataFrame (X) bins based on Kmeans results and add this column or columns in X.

Parameters:: X (pl/pd.DataFrame) – Data to transform.
Returns:: X – Input X with additional cluster column added.
Return type:: pl/pd.DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )

>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})

>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ new │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 0   │
│ 2   ┆ 6   ┆ 0   │
│ 3   ┆ 7   ┆ 0   │
│ 4   ┆ 8   ┆ 1   │
└─────┴─────┴─────┘

```

class tubular.OneHotEncodingTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, wanted_values: dict[str, ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]]] | None = None, separator: str = '_', drop_original: bool = False, **kwargs: bool)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer to convert categorical variables into dummy columns.

separator

Separator used in naming for dummy columns.

Type:: str

drop_original

Should original columns be dropped after creating dummy fields?

Type:: bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> transformer
OneHotEncodingTransformer(columns=['a'])

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})

>>> _ = transformer.fit(test_df)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}

>>> OneHotEncodingTransformer.from_json(json_dump)
OneHotEncodingTransformer(columns=['a'])

```

FITS = True

MAX_LEVELS = 100

Get list of levels for each column to be transformed.

This defines which dummy columns will be created in transform.

Parameters:

X (DataFrame) – Data to identify levels from.
y (None) – Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Returns:

OneHotEncodingTransformer

Return type:

fitted class instance

Raises:

ValueError – if column has >100 levels:

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})

>>> transformer.fit(test_df)
OneHotEncodingTransformer(columns=['a'])

```

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
...     wanted_values={"a": ["cat", "dog"]},
... )

>>> transformer.get_feature_names_out()
['a_cat', 'a_dog']

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )

>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...

>>> test_df = pl.DataFrame({"a": ["cat", "dog", "rat"]})

>>> _ = transformer.fit(test_df)

>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(columns=["a"])

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})

>>> _ = transformer.fit(test_df)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}

```

Create new dummy columns from categorical fields.

Parameters:

X (DataFrame) – Data to apply one hot encoding to.
return_native_override (Optional[bool]) – controls whether transformer returns narwhals or native type.
return_native_override
transformer (option to override return_native attr in)
parent (useful when calling)
methods

Returns:

X_transformed – Transformed input X with dummy columns derived from categorical columns added. If drop_original = True then the original categorical columns that the dummies are created from will not be in the output X.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})

>>> _ = transformer.fit(test_df)

>>> transformer.transform(test_df)
shape: (2, 4)
┌─────┬─────┬───────┬───────┐
│ a   ┆ b   ┆ a_x   ┆ a_y   │
│ --- ┆ --- ┆ ---   ┆ ---   │
│ str ┆ i64 ┆ bool  ┆ bool  │
╞═════╪═════╪═══════╪═══════╡
│ x   ┆ 1   ┆ true  ┆ false │
│ y   ┆ 2   ┆ false ┆ true  │
└─────┴─────┴───────┴───────┘

```

class tubular.OutOfRangeNullTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, dtype: ]] = 'Float64', **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to set values outside of a range to null.

This transformer sets the cut off values in the same way as the CappingTransformer. So either the user can specify them directly in the capping_values argument or they can be calculated in the fit method, if the user supplies the quantiles argument.

Attributes:

capping_valuesdict[str, CappingValues] or None: Capping values to apply to each column, capping_values argument.
quantilesdict[str, CappingValues] or None: Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.
quantile_capping_valuesdict[str, CappingValues] or None: Capping values learned from quantiles (if provided) to apply to each column.
weights_columnstr or None: weights_column argument.
_replacement_valuesdict[str, CappingValues]: Replacement values when capping is applied. This will contain nulls for each column.
built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> transformer
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

# transform method is inherited so also demo that here >>> test_df = pl.DataFrame()

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.transform(test_df)
shape: (4, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ f64  ┆ f64  ┆ i64 │
╞══════╪══════╪═════╡
│ null ┆ null ┆ 1   │
│ 15.0 ┆ 2.0  ┆ 2   │
│ 18.0 ┆ null ┆ 3   │
│ null ┆ 1.0  ┆ 4   │
└──────┴──────┴─────┘

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OutOfRangeNullTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}

>>> OutOfRangeNullTransformer.from_json(json_dump)
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:

X (DataFrame) – A dataframe with required columns to be capped.
y (None) – Required for pipeline.

Returns:

OutOfRangeNullTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})

>>> transformer.fit(test_df)
OutOfRangeNullTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

static set_replacement_values(capping_values: dict[str, list[int | float | None]]) → dict[str, list[bool | None]][source]

Set the _replacement_values to have all null values.

Keeps the existing keys in the _replacement_values dict and sets all values (except None) in the lists to np.NaN. Any None values remain in place.

Returns:: replacement_values
Return type:: replacement values for OutOfRangeNullTransformer

Examples

```pycon >>> import polars as pl

>>> capping_values = {"a": [0.1, 0.2], "b": [None, 10]}

>>> OutOfRangeNullTransformer.set_replacement_values(capping_values)
{'a': [None, None], 'b': [False, None]}

```

class tubular.RatioTransformer(columns: ]], return_dtype: ]] = 'Float32', **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs division operation between two columns.

This transformer allows performing division between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The first column is the numerator, and the second column is the denominator.

Type:: ListOfTwoStrs

return_dtype

The dtype of the resulting column, either ‘Float32’ or ‘Float64’.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> transformer.columns [‘a’, ‘b’] >>> transformer.return_dtype ‘Float32’

```

FITS = False

get_feature_names_out() → list[str][source]

Get the names of the output features.

Returns:: List containing the name of the new column created by the transformation.
Return type:: list[str]

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> ratio_transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> ratio_transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘RatioTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘return_dtype’: ‘Float32’}, ‘fit’: {’is_fitted_’: True}}

```

Transform the DataFrame by applying the division operation between two columns.

Parameters:: X (DataFrame) – DataFrame containing the columns to operate on.
Returns:: Transformed DataFrame with the new column containing the division results.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬────────────────┐ │ a ┆ b ┆ a_divided_by_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ f32 │ ╞═════╪═════╪════════════════╡ │ 100 ┆ 80 ┆ 1.25 │ │ 200 ┆ 150 ┆ 1.333333 │ │ 300 ┆ 200 ┆ 1.5 │ └─────┴─────┴────────────────┘

```

class tubular.RemoveCharactersTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], characters: list[str], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to remove characters from text columns.

characters

list of characters to remove from text columns.

Type:: list[str]

characters_formatted

characters attr formatted into regex string.

Type:: str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:: bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”\d”]) >>> transformer RemoveCharactersTransformer(characters=[’\d’], columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['\\d'],
          'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> RemoveCharactersTransformer.from_json(json_dump)
RemoveCharactersTransformer(characters=['\\d'], columns=['a'])

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”, “b”], characters=[“a”])

>>> pprint(transformer.to_json())
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['a'],
          'columns': ['a', 'b'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

Strip unwanted characters from specified columns.

Parameters:: X (DataFrame) – Data containing columns to strip.
Returns:: X – Transformed input X with characters stripped from specified columns.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [” 8hi!”, None, “9999hello “]}) >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”W”, “s”]) >>> transformer.transform(test_df) shape: (3, 1) ┌───────────┐ │ a │ │ — │ │ str │ ╞═══════════╡ │ 8hi │ │ null │ │ 9999hello │ └───────────┘

```

class tubular.RenameColumnsTransformer(columns: ]] | str, new_column_names: dict[str, str], drop_original: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, DropOriginalMixin

Transformer to rename a given set of columns.

This can be useful for personalising the auto-output names from other transformers, or for creating a few different versions of a given column to undergo separate paths of logic in a pipeline (as the expression logic effectively creates duplicates of the column).

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> transformer RenameColumnsTransformer(columns=[‘a’], new_column_names={‘a’: ‘new_a’})

>>> # transformer can also be dumped to json and reinitialised

>>> json_dump = transformer.to_json()
>>> pprint(json_dump, sort_dicts=True)
{'classname': 'RenameColumnsTransformer',
 'fit': {'is_fitted_': True},
 'init': {'columns': ['a'],
          'copy': False,
          'drop_original': True,
          'new_column_names': {'a': 'new_a'},
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

>>> RenameColumnsTransformer.from_json(json_dump)
RenameColumnsTransformer(columns=['a'], new_column_names={'a': 'new_a'})

```

FITS = False

get_feature_names_out() → list[str][source]

List features modified/created by the transformer.

Returns:: list of features modified/created by the transformer
Return type:: list[str]

Examples

```pycon >>> transformer = RenameColumnsTransformer( … columns=[“a”, “b”], … new_column_names={“a”: “new_a”, “b”: “new_b”}, … )

>>> transformer.get_feature_names_out()
['new_a', 'new_b']

```

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘RenameColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘drop_original’: True, ‘new_column_names’: {‘a’: ‘new_a’}, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

Create column copies.

Parameters:: X (DataFrame) – Data to apply mappings to.
Returns:: X – Transformed input X with columns set to value.
Return type:: DataFrame
Raises:: ValueError – if new_column_names values are already present in X:

Examples

```pycon >>> import polars as pl

>>> transformer = RenameColumnsTransformer(
...     columns="a", new_column_names={"a": "new_a"}
... )  # noqa: E501

>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬───────┐
│ b   ┆ new_a │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 4   ┆ 1     │
│ 5   ┆ 2     │
│ 6   ┆ 3     │
└─────┴───────┘

```

Bases: BaseTransformer

Transformer to set value of column(s) to a given value.

This should be used if columns need to be set to a constant value.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:: bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:: bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:: bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:: bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:: bool

Examples

```pycon >>> SetValueTransformer(columns=”a”, value=1) SetValueTransformer(columns=[‘a’], value=1)

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = SetValueTransformer(columns=”a”, value=1) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘SetValueTransformer’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘value’: 1}, ‘fit’: {’is_fitted_’: True}}

```

Set columns to value.

Parameters:: X (DataFrame) – Data to apply mappings to.
Returns:: X – Transformed input X with columns set to value.
Return type:: DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = SetValueTransformer(columns="a", value=1)

>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 1   ┆ 5   │
│ 1   ┆ 6   │
└─────┴─────┘

```

class tubular.ToDatetimeTransformer(columns: str | list[str], time_format: str | None = None, **kwargs: bool)[source]

Bases: BaseTransformer

Class to transform convert specified columns to datetime.

Class simply uses the pd.to_datetime method on the specified columns.

Attributes:

built_from_json: bool: indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform
polars_compatiblebool: class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework
jsonable: bool: class attribute, indicates if transformer supports to/from_json methods
FITS: bool: class attribute, indicates whether transform requires fit to be run first
lazyframe_compatible: bool: class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = ToDatetimeTransformer( … columns=”a”, … time_format=”%d/%m/%Y”, … ) >>> transformer ToDatetimeTransformer(columns=[‘a’], time_format=’%d/%m/%Y’)

>>> # version will vary for local vs CI, so use ... as generic match
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:: jsonified transformer. Nested dict containing levels for attributes set at init and fit.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = ToDatetimeTransformer(columns=”a”, time_format=”%d/%m/%Y”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

Convert specified column to datetime using pd.to_datetime.

Parameters:: X (DataFrame) – Data with column to transform.
Returns:: dataframe with provided columns converted to datetime
Return type:: DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = ToDatetimeTransformer(
...     columns="a",
...     time_format="%d/%m/%Y",
... )

>>> test_df = pl.DataFrame({"a": ["01/02/2020", "10/12/1996"], "b": [1, 2]})

>>> transformer.transform(test_df)
shape: (2, 2)
┌─────────────────────┬─────┐
│ a                   ┆ b   │
│ ---                 ┆ --- │
│ datetime[μs]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2020-02-01 00:00:00 ┆ 1   │
│ 1996-12-10 00:00:00 ┆ 2   │
└─────────────────────┴─────┘

```

class tubular.WhenThenOtherwiseTransformer(columns: ]], when_column: str, then_column: str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to apply conditional logic across multiple columns.

This transformer evaluates specified columns against a condition and updates with given values based on the results.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:: bool

FITS

Indicates whether transform requires fit to be run first.

Type:: bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:: bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:: bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], when_column=”condition_col”, then_column=”update_col” … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

FITS = False

get_transform_exprs() → list[Expr][source]

Get transform expressions.

Returns:: list[nw.Expr]
Return type:: transform expressions for class

jsonable = True

lazyframe_compatible = True

polars_compatible = True

to_json() → dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:: JSON representation of the transformer, including init parameters.
Return type:: dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, # noqa: E501 … ) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘WhenThenOtherwiseTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘copy’: False, ‘return_native’: True, ‘then_column’: ‘update_col’, ‘verbose’: False, ‘when_column’: ‘condition_col’},

‘tubular_version’: …}

```

Apply conditional logic to transform specified columns.

Parameters:: X (DataFrame) – DataFrame containing the columns to be transformed.
Returns:: Transformed DataFrame with updated columns based on conditions.
Return type:: DataFrame
Raises:: TypeError – If the when_column is not of type Boolean or if columns have mismatched types.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

tubular package

Submodules

tubular.base module

Attributes:

Example:

Returns:

Example:

tubular.capping module

Attributes:

Example:

Attributes:

Example:

tubular.comparison module

tubular.dates module

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

tubular.imputers module

tubular.mapping module

tubular.misc module

tubular.mixins module

tubular.nominal module

tubular.numeric module

Args:

Returns:

tubular.strings module

Module contents

Attributes:

Example:

Attributes:

Example:

Returns:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example:

Attributes:

Example: