tubular package

Submodules

tubular.base module

Contains transformers that other transformers in the package inherit from.

These transformers contain key checks to be applied in all cases.

class tubular.base.BaseTransformer(columns: ]] | str, copy: bool = False, verbose: bool = False, return_native: bool = True)[source]

Bases: BaseEstimator, TransformerMixin

Base transformer class which all other transformers in the package inherit from.

Provides fit and transform methods (required by sklearn transformers), simple input checking and functionality to copy X prior to transform.

Attributes:

columnslist

Either a list of str values giving which columns in a input pandas.DataFrame the transformer will be applied to.

copybool

Should X be copied before transforms are applied? Copy argument no longer used and will be deprecated in a future release

verbosebool

Print statements to show which methods are being run or not.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

return_native: bool, default = True

Controls whether transformer returns narwhals or native pandas/polars type

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BaseTransformer( … columns=”a”, … ) BaseTransformer(columns=[‘a’])

```

FITS = True
check_is_fitted(attribute: str) None[source]

Check if particular attributes are on the object.

This is useful to do before running transform to avoid trying to transform data without first running the fit method.

Wrapper for utils.validation.check_is_fitted function.

Parameters:

attribute (List) – List of str values giving names of attribute to check exist on self.

Example

```pycon >>> transformer = BaseTransformer( … columns=”a”, … )

>>> transformer.check_is_fitted("columns")

```

classname() str[source]

Return the name of the current class when called.

Returns:

str

Return type:

name of class

columns_check(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) None[source]

Check that the columns attribute is set and all values are present in X.

Parameters:

X (DataFrame) – Data to check columns are in.

Raises:

ValueError – if columns missing from dataframe:

Examples

```pycon >>> import polars as pl >>> transformer = BaseTransformer( … columns=”a”, … )

>>> df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
>>> transformer.columns_check(df)

```

fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) BaseTransformer[source]

Check data before fit.

Fit calls the columns_check method which will check that the columns attribute is set and all values are present in X

Parameters:
  • X (DataFrame) – Data to fit the transformer on.

  • y (None or Series or LazyFrame, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

BaseTransformer

Return type:

returns self

Examples

```pycon >>> import polars as pl >>> transformer = BaseTransformer( … columns=”a”, … ) >>> df = pl.DataFrame({“a”: [1, 2], “b”: [3, 4]}) >>> transformer.fit(df) BaseTransformer(columns=[‘a’])

```

classmethod from_json(json: dict[str, Any]) BaseTransformer[source]

Rebuild transformer from json dict, readyfor transform.

Parameters:

json (dict[str, dict[str, Any]]) – json-ified transformer

Returns:

reconstructed transformer class, ready for transform

Return type:

BaseTransformer

Raises:

RuntimeError – if transformer does not have to/from json: functionality enabled

Examples

```pycon >>> json_dict = {“init”: {“columns”: [“a”, “b”]}, “fit”: {}}

>>> BaseTransformer.from_json(json=json_dict)
BaseTransformer(columns=['a', 'b'])

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Child classes will need to overload this method if their behaviour is more complex than just returning the input columns.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = BaseTransformer( … columns=”a”, … )

>>> transformer.get_feature_names_out()
['a']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
set_transform_request(*, return_native_override: bool | None | str = '$UNCHANGED$') BaseTransformer

Configure whether metadata should be requested to be passed to the transform method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

return_native_override (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_native_override parameter in transform.

Returns:

self – The updated object.

Return type:

object

to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Raises:

RuntimeError – if transformer does not have to/from json functionality: enabled

Examples

```pycon >>> transformer = BaseTransformer(columns=[“a”, “b”])

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'BaseTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True}, 'fit': {'is_fitted_': False}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Check data before child transform.

Transform calls the columns_check method which will check columns in columns attribute are in X.

Parameters:
  • X (DataFrame) – Data to transform with the transformer.

  • return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Input X, copied if specified by user.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = BaseTransformer( … columns=”a”, … )

>>> df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
>>> transformer.transform(df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

```

class tubular.base.DataFrameMethodTransformer(**kwargs)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer that applies a pandas.DataFrame method.

Transformer assigns the output of the method to a new column or columns. It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.DataFrame method being called.

Be aware it is possible to supply incompatible arguments to init that will only be identified when transform is run. This is because there are many combinations of method, input and output sizes. Additionally some methods may only work as expected when called in transform with specific key word arguments.

new_column_names

The name of the column or columns to be assigned to the output of running the pandas method in transform.

Type:

str or list of str

pd_method_name

The name of the pandas.DataFrame method to call.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Transform input data.

Uses the given pandas.DataFrame method and assign the output back to column or columns in X.

Any keyword arguments set in the pd_method_kwargs attribute are passed onto the pandas DataFrame method when calling it.

Parameters:

X (pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional column or columns (self.new_column_names) added. These contain the output of running the pandas DataFrame method.

Return type:

pd.DataFrame

tubular.base.register(cls: BaseTransformer) BaseTransformer[source]

Add transformer to registry dict.

Returns:

cls - transformer

Example:

```pycon >>> @register … class MyTransformer(BaseTransformer): … pass … >>> CLASS_REGISTRY[“MyTransformer”] <class ‘tubular.base.MyTransformer’>

```

tubular.capping module

Contains transformers that apply capping to numeric columns.

class tubular.capping.BaseCappingTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseNumericTransformer, WeightColumnMixin

Base class for capping transformers, contains functionality shared across capping transformer classes.

capping_values

Capping values to apply to each column, capping_values argument.

Type:

dict[str, CappingValues] or None

quantiles

Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.

Type:

dict[str, CappingValues] or None

quantile_capping_values

Capping values learned from quantiles (if provided) to apply to each column.

Type:

dict[str, CappingValues] or None

weights_column

weights_column argument.

Type:

str or None

_replacement_values

Replacement values when capping is applied. Will be a copy of capping_values.

Type:

dict[str, CappingValues]

built_from_json
Type:

bool

indicates if transformer was reconstructed from json, which limits it's supported
functionality to .transform
polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

FITS = True
check_capping_values_dict(capping_values_dict: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]], dict_name: str) None[source]

Check passed dictionary.

Parameters:
  • capping_values_dict (dict[str, float]) – dict of form {column_name: [lower_cap, upper_cap]}

  • dict_name (str) – ‘capping_values’ or ‘quantiles’

Raises:

ValueError – if capping values are invalid, e.g. lower_cap>upper_cap:

Examples

```pycon >>> transformer = BaseCappingTransformer( … capping_values={“a”: [10, 20], “b”: [1, 3]}, … )

>>> transformer.check_capping_values_dict(transformer.capping_values, "capping_values")

```

fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) BaseCappingTransformer[source]

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the quantile_capping_values and replacement_values attributes.

Parameters:
  • X (DataFrame) – A dataframe with required columns to be capped.

  • y (Series or LazyFrame or None. Defaults to None) – Required for pipeline.

Returns:

BaseCappingTransformer

Return type:

fitted instance of class

Examples

```pycon >>> import polars as pl

>>> transformer = BaseCappingTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> test_target = pl.Series(name="target", values=[5, 6, 7, 8])
>>> transformer.fit(test_df, test_target)
BaseCappingTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[source]

Return a JSON-serializable representation of the transformer.

Return type:

dict

Dictionary containing all necessary attributes to recreate the transformer with from_json. Keys include ‘init’ (initialization parameters) and ‘fit’ (fitted values).

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply capping to columns in X.

If cap_value_max is set, any values above cap_value_max will be set to cap_value_max. If cap_value_min is set any values below cap_value_min will be set to cap_value_min. Only works or numeric columns.

Parameters:
  • X (DataFrame) – Data to apply capping to.

  • return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Transformed input X with min and max capping applied to the specified columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = BaseCappingTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 10  ┆ 3   ┆ 1   │
│ 15  ┆ 2   ┆ 2   │
│ 18  ┆ 3   ┆ 3   │
│ 20  ┆ 1   ┆ 4   │
└─────┴─────┴─────┘

```

weighted_quantile(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, quantiles: list[int | float], values_column: str, weights_column: str) list[int | float | None][source]

Calculate weighted quantiles.

This method is adapted from the “Completely vectorized numpy solution” answer from user Alleo (https://stackoverflow.com/users/498892/alleo) to the following stackoverflow question; https://stackoverflow.com/questions/21844024/weighted-percentile-using-numpy. This method is also licenced under the CC-BY-SA terms, as the original code sample posted to stackoverflow (pre February 1, 2016) was.

Method is similar to numpy.percentile, but supports weights. Supplied quantiles should be in the range [0, 1]. Method calculates cumulative % of weight for each observation, then interpolates between these observations to calculate the desired quantiles. Null values in the observations (values) and 0 weight observations are filtered out before calculating.

Parameters:
  • X (DataFrame) – Dataframe with relevant columns to calculate quantiles from.

  • quantiles (list[Number]) – Weighted quantiles to calculate. Must all be between 0 and 1.

  • values_column (str) – name of relevant values column in data

  • weights_column (str) – name of relevant weight column in data

Returns:

interp_quantiles – List containing computed quantiles.

Return type:

list[Number]

Examples

```pycon >>> import polars as pl >>> x = CappingTransformer(capping_values={“a”: [2, 10]}) >>> df = pl.DataFrame({“a”: [1, 2, 3], “weight”: [1, 1, 1]}) >>> quantiles_to_compute = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] >>> computed_quantiles = x.weighted_quantile( … X=df, values_column=”a”, weights_column=”weight”, quantiles=quantiles_to_compute … ) >>> [round(q, 1) for q in computed_quantiles] [1.0, 1.0, 1.0, 1.0, 1.2, 1.5, 1.8, 2.1, 2.4, 2.7, 3.0]

>>> df = pl.DataFrame({"a": [1, 2, 3], "weight": [0, 1, 0]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=quantiles_to_compute
... )
>>> [round(q, 1) for q in computed_quantiles]
[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]
>>> df = pl.DataFrame({"a": [1, 2, 3], "weight": [1, 1, 0]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=quantiles_to_compute
... )
>>> [round(q, 1) for q in computed_quantiles]
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0]
>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5], "weight": [1, 1, 1, 1, 1]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=quantiles_to_compute
... )
>>> [round(q, 1) for q in computed_quantiles]
[1.0, 1.0, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]
>>> df = pl.DataFrame({"a": [1, 2, 3, 4, 5], "weight": [1, 0, 1, 0, 1]})
>>> computed_quantiles = x.weighted_quantile(
...     X=df, values_column="a", weights_column="weight", quantiles=[0, 0.5, 1.0]
... )
>>> [round(q, 1) for q in computed_quantiles]
[1.0, 2.0, 5.0]

```

class tubular.capping.CappingTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to cap numeric values at both or either minimum and maximum values.

For max capping any values above the cap value will be set to the cap. Similarly for min capping any values below the cap will be set to the cap. Only works for numeric columns.

Attributes:

capping_valuesdict[str, CappingValues] or None

Capping values to apply to each column, capping_values argument.

quantilesdict[str, CappingValues] or None

Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.

quantile_capping_valuesdict[str, CappingValues] or None

Capping values learned from quantiles (if provided) to apply to each column.

weights_columnstr or None

weights_column argument.

_replacement_valuesdict[str, CappingValues]

Replacement values when capping is applied. Will be a copy of capping_values.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 10  ┆ 3   ┆ 1   │
│ 15  ┆ 2   ┆ 2   │
│ 18  ┆ 3   ┆ 3   │
│ 20  ┆ 1   ┆ 4   │
└─────┴─────┴─────┘
>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'CappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}
>>> CappingTransformer.from_json(json_dump)
CappingTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) CappingTransformer[source]

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:
  • X (DataFrame) – A dataframe with required columns to be capped.

  • y (None) – Required for pipeline.

Returns:

CappingTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.fit(test_df)
CappingTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.capping.OutOfRangeNullTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to set values outside of a range to null.

This transformer sets the cut off values in the same way as the CappingTransformer. So either the user can specify them directly in the capping_values argument or they can be calculated in the fit method, if the user supplies the quantiles argument.

Attributes:

capping_valuesdict[str, CappingValues] or None

Capping values to apply to each column, capping_values argument.

quantilesdict[str, CappingValues] or None

Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.

quantile_capping_valuesdict[str, CappingValues] or None

Capping values learned from quantiles (if provided) to apply to each column.

weights_columnstr or None

weights_column argument.

_replacement_valuesdict[str, CappingValues]

Replacement values when capping is applied. This will contain nulls for each column.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> transformer
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

# transform method is inherited so also demo that here >>> test_df = pl.DataFrame()

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.transform(test_df)
shape: (4, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64 │
╞══════╪══════╪═════╡
│ null ┆ null ┆ 1   │
│ 15   ┆ 2    ┆ 2   │
│ 18   ┆ null ┆ 3   │
│ null ┆ 1    ┆ 4   │
└──────┴──────┴─────┘
>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OutOfRangeNullTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}
>>> OutOfRangeNullTransformer.from_json(json_dump)
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) OutOfRangeNullTransformer[source]

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:
  • X (DataFrame) – A dataframe with required columns to be capped.

  • y (None) – Required for pipeline.

Returns:

OutOfRangeNullTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.fit(test_df)
OutOfRangeNullTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
static set_replacement_values(capping_values: dict[str, list[int | float | None]]) dict[str, list[bool | None]][source]

Set the _replacement_values to have all null values.

Keeps the existing keys in the _replacement_values dict and sets all values (except None) in the lists to np.NaN. Any None values remain in place.

Returns:

replacement_values

Return type:

replacement values for OutOfRangeNullTransformer

Examples

```pycon >>> import polars as pl

>>> capping_values = {"a": [0.1, 0.2], "b": [None, 10]}
>>> OutOfRangeNullTransformer.set_replacement_values(capping_values)
{'a': [None, None], 'b': [False, None]}

```

tubular.comparison module

module for comparing and conditionally updating provided columns.

class tubular.comparison.CompareTwoColumnsTransformer(columns: ]], condition: ]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to compare two columns and generate outcomes based on conditions.

This transformer evaluates a condition between two columns and generates an outcome based on the result.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:

bool

FITS

Indicates whether transform requires fit to be run first.

Type:

bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:

bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:

bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

FITS = False
jsonable = True
lazyframe_compatible = True
ops_map: ClassVar[dict[ConditionEnum, Any]] = {ConditionEnum.EQUAL_TO: <built-in function eq>, ConditionEnum.GREATER_THAN: <built-in function gt>, ConditionEnum.LESS_THAN: <built-in function lt>, ConditionEnum.NOT_EQUAL_TO: <built-in function ne>}
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=ConditionEnum.GREATER_THAN.value, … ) >>> json_dict = transformer.to_json() >>> from pprint import pprint >>> pprint(json_dict, sort_dicts=True) {‘classname’: ‘CompareTwoColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘condition’: ‘>’, ‘copy’: False, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform two columns based on a condition to generate an outcome.

Parameters:

X (DataFrame) – DataFrame containing the columns to be transformed.

Returns:

Transformed DataFrame with the new outcome column.

Return type:

DataFrame

Raises:

TypeError – If the columns are not of a numeric type.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

class tubular.comparison.ConditionEnum(*values)[source]

Bases: Enum

Enumeration of comparison conditions.

EQUAL_TO = '=='
GREATER_THAN = '>'
LESS_THAN = '<'
NOT_EQUAL_TO = '!='
class tubular.comparison.EqualityChecker(**kwargs)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer to check if two columns are equal.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
get_feature_names_out() list[str][source]

Get list of features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> # base classes just return inputs >>> transformer = EqualityChecker( … columns=[“a”, “b”], … new_column_name=”bla”, … )

>>> transformer.get_feature_names_out()
['bla']

```

jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: pd.DataFrame) pd.DataFrame[source]

Create a column which indicated equality between given columns.

Parameters:

X (pd.DataFrame) – Data to apply mappings to.

Returns:

X – Transformed input X with additional boolean column.

Return type:

pd.DataFrame

class tubular.comparison.WhenThenOtherwiseTransformer(columns: ]], when_column: str, then_column: str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to apply conditional logic across multiple columns.

This transformer evaluates specified columns against a condition and updates with given values based on the results.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:

bool

FITS

Indicates whether transform requires fit to be run first.

Type:

bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:

bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:

bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], when_column=”condition_col”, then_column=”update_col” … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, # noqa: E501 … ) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘WhenThenOtherwiseTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘copy’: False, ‘return_native’: True, ‘then_column’: ‘update_col’, ‘verbose’: False, ‘when_column’: ‘condition_col’},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply conditional logic to transform specified columns.

Parameters:

X (DataFrame) – DataFrame containing the columns to be transformed.

Returns:

Transformed DataFrame with updated columns based on conditions.

Return type:

DataFrame

Raises:

TypeError – If the when_column is not of type Boolean or if columns have mismatched types.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

tubular.dates module

Contains transformers for working with date columns.

class tubular.dates.BaseDatetimeTransformer(columns: list[str] | str, new_column_name: str, drop_original: bool = False, **kwargs: bool | None)[source]

Bases: BaseGenericDateTransformer

Extends BaseTransformer for datetime scenarios.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BaseDatetimeTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … ) BaseDatetimeTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’)

```

FITS = False
jsonable = False
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Check types of selected columns in provided data.

Parameters:
  • X (DataFrame) – Data containing self.columns

  • return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

  • X (DataFrame) – Validated data

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = BaseDatetimeTransformer(

  • … columns=[“a”, “b”],

  • … new_column_name=”bla”,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)

  • … “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)

  • … },

  • … )

  • >>> # base transform has no effect on data

  • >>> transformer.transform(test_df)

  • shape ((2, 2))

  • ┌─────────────────────┬─────────────────────┐

  • │ a ┆ b │

  • │ — ┆ — │

  • │ datetime[μs] ┆ datetime[μs] │

  • ╞═════════════════════╪═════════════════════╡

  • │ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 │)

  • │ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 │)

  • └─────────────────────┴─────────────────────┘

  • ```

class tubular.dates.BaseGenericDateTransformer(columns: list[str] | str, new_column_name: str, drop_original: bool = False, **kwargs: bool | None)[source]

Bases: DropOriginalMixin, BaseTransformer

Extends BaseTransformer for datetime/date scenarios.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

return_native: bool, default = True

Controls whether transformer returns narwhals or native pandas/polars type

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BaseGenericDateTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … ) BaseGenericDateTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’)

```

FITS = False
check_columns_are_date_or_datetime(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, datetime_only: bool) None[source]

Check types of provided columns.

Columns must be datetime or date type, depending on the datetime_only flag. If a column does not meet the expected type criteria, a TypeError is raised.

Parameters:
  • X (DataFrame) – Data to validate

  • datetime_only (bool) – Indicates whether ONLY datetime types are accepted

Raises:
  • TypeError – if non date/datetime types are found:

  • TypeError – if mismatched date/datetime types are found,:

  • types should be consistent

Examples

```pycon >>> import polars as pl

>>> transformer = BaseGenericDateTransformer(
...     columns=["a", "b"],
...     new_column_name="bla",
... )
>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )
>>> transformer.check_columns_are_date_or_datetime(test_df, datetime_only=False)

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> # base classes just return inputs >>> transformer = BaseGenericDateTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … )

>>> transformer.get_feature_names_out()
['a', 'b']
>>> # other classes return new columns
>>> transformer = DateDifferenceTransformer(
...     columns=["a", "b"],
...     new_column_name="bla",
... )
>>> transformer.get_feature_names_out()
['bla']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = BaseGenericDateTransformer(columns=[“a”, “b”], new_column_name=”bla”)

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'BaseGenericDateTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'bla', 'drop_original': False}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, datetime_only: bool = False, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Validate data pre transform.

Parameters:
  • X (DataFrame) – Data containing self.columns

  • datetime_only (bool) – Indicates whether ONLY datetime types are accepted

  • return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Validated data

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = BaseGenericDateTransformer(
...     columns=["a", "b"],
...     new_column_name="bla",
... )
>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )
>>> # base transform has no effect on data
>>> transformer.transform(test_df)
shape: (2, 2)
┌────────────┬────────────┐
│ a          ┆ b          │
│ ---        ┆ ---        │
│ date       ┆ date       │
╞════════════╪════════════╡
│ 1993-09-27 ┆ 1991-05-22 │
│ 2005-10-07 ┆ 2001-12-10 │
└────────────┴────────────┘

```

class tubular.dates.BetweenDatesTransformer(columns: ]], new_column_name: str, drop_original: bool = False, lower_inclusive: bool = True, upper_inclusive: bool = True, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Transformer to generate a boolean column indicating if one date is between two others.

If any row has column_lower greater than column_upper, the output column for that row will be null instead of raising a warning.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

column_lowerstr

Name of date column to subtract. This attribute is not for use in any method, use ‘columns’ instead. Here only as a fix to allow string representation of transformer.

column_upperstr

Name of date column to subtract from. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.

column_betweenstr

Name of column to check if it’s values fall between column_lower and column_upper. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.

columnslist

Contains the names of the columns to compare in the order [column_lower, column_between column_upper].

new_column_namestr

new_column_name argument passed when initialising the transformer.

lower_inclusivebool

lower_inclusive argument passed when initialising the transformer.

upper_inclusivebool

upper_inclusive argument passed when initialising the transformer.

drop_original: bool

indicates whether to drop original columns.

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=True, … ) BetweenDatesTransformer(columns=[‘a’, ‘b’, ‘c’],

new_column_name=’b_between_a_c’)

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=False, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘BetweenDatesTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’, ‘c’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘b_between_a_c’, ‘drop_original’: False, ‘lower_inclusive’: True, ‘upper_inclusive’: False}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: FrameT) FrameT[source]

Transform - creates column indicating if middle date is between the other two.

Rows where the lower bound is greater than the upper bound will produce null in the resulting output column for that row.

Parameters:

X (pd/pl/nw.DataFrame) – Data to transform.

Returns:

  • X (pd/pl/nw.DataFrame) – Input X with additional column (self.new_column_name) added. This column is boolean and indicates if the middle column is between the other 2.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = BetweenDatesTransformer(

  • … columns=[“a”, “b”, “c”],

  • … new_column_name=”b_between_a_c”,

  • … lower_inclusive=True,

  • … upper_inclusive=True,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([)

  • … datetime.date(1990, 9, 27),

  • … datetime.date(2005, 10, 7),

  • … datetime.date(2010, 1, 1),

  • … ],

  • … “b” ([)

  • … datetime.date(1991, 5, 22),

  • … datetime.date(2001, 12, 10),

  • … datetime.date(2009, 1, 1),

  • … ],

  • … “c” ([)

  • … datetime.date(1993, 4, 20),

  • … datetime.date(2007, 11, 8),

  • … datetime.date(2008, 1, 1),

  • … ],

  • … },

  • … )

  • >>> transformer.transform(test_df)

  • shape ((3, 4))

  • ┌────────────┬────────────┬────────────┬───────────────┐

  • │ a ┆ b ┆ c ┆ b_between_a_c │

  • │ — ┆ — ┆ — ┆ — │

  • │ date ┆ date ┆ date ┆ bool │

  • ╞════════════╪════════════╪════════════╪═══════════════╡

  • │ 1990-09-27 ┆ 1991-05-22 ┆ 1993-04-20 ┆ true │

  • │ 2005-10-07 ┆ 2001-12-10 ┆ 2007-11-08 ┆ false │

  • │ 2010-01-01 ┆ 2009-01-01 ┆ 2008-01-01 ┆ null │

  • └────────────┴────────────┴────────────┴───────────────┘

  • ```

class tubular.dates.DateDiffLeapYearTransformer(**kwargs)[source]

Bases: BaseGenericDateTransformer

Transformer to calculate the number of years between two dates.

!!! warning “Deprecated”

This transformer is now deprecated; use DateDifferenceTransformer instead.

columns

List of 2 columns. First column will be subtracted from second.

Type:

List[str]

new_column_name

Name given to calculated datediff column. If None then {column_upper}_{column_lower}_datediff will be used.

Type:

str, default = None

drop_original

Indicator whether to drop old columns during transform method.

Type:

bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = True
transform(X: FrameT) FrameT[source]

Calculate year gap between the two provided columns.

New column is created under the ‘new_column_name’, and optionally removes the old date columns.

Parameters:

X (pd/pl/nw.DataFrame) – Data containing self.columns

Returns:

X – Data containing self.columns

Return type:

pd/pl/nw.DataFrame

class tubular.dates.DateDifferenceTransformer(columns: ]], new_column_name: str, units: ]] = 'D', drop_original: bool = False, custom_days_divider: int | None = None, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Class to transform calculate the difference between 2 date fields in specified units.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DateDifferenceTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … units=”common_year”, … ) >>> transformer DateDifferenceTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’,

units=’common_year’)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'bla', 'drop_original': False, 'units': 'common_year', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}
>>> DateDifferenceTransformer.from_json(json_dump)
DateDifferenceTransformer(columns=['a', 'b'], new_column_name='bla',
                          units='common_year')

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DateDifferenceTransformer(columns=[“a”, “b”], new_column_name=”a_diff_b”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'a_diff_b', 'drop_original': False, 'units': 'D', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Calculate the difference between the given fields in the specified units.

Parameters:

X (DataFrame) – Data containing self.columns

Returns:

dataframe with added date difference column

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DateDifferenceTransformer(
...     columns=["a", "b"],
...     new_column_name="a_b_difference_years",
...     units="common_year",
... )
>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )
>>> transformer.transform(test_df)
shape: (2, 3)
┌────────────┬────────────┬──────────────────────┐
│ a          ┆ b          ┆ a_b_difference_years │
│ ---        ┆ ---        ┆ ---                  │
│ date       ┆ date       ┆ f64                  │
╞════════════╪════════════╪══════════════════════╡
│ 1993-09-27 ┆ 1991-05-22 ┆ -2.353425            │
│ 2005-10-07 ┆ 2001-12-10 ┆ -3.827397            │
└────────────┴────────────┴──────────────────────┘

```

class tubular.dates.DateDifferenceUnitsOptions(*values)[source]

Bases: str, Enum

Options for return units in DateDifferenceTransformer.

COMMON_YEAR = 'common_year'
CUSTOM_DAYS = 'custom_days'
DAYS = 'D'
FORTNIGHT = 'fortnight'
HOURS = 'h'
LUNAR_MONTH = 'lunar_month'
MINUTES = 'm'
SECONDS = 's'
WEEK = 'week'
class tubular.dates.DatetimeComponentExtractor(columns: str | list[str], include: ]], **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract numeric datetime components.

Attributes:

columns: List[str]

List of columns for processing

includelist of str

Which numeric datetime components to extract

polars_compatiblebool

Indicates whether transformer has been converted to polars/pandas agnostic framework

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

jsonable: bool

Indicates if transformer supports to/from_json methods

FITS: bool

Indicates whether transform requires fit to be run first

Example:

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … ) >>> transformer DatetimeComponentExtractor(columns=[‘a’], include=[‘hour’, ‘day’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}
>>> DatetimeComponentExtractor.from_json(json_dump)
DatetimeComponentExtractor(columns=['a'], include=['hour', 'day'])

```

FITS = False
INCLUDE_OPTIONS: ClassVar[list[str]] = ['hour', 'day', 'month', 'year']
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

List of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=[“a”, “b”], … include=[“hour”, “day”], … )

>>> transformer.get_feature_names_out()
['a_hour', 'a_day', 'b_hour', 'b_day']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, Any][source]

Convert transformer to JSON format.

Returns:

JSON representation of the transformer

Return type:

dict

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … )

>>> transformer.to_json()
{'tubular_version': '...', 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform - Extracts numeric datetime components.

Parameters:

X (DataFrame) – Data with columns to extract info from.

Returns:

X – Transformed input X with added columns of extracted information.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DatetimeComponentExtractor(
...     columns="a",
...     include=["hour", "day"],
... )
>>> test_df = pl.DataFrame(
...     {
...         "a": [
...             datetime.datetime(1993, 9, 27, 14, 30),
...             datetime.datetime(2005, 10, 7, 9, 45),
...         ],
...         "b": [
...             datetime.datetime(1991, 5, 22, 18, 0),
...             datetime.datetime(2001, 12, 10, 23, 59),
...         ],
...     },
... )
>>> transformer.transform(test_df)
shape: (2, 4)
┌─────────────────────┬─────────────────────┬────────┬───────┐
│ a                   ┆ b                   ┆ a_hour ┆ a_day │
│ ---                 ┆ ---                 ┆ ---    ┆ ---   │
│ datetime[μs]        ┆ datetime[μs]        ┆ f32    ┆ f32   │
╞═════════════════════╪═════════════════════╪════════╪═══════╡
│ 1993-09-27 14:30:00 ┆ 1991-05-22 18:00:00 ┆ 14.0   ┆ 27.0  │
│ 2005-10-07 09:45:00 ┆ 2001-12-10 23:59:00 ┆ 9.0    ┆ 7.0   │
└─────────────────────┴─────────────────────┴────────┴───────┘

```

class tubular.dates.DatetimeComponentOptions(*values)[source]

Bases: str, Enum

Contains options for DatetimeComponentExtractor.

DAY = 'day'
HOUR = 'hour'
MONTH = 'month'
YEAR = 'year'
class tubular.dates.DatetimeInfoExtractor(columns: str | list[str], include: ]] | None = None, datetime_mappings: dict[~typing.Annotated[str, beartype.vale.Is[lambda s: ...]], dict[int, str]] | None = None, drop_original: bool | None = False, **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract various features from datetime var.

Attributes:

columns: List[str]

List of columns for processing

includelist of str, default = [“timeofday”, “timeofmonth”, “timeofyear”, “dayofweek”]

Which datetime categorical information to extract

datetime_mappingsdict, default = None

Optional argument to define custom mappings for datetime values.

drop_original: str

indicates whether to drop provided columns post transform

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DatetimeInfoExtractor( … columns=”a”, … include=”timeofday”, … ) >>> transformer DatetimeInfoExtractor(columns=[‘a’], datetime_mappings={},

include=[‘timeofday’])

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}

```

DATETIME_ATTR: ClassVar[dict[str, str]] = {'dayofweek': 'weekday', 'timeofday': 'hour', 'timeofmonth': 'day', 'timeofyear': 'month'}
DEFAULT_MAPPINGS: ClassVar[dict[str, dict[int, str]]] = {'dayofweek': {1: 'monday', 2: 'tuesday', 3: 'wednesday', 4: 'thursday', 5: 'friday', 6: 'saturday', 7: 'sunday'}, 'timeofday': {0: 'night', 1: 'night', 2: 'night', 3: 'night', 4: 'night', 5: 'night', 6: 'morning', 7: 'morning', 8: 'morning', 9: 'morning', 10: 'morning', 11: 'morning', 12: 'afternoon', 13: 'afternoon', 14: 'afternoon', 15: 'afternoon', 16: 'afternoon', 17: 'afternoon', 18: 'evening', 19: 'evening', 20: 'evening', 21: 'evening', 22: 'evening', 23: 'evening'}, 'timeofmonth': {1: 'start', 2: 'start', 3: 'start', 4: 'start', 5: 'start', 6: 'start', 7: 'start', 8: 'start', 9: 'start', 10: 'start', 11: 'middle', 12: 'middle', 13: 'middle', 14: 'middle', 15: 'middle', 16: 'middle', 17: 'middle', 18: 'middle', 19: 'middle', 20: 'middle', 21: 'end', 22: 'end', 23: 'end', 24: 'end', 25: 'end', 26: 'end', 27: 'end', 28: 'end', 29: 'end', 30: 'end', 31: 'end'}, 'timeofyear': {1: 'winter', 2: 'winter', 3: 'spring', 4: 'spring', 5: 'spring', 6: 'summer', 7: 'summer', 8: 'summer', 9: 'autumn', 10: 'autumn', 11: 'autumn', 12: 'winter'}}
FITS = False
INCLUDE_OPTIONS: ClassVar[list[str]] = ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek']
RANGE_TO_MAP: ClassVar[dict[str, set[int]]] = {'dayofweek': {1, 2, 3, 4, 5, 6, 7}, 'timeofday': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}, 'timeofmonth': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, 'timeofyear': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}}
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = DatetimeInfoExtractor( … columns=[“a”, “b”], … include=[“timeofday”, “timeofmonth”], … )

>>> transformer.get_feature_names_out()
['a_timeofday', 'a_timeofmonth', 'b_timeofday', 'b_timeofmonth']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

>>> transformer=DatetimeInfoExtractor(columns='a')
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform - Extracts new features from datetime variables.

Parameters:

X (DataFrame) – Data with columns to extract info from.

Returns:

  • X (DataFrame) – Transformed input X with added columns of extracted information.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = DatetimeInfoExtractor(

  • … columns=”a”,

  • … include=”timeofmonth”,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)

  • … “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)

  • … },

  • … )

  • >>> transformer.transform(test_df)

  • shape ((2, 3))

  • ┌─────────────────────┬─────────────────────┬───────────────┐

  • │ a ┆ b ┆ a_timeofmonth │

  • │ — ┆ — ┆ — │

  • │ datetime[μs] ┆ datetime[μs] ┆ enum │

  • ╞═════════════════════╪═════════════════════╪═══════════════╡

  • │ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ end │)

  • │ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ start │)

  • └─────────────────────┴─────────────────────┴───────────────┘

  • ```

class tubular.dates.DatetimeInfoOptions(*values)[source]

Bases: str, Enum

Options for what is returned by DatetimeInfoExtractor.

DAY_OF_WEEK = 'dayofweek'
TIME_OF_DAY = 'timeofday'
TIME_OF_MONTH = 'timeofmonth'
TIME_OF_YEAR = 'timeofyear'
class tubular.dates.DatetimeSinusoidCalculator(columns: str | list[str], method: ]], units: ]]], period: ]]] = 6.283185307179586, drop_original: bool = False, **kwargs: bool | str)[source]

Bases: BaseDatetimeTransformer

Calculate the sine or cosine of a datetime column in a given unit (e.g hour).

Includes the option to scale period of the sine or cosine to match the natural period of the unit (e.g. 24).

Attributes:

columnsstr or list

Columns to take the sine or cosine of.

methodstr or list

The function to be calculated; either sin, cos or a list containing both.

unitsstr or dict

Which time unit the calculation is to be carried out on. Will take any of ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘microsecond’. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.

periodstr, float or dict, default = 2*np.pi

The period of the output in the units specified above. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) DatetimeSinusoidCalculator(columns=[‘a’], method=[‘sin’], units=’month’)

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … )

>>> transformer.get_feature_names_out()
['sin_6.283185307179586_month_a']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘DatetimeSinusoidCalculator’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘dummy’, ‘drop_original’: False, ‘method’: [‘sin’], ‘units’: ‘month’, ‘period’: 6.283185307179586}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform - creates column containing sine or cosine of another datetime column.

Which function is used is stored in the self.method attribute.

Parameters:
  • X (pd/pl/nw.DataFrame) – Data to transform.

  • return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

  • X (pd/pl/nw.DataFrame) – Input X with additional columns added, these are named “<method>_<original_column>”

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = DatetimeSinusoidCalculator(

  • … columns=”a”,

  • … method=”sin”,

  • … units=”month”,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)

  • … “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)

  • … },

  • … )

  • >>> transformer.transform(test_df)

  • shape ((2, 3))

  • ┌─────────────────────┬─────────────────────┬───────────────────────────────┐

  • │ a ┆ b ┆ sin_6.283185307179586_month_a │

  • │ — ┆ — ┆ — │

  • │ datetime[μs] ┆ datetime[μs] ┆ f64 │

  • ╞═════════════════════╪═════════════════════╪═══════════════════════════════╡

  • │ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ 0.412118 │)

  • │ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ -0.544021 │)

  • └─────────────────────┴─────────────────────┴───────────────────────────────┘

  • ```

class tubular.dates.DatetimeSinusoidUnitsOptions(*values)[source]

Bases: str, Enum

Options for units argument of DatetimeSinusoidCalculator.

DAY = 'day'
HOUR = 'hour'
MICROSECOND = 'microsecond'
MINUTE = 'minute'
MONTH = 'month'
SECOND = 'second'
YEAR = 'year'
class tubular.dates.MethodOptions(*values)[source]

Bases: str, Enum

Options for method arg of DatetimeSinusoidCalculator.

COS = 'cos'
SIN = 'sin'
class tubular.dates.SeriesDtMethodTransformer(**kwargs)[source]

Bases: BaseDatetimeTransformer

Transformer that applies a pandas.Series.dt method.

Transformer assigns the output of the method to a new column. It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.Series.dt method being called.

Be aware it is possible to supply incompatible arguments to init that will only be identified when transform is run. This is because there are many combinations of method, input and output sizes. Additionally some methods may only work as expected when called in transform with specific key word arguments.

column

Name of column to apply transformer to. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.

Type:

str

columns

Column name for transformation.

Type:

str

new_column_name

The name of the column or columns to be assigned to the output of running the pandas method in transform.

Type:

str

pd_method_name

The name of the pandas.DataFrame method to call.

Type:

str

pd_method_kwargs

Dictionary of keyword arguments to call the pd.Series.dt method with.

Type:

dict

drop_original

Indicates whether to drop self.column post transform

Type:

bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Transform specific column on input pandas.DataFrame (X) using the given pandas.Series.dt method.

Any keyword arguments set in the pd_method_kwargs attribute are passed onto the pd.Series.dt method when calling it.

Parameters:

X (pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional column (self.new_column_name) added. These contain the output of running the pd.Series.dt method.

Return type:

pd.DataFrame

class tubular.dates.ToDatetimeTransformer(columns: str | list[str], time_format: str | None = None, **kwargs: bool)[source]

Bases: BaseTransformer

Class to transform convert specified columns to datetime.

Class simply uses the pd.to_datetime method on the specified columns.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = ToDatetimeTransformer( … columns=”a”, … time_format=”%d/%m/%Y”, … ) >>> transformer ToDatetimeTransformer(columns=[‘a’], time_format=’%d/%m/%Y’)

>>> # version will vary for local vs CI, so use ... as generic match
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = ToDatetimeTransformer(columns=”a”, time_format=”%d/%m/%Y”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Convert specified column to datetime using pd.to_datetime.

Parameters:

X (DataFrame) – Data with column to transform.

Returns:

dataframe with provided columns converted to datetime

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = ToDatetimeTransformer(
...     columns="a",
...     time_format="%d/%m/%Y",
... )
>>> test_df = pl.DataFrame({"a": ["01/02/2020", "10/12/1996"], "b": [1, 2]})
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────────────────────┬─────┐
│ a                   ┆ b   │
│ ---                 ┆ --- │
│ datetime[μs]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2020-02-01 00:00:00 ┆ 1   │
│ 1996-12-10 00:00:00 ┆ 2   │
└─────────────────────┴─────┘

```

tubular.imputers module

Contains transformers that deal with imputation of missing values.

class tubular.imputers.ArbitraryImputer(impute_value: int | float | str | bool, columns: str | list[str], **kwargs: bool | None)[source]

Bases: BaseImputer

Transformer to impute null values with an arbitrary pre-defined value.

impute_value

Value to impute nulls with.

Type:

int or float or str or bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> arbitrary_imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5) >>> arbitrary_imputer ArbitraryImputer(columns=[‘a’, ‘b’], impute_value=5)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = arbitrary_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ArbitraryImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'impute_value': 5}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 5, 'b': 5}}}
>>> ArbitraryImputer.from_json(json_dump)
ArbitraryImputer(columns=['a', 'b'], impute_value=5)

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Impute missing values with the supplied impute_value.

Parameters:

X (DataFrame) – Data containing columns to impute.

Returns:

  • X (DataFrame) – Transformed input X with nulls imputed with the specified impute_value, for the specified columns.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> test_df = pl.DataFrame({“a” ([1, None, 2], “b”: [3, None, 4]}))

  • >>> imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5)

  • >>> imputer.transform(test_df)

  • shape ((3, 2))

  • ┌─────┬─────┐

  • │ a ┆ b │

  • │ — ┆ — │

  • │ i64 ┆ i64 │

  • ╞═════╪═════╡

  • │ 1 ┆ 3 │

  • │ 5 ┆ 5 │

  • │ 2 ┆ 4 │

  • └─────┴─────┘

  • ```

class tubular.imputers.BaseImputer(columns: ]] | str, copy: bool = False, verbose: bool = False, return_native: bool = True)[source]

Bases: BaseTransformer

Contains transform method that will use fill nulls with values from self.impute_values_.

Other imputers in this module should inherit from this class.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> BaseImputer(columns=[“a”, “b”]) BaseImputer(columns=[‘a’, ‘b’])

```

FITS = False
jsonable = False
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Raises:

RuntimeError: – if class is not jsonable

Examples

```pycon >>> arbitrary_imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=1)

>>> # version will vary for local vs CI, so use ... as generic match
>>> arbitrary_imputer.to_json()
{'tubular_version': ..., 'classname': 'ArbitraryImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'impute_value': 1}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 1, 'b': 1}}}
>>> mean_imputer = MeanImputer(columns=["a", "b"])
>>> test_df = pl.DataFrame({"a": [1, None], "b": [None, 2]})
>>> _ = mean_imputer.fit(test_df)
>>> mean_imputer.to_json()
{'tubular_version': ..., 'classname': 'MeanImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 1.0, 'b': 2.0}}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Impute missing values with values calculated from fit method.

Parameters:
  • X (DataFrame) – Data to impute.

  • return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Transformed input X with nulls imputed with the median value for the specified columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> imputer = BaseImputer(columns=["a", "b"])
>>> imputer.impute_values_ = {"a": 2, "b": 3.5}
>>> test_df = pl.DataFrame({"a": [1, None, 2], "b": [3, None, 4]})
>>> imputer.transform(test_df)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1   ┆ 3.0 │
│ 2   ┆ 3.5 │
│ 2   ┆ 4.0 │
└─────┴─────┘

```

class tubular.imputers.MeanImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: WeightColumnMixin, BaseImputer

Transformer to impute missing values with the mean of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (mean) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> mean_imputer = MeanImputer( … columns=[“a”, “b”], … ) >>> mean_imputer MeanImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})
>>> _ = mean_imputer.fit(test_df)
>>> json_dump = mean_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}
>>> MeanImputer.from_json(json_dump)
MeanImputer(columns=['a', 'b'])

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) MeanImputer[source]

Calculate mean values to impute with from X.

Parameters:
  • X (DataFrame) – Data to “learn” the mean values from.

  • y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MeanImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MeanImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.imputers.MedianImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the median of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (median) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> median_imputer = MedianImputer( … columns=[“a”, “b”], … ) >>> median_imputer MedianImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})
>>> _ = median_imputer.fit(test_df)
>>> json_dump = median_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MedianImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}
>>> MedianImputer.from_json(json_dump)
MedianImputer(columns=['a', 'b'])

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) MedianImputer[source]

Calculate median values to impute with from X.

Parameters:
  • X (DataFrame) – Data to “learn” the median values from.

  • y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MedianImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MedianImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.imputers.ModeImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the mode of the supplied columns.

If mode is NaN, a warning will be raised.

impute_values_

Created during fit method. Dictionary of float / int (mode) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> mode_imputer = ModeImputer( … columns=[“a”, “b”], … ) >>> mode_imputer ModeImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})
>>> _ = mode_imputer.fit(test_df)
>>> json_dump = mode_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ModeImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0, 'b': 1}}}
>>> ModeImputer.from_json(json_dump)
ModeImputer(columns=['a', 'b'])

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) ModeImputer[source]

Calculate mode values to impute with from X.

In the event of a tie, the highest modal value will be returned.

Parameters:
  • X (DataFrame) – Data to “learn” the mode values from.

  • y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance

Return type:

ModeImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = ModeImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 3 │ │ 2 ┆ 4 │ │ 2 ┆ 4 │ └─────┴─────┘

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.imputers.NearestMeanResponseImputer(**kwargs)[source]

Bases: BaseImputer

Impute nulls with the value where the average target is most similar to that for the nulls.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = True
deprecated = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series) NearestMeanResponseImputer[source]

Calculate mean values to impute with.

Parameters:
  • X (FrameT) – Data to fit the transformer on.

  • y (nw.Series) – Response column used to determine the value to impute with. The average response for each level of every column is calculated. The level which has the closest average response to the average response of the unknown levels is selected as the imputation value.

Returns:

NearestMeanResponseImputer

Return type:

fitted class instance

Raises:

ValueError – provided y contains nulls:

jsonable = False
lazyframe_compatible = False
polars_compatible = True
class tubular.imputers.NullIndicator(columns: ]] | str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Class to create a binary indicator column for null values.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> null_indicator = NullIndicator( … columns=[“a”, “b”], … ) >>> null_indicator NullIndicator(columns=[‘a’, ‘b’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = null_indicator.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'NullIndicator', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True}, 'fit': {'is_fitted_': True}}
>>> NullIndicator.from_json(json_dump)
NullIndicator(columns=['a', 'b'])

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Create new columns indicating the position of null values for each variable in self.columns.

Parameters:

X (DataFrame) – Data to add indicators to.

Returns:

dataframe with null indicator columns added

Return type:

DataFrame

Examples

——–, ```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = NullIndicator(columns=[“a”, “b”]) >>> imputer.transform(test_df) shape: (3, 4) ┌──────┬──────┬─────────┬─────────┐ │ a ┆ b ┆ a_nulls ┆ b_nulls │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ bool │ ╞══════╪══════╪═════════╪═════════╡ │ 1 ┆ 3 ┆ false ┆ false │ │ null ┆ null ┆ true ┆ true │ │ 2 ┆ 4 ┆ false ┆ false │ └──────┴──────┴─────────┴─────────┘

```

tubular.mapping module

Contains transformers that apply different types of mappings to columns.

class tubular.mapping.BaseCrossColumnMappingTransformer(**kwargs)[source]

Bases: BaseMappingTransformer

BaseMappingTransformer Extension for cross column mapping transformers.

adjust_column

Column containing the values to be adjusted.

Type:

str

mappings

Dictionary of mappings for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Check X is valid for transform and calls parent transform.

Parameters:

X (pd.DataFrame) – Data to apply adjustments to.

Returns:

X – Transformed data X with adjustments applied to specified columns.

Return type:

pd.DataFrame

Raises:

ValueError: – if provided adjust_column is not in DataFrame.

class tubular.mapping.BaseCrossColumnNumericTransformer(**kwargs)[source]

Bases: BaseCrossColumnMappingTransformer

BaseCrossColumnNumericTransformer Extension for cross column numerical mapping transformers.

adjust_column

Column containing the values to be adjusted.

Type:

str

mappings

Dictionary of mappings for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Check X is valid for transform and calls parent transform.

Parameters:

X (pd.DataFrame) – Data to apply adjustments to.

Returns:

X – Transformed data X with adjustments applied to specified columns.

Return type:

pd.DataFrame

Raises:

TypeError: – if provided columns are non-numeric

class tubular.mapping.BaseMappingTransformMixin(columns: ]] | str, copy: bool = False, verbose: bool = False, return_native: bool = True)[source]

Bases: BaseTransformer

Mixin class to apply mappings to columns method.

Transformer uses the mappings attribute which should be a dict of dicts/mappings for each required column.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

FITS = False
jsonable = False
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply mapping defined in the mappings dict to each column in the columns attribute.

Parameters:
  • X (DataFrame) – Data with nominal columns to transform.

  • return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

  • X (DataFrame) – Transformed input X with levels mapped according to mappings dict.

  • # not currently including doctest for this, as is not intended to be used

  • # independently (should be inherited as a mixin)

class tubular.mapping.BaseMappingTransformer(mappings: dict[str, dict[Any, Any]], return_dtypes: dict[str, RETURN_DTYPES] | None = None, **kwargs: bool | None)[source]

Bases: BaseTransformer

Base Transformer Extension for mapping transformers.

mappings

Dictionary of mappings for each column individually. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

mappings_from_null

dict storing what null values will be mapped to. Generally best to use an imputer, but this functionality is useful for inverting pipelines.

Type:

dict[str, Any]

return_dtypes

Dictionary of col:dtype for returned columns

Type:

dict[str, RETURN_DTYPES]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> BaseMappingTransformer( … mappings={“a”: {“Y”: 1, “N”: 0}}, … return_dtypes={“a”: “Int8”}, … ) BaseMappingTransformer(mappings={‘a’: {‘N’: 0, ‘Y’: 1}},

return_dtypes={‘a’: ‘Int8’})

```

FITS = False
RETURN_DTYPES

alias of Literal[‘String’, ‘Object’, ‘Categorical’, ‘Boolean’, ‘Int8’, ‘Int16’, ‘Int32’, ‘Int64’, ‘Float32’, ‘Float64’]

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> mapping_transformer = BaseMappingTransformer(mappings={“a”: {“x”: 1}})

>>> mapping_transformer.to_json()
{'tubular_version': ..., 'classname': 'BaseMappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'mappings': {'a': {'x': 1}}, 'return_dtypes': {'a': 'Int64'}}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Check mappings dict has been fitted.

Parameters:
  • X (DataFrame) – Data to apply mappings to.

  • return_native_override (Optional[bool]) – option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Input X, copied if specified by user.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = BaseMappingTransformer(
...     mappings={"a": {"Y": 1, "N": 0}},
...     return_dtypes={"a": "Int8"},
... )
>>> test_df = pl.DataFrame({"a": ["Y", "N"], "b": [3, 4]})
>>> # base class transform has no effect on data
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ Y   ┆ 3   │
│ N   ┆ 4   │
└─────┴─────┘

```

class tubular.mapping.CrossColumnAddTransformer(**kwargs)[source]

Bases: BaseCrossColumnNumericTransformer

Transformer to apply an additive adjustment to values in one column based on the values of another column.

adjust_column

Column containing the values to be adjusted.

Type:

str

mappings

Dictionary of additive adjustments for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Transform values in given column using the values provided in the adjustments dictionary.

Parameters:

X (pd.DataFrame) – Data to apply adjustments to.

Returns:

X – Transformed data X with adjustments applied to specified columns.

Return type:

pd.DataFrame

class tubular.mapping.CrossColumnMappingTransformer(**kwargs)[source]

Bases: BaseCrossColumnMappingTransformer

Transformer to adjust values in one column based on the values of another column.

adjust_column

Column containing the values to be adjusted.

Type:

str

mappings

Dictionary of mappings for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Transform values in given column using the values provided in the adjustments dictionary.

Parameters:

X (pd.DataFrame) – Data to apply adjustments to.

Returns:

X – Transformed data X with adjustments applied to specified columns.

Return type:

pd.DataFrame

class tubular.mapping.CrossColumnMultiplyTransformer(**kwargs)[source]

Bases: BaseCrossColumnNumericTransformer

Transformer to apply a multiplicative adjustment to values in one column based on the values of another column.

adjust_column

Column containing the values to be adjusted.

Type:

str

mappings

Dictionary of multiplicative adjustments for each column individually to be applied to the adjust_column. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Transform values in given column using the values provided in the adjustments dictionary.

Parameters:

X (pd.DataFrame) – Data to apply adjustments to.

Returns:

X – Transformed data X with adjustments applied to specified columns.

Return type:

pd.DataFrame

class tubular.mapping.MappingTransformer(mappings: dict[str, dict[Any, Any]], return_dtypes: dict[str, RETURN_DTYPES] | None = None, **kwargs: bool | None)[source]

Bases: BaseMappingTransformer, BaseMappingTransformMixin

Transformer to map values in columns to other values e.g. to merge two levels into one.

Note, the MappingTransformer does not require ‘self-mappings’ to be defined i.e. if you want to map a value to itself, you can omit this value from the mappings rather than having to map it to itself.

This transformer inherits from BaseMappingTransformMixin as well as the BaseMappingTransformer, BaseMappingTransformer performs standard checks, while BasemappingTransformMixin handles the actual logic.

Parameters:
  • mappings (dict) – Dictionary containing column mappings. Each value in mappings should be a dictionary of key (column to apply mapping to) value (mapping dict for given columns) pairs. For example the following dict {‘a’: {1: 2, 3: 4}, ‘b’: {‘a’: 1, ‘b’: 2}} would specify a mapping for column a of 1->2, 3->4 and a mapping for column b of ‘a’->1, b->2.

  • return_dtype (Optional[Dict[str, RETURN_DTYPES]]) – Dictionary of col:dtype for returned columns

  • **kwargs – Arbitrary keyword arguments passed onto BaseMappingTransformer.init method.

mappings

Dictionary of mappings for each column individually. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

mappings_from_null

dict storing what null values will be mapped to. Generally best to use an imputer, but this functionality is useful for inverting pipelines.

Type:

dict[str, Any]

return_dtypes

Dictionary of col:dtype for returned columns

Type:

dict[str, RETURN_DTYPES]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> transformer = MappingTransformer( … mappings={“a”: {“Y”: 1, “N”: 0}}, … return_dtypes={“a”: “Int8”}, … ) >>> transformer MappingTransformer(mappings={‘a’: {‘N’: 0, ‘Y’: 1}},

return_dtypes={‘a’: ‘Int8’})

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'mappings': {'a': {'Y': 1, 'N': 0}}, 'return_dtypes': {'a': 'Int8'}}, 'fit': {'is_fitted_': True}}
>>> MappingTransformer.from_json(json_dump)
MappingTransformer(mappings={'a': {'N': 0, 'Y': 1}},
                   return_dtypes={'a': 'Int8'})

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the input data X according to the mappings in the mappings attribute dict.

This method calls the BaseMappingTransformMixin.transform. Note, this transform method is different to some of the transform methods in the nominal module, even though they also use the BaseMappingTransformMixin.transform method. Here, if a value does not exist in the mapping it is unchanged.

Parameters:

X (DataFrame) – Data with nominal columns to transform.

Returns:

X – Transformed input X with levels mapped according to mappings dict.

Return type:

DataFrame

Examples

``pycon >>> import polars as pl

>>> transformer = MappingTransformer(
...   mappings={'a': {'Y': 1, 'N': 0}},
...   return_dtypes={"a":"Int8"},
...    )
>>> test_df=pl.DataFrame({'a': ["Y", "N"], 'b': [3,4]})
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i8  ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 0   ┆ 4   │
└─────┴─────┘

```

tubular.misc module

Contains legacy transformers for introducing fixed columns and changing dtypes.

class tubular.misc.ColumnDtypeSetter(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], dtype: ]], **kwargs: bool)[source]

Bases: BaseTransformer

Transformer to set transform columns in a dataframe to a dtype.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘ColumnDtypeSetter’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘dtype’: ‘Float32’, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform data.

Parameters:

X (DataFrame) – data to transform.

Returns:

DataFrame

Return type:

transformed data

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2]}) >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> transformer.transform(df) shape: (2, 1) ┌─────┐ │ a │ │ — │ │ f32 │ ╞═════╡ │ 1.0 │ │ 2.0 │ └─────┘

```

class tubular.misc.RenameColumnsTransformer(columns: ]] | str, new_column_names: dict[str, str], drop_original: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, DropOriginalMixin

Transformer to rename a given set of columns.

This can be useful for personalising the auto-output names from other transformers, or for creating a few different versions of a given column to undergo separate paths of logic in a pipeline (as the expression logic effectively creates duplicates of the column).

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> transformer RenameColumnsTransformer(columns=[‘a’], new_column_names={‘a’: ‘new_a’})

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> pprint(json_dump, sort_dicts=True)
{'classname': 'RenameColumnsTransformer',
 'fit': {'is_fitted_': True},
 'init': {'columns': ['a'],
          'copy': False,
          'drop_original': True,
          'new_column_names': {'a': 'new_a'},
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> RenameColumnsTransformer.from_json(json_dump)
RenameColumnsTransformer(columns=['a'], new_column_names={'a': 'new_a'})

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = RenameColumnsTransformer( … columns=[“a”, “b”], … new_column_names={“a”: “new_a”, “b”: “new_b”}, … )

>>> transformer.get_feature_names_out()
['new_a', 'new_b']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘RenameColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘drop_original’: True, ‘new_column_names’: {‘a’: ‘new_a’}, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Create column copies.

Parameters:

X (DataFrame) – Data to apply mappings to.

Returns:

X – Transformed input X with columns set to value.

Return type:

DataFrame

Raises:

ValueError – if new_column_names values are already present in X:

Examples

```pycon >>> import polars as pl

>>> transformer = RenameColumnsTransformer(
...     columns="a", new_column_names={"a": "new_a"}
... )  # noqa: E501
>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬───────┐
│ b   ┆ new_a │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 4   ┆ 1     │
│ 5   ┆ 2     │
│ 6   ┆ 3     │
└─────┴───────┘

```

class tubular.misc.SetValueTransformer(columns: ]] | str, value: int | float | str | bool | None, **kwargs: bool)[source]

Bases: BaseTransformer

Transformer to set value of column(s) to a given value.

This should be used if columns need to be set to a constant value.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> SetValueTransformer(columns=”a”, value=1) SetValueTransformer(columns=[‘a’], value=1)

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = SetValueTransformer(columns=”a”, value=1) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘SetValueTransformer’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘value’: 1}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Set columns to value.

Parameters:

X (DataFrame) – Data to apply mappings to.

Returns:

X – Transformed input X with columns set to value.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = SetValueTransformer(columns="a", value=1)
>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 1   ┆ 5   │
│ 1   ┆ 6   │
└─────┴─────┘

```

class tubular.misc.SimpleCastDtypes(*values)[source]

Bases: str, Enum

Allowed dtypes for ColumnDtypeSetter.

BOOLEAN = 'Boolean'
CATEGORICAL = 'Categorical'
FLOAT32 = 'Float32'
FLOAT64 = 'Float64'
INT16 = 'Int16'
INT32 = 'Int32'
INT64 = 'Int64'
INT8 = 'Int8'
STRING = 'String'
UINT16 = 'UInt16'
UINT32 = 'UInt32'
UINT64 = 'UInt64'
UINT8 = 'UInt8'

tubular.mixins module

Contains mixin classes for use across transformers.

class tubular.mixins.CheckNumericMixin[source]

Bases: object

Mixin class with methods for numeric transformers.

check_numeric_columns(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native: bool = True) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Check column args are numeric for numeric transformers.

Parameters:
  • X (DataFrame) – Data containing columns to check.

  • return_native (bool) – indicates whether to return nw or pd/pl dataframe

Returns:

validated dataframe

Return type:

DataFrame

Raises:

TypeError: – if provided columns are non-numeric

classname() str[source]

Get name of the current class when called.

Returns:

name of class

Return type:

str

class tubular.mixins.DropOriginalMixin[source]

Bases: object

Mixin class to validate and apply ‘drop_original’ argument used by various transformers.

Transformer deletes transformer input columns depending on boolean argument.

classname() str[source]

Get name of the current class when called.

Returns:

name of class

Return type:

str

static drop_original_column(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, drop_original: bool, columns: list[str] | str | None, return_native: bool = True) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Drop input columns from X if drop_original set to True.

Parameters:
  • X (DataFrame) – Data with columns to drop.

  • drop_original (bool) – boolean dictating dropping the input columns from X after checks.

  • columns (list[str] | str | None) – Object containing columns to drop

  • return_native (bool) – controls whether mixin returns native or narwhals type

Returns:

X – Transformed input X with columns dropped.

Return type:

DataFrame

class tubular.mixins.WeightColumnMixin[source]

Bases: object

Mixin class with weights functionality.

check_weights_column(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, weights_column: str) None[source]

Validate weights column in dataframe.

Parameters:
  • X (DataFrame) – input data

  • weights_column (str) – name of weight column

Raises:
  • ValueError: – if weights_column is missing from data

  • ValueError: – if weights_column is non-numeric

classname() str[source]

Get the name of the current class when called.

Returns:

name of class

Return type:

str

static get_valid_weights_filter_expr(weights_column: str, verbose: bool = False) Expr[source]

Validate weights column in dataframe.

Parameters:
  • weights_column (str) – name of weight column

  • verbose (bool) – control verbosity of method

Returns:

nw.Expr

Return type:

expression to be used for filtering down to valid weights rows

tubular.nominal module

Contains transformers that apply encodings to nominal columns.

class tubular.nominal.GroupRareLevelsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, cut_off_percent: ]] = 0.01, weights_column: str | None = None, rare_level_name: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] = 'rare', record_rare_levels: bool = True, unseen_levels_to_rare: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, WeightColumnMixin

Group together rare levels of nominal variables into a new rare level.

Rare levels are defined by a cut off percentage, which can either be based on the number of rows or sum of weights. Any levels below this cut off value will be grouped into the rare level.

cut_off_percent

Cut off percentage (either in terms of number of rows or sum of weight) for a given nominal level to be considered rare.

Type:

float

non_rare_levels

Created in fit. A dict of non-rare levels (i.e. levels with more than cut_off_percent weight or rows) that is used to identify rare levels in transform.

Type:

dict

rare_level_name

Must be of the same type as columns. Label for the new nominal level that will be added to group together rare levels (as defined by cut_off_percent).

Type:

any

record_rare_levels

Should the ‘rare’ levels that will be grouped together be recorded? If not they will be lost after the fit and the only information remaining will be the ‘non’rare’ levels.

Type:

bool

rare_levels_record

Only created (in fit) if record_rare_levels is True. This is dict containing a list of levels that were grouped into ‘rare’ for each column the transformer was applied to.

Type:

dict

weights_column

Name of weights columns to use if cut_off_percent should be in terms of sum of weight not number of rows.

Type:

str

unseen_levels_to_rare

If True, unseen levels in new data will be passed to rare, if set to false they will be left unchanged.

Type:

bool

training_data_levels

Dictionary containing the set of values present in the training data for each column in self.columns. It will only exist in if unseen_levels_to_rare is set to False.

Type:

dict[set]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> GroupRareLevelsTransformer( … columns=”a”, … cut_off_percent=0.02, … rare_level_name=”rare_level”, … ) GroupRareLevelsTransformer(columns=[‘a’], cut_off_percent=0.02,

rare_level_name=’rare_level’)

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) GroupRareLevelsTransformer[source]

Record non-rare levels for categorical variables.

When transform is called, only levels records in non_rare_levels during fit will remain unchanged - all other levels will be grouped. If record_rare_levels is True then the rare levels will also be recorded.

The label for the rare levels must be of the same type as the columns.

Parameters:
  • X (DataFrame) – Data to identify non-rare levels from.

  • y (Series or LazyFrame or None, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

GroupRareLevelsTransformer

Return type:

fitted class instance

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.02,
...     rare_level_name="rare_level",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})
>>> transformer.fit(test_df)
GroupRareLevelsTransformer(columns=['a'], cut_off_percent=0.02,
                           rare_level_name='rare_level')

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> import tests.test_data as d

>>> df = d.create_df_8("pandas")
>>> x = GroupRareLevelsTransformer(
...     columns=["b", "c"], cut_off_percent=0.4, unseen_levels_to_rare=False
... )
>>> x.fit(df)
GroupRareLevelsTransformer(columns=['b', 'c'], cut_off_percent=0.4,
                           unseen_levels_to_rare=False)
>>> x.to_json()
{'tubular_version': ..., 'classname': 'GroupRareLevelsTransformer', 'init': {'columns': ['b', 'c'], 'copy': False, 'verbose': False, 'return_native': True, 'cut_off_percent': 0.4, 'weights_column': None, 'rare_level_name': 'rare', 'record_rare_levels': True, 'unseen_levels_to_rare': False}, 'fit': {'is_fitted_': True, 'non_rare_levels': {'b': ['w'], 'c': ['a']}, 'training_data_levels': {'b': ['w', 'x', 'y', 'z'], 'c': ['a', 'b', 'c']}, 'rare_levels_record': {'b': ['x', 'y', 'z'], 'c': ['b', 'c']}}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Group rare levels together into a new ‘rare’ level.

Parameters:

X (DataFrame) – Data to with catgeorical variables to apply rare level grouping to.

Returns:

X – Transformed input X with rare levels grouped for into a new rare level.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.5,
...     rare_level_name="rare_level",
... )
>>> test_df = pl.DataFrame({"a": ["x", "x", "y"], "b": ["w", "z", "z"]})
>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (3, 2)
┌────────────┬─────┐
│ a          ┆ b   │
│ ---        ┆ --- │
│ str        ┆ str │
╞════════════╪═════╡
│ x          ┆ w   │
│ x          ┆ z   │
│ rare_level ┆ z   │
└────────────┴─────┘

```

class tubular.nominal.MeanResponseTransformer(columns: str | list[str] | None = None, weights_column: str | None = None, prior: ]] = 0, level: float | int | str | list | None = None, unseen_level_handling: float | int | Literal['mean', 'median', 'min', 'max'] | None = None, return_type: Literal['Float32', 'Float64'] = 'Float32', drop_original: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, WeightColumnMixin, DropOriginalMixin

Convert categorical variables to numeric by mapping levels to the mean response for level.

For a continuous or binary response the categorical columns specified will have values replaced with the mean response for each category.

For an n > 1 level categorical response, up to n binary responses can be created, which in turn can then be used to encode each categorical column specified. This will generate up to n * len(columns) new columns, of with names of the form {column}_{response_level}. The original columns will be removed from the dataframe. This functionality is controlled using the ‘level’ parameter. Note that the above only works for a n > 1 level categorical response. Do not use ‘level’ parameter for a n = 1 level numerical response. In this case, use the standard mean response transformer without the ‘level’ parameter.

If a categorical variable contains null values these will not be transformed.

The same weights and prior are applied to each response level in the multi-level case.

columns

Categorical columns to encode in the input data.

Type:

str or list

weights_column

Weights column to use when calculating the mean response.

Type:

str or None

prior

Regularisation parameter, can be thought of roughly as the size a category should be in order for its statistics to be considered reliable (hence default value of 0 means no regularisation).

Type:

int, default = 0

level

Parameter to control encoding against a multi-level categorical response. If None the response will be treated as binary or continuous, if ‘all’ all response levels will be encoded against and if it is a list of levels then only the levels specified will be encoded against.

Type:

str, int, float, list or None, default = None

response_levels

Only created in the multi-level case. Generated from level, list of all the response levels to encode against.

Type:

list

mappings

Created in fit. A nested Dict of {column names : column specific mapping dictionary} pairs. Column specific mapping dictionaries contain {initial value : mapped value} pairs.

Type:

dict

mapped_columns

Only created in the multi-level case. A list of the new columns produced by encoded the columns in self.columns against multiple response levels, of the form {column}_{level}.

Type:

list

transformer_dict

Only created in the multi-level case. A dictionary of the form level : transformer containing the mean response transformers for each level to be encoded against.

Type:

dict

unseen_levels_encoding_dict

Dict containing the values (based on chosen unseen_level_handling) derived from the encoded columns to use when handling unseen levels in data passed to transform method.

Type:

dict

return_type

What type to cast return column as. Defaults to float32.

Type:

Literal[‘float32’, ‘float64’]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> transformer
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')
>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})
>>> _ = transformer.fit(test_df[["a"]], test_df["b"])
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 1, 'level': None, 'unseen_level_handling': 'mean', 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.25, 'y': 0.75}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a'], 'unseen_levels_encoding_dict': {'a': 0.5}}}
>>> MeanResponseTransformer.from_json(json_dump)
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame) MeanResponseTransformer[source]

Identify mapping of categorical levels to mean response values.

If the user specified the weights_column arg in when initialising the transformer the weighted mean response will be calculated using that column.

In the multi-level case this method learns which response levels are present and are to be encoded against.

Parameters:
  • X (DataFrame) – Data to with catgeorical variable columns to transform and also containing response_column column.

  • y (Series or LazyFrame) – Response variable or target.

Returns:

MeanResponseTransformer

Return type:

fitted class instance

Raises:

ValueError – if y contains null values:

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})
>>> transformer.fit(test_df, test_df["target"])
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> transformer.get_feature_names_out()
['a']
>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level=["x", "y"],
...     unseen_level_handling="mean",
... )
>>> transformer.get_feature_names_out()
['a_x', 'a_y']
>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level="all",
...     unseen_level_handling="mean",
... )
>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...
>>> test_df = pl.DataFrame({"a": ["x", "y", "x"], "b": ["cat", "dog", "rat"]})
>>> _ = transformer.fit(test_df, test_df["b"])
>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(columns=["a"])
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})
>>> _ = transformer.fit(test_df[["a"]], test_df["b"])
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 0, 'level': None, 'unseen_level_handling': None, 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.0, 'y': 1.0}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a']}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply mean response encoding stored in the mappings attribute to columns.

Parameters:

X (DataFrame) – Data with nominal columns to transform.

Returns:

X – Transformed input X with levels mapped according to mappings dict.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> # example with no prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=0, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})
>>> _ = transformer.fit(test_df, test_df["target"])
>>> transformer.transform(test_df)
shape: (2, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ target │
│ --- ┆ --- ┆ ---    │
│ f32 ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ 0.0 ┆ 1   ┆ 0      │
│ 1.0 ┆ 2   ┆ 1      │
└─────┴─────┴────────┘

# example with prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=1, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})
>>> _ = transformer.fit(test_df, test_df["target"])
>>> transformer.transform(test_df)
shape: (2, 3)
┌──────┬─────┬────────┐
│ a    ┆ b   ┆ target │
│ ---  ┆ --- ┆ ---    │
│ f32  ┆ i64 ┆ i64    │
╞══════╪═════╪════════╡
│ 0.25 ┆ 1   ┆ 0      │
│ 0.75 ┆ 2   ┆ 1      │
└──────┴─────┴────────┘

```

class tubular.nominal.NominalToIntegerTransformer(**kwargs)[source]

Bases: BaseMappingTransformMixin

Transformer to convert columns containing nominal values into integer values.

The nominal levels that are mapped to integers are not ordered in any way.

start_encoding

Value to start the encoding / mapping of nominal to integer from.

Type:

int

mappings

Created in fit. A dict of key (column names) value (mappings between levels and integers for given column) pairs.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = True
deprecated = True
fit(X: pd.DataFrame, y: pd.Series | None = None) pd.DataFrame[source]

Create mapping between nominal levels and integer values for categorical variables.

Parameters:
  • X (pd.DataFrame) – Data to fit the transformer on, this sets the nominal levels that can be mapped.

  • y (None or pd.DataFrame or pd.Series, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

NominalToIntegerTransformer

Return type:

fitted class instance

Raises:

ValueError – if column has more levels than can be encoded as int8:

jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: pd.DataFrame) pd.DataFrame[source]

Apply integer encoding stored in the mappings attribute to columns.

Parameters:

X (pd.DataFrame) – Data with nominal columns to transform.

Returns:

X – Transformed input X with levels mapped according to mappings dict.

Return type:

pd.DataFrame

class tubular.nominal.OneHotEncodingTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, wanted_values: dict[str, ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]]] | None = None, separator: str = '_', drop_original: bool = False, **kwargs: bool)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer to convert categorical variables into dummy columns.

separator

Separator used in naming for dummy columns.

Type:

str

drop_original

Should original columns be dropped after creating dummy fields?

Type:

bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> transformer
OneHotEncodingTransformer(columns=['a'])
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})
>>> _ = transformer.fit(test_df)
>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}
>>> OneHotEncodingTransformer.from_json(json_dump)
OneHotEncodingTransformer(columns=['a'])

```

FITS = True
MAX_LEVELS = 100
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) OneHotEncodingTransformer[source]

Get list of levels for each column to be transformed.

This defines which dummy columns will be created in transform.

Parameters:
  • X (DataFrame) – Data to identify levels from.

  • y (None) – Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Returns:

OneHotEncodingTransformer

Return type:

fitted class instance

Raises:

ValueError – if column has >100 levels:

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})
>>> transformer.fit(test_df)
OneHotEncodingTransformer(columns=['a'])

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
...     wanted_values={"a": ["cat", "dog"]},
... )
>>> transformer.get_feature_names_out()
['a_cat', 'a_dog']
>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...
>>> test_df = pl.DataFrame({"a": ["cat", "dog", "rat"]})
>>> _ = transformer.fit(test_df)
>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(columns=["a"])
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})
>>> _ = transformer.fit(test_df)
>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Create new dummy columns from categorical fields.

Parameters:
  • X (DataFrame) – Data to apply one hot encoding to.

  • return_native_override (Optional[bool]) – controls whether transformer returns narwhals or native type.

  • return_native_override

  • transformer (option to override return_native attr in)

  • parent (useful when calling)

  • methods

Returns:

X_transformed – Transformed input X with dummy columns derived from categorical columns added. If drop_original = True then the original categorical columns that the dummies are created from will not be in the output X.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})
>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (2, 4)
┌─────┬─────┬───────┬───────┐
│ a   ┆ b   ┆ a_x   ┆ a_y   │
│ --- ┆ --- ┆ ---   ┆ ---   │
│ str ┆ i64 ┆ bool  ┆ bool  │
╞═════╪═════╪═══════╪═══════╡
│ x   ┆ 1   ┆ true  ┆ false │
│ y   ┆ 2   ┆ false ┆ true  │
└─────┴─────┴───────┴───────┘

```

class tubular.nominal.OrdinalEncoderTransformer(**kwargs)[source]

Bases: BaseMappingTransformMixin, WeightColumnMixin

Encode categorical variables into ascending rank-ordered integer values variables.

Maps levels to the target-mean response for that level.

Values will be sorted in ascending order only i.e. categorical level with lowest target mean response to be encoded as 1, the next highest value as 2 and so on.

If a categorical variable contains null values these will not be transformed.

weights_column

Weights column to use when calculating the mean response.

Type:

str or None

mappings

Created in fit. Dict of key (column names) value (mapping of categorical levels to numeric, ordinal encoded response values) pairs.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = True
deprecated = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series) OrdinalEncoderTransformer[source]

Identify mapping of categorical levels to rank-ordered integer values by target-mean in ascending order.

If the user specified the weights_column arg in when initialising the transformer the weighted mean response will be calculated using that column.

Parameters:
  • X (DataFrame) – Data to with catgeorical variable columns to transform and response_column column specified when object was initialised.

  • y (Series or LazyFrame) – Response column or target.

Returns:

OrdinalEncoderTransformer

Return type:

fitted class instance

Raises:

ValueError – if y contains nulls:

jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply ordinal encoding stored in the mappings attribute to columns.

This maps categorical levels to rank-ordered integer values by target-mean in ascending order.

Parameters:

X (DataFrame) – Data to with catgeorical variable columns to transform.

Returns:

X – Transformed data with levels mapped to ordinal encoded values for categorical variables.

Return type:

DataFrame

tubular.numeric module

Contains transformers that apply numeric functions.

class tubular.numeric.BaseNumericTransformer(columns: list[str], **kwargs: dict[str, bool])[source]

Bases: BaseTransformer, CheckNumericMixin

Extends BaseTransformer for datetime scenarios.

columns

List of columns to be operated on

Type:

List[str]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> BaseNumericTransformer( … columns=”a”, … ) BaseNumericTransformer(columns=[‘a’])

```

FITS = False
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | None = None) BaseNumericTransformer[source]

Validate data and attributes prior to the child objects fit logic.

Parameters:
  • X (DataFrame) – A dataframe containing the required columns

  • y (Series | None) – Required for pipeline.

Returns:

fitted class instance.

Return type:

BaseNumericTransformer

Examples

```pycon >>> import polars as pl

>>> transformer = BaseNumericTransformer(
...     columns="a",
... )
>>> test_df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
>>> transformer.fit(test_df)
BaseNumericTransformer(columns=['a'])

```

jsonable = False
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Validate data and attributes prior to the child objects transform logic.

Parameters:
  • X (DataFrame) – Data to transform.

  • return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

X – Validated data

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = BaseNumericTransformer(
...     columns="a",
... )
>>> test_df = pl.DataFrame({"a": [1, 2], "b": [3, 4]})
>>> # base class has no effect on datag
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 2   ┆ 4   │
└─────┴─────┘

```

class tubular.numeric.CutTransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Class to bin a column into discrete intervals.

Class simply uses the [pd.cut](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html) method on the specified column.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Discretise specified column using pd.cut.

Parameters:

X (pd.DataFrame) – Data with column to transform.

Returns:

Dataframe with binned column

Return type:

pd.DataFrame

class tubular.numeric.DifferenceTransformer(columns: ]], **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs subtraction operation between two columns.

This transformer allows performing subtraction between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The second column is subtracted from the first.

Type:

ListOfTwoStrs

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> transformer.columns [‘a’, ‘b’]

```

FITS = False
get_feature_names_out() list[str][source]

Get the names of the output features.

Returns:

List containing the name of the new column created by the transformation.

Return type:

list[str]

jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the DataFrame by applying the subtraction operation between two columns.

Parameters:

X (DataFrame) – DataFrame containing the columns to operate on.

Returns:

Transformed DataFrame with the new column containing the subtraction results.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬───────────┐ │ a ┆ b ┆ a_minus_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═══════════╡ │ 100 ┆ 80 ┆ 20 │ │ 200 ┆ 150 ┆ 50 │ │ 300 ┆ 200 ┆ 100 │ └─────┴─────┴───────────┘

```

class tubular.numeric.InteractionTransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Generates interaction features.

Transformer generates a new column for all combinations from the selected columns up to the maximum degree provided. (For sklearn version higher than 1.0.0>, only interaction of a degree higher or equal to the minimum degree would be computed). Each interaction column consists of the product of the specific combination of columns. Ex: with 3 columns provided [“a”,”b”,”c”], if max degree is 3, the total possible combinations are : - of degree 1 : [“a”,”b”,”c”] - of degree 2 : [“a b”,”b c”,”a c”] - of degree 3 : [“a b c”].

min_degree

minimum degree of interaction features to be considered

Type:

int

max_degree

maximum degree of interaction features to be considered

Type:

int

nb_features_to_interact

number of selected columns from which interactions should be computed. (=len(columns))

Type:

int

nb_combinations

number of new interaction features

Type:

int

interaction_colname

names of each new interaction feature. The name of an interaction feature is the combinations of previous column names joined with a whitespace. Interaction feature of [“col1”,”col2”,”col3] would be “col1 col2 col3”.

Type:

list

nb_feature_out

number of total columns of transformed dataset, including new interaction features

Type:

int

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
MIN_DEGREE_VALUE = 2
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Generate interaction features using the “product” pandas.DataFrame method.

Parameters:

X (pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional column or columns (self.interaction_colname) added. These contain the output of running the product pandas DataFrame method on identified combinations.

Return type:

pd.DataFrame

Raises:

TypeError – for invalid PolynomialFeatures._combinations arguments:

class tubular.numeric.LogTransformer(**kwargs)[source]

Bases: BaseNumericTransformer, DropOriginalMixin

Transformer to apply log transformation.

Transformer has the option to add 1 to the columns to log and drop the original columns.

add_1

The name of the column or columns to be assigned to the output of running the pandas method in transform.

Type:

bool

drop_original

The name of the pandas.DataFrame method to call.

Type:

bool

suffix

The suffix to add onto the end of column names for new columns.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Apply the log transform to the specified columns.

If the drop attribute is True then the original columns are dropped. If the add_1 attribute is True then the original columns + 1 are logged.

Parameters:

X (pd.DataFrame) – The dataframe to be transformed.

Returns:

X – The dataframe with the specified columns logged, optionally dropping the original columns if self.drop is True.

Return type:

pd.DataFrame

Raises:

ValueError: – if provided columns contain negative values.

class tubular.numeric.OneDKmeansTransformer(columns: str | ~typing.Annotated[list[str], beartype.vale.Is[lambda list_arg: ...]], new_column_name: str, n_init: str | int = 'auto', n_clusters: int = 8, drop_original: bool = False, kmeans_kwargs: dict[str, object] | None = None, **kwargs: bool)[source]

Bases: BaseNumericTransformer, DropOriginalMixin

Generates a new column based on kmeans algorithm.

Transformer runs the kmeans algorithm based on given number of clusters and then identifies the bins’ cuts based on the results. Finally it passes them into the a cut function.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”new”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … ) OneDKmeansTransformer(columns=[‘a’], kmeans_kwargs={‘random_state’: 42},

n_clusters=2, new_column_name=’new’)

```

FITS = True
fit(X: FrameT, y: IntoSeriesT | None = None) OneDKmeansTransformer[source]

Fit transformer to input data.

Parameters:
  • X (pd/pl.DataFrame) – Dataframe with columns to learn scaling values from.

  • y (None) – Required for pipeline.

Returns:

Fitted class instance.

Return type:

OneDKmeansTransformer

Raises:

ValueError: – if columns in X contain missing values.

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )
>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
>>> transformer.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”kmeans_column”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … )

>>> transformer.get_feature_names_out()
['kmeans_column']

```

jsonable = True
lazyframe_compatible = False
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

>>> import polars as pl
>>> x = OneDKmeansTransformer(
... columns='a',
... n_clusters=2,
... new_column_name="new",
... drop_original=False,
... kmeans_kwargs={"random_state": 42},
...    )
>>> test_df=pl.DataFrame({'a': [1,2,3,4],  'b': [5,6,7,8]})
>>> x.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')
>>> x.to_json()
{'tubular_version': ..., 'classname': 'OneDKmeansTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'new', 'n_init': 'auto', 'n_clusters': 2, 'drop_original': False, 'kmeans_kwargs': {'random_state': 42}}, 'fit': {'is_fitted_': True, 'bins': [3, 4]}}
transform(X: FrameT) FrameT[source]

Generate from input pd/pl.DataFrame (X) bins based on Kmeans results and add this column or columns in X.

Parameters:

X (pl/pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional cluster column added.

Return type:

pl/pd.DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )
>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ new │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 0   │
│ 2   ┆ 6   ┆ 0   │
│ 3   ┆ 7   ┆ 0   │
│ 4   ┆ 8   ┆ 1   │
└─────┴─────┴─────┘

```

class tubular.numeric.PCATransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Generates variables using Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

It is based on sklearn class sklearn.decomposition.PCA

pca
Type:

PCA class from sklearn.decomposition

n_components_

The estimated number of components. When n_components is set to ‘mle’ or a number between 0 and 1 (with svd_solver == ‘full’) this number is estimated from input data. Otherwise it equals the parameter n_components, or the lesser value of n_features and n_samples if n_components is None.

Type:

int

feature_names_out

list of feature name representing the new dimensions.

Type:

list or None

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = True
deprecated = True
fit(X: DataFrame, y: Series | None = None) DataFrame[source]

Fit PCA to input data.

Parameters:
  • X (pd.DataFrame) – Dataframe with columns to learn scaling values from.

  • y (None) – Required for pipeline.

Returns:

fitted class instance.

Return type:

PCATransformer

Raises:

ValueError: – if n_components is invalid for data

jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Generate from input pandas DataFrame (X) PCA features and add this column or columns in X.

Parameters:

X (pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional column or columns (self.interaction_colname) added. These contain the output of running the product pandas DataFrame method on identified combinations.

Return type:

pd.DataFrame

class tubular.numeric.RatioTransformer(columns: ]], return_dtype: ]] = 'Float32', **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs division operation between two columns.

This transformer allows performing division between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The first column is the numerator, and the second column is the denominator.

Type:

ListOfTwoStrs

return_dtype

The dtype of the resulting column, either ‘Float32’ or ‘Float64’.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> transformer.columns [‘a’, ‘b’] >>> transformer.return_dtype ‘Float32’

```

FITS = False
get_feature_names_out() list[str][source]

Get the names of the output features.

Returns:

List containing the name of the new column created by the transformation.

Return type:

list[str]

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> ratio_transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> ratio_transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘RatioTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘return_dtype’: ‘Float32’}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the DataFrame by applying the division operation between two columns.

Parameters:

X (DataFrame) – DataFrame containing the columns to operate on.

Returns:

Transformed DataFrame with the new column containing the division results.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬────────────────┐ │ a ┆ b ┆ a_divided_by_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ f32 │ ╞═════╪═════╪════════════════╡ │ 100 ┆ 80 ┆ 1.25 │ │ 200 ┆ 150 ┆ 1.333333 │ │ 300 ┆ 200 ┆ 1.5 │ └─────┴─────┴────────────────┘

```

class tubular.numeric.ScalingTransformer(**kwargs)[source]

Bases: BaseNumericTransformer

Transformer to perform scaling of numeric columns.

Transformer can apply min max scaling, max absolute scaling or standardisation (subtract mean and divide by std). The transformer uses the appropriate sklearn.preprocessing scaler.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = True
deprecated = True
fit(X: DataFrame, y: Series | None = None) ScalingTransformer[source]

Fit scaler to input data.

Parameters:
  • X (pd.DataFrame) – Dataframe with columns to learn scaling values from.

  • y (None) – Required for pipeline.

Returns:

fitted class instance.

Return type:

ScalingTransformer

jsonable = False
lazyframe_compatible = False
polars_compatible = False
scaler_options: ClassVar[dict[str, MinMaxScaler | MaxAbsScaler | StandardScaler]] = {'max_abs': <class 'sklearn.preprocessing._data.MaxAbsScaler'>, 'min_max': <class 'sklearn.preprocessing._data.MinMaxScaler'>, 'standard': <class 'sklearn.preprocessing._data.StandardScaler'>}
transform(X: DataFrame) DataFrame[source]

Transform input data X with fitted scaler.

Parameters:

X (pd.DataFrame) – Dataframe containing columns to be scaled.

Returns:

X – Input X with columns scaled.

Return type:

pd.DataFrame

class tubular.numeric.TwoColumnOperatorTransformer(**kwargs)[source]

Bases: DataFrameMethodTransformer, BaseNumericTransformer

Applies a pandas.DataFrame method to two columns (add, sub, mul, div, mod, pow).

Transformer assigns the output of the method to a new column. The method will be applied in the form (column 1)operator(column 2), so order matters (if the method does not commute). It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.DataFrame method being called.

pd_method_name

The name of the pandas.DataFrame method to be called.

Type:

str

columns

list containing two string items: [column1_name, column2_name] The first will be operated upon by the chosen pandas method using the second.

Type:

list

column2_name

The name of the 2nd column in the operation.

Type:

str

new_column_name

The name of the new column that the output is assigned to.

Type:

str

pd_method_kwargs

Dictionary of method kwargs to be passed to pandas.DataFrame method.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Transform input data by applying the chosen method to the two specified columns.

Args:

X (pd.DataFrame): Data to transform.

Returns:

pd.DataFrame: Input X with an additional column.

tubular.strings module

Contains transformers that apply string functions.

class tubular.strings.ExtractStringComponentsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], by: str, return_n_components: ]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to extract components from string columns, split by given character.

by

character to split on

Type:

str

return_n_components

number of components to return

Type:

int

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … ) >>> transformer ExtractStringComponentsTransformer(by=’@’, columns=[‘a’], return_n_components=2)

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'ExtractStringComponentsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'by': '@',
          'columns': ['a'],
          'copy': False,
          'return_n_components': 2,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> ExtractStringComponentsTransformer.from_json(json_dump)
ExtractStringComponentsTransformer(by='@', columns=['a'], return_n_components=2)

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … )

>>> transformer.get_feature_names_out()
['a_split_by_@_entry_0', 'a_split_by_@_entry_1']

```

get_transform_exprs() list[Expr][source]

Get transform expressions.

Returns:

list[nw.Expr]

Return type:

transform expressions for class

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … )

>>> pprint(transformer.to_json())
{'classname': 'ExtractStringComponentsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'by': '@',
          'columns': ['a'],
          'copy': False,
          'return_n_components': 2,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Extract components from string columns, split by given character.

Parameters:

X (DataFrame) – Data containing columns to extract components from.

Returns:

X – Transformed input X with string components extracted from columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [”greg@gmail.com”, “bob@apple.net”]}) >>> transformer = ExtractStringComponentsTransformer( … columns=[“a”], by=”@”, return_n_components=2 … ) >>> transformer.transform(test_df) shape: (2, 3) ┌────────────────┬──────────────────────┬──────────────────────┐ │ a ┆ a_split_by_@_entry_0a_split_by_@_entry_1 │ │ — ┆ — ┆ — │ │ str ┆ str ┆ str │ ╞════════════════╪══════════════════════╪══════════════════════╡ │ greg@gmail.com ┆ greg ┆ gmail.com │ │ bob@apple.net ┆ bob ┆ apple.net │ └────────────────┴──────────────────────┴──────────────────────┘

```

class tubular.strings.LowerCaseTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to lower case of text columns.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = LowerCaseTransformer( … columns=[“a”], … ) >>> transformer LowerCaseTransformer(columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'LowerCaseTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> LowerCaseTransformer.from_json(json_dump)
LowerCaseTransformer(columns=['a'])

```

FITS = False
get_transform_exprs() list[Expr][source]

Get transform expressions.

Returns:

list[nw.Expr]

Return type:

transform expressions for class

jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Lower case of text in given columns.

Parameters:

X (DataFrame) – Data containing columns to lowercase.

Returns:

X – Transformed input X with text lowercased in given columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [“HeLlO”, None, “ HI”]}) >>> transformer = LowerCaseTransformer(columns=”a”) >>> transformer.transform(test_df) shape: (3, 1) ┌───────┐ │ a │ │ — │ │ str │ ╞═══════╡ │ hello │ │ null │ │ hi │ └───────┘

```

class tubular.strings.RemoveCharactersTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], characters: list[str], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to remove characters from text columns.

characters

list of characters to remove from text columns.

Type:

list[str]

characters_formatted

characters attr formatted into regex string.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”\d”]) >>> transformer RemoveCharactersTransformer(characters=[’\d’], columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['\\d'],
          'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> RemoveCharactersTransformer.from_json(json_dump)
RemoveCharactersTransformer(characters=['\\d'], columns=['a'])

```

FITS = False
get_transform_exprs() list[Expr][source]

Get transform expressions.

Returns:

list[nw.Expr]

Return type:

transform expressions for class

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”, “b”], characters=[“a”])

>>> pprint(transformer.to_json())
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['a'],
          'columns': ['a', 'b'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Strip unwanted characters from specified columns.

Parameters:

X (DataFrame) – Data containing columns to strip.

Returns:

X – Transformed input X with characters stripped from specified columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [” 8hi!”, None, “9999hello “]}) >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”W”, “s”]) >>> transformer.transform(test_df) shape: (3, 1) ┌───────────┐ │ a │ │ — │ │ str │ ╞═══════════╡ │ 8hi │ │ null │ │ 9999hello │ └───────────┘

```

class tubular.strings.SeriesStrMethodTransformer(**kwargs)[source]

Bases: BaseTransformer

Transformer that applies a pandas.Series.str method.

Transformer assigns the output of the method to a new column. It is possible to supply other key word arguments to the transform method, which will be passed to the pandas.Series.str method being called.

Be aware it is possible to supply incompatible arguments to init that will only be identified when transform is run. This is because there are many combinations of method, input and output sizes. Additionally some methods may only work as expected when called in transform with specific key word arguments.

new_column_name

The name of the column or columns to be assigned to the output of running the pd.Series.str in transform.

Type:

str

pd_method_name

The name of the pd.Series.str method to call.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Apply given pandas.Series.str method to given column.

Any keyword arguments set in the pd_method_kwargs attribute are passed onto the pd.Series.str method when calling it.

Parameters:

X (pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional column (self.new_column_name) added. These contain the output of running the pd.Series.str method.

Return type:

pd.DataFrame

class tubular.strings.StringConcatenator(**kwargs)[source]

Bases: BaseTransformer

Transformer to combine data from specified columns, of mixed datatypes, into a new column containing one string.

Parameters:
  • columns (str or list of str) – Columns to concatenate.

  • new_column_name (str, default = "new_column") – New column name

  • separator (str, default = " ") – Separator for the new string value

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

deprecated = True
jsonable = False
lazyframe_compatible = False
polars_compatible = False
transform(X: DataFrame) DataFrame[source]

Combine data from specified columns, of mixed datatypes, into a new column containing one string.

Parameters:

X (df) – Data to concatenate values on.

Returns:

X – Returns a dataframe with concatenated values.

Return type:

df

class tubular.strings.StringContainsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], reference: str, reference_as_column: bool = False, **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to indicate if given columns contain reference values.

reference

column or value to compare against, e.g. look for values of reference=’a’ in columns [‘b’, ‘c’].

Type:

str

reference_as_column

indicates whether reference represents a column (or value). Note, reference_as_column=True is not supported for pandas backend.

Type:

bool

characters_formatted

characters attr formatted into regex string.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = StringContainsTransformer( … columns=[“a”], reference=”b”, reference_as_column=True … ) >>> transformer StringContainsTransformer(columns=[‘a’], reference=’b’,

reference_as_column=True)

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'StringContainsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'reference': 'b',
          'reference_as_column': True,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> StringContainsTransformer.from_json(json_dump)
StringContainsTransformer(columns=['a'], reference='b',
                          reference_as_column=True)

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = StringContainsTransformer(columns=[“a”, “b”], reference=”c”)

>>> transformer.get_feature_names_out()
['a_contains_c', 'b_contains_c']

```

get_transform_exprs() list[Expr][source]

Get transform expressions.

Returns:

list[nw.Expr]

Return type:

transform expressions for class

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = StringContainsTransformer( … columns=[“a”], reference=”b”, reference_as_column=True … )

>>> pprint(transformer.to_json())
{'classname': 'StringContainsTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'reference': 'b',
          'reference_as_column': True,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Indicate if provided columns contain reference values.

Parameters:

X (DataFrame) – Data containing columns to strip.

Returns:

X – Transformed input X with characters stripped from specified columns.

Return type:

DataFrame

Raises:

TypeError – if called on pandas df when reference_as_column=True:

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame( … {“a”: [“cat”, “dog”, None, “mouse”], “b”: [“cat”, “rat”, None, “mouse”]} … ) >>> transformer = StringContainsTransformer( … columns=[“a”], reference=”b”, reference_as_column=True … ) >>> transformer.transform(test_df) shape: (4, 3) ┌───────┬───────┬──────────────┐ │ a ┆ b ┆ a_contains_b │ │ — ┆ — ┆ — │ │ str ┆ str ┆ bool │ ╞═══════╪═══════╪══════════════╡ │ cat ┆ cat ┆ true │ │ dog ┆ rat ┆ false │ │ null ┆ null ┆ null │ │ mouse ┆ mouse ┆ true │ └───────┴───────┴──────────────┘

```

Module contents

Initialise classes exposed by package.

class tubular.AggregateColumnsOverRowTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], aggregations: ]], drop_original: bool = False, **kwargs: bool)[source]

Bases: BaseAggregationTransformer

Aggregate provided columns over each row.

This transformer aggregates data within specified columns and can optionally drop the original columns post-transformation.

Attributes:

columnsUnion[str,list[str]]

List of column names to apply the aggregation transformations to.

aggregationslist[str]

List of aggregation methods to apply.

drop_originalbool, optional

Whether to drop the original columns after transformation. Default is False.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatible: bool

Indicates if transformer will work with polars frames

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> AggregateColumnsOverRowTransformer( … columns=[“a”, “b”], … aggregations=[“min”, “max”], … ) AggregateColumnsOverRowTransformer(aggregations=[‘min’, ‘max’],

columns=[‘a’, ‘b’])

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = AggregateColumnsOverRowTransformer( … columns=[“a”, “b”], … aggregations=[“min”, “max”], … )

>>> transformer.get_feature_names_out()
['a_b_min', 'a_b_max']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the dataframe by aggregating provided columns over each row.

Parameters:

X (DataFrame) – DataFrame to transform by aggregating provided columns over each row

Returns:

  • DataFrame – Transformed DataFrame with aggregated columns.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> transformer = AggregateColumnsOverRowTransformer(

  • … columns=[“a”, “b”],

  • … aggregations=[“min”, “max”],

  • … )

  • >>> test_df = pl.DataFrame({“a” ([1, 2], “b”: [3, 4], “c”: [5, 6]}))

  • >>> transformer.transform(test_df)

  • shape ((2, 5))

  • ┌─────┬─────┬─────┬─────────┬─────────┐

  • │ a ┆ b ┆ c ┆ a_b_min ┆ a_b_max │

  • │ — ┆ — ┆ — ┆ — ┆ — │

  • │ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │

  • ╞═════╪═════╪═════╪═════════╪═════════╡

  • │ 1 ┆ 3 ┆ 5 ┆ 1 ┆ 3 │

  • │ 2 ┆ 4 ┆ 6 ┆ 2 ┆ 4 │

  • └─────┴─────┴─────┴─────────┴─────────┘

  • ```

class tubular.AggregateRowsOverColumnTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], aggregations: ]], key: str, drop_original: bool = False, **kwargs: bool)[source]

Bases: BaseAggregationTransformer

Aggregation transformer.

Aggregate rows over specified columns, where rows are grouped by provided key column.

Attributes:

columnsUnion[str, list[str]]

List of column names to apply the aggregation transformations to.

aggregationslist[str]

List of aggregation methods to apply.

keystr

Column name to group by for aggregation.

drop_originalbool, optional

Whether to drop the original columns after transformation. Default is False.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatible: bool

Indicates if transformer will work with polars frames

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> AggregateRowsOverColumnTransformer( … columns=”a”, … aggregations=[“min”, “max”], … key=”b”, … ) AggregateRowsOverColumnTransformer(aggregations=[‘min’, ‘max’], columns=[‘a’],

key=’b’)

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = AggregateRowsOverColumnTransformer( … columns=”a”, … aggregations=[“min”, “max”], … key=”b”, … )

>>> transformer.get_feature_names_out()
['a_min', 'a_max']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, Any][source]

Dump transformer to json dict.

Returns:

dict[str, Any]:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Example:

```pycon >>> transformer = AggregateRowsOverColumnTransformer( … columns=”a”, … key=”c”, … aggregations=[“min”, “max”], … ) >>> transformer.to_json() # doctest: +NORMALIZE_WHITESPACE {‘tubular_version’: …,

‘classname’: ‘AggregateRowsOverColumnTransformer’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘aggregations’: [‘min’, ‘max’], ‘drop_original’: False, ‘key’: ‘c’}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the dataframe by aggregating rows over specified columns.

Parameters:

X (DataFrame) – DataFrame to transform by aggregating specified columns.

Returns:

Transformed DataFrame with aggregated columns.

Return type:

DataFrame

Raises:

ValueError – If the key column is not found in the DataFrame.

Examples

```pycon >>> import polars as pl

>>> transformer = AggregateRowsOverColumnTransformer(
...     columns="a",
...     aggregations=["min", "max"],
...     key="b",
... )
>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [1, 1, 2], "c": [1, 2, 3]})
>>> transformer.transform(test_df)
shape: (3, 5)
┌─────┬─────┬─────┬───────┬───────┐
│ a   ┆ b   ┆ c   ┆ a_min ┆ a_max │
│ --- ┆ --- ┆ --- ┆ ---   ┆ ---   │
│ i64 ┆ i64 ┆ i64 ┆ i64   ┆ i64   │
╞═════╪═════╪═════╪═══════╪═══════╡
│ 1   ┆ 1   ┆ 1   ┆ 1     ┆ 2     │
│ 2   ┆ 1   ┆ 2   ┆ 1     ┆ 2     │
│ 3   ┆ 2   ┆ 3   ┆ 3     ┆ 3     │
└─────┴─────┴─────┴───────┴───────┘

```

class tubular.ArbitraryImputer(impute_value: int | float | str | bool, columns: str | list[str], **kwargs: bool | None)[source]

Bases: BaseImputer

Transformer to impute null values with an arbitrary pre-defined value.

impute_value

Value to impute nulls with.

Type:

int or float or str or bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> arbitrary_imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5) >>> arbitrary_imputer ArbitraryImputer(columns=[‘a’, ‘b’], impute_value=5)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = arbitrary_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ArbitraryImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'impute_value': 5}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 5, 'b': 5}}}
>>> ArbitraryImputer.from_json(json_dump)
ArbitraryImputer(columns=['a', 'b'], impute_value=5)

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Impute missing values with the supplied impute_value.

Parameters:

X (DataFrame) – Data containing columns to impute.

Returns:

  • X (DataFrame) – Transformed input X with nulls imputed with the specified impute_value, for the specified columns.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> test_df = pl.DataFrame({“a” ([1, None, 2], “b”: [3, None, 4]}))

  • >>> imputer = ArbitraryImputer(columns=[“a”, “b”], impute_value=5)

  • >>> imputer.transform(test_df)

  • shape ((3, 2))

  • ┌─────┬─────┐

  • │ a ┆ b │

  • │ — ┆ — │

  • │ i64 ┆ i64 │

  • ╞═════╪═════╡

  • │ 1 ┆ 3 │

  • │ 5 ┆ 5 │

  • │ 2 ┆ 4 │

  • └─────┴─────┘

  • ```

class tubular.BetweenDatesTransformer(columns: ]], new_column_name: str, drop_original: bool = False, lower_inclusive: bool = True, upper_inclusive: bool = True, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Transformer to generate a boolean column indicating if one date is between two others.

If any row has column_lower greater than column_upper, the output column for that row will be null instead of raising a warning.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

column_lowerstr

Name of date column to subtract. This attribute is not for use in any method, use ‘columns’ instead. Here only as a fix to allow string representation of transformer.

column_upperstr

Name of date column to subtract from. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.

column_betweenstr

Name of column to check if it’s values fall between column_lower and column_upper. This attribute is not for use in any method, use ‘columns instead. Here only as a fix to allow string representation of transformer.

columnslist

Contains the names of the columns to compare in the order [column_lower, column_between column_upper].

new_column_namestr

new_column_name argument passed when initialising the transformer.

lower_inclusivebool

lower_inclusive argument passed when initialising the transformer.

upper_inclusivebool

upper_inclusive argument passed when initialising the transformer.

drop_original: bool

indicates whether to drop original columns.

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=True, … ) BetweenDatesTransformer(columns=[‘a’, ‘b’, ‘c’],

new_column_name=’b_between_a_c’)

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = BetweenDatesTransformer( … columns=[“a”, “b”, “c”], … new_column_name=”b_between_a_c”, … lower_inclusive=True, … upper_inclusive=False, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘BetweenDatesTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’, ‘c’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘b_between_a_c’, ‘drop_original’: False, ‘lower_inclusive’: True, ‘upper_inclusive’: False}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: FrameT) FrameT[source]

Transform - creates column indicating if middle date is between the other two.

Rows where the lower bound is greater than the upper bound will produce null in the resulting output column for that row.

Parameters:

X (pd/pl/nw.DataFrame) – Data to transform.

Returns:

  • X (pd/pl/nw.DataFrame) – Input X with additional column (self.new_column_name) added. This column is boolean and indicates if the middle column is between the other 2.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = BetweenDatesTransformer(

  • … columns=[“a”, “b”, “c”],

  • … new_column_name=”b_between_a_c”,

  • … lower_inclusive=True,

  • … upper_inclusive=True,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([)

  • … datetime.date(1990, 9, 27),

  • … datetime.date(2005, 10, 7),

  • … datetime.date(2010, 1, 1),

  • … ],

  • … “b” ([)

  • … datetime.date(1991, 5, 22),

  • … datetime.date(2001, 12, 10),

  • … datetime.date(2009, 1, 1),

  • … ],

  • … “c” ([)

  • … datetime.date(1993, 4, 20),

  • … datetime.date(2007, 11, 8),

  • … datetime.date(2008, 1, 1),

  • … ],

  • … },

  • … )

  • >>> transformer.transform(test_df)

  • shape ((3, 4))

  • ┌────────────┬────────────┬────────────┬───────────────┐

  • │ a ┆ b ┆ c ┆ b_between_a_c │

  • │ — ┆ — ┆ — ┆ — │

  • │ date ┆ date ┆ date ┆ bool │

  • ╞════════════╪════════════╪════════════╪═══════════════╡

  • │ 1990-09-27 ┆ 1991-05-22 ┆ 1993-04-20 ┆ true │

  • │ 2005-10-07 ┆ 2001-12-10 ┆ 2007-11-08 ┆ false │

  • │ 2010-01-01 ┆ 2009-01-01 ┆ 2008-01-01 ┆ null │

  • └────────────┴────────────┴────────────┴───────────────┘

  • ```

class tubular.CappingTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to cap numeric values at both or either minimum and maximum values.

For max capping any values above the cap value will be set to the cap. Similarly for min capping any values below the cap will be set to the cap. Only works for numeric columns.

Attributes:

capping_valuesdict[str, CappingValues] or None

Capping values to apply to each column, capping_values argument.

quantilesdict[str, CappingValues] or None

Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.

quantile_capping_valuesdict[str, CappingValues] or None

Capping values learned from quantiles (if provided) to apply to each column.

weights_columnstr or None

weights_column argument.

_replacement_valuesdict[str, CappingValues]

Replacement values when capping is applied. Will be a copy of capping_values.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 10  ┆ 3   ┆ 1   │
│ 15  ┆ 2   ┆ 2   │
│ 18  ┆ 3   ┆ 3   │
│ 20  ┆ 1   ┆ 4   │
└─────┴─────┴─────┘
>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'CappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}
>>> CappingTransformer.from_json(json_dump)
CappingTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) CappingTransformer[source]

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:
  • X (DataFrame) – A dataframe with required columns to be capped.

  • y (None) – Required for pipeline.

Returns:

CappingTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = CappingTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.fit(test_df)
CappingTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.ColumnDtypeSetter(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], dtype: ]], **kwargs: bool)[source]

Bases: BaseTransformer

Transformer to set transform columns in a dataframe to a dtype.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

deprecated

indicates if class has been deprecated

Type:

bool

FITS = False
deprecated = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘ColumnDtypeSetter’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘dtype’: ‘Float32’, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform data.

Parameters:

X (DataFrame) – data to transform.

Returns:

DataFrame

Return type:

transformed data

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2]}) >>> transformer = ColumnDtypeSetter(columns=”a”, dtype=”Float32”) >>> transformer.transform(df) shape: (2, 1) ┌─────┐ │ a │ │ — │ │ f32 │ ╞═════╡ │ 1.0 │ │ 2.0 │ └─────┘

```

class tubular.CompareTwoColumnsTransformer(columns: ]], condition: ]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to compare two columns and generate outcomes based on conditions.

This transformer evaluates a condition between two columns and generates an outcome based on the result.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:

bool

FITS

Indicates whether transform requires fit to be run first.

Type:

bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:

bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:

bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

FITS = False
jsonable = True
lazyframe_compatible = True
ops_map: ClassVar[dict[ConditionEnum, Any]] = {ConditionEnum.EQUAL_TO: <built-in function eq>, ConditionEnum.GREATER_THAN: <built-in function gt>, ConditionEnum.LESS_THAN: <built-in function lt>, ConditionEnum.NOT_EQUAL_TO: <built-in function ne>}
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=ConditionEnum.GREATER_THAN.value, … ) >>> json_dict = transformer.to_json() >>> from pprint import pprint >>> pprint(json_dict, sort_dicts=True) {‘classname’: ‘CompareTwoColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘condition’: ‘>’, ‘copy’: False, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform two columns based on a condition to generate an outcome.

Parameters:

X (DataFrame) – DataFrame containing the columns to be transformed.

Returns:

Transformed DataFrame with the new outcome column.

Return type:

DataFrame

Raises:

TypeError – If the columns are not of a numeric type.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame({“a”: [1, 2, 3], “b”: [3, 2, 1]}) >>> transformer = CompareTwoColumnsTransformer( … columns=[“a”, “b”], … condition=”>”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 3) ┌─────┬─────┬───────┐ │ a ┆ b ┆ a>b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool │ ╞═════╪═════╪═══════╡ │ 1 ┆ 3 ┆ false │ │ 2 ┆ 2 ┆ false │ │ 3 ┆ 1 ┆ true │ └─────┴─────┴───────┘

```

class tubular.DateDifferenceTransformer(columns: ]], new_column_name: str, units: ]] = 'D', drop_original: bool = False, custom_days_divider: int | None = None, **kwargs: bool)[source]

Bases: BaseGenericDateTransformer

Class to transform calculate the difference between 2 date fields in specified units.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DateDifferenceTransformer( … columns=[“a”, “b”], … new_column_name=”bla”, … units=”common_year”, … ) >>> transformer DateDifferenceTransformer(columns=[‘a’, ‘b’], new_column_name=’bla’,

units=’common_year’)

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'bla', 'drop_original': False, 'units': 'common_year', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}
>>> DateDifferenceTransformer.from_json(json_dump)
DateDifferenceTransformer(columns=['a', 'b'], new_column_name='bla',
                          units='common_year')

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DateDifferenceTransformer(columns=[“a”, “b”], new_column_name=”a_diff_b”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DateDifferenceTransformer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'a_diff_b', 'drop_original': False, 'units': 'D', 'custom_days_divider': None}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Calculate the difference between the given fields in the specified units.

Parameters:

X (DataFrame) – Data containing self.columns

Returns:

dataframe with added date difference column

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DateDifferenceTransformer(
...     columns=["a", "b"],
...     new_column_name="a_b_difference_years",
...     units="common_year",
... )
>>> test_df = pl.DataFrame(
...     {
...         "a": [datetime.date(1993, 9, 27), datetime.date(2005, 10, 7)],
...         "b": [datetime.date(1991, 5, 22), datetime.date(2001, 12, 10)],
...     },
... )
>>> transformer.transform(test_df)
shape: (2, 3)
┌────────────┬────────────┬──────────────────────┐
│ a          ┆ b          ┆ a_b_difference_years │
│ ---        ┆ ---        ┆ ---                  │
│ date       ┆ date       ┆ f64                  │
╞════════════╪════════════╪══════════════════════╡
│ 1993-09-27 ┆ 1991-05-22 ┆ -2.353425            │
│ 2005-10-07 ┆ 2001-12-10 ┆ -3.827397            │
└────────────┴────────────┴──────────────────────┘

```

class tubular.DatetimeComponentExtractor(columns: str | list[str], include: ]], **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract numeric datetime components.

Attributes:

columns: List[str]

List of columns for processing

includelist of str

Which numeric datetime components to extract

polars_compatiblebool

Indicates whether transformer has been converted to polars/pandas agnostic framework

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

jsonable: bool

Indicates if transformer supports to/from_json methods

FITS: bool

Indicates whether transform requires fit to be run first

Example:

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … ) >>> transformer DatetimeComponentExtractor(columns=[‘a’], include=[‘hour’, ‘day’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}
>>> DatetimeComponentExtractor.from_json(json_dump)
DatetimeComponentExtractor(columns=['a'], include=['hour', 'day'])

```

FITS = False
INCLUDE_OPTIONS: ClassVar[list[str]] = ['hour', 'day', 'month', 'year']
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

List of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=[“a”, “b”], … include=[“hour”, “day”], … )

>>> transformer.get_feature_names_out()
['a_hour', 'a_day', 'b_hour', 'b_day']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, Any][source]

Convert transformer to JSON format.

Returns:

JSON representation of the transformer

Return type:

dict

Examples

```pycon >>> transformer = DatetimeComponentExtractor( … columns=”a”, … include=[“hour”, “day”], … )

>>> transformer.to_json()
{'tubular_version': '...', 'classname': 'DatetimeComponentExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['hour', 'day']}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform - Extracts numeric datetime components.

Parameters:

X (DataFrame) – Data with columns to extract info from.

Returns:

X – Transformed input X with added columns of extracted information.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> import datetime

>>> transformer = DatetimeComponentExtractor(
...     columns="a",
...     include=["hour", "day"],
... )
>>> test_df = pl.DataFrame(
...     {
...         "a": [
...             datetime.datetime(1993, 9, 27, 14, 30),
...             datetime.datetime(2005, 10, 7, 9, 45),
...         ],
...         "b": [
...             datetime.datetime(1991, 5, 22, 18, 0),
...             datetime.datetime(2001, 12, 10, 23, 59),
...         ],
...     },
... )
>>> transformer.transform(test_df)
shape: (2, 4)
┌─────────────────────┬─────────────────────┬────────┬───────┐
│ a                   ┆ b                   ┆ a_hour ┆ a_day │
│ ---                 ┆ ---                 ┆ ---    ┆ ---   │
│ datetime[μs]        ┆ datetime[μs]        ┆ f32    ┆ f32   │
╞═════════════════════╪═════════════════════╪════════╪═══════╡
│ 1993-09-27 14:30:00 ┆ 1991-05-22 18:00:00 ┆ 14.0   ┆ 27.0  │
│ 2005-10-07 09:45:00 ┆ 2001-12-10 23:59:00 ┆ 9.0    ┆ 7.0   │
└─────────────────────┴─────────────────────┴────────┴───────┘

```

class tubular.DatetimeInfoExtractor(columns: str | list[str], include: ]] | None = None, datetime_mappings: dict[~typing.Annotated[str, beartype.vale.Is[lambda s: ...]], dict[int, str]] | None = None, drop_original: bool | None = False, **kwargs: str | bool)[source]

Bases: BaseDatetimeTransformer

Transformer to extract various features from datetime var.

Attributes:

columns: List[str]

List of columns for processing

includelist of str, default = [“timeofday”, “timeofmonth”, “timeofyear”, “dayofweek”]

Which datetime categorical information to extract

datetime_mappingsdict, default = None

Optional argument to define custom mappings for datetime values.

drop_original: str

indicates whether to drop provided columns post transform

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = DatetimeInfoExtractor( … columns=”a”, … include=”timeofday”, … ) >>> transformer DatetimeInfoExtractor(columns=[‘a’], datetime_mappings={},

include=[‘timeofday’])

>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}

```

DATETIME_ATTR: ClassVar[dict[str, str]] = {'dayofweek': 'weekday', 'timeofday': 'hour', 'timeofmonth': 'day', 'timeofyear': 'month'}
DEFAULT_MAPPINGS: ClassVar[dict[str, dict[int, str]]] = {'dayofweek': {1: 'monday', 2: 'tuesday', 3: 'wednesday', 4: 'thursday', 5: 'friday', 6: 'saturday', 7: 'sunday'}, 'timeofday': {0: 'night', 1: 'night', 2: 'night', 3: 'night', 4: 'night', 5: 'night', 6: 'morning', 7: 'morning', 8: 'morning', 9: 'morning', 10: 'morning', 11: 'morning', 12: 'afternoon', 13: 'afternoon', 14: 'afternoon', 15: 'afternoon', 16: 'afternoon', 17: 'afternoon', 18: 'evening', 19: 'evening', 20: 'evening', 21: 'evening', 22: 'evening', 23: 'evening'}, 'timeofmonth': {1: 'start', 2: 'start', 3: 'start', 4: 'start', 5: 'start', 6: 'start', 7: 'start', 8: 'start', 9: 'start', 10: 'start', 11: 'middle', 12: 'middle', 13: 'middle', 14: 'middle', 15: 'middle', 16: 'middle', 17: 'middle', 18: 'middle', 19: 'middle', 20: 'middle', 21: 'end', 22: 'end', 23: 'end', 24: 'end', 25: 'end', 26: 'end', 27: 'end', 28: 'end', 29: 'end', 30: 'end', 31: 'end'}, 'timeofyear': {1: 'winter', 2: 'winter', 3: 'spring', 4: 'spring', 5: 'spring', 6: 'summer', 7: 'summer', 8: 'summer', 9: 'autumn', 10: 'autumn', 11: 'autumn', 12: 'winter'}}
FITS = False
INCLUDE_OPTIONS: ClassVar[list[str]] = ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek']
RANGE_TO_MAP: ClassVar[dict[str, set[int]]] = {'dayofweek': {1, 2, 3, 4, 5, 6, 7}, 'timeofday': {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23}, 'timeofmonth': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}, 'timeofyear': {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}}
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = DatetimeInfoExtractor( … columns=[“a”, “b”], … include=[“timeofday”, “timeofmonth”], … )

>>> transformer.get_feature_names_out()
['a_timeofday', 'a_timeofmonth', 'b_timeofday', 'b_timeofmonth']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

>>> transformer=DatetimeInfoExtractor(columns='a')
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'DatetimeInfoExtractor', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'dummy', 'drop_original': False, 'include': ['timeofday', 'timeofmonth', 'timeofyear', 'dayofweek'], 'datetime_mappings': {}}, 'fit': {'is_fitted_': True}}
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform - Extracts new features from datetime variables.

Parameters:

X (DataFrame) – Data with columns to extract info from.

Returns:

  • X (DataFrame) – Transformed input X with added columns of extracted information.

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = DatetimeInfoExtractor(

  • … columns=”a”,

  • … include=”timeofmonth”,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)

  • … “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)

  • … },

  • … )

  • >>> transformer.transform(test_df)

  • shape ((2, 3))

  • ┌─────────────────────┬─────────────────────┬───────────────┐

  • │ a ┆ b ┆ a_timeofmonth │

  • │ — ┆ — ┆ — │

  • │ datetime[μs] ┆ datetime[μs] ┆ enum │

  • ╞═════════════════════╪═════════════════════╪═══════════════╡

  • │ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ end │)

  • │ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ start │)

  • └─────────────────────┴─────────────────────┴───────────────┘

  • ```

class tubular.DatetimeSinusoidCalculator(columns: str | list[str], method: ]], units: ]]], period: ]]] = 6.283185307179586, drop_original: bool = False, **kwargs: bool | str)[source]

Bases: BaseDatetimeTransformer

Calculate the sine or cosine of a datetime column in a given unit (e.g hour).

Includes the option to scale period of the sine or cosine to match the natural period of the unit (e.g. 24).

Attributes:

columnsstr or list

Columns to take the sine or cosine of.

methodstr or list

The function to be calculated; either sin, cos or a list containing both.

unitsstr or dict

Which time unit the calculation is to be carried out on. Will take any of ‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’, ‘second’, ‘microsecond’. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.

periodstr, float or dict, default = 2*np.pi

The period of the output in the units specified above. Can be a string or a dict containing key-value pairs of column name and units to be used for that column.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) DatetimeSinusoidCalculator(columns=[‘a’], method=[‘sin’], units=’month’)

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … )

>>> transformer.get_feature_names_out()
['sin_6.283185307179586_month_a']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = DatetimeSinusoidCalculator( … columns=”a”, … method=”sin”, … units=”month”, … ) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘DatetimeSinusoidCalculator’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘new_column_name’: ‘dummy’, ‘drop_original’: False, ‘method’: [‘sin’], ‘units’: ‘month’, ‘period’: 6.283185307179586}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform - creates column containing sine or cosine of another datetime column.

Which function is used is stored in the self.method attribute.

Parameters:
  • X (pd/pl/nw.DataFrame) – Data to transform.

  • return_native_override (Optional[bool]) – Option to override return_native attr in transformer, useful when calling parent methods

Returns:

  • X (pd/pl/nw.DataFrame) – Input X with additional columns added, these are named “<method>_<original_column>”

  • Example

  • ——–

  • ```pycon

  • >>> import polars as pl

  • >>> import datetime

  • >>> transformer = DatetimeSinusoidCalculator(

  • … columns=”a”,

  • … method=”sin”,

  • … units=”month”,

  • … )

  • >>> test_df = pl.DataFrame(

  • … {

  • … “a” ([datetime.datetime(1993, 9, 27), datetime.datetime(2005, 10, 7)],)

  • … “b” ([datetime.datetime(1991, 5, 22), datetime.datetime(2001, 12, 10)],)

  • … },

  • … )

  • >>> transformer.transform(test_df)

  • shape ((2, 3))

  • ┌─────────────────────┬─────────────────────┬───────────────────────────────┐

  • │ a ┆ b ┆ sin_6.283185307179586_month_a │

  • │ — ┆ — ┆ — │

  • │ datetime[μs] ┆ datetime[μs] ┆ f64 │

  • ╞═════════════════════╪═════════════════════╪═══════════════════════════════╡

  • │ 1993-09-27 00 (00:00 ┆ 1991-05-22 00:00:00 ┆ 0.412118 │)

  • │ 2005-10-07 00 (00:00 ┆ 2001-12-10 00:00:00 ┆ -0.544021 │)

  • └─────────────────────┴─────────────────────┴───────────────────────────────┘

  • ```

class tubular.DifferenceTransformer(columns: ]], **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs subtraction operation between two columns.

This transformer allows performing subtraction between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The second column is subtracted from the first.

Type:

ListOfTwoStrs

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> transformer.columns [‘a’, ‘b’]

```

FITS = False
get_feature_names_out() list[str][source]

Get the names of the output features.

Returns:

List containing the name of the new column created by the transformation.

Return type:

list[str]

jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the DataFrame by applying the subtraction operation between two columns.

Parameters:

X (DataFrame) – DataFrame containing the columns to operate on.

Returns:

Transformed DataFrame with the new column containing the subtraction results.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = DifferenceTransformer(columns=[“a”, “b”]) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬───────────┐ │ a ┆ b ┆ a_minus_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ i64 │ ╞═════╪═════╪═══════════╡ │ 100 ┆ 80 ┆ 20 │ │ 200 ┆ 150 ┆ 50 │ │ 300 ┆ 200 ┆ 100 │ └─────┴─────┴───────────┘

```

class tubular.GroupRareLevelsTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, cut_off_percent: ]] = 0.01, weights_column: str | None = None, rare_level_name: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] = 'rare', record_rare_levels: bool = True, unseen_levels_to_rare: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, WeightColumnMixin

Group together rare levels of nominal variables into a new rare level.

Rare levels are defined by a cut off percentage, which can either be based on the number of rows or sum of weights. Any levels below this cut off value will be grouped into the rare level.

cut_off_percent

Cut off percentage (either in terms of number of rows or sum of weight) for a given nominal level to be considered rare.

Type:

float

non_rare_levels

Created in fit. A dict of non-rare levels (i.e. levels with more than cut_off_percent weight or rows) that is used to identify rare levels in transform.

Type:

dict

rare_level_name

Must be of the same type as columns. Label for the new nominal level that will be added to group together rare levels (as defined by cut_off_percent).

Type:

any

record_rare_levels

Should the ‘rare’ levels that will be grouped together be recorded? If not they will be lost after the fit and the only information remaining will be the ‘non’rare’ levels.

Type:

bool

rare_levels_record

Only created (in fit) if record_rare_levels is True. This is dict containing a list of levels that were grouped into ‘rare’ for each column the transformer was applied to.

Type:

dict

weights_column

Name of weights columns to use if cut_off_percent should be in terms of sum of weight not number of rows.

Type:

str

unseen_levels_to_rare

If True, unseen levels in new data will be passed to rare, if set to false they will be left unchanged.

Type:

bool

training_data_levels

Dictionary containing the set of values present in the training data for each column in self.columns. It will only exist in if unseen_levels_to_rare is set to False.

Type:

dict[set]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> GroupRareLevelsTransformer( … columns=”a”, … cut_off_percent=0.02, … rare_level_name=”rare_level”, … ) GroupRareLevelsTransformer(columns=[‘a’], cut_off_percent=0.02,

rare_level_name=’rare_level’)

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) GroupRareLevelsTransformer[source]

Record non-rare levels for categorical variables.

When transform is called, only levels records in non_rare_levels during fit will remain unchanged - all other levels will be grouped. If record_rare_levels is True then the rare levels will also be recorded.

The label for the rare levels must be of the same type as the columns.

Parameters:
  • X (DataFrame) – Data to identify non-rare levels from.

  • y (Series or LazyFrame or None, default = None) – Optional argument only required for the transformer to work with sklearn pipelines.

Returns:

GroupRareLevelsTransformer

Return type:

fitted class instance

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.02,
...     rare_level_name="rare_level",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})
>>> transformer.fit(test_df)
GroupRareLevelsTransformer(columns=['a'], cut_off_percent=0.02,
                           rare_level_name='rare_level')

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> import tests.test_data as d

>>> df = d.create_df_8("pandas")
>>> x = GroupRareLevelsTransformer(
...     columns=["b", "c"], cut_off_percent=0.4, unseen_levels_to_rare=False
... )
>>> x.fit(df)
GroupRareLevelsTransformer(columns=['b', 'c'], cut_off_percent=0.4,
                           unseen_levels_to_rare=False)
>>> x.to_json()
{'tubular_version': ..., 'classname': 'GroupRareLevelsTransformer', 'init': {'columns': ['b', 'c'], 'copy': False, 'verbose': False, 'return_native': True, 'cut_off_percent': 0.4, 'weights_column': None, 'rare_level_name': 'rare', 'record_rare_levels': True, 'unseen_levels_to_rare': False}, 'fit': {'is_fitted_': True, 'non_rare_levels': {'b': ['w'], 'c': ['a']}, 'training_data_levels': {'b': ['w', 'x', 'y', 'z'], 'c': ['a', 'b', 'c']}, 'rare_levels_record': {'b': ['x', 'y', 'z'], 'c': ['b', 'c']}}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Group rare levels together into a new ‘rare’ level.

Parameters:

X (DataFrame) – Data to with catgeorical variables to apply rare level grouping to.

Returns:

X – Transformed input X with rare levels grouped for into a new rare level.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = GroupRareLevelsTransformer(
...     columns="a",
...     cut_off_percent=0.5,
...     rare_level_name="rare_level",
... )
>>> test_df = pl.DataFrame({"a": ["x", "x", "y"], "b": ["w", "z", "z"]})
>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (3, 2)
┌────────────┬─────┐
│ a          ┆ b   │
│ ---        ┆ --- │
│ str        ┆ str │
╞════════════╪═════╡
│ x          ┆ w   │
│ x          ┆ z   │
│ rare_level ┆ z   │
└────────────┴─────┘

```

class tubular.LowerCaseTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to lower case of text columns.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = LowerCaseTransformer( … columns=[“a”], … ) >>> transformer LowerCaseTransformer(columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'LowerCaseTransformer',
 'fit': {'is_fitted_': False},
 'init': {'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> LowerCaseTransformer.from_json(json_dump)
LowerCaseTransformer(columns=['a'])

```

FITS = False
get_transform_exprs() list[Expr][source]

Get transform expressions.

Returns:

list[nw.Expr]

Return type:

transform expressions for class

jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Lower case of text in given columns.

Parameters:

X (DataFrame) – Data containing columns to lowercase.

Returns:

X – Transformed input X with text lowercased in given columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [“HeLlO”, None, “ HI”]}) >>> transformer = LowerCaseTransformer(columns=”a”) >>> transformer.transform(test_df) shape: (3, 1) ┌───────┐ │ a │ │ — │ │ str │ ╞═══════╡ │ hello │ │ null │ │ hi │ └───────┘

```

class tubular.MappingTransformer(mappings: dict[str, dict[Any, Any]], return_dtypes: dict[str, RETURN_DTYPES] | None = None, **kwargs: bool | None)[source]

Bases: BaseMappingTransformer, BaseMappingTransformMixin

Transformer to map values in columns to other values e.g. to merge two levels into one.

Note, the MappingTransformer does not require ‘self-mappings’ to be defined i.e. if you want to map a value to itself, you can omit this value from the mappings rather than having to map it to itself.

This transformer inherits from BaseMappingTransformMixin as well as the BaseMappingTransformer, BaseMappingTransformer performs standard checks, while BasemappingTransformMixin handles the actual logic.

Parameters:
  • mappings (dict) – Dictionary containing column mappings. Each value in mappings should be a dictionary of key (column to apply mapping to) value (mapping dict for given columns) pairs. For example the following dict {‘a’: {1: 2, 3: 4}, ‘b’: {‘a’: 1, ‘b’: 2}} would specify a mapping for column a of 1->2, 3->4 and a mapping for column b of ‘a’->1, b->2.

  • return_dtype (Optional[Dict[str, RETURN_DTYPES]]) – Dictionary of col:dtype for returned columns

  • **kwargs – Arbitrary keyword arguments passed onto BaseMappingTransformer.init method.

mappings

Dictionary of mappings for each column individually. The dict passed to mappings in init is set to the mappings attribute.

Type:

dict

mappings_from_null

dict storing what null values will be mapped to. Generally best to use an imputer, but this functionality is useful for inverting pipelines.

Type:

dict[str, Any]

return_dtypes

Dictionary of col:dtype for returned columns

Type:

dict[str, RETURN_DTYPES]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> transformer = MappingTransformer( … mappings={“a”: {“Y”: 1, “N”: 0}}, … return_dtypes={“a”: “Int8”}, … ) >>> transformer MappingTransformer(mappings={‘a’: {‘N’: 0, ‘Y’: 1}},

return_dtypes={‘a’: ‘Int8’})

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MappingTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'mappings': {'a': {'Y': 1, 'N': 0}}, 'return_dtypes': {'a': 'Int8'}}, 'fit': {'is_fitted_': True}}
>>> MappingTransformer.from_json(json_dump)
MappingTransformer(mappings={'a': {'N': 0, 'Y': 1}},
                   return_dtypes={'a': 'Int8'})

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the input data X according to the mappings in the mappings attribute dict.

This method calls the BaseMappingTransformMixin.transform. Note, this transform method is different to some of the transform methods in the nominal module, even though they also use the BaseMappingTransformMixin.transform method. Here, if a value does not exist in the mapping it is unchanged.

Parameters:

X (DataFrame) – Data with nominal columns to transform.

Returns:

X – Transformed input X with levels mapped according to mappings dict.

Return type:

DataFrame

Examples

``pycon >>> import polars as pl

>>> transformer = MappingTransformer(
...   mappings={'a': {'Y': 1, 'N': 0}},
...   return_dtypes={"a":"Int8"},
...    )
>>> test_df=pl.DataFrame({'a': ["Y", "N"], 'b': [3,4]})
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i8  ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
│ 0   ┆ 4   │
└─────┴─────┘

```

class tubular.MeanImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: WeightColumnMixin, BaseImputer

Transformer to impute missing values with the mean of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (mean) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> mean_imputer = MeanImputer( … columns=[“a”, “b”], … ) >>> mean_imputer MeanImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})
>>> _ = mean_imputer.fit(test_df)
>>> json_dump = mean_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}
>>> MeanImputer.from_json(json_dump)
MeanImputer(columns=['a', 'b'])

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) MeanImputer[source]

Calculate mean values to impute with from X.

Parameters:
  • X (DataFrame) – Data to “learn” the mean values from.

  • y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MeanImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MeanImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.MeanResponseTransformer(columns: str | list[str] | None = None, weights_column: str | None = None, prior: ]] = 0, level: float | int | str | list | None = None, unseen_level_handling: float | int | Literal['mean', 'median', 'min', 'max'] | None = None, return_type: Literal['Float32', 'Float64'] = 'Float32', drop_original: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, WeightColumnMixin, DropOriginalMixin

Convert categorical variables to numeric by mapping levels to the mean response for level.

For a continuous or binary response the categorical columns specified will have values replaced with the mean response for each category.

For an n > 1 level categorical response, up to n binary responses can be created, which in turn can then be used to encode each categorical column specified. This will generate up to n * len(columns) new columns, of with names of the form {column}_{response_level}. The original columns will be removed from the dataframe. This functionality is controlled using the ‘level’ parameter. Note that the above only works for a n > 1 level categorical response. Do not use ‘level’ parameter for a n = 1 level numerical response. In this case, use the standard mean response transformer without the ‘level’ parameter.

If a categorical variable contains null values these will not be transformed.

The same weights and prior are applied to each response level in the multi-level case.

columns

Categorical columns to encode in the input data.

Type:

str or list

weights_column

Weights column to use when calculating the mean response.

Type:

str or None

prior

Regularisation parameter, can be thought of roughly as the size a category should be in order for its statistics to be considered reliable (hence default value of 0 means no regularisation).

Type:

int, default = 0

level

Parameter to control encoding against a multi-level categorical response. If None the response will be treated as binary or continuous, if ‘all’ all response levels will be encoded against and if it is a list of levels then only the levels specified will be encoded against.

Type:

str, int, float, list or None, default = None

response_levels

Only created in the multi-level case. Generated from level, list of all the response levels to encode against.

Type:

list

mappings

Created in fit. A nested Dict of {column names : column specific mapping dictionary} pairs. Column specific mapping dictionaries contain {initial value : mapped value} pairs.

Type:

dict

mapped_columns

Only created in the multi-level case. A list of the new columns produced by encoded the columns in self.columns against multiple response levels, of the form {column}_{level}.

Type:

list

transformer_dict

Only created in the multi-level case. A dictionary of the form level : transformer containing the mean response transformers for each level to be encoded against.

Type:

dict

unseen_levels_encoding_dict

Dict containing the values (based on chosen unseen_level_handling) derived from the encoded columns to use when handling unseen levels in data passed to transform method.

Type:

dict

return_type

What type to cast return column as. Defaults to float32.

Type:

Literal[‘float32’, ‘float64’]

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> transformer
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')
>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})
>>> _ = transformer.fit(test_df[["a"]], test_df["b"])
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 1, 'level': None, 'unseen_level_handling': 'mean', 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.25, 'y': 0.75}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a'], 'unseen_levels_encoding_dict': {'a': 0.5}}}
>>> MeanResponseTransformer.from_json(json_dump)
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame) MeanResponseTransformer[source]

Identify mapping of categorical levels to mean response values.

If the user specified the weights_column arg in when initialising the transformer the weighted mean response will be calculated using that column.

In the multi-level case this method learns which response levels are present and are to be encoded against.

Parameters:
  • X (DataFrame) – Data to with catgeorical variable columns to transform and also containing response_column column.

  • y (Series or LazyFrame) – Response variable or target.

Returns:

MeanResponseTransformer

Return type:

fitted class instance

Raises:

ValueError – if y contains null values:

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})
>>> transformer.fit(test_df, test_df["target"])
MeanResponseTransformer(columns=['a'], prior=1, unseen_level_handling='mean')

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     unseen_level_handling="mean",
... )
>>> transformer.get_feature_names_out()
['a']
>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level=["x", "y"],
...     unseen_level_handling="mean",
... )
>>> transformer.get_feature_names_out()
['a_x', 'a_y']
>>> transformer = MeanResponseTransformer(
...     columns="a",
...     prior=1,
...     level="all",
...     unseen_level_handling="mean",
... )
>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...
>>> test_df = pl.DataFrame({"a": ["x", "y", "x"], "b": ["cat", "dog", "rat"]})
>>> _ = transformer.fit(test_df, test_df["b"])
>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = MeanResponseTransformer(columns=["a"])
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [0, 1]})
>>> _ = transformer.fit(test_df[["a"]], test_df["b"])
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'MeanResponseTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None, 'prior': 0, 'level': None, 'unseen_level_handling': None, 'return_type': 'Float32', 'drop_original': True}, 'fit': {'is_fitted_': True, 'mappings': {'a': {'x': 0.0, 'y': 1.0}}, 'return_dtypes': {'a': 'Float32'}, 'column_to_encoded_columns': {'a': ['a']}, 'encoded_columns': ['a']}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply mean response encoding stored in the mappings attribute to columns.

Parameters:

X (DataFrame) – Data with nominal columns to transform.

Returns:

X – Transformed input X with levels mapped according to mappings dict.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> # example with no prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=0, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})
>>> _ = transformer.fit(test_df, test_df["target"])
>>> transformer.transform(test_df)
shape: (2, 3)
┌─────┬─────┬────────┐
│ a   ┆ b   ┆ target │
│ --- ┆ --- ┆ ---    │
│ f32 ┆ i64 ┆ i64    │
╞═════╪═════╪════════╡
│ 0.0 ┆ 1   ┆ 0      │
│ 1.0 ┆ 2   ┆ 1      │
└─────┴─────┴────────┘

# example with prior >>> transformer = MeanResponseTransformer( … columns=”a”, … prior=1, … unseen_level_handling=”mean”, … )

>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2], "target": [0, 1]})
>>> _ = transformer.fit(test_df, test_df["target"])
>>> transformer.transform(test_df)
shape: (2, 3)
┌──────┬─────┬────────┐
│ a    ┆ b   ┆ target │
│ ---  ┆ --- ┆ ---    │
│ f32  ┆ i64 ┆ i64    │
╞══════╪═════╪════════╡
│ 0.25 ┆ 1   ┆ 0      │
│ 0.75 ┆ 2   ┆ 1      │
└──────┴─────┴────────┘

```

class tubular.MedianImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the median of the supplied columns.

impute_values_

Created during fit method. Dictionary of float / int (median) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> median_imputer = MedianImputer( … columns=[“a”, “b”], … ) >>> median_imputer MedianImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})
>>> _ = median_imputer.fit(test_df)
>>> json_dump = median_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'MedianImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0.0, 'b': 1.0}}}
>>> MedianImputer.from_json(json_dump)
MedianImputer(columns=['a', 'b'])

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) MedianImputer[source]

Calculate median values to impute with from X.

Parameters:
  • X (DataFrame) – Data to “learn” the median values from.

  • y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance.

Return type:

MedianImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = MedianImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ f64 ┆ f64 │ ╞═════╪═════╡ │ 1.0 ┆ 3.0 │ │ 1.5 ┆ 3.5 │ │ 2.0 ┆ 4.0 │ └─────┴─────┘

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.ModeImputer(columns: str | list[str], weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseImputer, WeightColumnMixin

Transformer to impute missing values with the mode of the supplied columns.

If mode is NaN, a warning will be raised.

impute_values_

Created during fit method. Dictionary of float / int (mode) values of columns in the columns attribute. Keys of impute_values_ give the column names.

Type:

dict

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> mode_imputer = ModeImputer( … columns=[“a”, “b”], … ) >>> mode_imputer ModeImputer(columns=[‘a’, ‘b’])

>>> # once fit, transformer can also be dumped to json and reinitialised
>>> test_df = pl.DataFrame({"a": [0, None], "b": [None, 1]})
>>> _ = mode_imputer.fit(test_df)
>>> json_dump = mode_imputer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ModeImputer', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True, 'weights_column': None}, 'fit': {'is_fitted_': True, 'impute_values_': {'a': 0, 'b': 1}}}
>>> ModeImputer.from_json(json_dump)
ModeImputer(columns=['a', 'b'])

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) ModeImputer[source]

Calculate mode values to impute with from X.

In the event of a tie, the highest modal value will be returned.

Parameters:
  • X (DataFrame) – Data to “learn” the mode values from.

  • y (Series or LazyFrame or None, default = None) – Not required.

Returns:

fitted class instance

Return type:

ModeImputer

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = ModeImputer(columns=[“a”, “b”]) >>> imputer = imputer.fit(test_df) >>> imputer.transform(test_df) shape: (3, 2) ┌─────┬─────┐ │ a ┆ b │ │ — ┆ — │ │ i64 ┆ i64 │ ╞═════╪═════╡ │ 1 ┆ 3 │ │ 2 ┆ 4 │ │ 2 ┆ 4 │ └─────┴─────┘

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
class tubular.NullIndicator(columns: ]] | str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Class to create a binary indicator column for null values.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> null_indicator = NullIndicator( … columns=[“a”, “b”], … ) >>> null_indicator NullIndicator(columns=[‘a’, ‘b’])

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = null_indicator.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'NullIndicator', 'init': {'columns': ['a', 'b'], 'copy': False, 'verbose': False, 'return_native': True}, 'fit': {'is_fitted_': True}}
>>> NullIndicator.from_json(json_dump)
NullIndicator(columns=['a', 'b'])

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Create new columns indicating the position of null values for each variable in self.columns.

Parameters:

X (DataFrame) – Data to add indicators to.

Returns:

dataframe with null indicator columns added

Return type:

DataFrame

Examples

——–, ```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [1, None, 2], “b”: [3, None, 4]}) >>> imputer = NullIndicator(columns=[“a”, “b”]) >>> imputer.transform(test_df) shape: (3, 4) ┌──────┬──────┬─────────┬─────────┐ │ a ┆ b ┆ a_nulls ┆ b_nulls │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ bool │ ╞══════╪══════╪═════════╪═════════╡ │ 1 ┆ 3 ┆ false ┆ false │ │ null ┆ null ┆ true ┆ true │ │ 2 ┆ 4 ┆ false ┆ false │ └──────┴──────┴─────────┴─────────┘

```

class tubular.OneDKmeansTransformer(columns: str | ~typing.Annotated[list[str], beartype.vale.Is[lambda list_arg: ...]], new_column_name: str, n_init: str | int = 'auto', n_clusters: int = 8, drop_original: bool = False, kmeans_kwargs: dict[str, object] | None = None, **kwargs: bool)[source]

Bases: BaseNumericTransformer, DropOriginalMixin

Generates a new column based on kmeans algorithm.

Transformer runs the kmeans algorithm based on given number of clusters and then identifies the bins’ cuts based on the results. Finally it passes them into the a cut function.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”new”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … ) OneDKmeansTransformer(columns=[‘a’], kmeans_kwargs={‘random_state’: 42},

n_clusters=2, new_column_name=’new’)

```

FITS = True
fit(X: FrameT, y: IntoSeriesT | None = None) OneDKmeansTransformer[source]

Fit transformer to input data.

Parameters:
  • X (pd/pl.DataFrame) – Dataframe with columns to learn scaling values from.

  • y (None) – Required for pipeline.

Returns:

Fitted class instance.

Return type:

OneDKmeansTransformer

Raises:

ValueError: – if columns in X contain missing values.

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )
>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
>>> transformer.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = OneDKmeansTransformer( … columns=”a”, … n_clusters=2, … new_column_name=”kmeans_column”, … drop_original=False, … kmeans_kwargs={“random_state”: 42}, … )

>>> transformer.get_feature_names_out()
['kmeans_column']

```

jsonable = True
lazyframe_compatible = False
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

>>> import polars as pl
>>> x = OneDKmeansTransformer(
... columns='a',
... n_clusters=2,
... new_column_name="new",
... drop_original=False,
... kmeans_kwargs={"random_state": 42},
...    )
>>> test_df=pl.DataFrame({'a': [1,2,3,4],  'b': [5,6,7,8]})
>>> x.fit(test_df)
OneDKmeansTransformer(columns=['a'], kmeans_kwargs={'random_state': 42},
                      n_clusters=2, new_column_name='new')
>>> x.to_json()
{'tubular_version': ..., 'classname': 'OneDKmeansTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'new_column_name': 'new', 'n_init': 'auto', 'n_clusters': 2, 'drop_original': False, 'kmeans_kwargs': {'random_state': 42}}, 'fit': {'is_fitted_': True, 'bins': [3, 4]}}
transform(X: FrameT) FrameT[source]

Generate from input pd/pl.DataFrame (X) bins based on Kmeans results and add this column or columns in X.

Parameters:

X (pl/pd.DataFrame) – Data to transform.

Returns:

X – Input X with additional cluster column added.

Return type:

pl/pd.DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneDKmeansTransformer(
...     columns="a",
...     n_clusters=2,
...     new_column_name="new",
...     drop_original=False,
...     kmeans_kwargs={"random_state": 42},
... )
>>> test_df = pl.DataFrame({"a": [1, 2, 3, 4], "b": [5, 6, 7, 8]})
>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ new │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 0   │
│ 2   ┆ 6   ┆ 0   │
│ 3   ┆ 7   ┆ 0   │
│ 4   ┆ 8   ┆ 1   │
└─────┴─────┴─────┘

```

class tubular.OneHotEncodingTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]] | None = None, wanted_values: dict[str, ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]]] | None = None, separator: str = '_', drop_original: bool = False, **kwargs: bool)[source]

Bases: DropOriginalMixin, BaseTransformer

Transformer to convert categorical variables into dummy columns.

separator

Separator used in naming for dummy columns.

Type:

str

drop_original

Should original columns be dropped after creating dummy fields?

Type:

bool

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> transformer
OneHotEncodingTransformer(columns=['a'])
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})
>>> _ = transformer.fit(test_df)
>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}
>>> OneHotEncodingTransformer.from_json(json_dump)
OneHotEncodingTransformer(columns=['a'])

```

FITS = True
MAX_LEVELS = 100
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) OneHotEncodingTransformer[source]

Get list of levels for each column to be transformed.

This defines which dummy columns will be created in transform.

Parameters:
  • X (DataFrame) – Data to identify levels from.

  • y (None) – Ignored. This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Returns:

OneHotEncodingTransformer

Return type:

fitted class instance

Raises:

ValueError – if column has >100 levels:

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})
>>> transformer.fit(test_df)
OneHotEncodingTransformer(columns=['a'])

```

get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
...     wanted_values={"a": ["cat", "dog"]},
... )
>>> transformer.get_feature_names_out()
['a_cat', 'a_dog']
>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> transformer.get_feature_names_out()
Traceback (most recent call last):
...
sklearn.exceptions.NotFittedError: ...
>>> test_df = pl.DataFrame({"a": ["cat", "dog", "rat"]})
>>> _ = transformer.fit(test_df)
>>> transformer.get_feature_names_out()
['a_cat', 'a_dog', 'a_rat']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(columns=["a"])
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": ["w", "z"]})
>>> _ = transformer.fit(test_df)
>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'OneHotEncodingTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'wanted_values': None, 'separator': '_', 'drop_original': False}, 'fit': {'is_fitted_': True, 'categories_': {'a': ['x', 'y']}, 'new_feature_names_': {'a': ['a_x', 'a_y']}}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, return_native_override: bool | None = None) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Create new dummy columns from categorical fields.

Parameters:
  • X (DataFrame) – Data to apply one hot encoding to.

  • return_native_override (Optional[bool]) – controls whether transformer returns narwhals or native type.

  • return_native_override

  • transformer (option to override return_native attr in)

  • parent (useful when calling)

  • methods

Returns:

X_transformed – Transformed input X with dummy columns derived from categorical columns added. If drop_original = True then the original categorical columns that the dummies are created from will not be in the output X.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = OneHotEncodingTransformer(
...     columns="a",
... )
>>> test_df = pl.DataFrame({"a": ["x", "y"], "b": [1, 2]})
>>> _ = transformer.fit(test_df)
>>> transformer.transform(test_df)
shape: (2, 4)
┌─────┬─────┬───────┬───────┐
│ a   ┆ b   ┆ a_x   ┆ a_y   │
│ --- ┆ --- ┆ ---   ┆ ---   │
│ str ┆ i64 ┆ bool  ┆ bool  │
╞═════╪═════╪═══════╪═══════╡
│ x   ┆ 1   ┆ true  ┆ false │
│ y   ┆ 2   ┆ false ┆ true  │
└─────┴─────┴───────┴───────┘

```

class tubular.OutOfRangeNullTransformer(capping_values: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, quantiles: dict[str, ~typing.Annotated[list[int | float | None], beartype.vale.Is[lambda list_arg: ...]]] | None = None, weights_column: str | None = None, **kwargs: bool)[source]

Bases: BaseCappingTransformer

Transformer to set values outside of a range to null.

This transformer sets the cut off values in the same way as the CappingTransformer. So either the user can specify them directly in the capping_values argument or they can be calculated in the fit method, if the user supplies the quantiles argument.

Attributes:

capping_valuesdict[str, CappingValues] or None

Capping values to apply to each column, capping_values argument.

quantilesdict[str, CappingValues] or None

Quantiles to set capping values at from input data. Will be empty after init, values populated when fit is run.

quantile_capping_valuesdict[str, CappingValues] or None

Capping values learned from quantiles (if provided) to apply to each column.

weights_columnstr or None

weights_column argument.

_replacement_valuesdict[str, CappingValues]

Replacement values when capping is applied. This will contain nulls for each column.

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     capping_values={"a": [10, 20], "b": [1, 3]},
... )
>>> transformer
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

# transform method is inherited so also demo that here >>> test_df = pl.DataFrame()

>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.transform(test_df)
shape: (4, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64 │
╞══════╪══════╪═════╡
│ null ┆ null ┆ 1   │
│ 15   ┆ 2    ┆ 2   │
│ 18   ┆ null ┆ 3   │
│ null ┆ 1    ┆ 4   │
└──────┴──────┴─────┘
>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'OutOfRangeNullTransformer', 'init': {'copy': False, 'verbose': False, 'return_native': True, 'capping_values': {'a': [10, 20], 'b': [1, 3]}, 'quantiles': None, 'weights_column': None}, 'fit': {'is_fitted_': False}}
>>> OutOfRangeNullTransformer.from_json(json_dump)
OutOfRangeNullTransformer(capping_values={'a': [10, 20], 'b': [1, 3]})

```

FITS = True
fit(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame, y: Series | Series | Series | LazyFrame | LazyFrame | None = None) OutOfRangeNullTransformer[source]

Learn capping values from input data X.

Calculates the quantiles to cap at given the quantiles dictionary supplied when initialising the transformer. Saves learnt values in the capping_values attribute.

Parameters:
  • X (DataFrame) – A dataframe with required columns to be capped.

  • y (None) – Required for pipeline.

Returns:

OutOfRangeNullTransformer

Return type:

fitted instance of class

Example

```pycon >>> import polars as pl

>>> transformer = OutOfRangeNullTransformer(
...     quantiles={"a": [0.01, 0.99], "b": [0.05, 0.95]},
... )
>>> test_df = pl.DataFrame({"a": [1, 15, 18, 25], "b": [6, 2, 7, 1], "c": [1, 2, 3, 4]})
>>> transformer.fit(test_df)
OutOfRangeNullTransformer(quantiles={'a': [0.01, 0.99], 'b': [0.05, 0.95]})

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
static set_replacement_values(capping_values: dict[str, list[int | float | None]]) dict[str, list[bool | None]][source]

Set the _replacement_values to have all null values.

Keeps the existing keys in the _replacement_values dict and sets all values (except None) in the lists to np.NaN. Any None values remain in place.

Returns:

replacement_values

Return type:

replacement values for OutOfRangeNullTransformer

Examples

```pycon >>> import polars as pl

>>> capping_values = {"a": [0.1, 0.2], "b": [None, 10]}
>>> OutOfRangeNullTransformer.set_replacement_values(capping_values)
{'a': [None, None], 'b': [False, None]}

```

class tubular.RatioTransformer(columns: ]], return_dtype: ]] = 'Float32', **kwargs: bool | None)[source]

Bases: BaseNumericTransformer

Transformer that performs division operation between two columns.

This transformer allows performing division between two columns in a DataFrame and stores the result in a new column.

columns

List of exactly two column names to operate on. The first column is the numerator, and the second column is the denominator.

Type:

ListOfTwoStrs

return_dtype

The dtype of the resulting column, either ‘Float32’ or ‘Float64’.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> transformer.columns [‘a’, ‘b’] >>> transformer.return_dtype ‘Float32’

```

FITS = False
get_feature_names_out() list[str][source]

Get the names of the output features.

Returns:

List containing the name of the new column created by the transformation.

Return type:

list[str]

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> ratio_transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> ratio_transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘RatioTransformer’, ‘init’: {‘columns’: [‘a’, ‘b’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘return_dtype’: ‘Float32’}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Transform the DataFrame by applying the division operation between two columns.

Parameters:

X (DataFrame) – DataFrame containing the columns to operate on.

Returns:

Transformed DataFrame with the new column containing the division results.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> transformer = RatioTransformer(columns=[“a”, “b”], return_dtype=”Float32”) >>> test_df = pl.DataFrame({“a”: [100, 200, 300], “b”: [80, 150, 200]}) >>> transformer.transform(test_df) shape: (3, 3) ┌─────┬─────┬────────────────┐ │ a ┆ b ┆ a_divided_by_b │ │ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ f32 │ ╞═════╪═════╪════════════════╡ │ 100 ┆ 80 ┆ 1.25 │ │ 200 ┆ 150 ┆ 1.333333 │ │ 300 ┆ 200 ┆ 1.5 │ └─────┴─────┴────────────────┘

```

class tubular.RemoveCharactersTransformer(columns: str | ~typing.Annotated[list, beartype.vale.Is[lambda list_arg: ...]], characters: list[str], **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer class to remove characters from text columns.

characters

list of characters to remove from text columns.

Type:

list[str]

characters_formatted

characters attr formatted into regex string.

Type:

str

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

return_native

Controls whether transformer returns narwhals or native pandas/polars type

Type:

bool, default = True

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”\d”]) >>> transformer RemoveCharactersTransformer(characters=[’\d’], columns=[‘a’])

>>> json_dump = transformer.to_json()
>>> pprint(json_dump)
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['\\d'],
          'columns': ['a'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> RemoveCharactersTransformer.from_json(json_dump)
RemoveCharactersTransformer(characters=['\\d'], columns=['a'])

```

FITS = False
get_transform_exprs() list[Expr][source]

Get transform expressions.

Returns:

list[nw.Expr]

Return type:

transform expressions for class

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RemoveCharactersTransformer(columns=[“a”, “b”], characters=[“a”])

>>> pprint(transformer.to_json())
{'classname': 'RemoveCharactersTransformer',
 'fit': {'is_fitted_': False},
 'init': {'characters': ['a'],
          'columns': ['a', 'b'],
          'copy': False,
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Strip unwanted characters from specified columns.

Parameters:

X (DataFrame) – Data containing columns to strip.

Returns:

X – Transformed input X with characters stripped from specified columns.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl >>> test_df = pl.DataFrame({“a”: [” 8hi!”, None, “9999hello “]}) >>> transformer = RemoveCharactersTransformer(columns=[“a”], characters=[”W”, “s”]) >>> transformer.transform(test_df) shape: (3, 1) ┌───────────┐ │ a │ │ — │ │ str │ ╞═══════════╡ │ 8hi │ │ null │ │ 9999hello │ └───────────┘

```

class tubular.RenameColumnsTransformer(columns: ]] | str, new_column_names: dict[str, str], drop_original: bool = True, **kwargs: bool)[source]

Bases: BaseTransformer, DropOriginalMixin

Transformer to rename a given set of columns.

This can be useful for personalising the auto-output names from other transformers, or for creating a few different versions of a given column to undergo separate paths of logic in a pipeline (as the expression logic effectively creates duplicates of the column).

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> transformer RenameColumnsTransformer(columns=[‘a’], new_column_names={‘a’: ‘new_a’})

>>> # transformer can also be dumped to json and reinitialised
>>> json_dump = transformer.to_json()
>>> pprint(json_dump, sort_dicts=True)
{'classname': 'RenameColumnsTransformer',
 'fit': {'is_fitted_': True},
 'init': {'columns': ['a'],
          'copy': False,
          'drop_original': True,
          'new_column_names': {'a': 'new_a'},
          'return_native': True,
          'verbose': False},
 'tubular_version': ...}
>>> RenameColumnsTransformer.from_json(json_dump)
RenameColumnsTransformer(columns=['a'], new_column_names={'a': 'new_a'})

```

FITS = False
get_feature_names_out() list[str][source]

List features modified/created by the transformer.

Returns:

list of features modified/created by the transformer

Return type:

list[str]

Examples

```pycon >>> transformer = RenameColumnsTransformer( … columns=[“a”, “b”], … new_column_names={“a”: “new_a”, “b”: “new_b”}, … )

>>> transformer.get_feature_names_out()
['new_a', 'new_b']

```

jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = RenameColumnsTransformer( … columns=”a”, new_column_names={“a”: “new_a”} … ) # noqa: E501 >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘RenameColumnsTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’],

‘copy’: False, ‘drop_original’: True, ‘new_column_names’: {‘a’: ‘new_a’}, ‘return_native’: True, ‘verbose’: False},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Create column copies.

Parameters:

X (DataFrame) – Data to apply mappings to.

Returns:

X – Transformed input X with columns set to value.

Return type:

DataFrame

Raises:

ValueError – if new_column_names values are already present in X:

Examples

```pycon >>> import polars as pl

>>> transformer = RenameColumnsTransformer(
...     columns="a", new_column_names={"a": "new_a"}
... )  # noqa: E501
>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬───────┐
│ b   ┆ new_a │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 4   ┆ 1     │
│ 5   ┆ 2     │
│ 6   ┆ 3     │
└─────┴───────┘

```

class tubular.SetValueTransformer(columns: ]] | str, value: int | float | str | bool | None, **kwargs: bool)[source]

Bases: BaseTransformer

Transformer to set value of column(s) to a given value.

This should be used if columns need to be set to a constant value.

built_from_json

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

Type:

bool

polars_compatible

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

Type:

bool

jsonable

class attribute, indicates if transformer supports to/from_json methods

Type:

bool

FITS

class attribute, indicates whether transform requires fit to be run first

Type:

bool

lazyframe_compatible

class attribute, indicates whether transformer works with lazyframes

Type:

bool

Examples

```pycon >>> SetValueTransformer(columns=”a”, value=1) SetValueTransformer(columns=[‘a’], value=1)

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = SetValueTransformer(columns=”a”, value=1) >>> transformer.to_json() {‘tubular_version’: …, ‘classname’: ‘SetValueTransformer’, ‘init’: {‘columns’: [‘a’], ‘copy’: False, ‘verbose’: False, ‘return_native’: True, ‘value’: 1}, ‘fit’: {’is_fitted_’: True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Set columns to value.

Parameters:

X (DataFrame) – Data to apply mappings to.

Returns:

X – Transformed input X with columns set to value.

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = SetValueTransformer(columns="a", value=1)
>>> test_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> transformer.transform(test_df)
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i32 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 1   ┆ 5   │
│ 1   ┆ 6   │
└─────┴─────┘

```

class tubular.ToDatetimeTransformer(columns: str | list[str], time_format: str | None = None, **kwargs: bool)[source]

Bases: BaseTransformer

Class to transform convert specified columns to datetime.

Class simply uses the pd.to_datetime method on the specified columns.

Attributes:

built_from_json: bool

indicates if transformer was reconstructed from json, which limits it’s supported functionality to .transform

polars_compatiblebool

class attribute, indicates whether transformer has been converted to polars/pandas agnostic narwhals framework

jsonable: bool

class attribute, indicates if transformer supports to/from_json methods

FITS: bool

class attribute, indicates whether transform requires fit to be run first

lazyframe_compatible: bool

class attribute, indicates whether transformer works with lazyframes

Example:

```pycon >>> transformer = ToDatetimeTransformer( … columns=”a”, … time_format=”%d/%m/%Y”, … ) >>> transformer ToDatetimeTransformer(columns=[‘a’], time_format=’%d/%m/%Y’)

>>> # version will vary for local vs CI, so use ... as generic match
>>> json_dump = transformer.to_json()
>>> json_dump
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Dump transformer to json dict.

Returns:

jsonified transformer. Nested dict containing levels for attributes set at init and fit.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> transformer = ToDatetimeTransformer(columns=”a”, time_format=”%d/%m/%Y”)

>>> # version will vary for local vs CI, so use ... as generic match
>>> transformer.to_json()
{'tubular_version': ..., 'classname': 'ToDatetimeTransformer', 'init': {'columns': ['a'], 'copy': False, 'verbose': False, 'return_native': True, 'time_format': '%d/%m/%Y'}, 'fit': {'is_fitted_': True}}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Convert specified column to datetime using pd.to_datetime.

Parameters:

X (DataFrame) – Data with column to transform.

Returns:

dataframe with provided columns converted to datetime

Return type:

DataFrame

Examples

```pycon >>> import polars as pl

>>> transformer = ToDatetimeTransformer(
...     columns="a",
...     time_format="%d/%m/%Y",
... )
>>> test_df = pl.DataFrame({"a": ["01/02/2020", "10/12/1996"], "b": [1, 2]})
>>> transformer.transform(test_df)
shape: (2, 2)
┌─────────────────────┬─────┐
│ a                   ┆ b   │
│ ---                 ┆ --- │
│ datetime[μs]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2020-02-01 00:00:00 ┆ 1   │
│ 1996-12-10 00:00:00 ┆ 2   │
└─────────────────────┴─────┘

```

class tubular.WhenThenOtherwiseTransformer(columns: ]], when_column: str, then_column: str, **kwargs: bool | None)[source]

Bases: BaseTransformer

Transformer to apply conditional logic across multiple columns.

This transformer evaluates specified columns against a condition and updates with given values based on the results.

polars_compatible

Indicates whether transformer has been converted to polars/pandas agnostic narwhals framework.

Type:

bool

FITS

Indicates whether transform requires fit to be run first.

Type:

bool

jsonable

Indicates if transformer supports to/from_json methods.

Type:

bool

lazyframe_compatible

Indicates whether transformer works with lazyframes.

Type:

bool

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], when_column=”condition_col”, then_column=”update_col” … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```

FITS = False
jsonable = True
lazyframe_compatible = True
polars_compatible = True
to_json() dict[str, dict[str, Any]][source]

Serialize the transformer to a JSON-compatible dictionary.

Returns:

JSON representation of the transformer, including init parameters.

Return type:

dict[str, dict[str, Any]]

Examples

```pycon >>> from pprint import pprint >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, # noqa: E501 … ) >>> pprint(transformer.to_json(), sort_dicts=True) {‘classname’: ‘WhenThenOtherwiseTransformer’,

‘fit’: {’is_fitted_’: True}, ‘init’: {‘columns’: [‘a’, ‘b’],

‘copy’: False, ‘return_native’: True, ‘then_column’: ‘update_col’, ‘verbose’: False, ‘when_column’: ‘condition_col’},

‘tubular_version’: …}

```

transform(X: DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame) DataFrame | DataFrame | LazyFrame | DataFrame | LazyFrame[source]

Apply conditional logic to transform specified columns.

Parameters:

X (DataFrame) – DataFrame containing the columns to be transformed.

Returns:

Transformed DataFrame with updated columns based on conditions.

Return type:

DataFrame

Raises:

TypeError – If the when_column is not of type Boolean or if columns have mismatched types.

Examples

```pycon >>> import polars as pl >>> df = pl.DataFrame( … { … “a”: [1, 2, 3], … “b”: [4, 5, 6], … “condition_col”: [True, False, True], … “update_col”: [10, 20, 30], … } … ) >>> transformer = WhenThenOtherwiseTransformer( … columns=[“a”, “b”], … when_column=”condition_col”, … then_column=”update_col”, … ) >>> transformed_df = transformer.transform(df) >>> print(transformed_df) shape: (3, 4) ┌─────┬─────┬───────────────┬────────────┐ │ a ┆ b ┆ condition_col ┆ update_col │ │ — ┆ — ┆ — ┆ — │ │ i64 ┆ i64 ┆ bool ┆ i64 │ ╞═════╪═════╪═══════════════╪════════════╡ │ 10 ┆ 10 ┆ true ┆ 10 │ │ 2 ┆ 5 ┆ false ┆ 20 │ │ 30 ┆ 30 ┆ true ┆ 30 │ └─────┴─────┴───────────────┴────────────┘

```