Contributing

Thanks for your interest in contributing to this package! No contibution is too small! We’re hoping it can be made even better through community contributions.

Requests and feedback

For any bugs, issues or feature requests please open an issue on the project.

Requirements for contributions

We have some general requirements for all contributions then specific requirements when adding completely new transformers to the package. This is to ensure consistency with the existing codebase.

Set up development environment

First clone the repository;

git clone https://github.com/lvgig/tubular.git
cd tubular

Then install tubular and dependencies for development;

pip install . -r requirements-dev.txt

We use pre-commit for this project which is configured to check that code is formatted with black and passes ruff checks. For a list of ruff rules follwed by this project check .ruff.toml.

To configure pre-commit for your local repository run the following;

pre-commit install

If working in a codespace the dev requirements and precommit will be installed automatically in the dev container.

If you are building the documentation locally you will need the docs/requirements.txt.

General

  • Please try and keep each pull request to one change or feature only

  • Make sure to update the changelog with details of your change

Code formatting

We use black to format our code and follow pep8 conventions.

As mentioned above we use pre-commit which streamlines checking that code has been formatted correctly.

CI

Make sure that pull requests pass our CI. It includes checks that;

  • code is formatted with black

  • flake8 passes

  • the tests for the project pass, with a minimum of 80% branch coverage

  • bandit passes

Tests

We use pytest as our testing framework.

All existing tests must pass and new functionality must be tested. We aim for 100% coverage on new features that are added to the package.

There are some similarities across the tests for the different transformers in the package. Please refer to existing tests as they give great examples to work from and show what is expected to be covered in the tests.

We also make use of the test-aide package to make mocking easier and to help with generating data when parametrizing tests for the correct output of transformers’ transform methods.

We organise our tests with one script per transformer then group together tests for a particular method into a test class.

Docstrings

We follow the numpy docstring style guide.

Docstrings need to be updated for the relevant changes and docstrings need to be added for new transformers.

New transformers

Transformers in the package are designed to work with pandas DataFrame objects.

To be consistent with scikit-learn, all transformers must implement at least a transform(X) method which applies the data transformation.

If information must be learnt from the data before applying the transform then a fit(X, y=None) method is required. X is the input DataFrame and y is the response, which may not be required.

Optionally a reverse_transform(X) method may be appropriate too if there is a way to apply the inverse of the transform method.

List of contributors

For the full list of contributors see the contributors page.

Prior to the open source release of the package there have been contributions from many individuals in the LV GI Data Science team;

  • Richard Angell

  • Ned Webster

  • Dapeng Wang

  • David Silverstone

  • Shreena Patel

  • Angelos Charitidis

  • David Hopkinson

  • Liam Holmes

  • Sandeep Karkhanis

  • KarHor Yap

  • Alistair Rogers

  • Maria Navarro

  • Marek Allen

  • James Payne