.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples\plot_basic_stainer_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_basic_stainer_example.py: Basic Usage of Stainers (no DirtyDF) ==================================== This page shows some basic examples of using stainers to directly transform panda dataframes. .. GENERATED FROM PYTHON SOURCE LINES 8-13 .. code-block:: default import pandas as pd import numpy as np from ddf.stainer import ShuffleStainer, InflectionStainer .. GENERATED FROM PYTHON SOURCE LINES 14-16 ShuffleStainer Example ^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 18-19 For the first example, let us use a basic dataset containing only 6 rows and 2 columns, an integer ID and an animal class. .. GENERATED FROM PYTHON SOURCE LINES 19-23 .. code-block:: default df = pd.DataFrame([(0, 'Cat'), (1, 'Dog'), (2, 'Rabbit'), (3, 'Cat'), (4, 'Cat'), (5, 'Dog')], columns=('id', 'class')) df .. raw:: html
id class
0 0 Cat
1 1 Dog
2 2 Rabbit
3 3 Cat
4 4 Cat
5 5 Dog


.. GENERATED FROM PYTHON SOURCE LINES 24-26 We now apply a ShuffleStainer to shuffle the rows in this dataset. Note that we require to pass in a numpy random generator for random generation. .. GENERATED FROM PYTHON SOURCE LINES 28-30 The stainer's transform method will output 3 objects: the transformed dataframe, a row map which maps the rows in the old dataframe to the new one, and a column map which maps the columns in the old dataframe to the new one. .. GENERATED FROM PYTHON SOURCE LINES 30-37 .. code-block:: default shuffle_stainer = ShuffleStainer() rng = np.random.default_rng(42) new_df, row_map, col_map = shuffle_stainer.transform(df, rng) new_df .. raw:: html
id class
0 3 Cat
1 2 Rabbit
2 5 Dog
3 4 Cat
4 1 Dog
5 0 Cat


.. GENERATED FROM PYTHON SOURCE LINES 38-40 Also, we can check the row map to determine which rows in the old dataframe were mapped to the new ones. (Note that ShuffleStainer does not affect or alter columns, so the column map is simply an empty dictionary) .. GENERATED FROM PYTHON SOURCE LINES 40-43 .. code-block:: default row_map .. rst-class:: sphx-glr-script-out Out: .. code-block:: none {3: [0], 2: [1], 5: [2], 4: [3], 1: [4], 0: [5]} .. GENERATED FROM PYTHON SOURCE LINES 44-46 The output shows that the 3rd row index (0-based indexing) from the original dataframe is mapped to the 0-th row in the new dataframe, as well as others. You may check with the ID column, or with the original dataframe above to verify that this is true. .. GENERATED FROM PYTHON SOURCE LINES 48-50 Furthermore, you may use the stainer's `get_history()` method to get the name of the stainer, a description of how the stainer had transformed the dataframe, and the time taken for said transformation. .. GENERATED FROM PYTHON SOURCE LINES 50-53 .. code-block:: default shuffle_stainer.get_history() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none ('Shuffle', 'Order of rows randomized', 0.0019943714141845703) .. GENERATED FROM PYTHON SOURCE LINES 54-56 InflectionStainer Example ^^^^^^^^^^^^^^^^^^^^^^^^^ .. GENERATED FROM PYTHON SOURCE LINES 58-61 For this next example, we will be using a randomly generated dataset of 100 rows and 3 columns, an integer ID, and 2 animal class columns (this dataset has no 'meaning', it is simply for demo). In particular, we will demonstrate using the InflectionStainer to generate string inflections of the animal categories. .. GENERATED FROM PYTHON SOURCE LINES 61-68 .. code-block:: default rng = np.random.default_rng(42) # reinitialize random generator df2 = pd.DataFrame(zip(range(100), rng.choice(['Cat','Dog','Rabbit'], 100), rng.choice(['Cow', 'Sheep', 'Goat', 'Horse'], 100)), columns=('id', 'class', 'class2')) df2.head() .. raw:: html
id class class2
0 0 Cat Horse
1 1 Rabbit Cow
2 2 Dog Horse
3 3 Dog Cow
4 4 Dog Horse


.. GENERATED FROM PYTHON SOURCE LINES 69-70 Here are the distributions of the animal classes. .. GENERATED FROM PYTHON SOURCE LINES 70-71 .. code-block:: default df2['class'].value_counts() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Rabbit 40 Dog 33 Cat 27 Name: class, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 72-74 .. code-block:: default df2['class2'].value_counts() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Sheep 27 Goat 26 Cow 24 Horse 23 Name: class2, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 75-77 We inflect on the 2 animal columns (index 1 and 2), use only 3 inflection formats (original, lowercase, and pluralize), and ignore inflections on the 'Dog' category in the first class and 'Cow' & 'Sheep' categories in the second class. .. GENERATED FROM PYTHON SOURCE LINES 77-83 .. code-block:: default inflect_stainer = InflectionStainer(col_idx=[1, 2], num_format = 3, formats=['original', 'lowercase', 'pluralize'], ignore_cats={1: ['Dog'], 2: ['Cow', 'Sheep']}) new_df2, row_map2, col_map2 = inflect_stainer.transform(df2, rng) new_df2.head() .. raw:: html
id class class2
0 0 Cats Horses
1 1 Rabbit Cow
2 2 Dog Horses
3 3 Dog Cow
4 4 Dog Horse


.. GENERATED FROM PYTHON SOURCE LINES 84-85 We can see the new distributions. .. GENERATED FROM PYTHON SOURCE LINES 85-86 .. code-block:: default new_df2['class'].value_counts() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Dog 33 rabbit 16 Rabbits 15 cat 10 Cat 10 Rabbit 9 Cats 7 Name: class, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 87-89 .. code-block:: default new_df2['class2'].value_counts() .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Sheep 27 Cow 24 Goats 12 Horses 11 Goat 8 Horse 6 horse 6 goat 6 Name: class2, dtype: int64 .. GENERATED FROM PYTHON SOURCE LINES 90-91 We can also check the description of the stainer's transform from its history (the 2nd element in the history tuple). .. GENERATED FROM PYTHON SOURCE LINES 91-93 .. code-block:: default print(inflect_stainer.get_history()[1]) .. rst-class:: sphx-glr-script-out Out: .. code-block:: none Categorical inflections on: {'class': {'Cat': ['Cat', 'Cats', 'cat'], 'Rabbit': ['Rabbits', 'rabbit', 'Rabbit']}, 'class2': {'Horse': ['horse', 'Horses', 'Horse'], 'Goat': ['Goats', 'goat', 'Goat']}} .. GENERATED FROM PYTHON SOURCE LINES 94-94 For more info on each of the stainer's use-cases and input parameters, do check their respective documentations. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.035 seconds) .. _sphx_glr_download_auto_examples_plot_basic_stainer_example.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_basic_stainer_example.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_basic_stainer_example.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_