.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples\plot_user_defined_stainer_example.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_plot_user_defined_stainer_example.py: User-defined Custom Stainers ============================ This page shows an example of how to create your own user-defined custom stainers, which subclasses from the Stainer class. .. GENERATED FROM PYTHON SOURCE LINES 8-10 .. code-block:: default from ddf.stainer import Stainer .. GENERATED FROM PYTHON SOURCE LINES 11-13 When creating a new stainer, it needs to inherit from the Stainer class in ddf.stainer. .. GENERATED FROM PYTHON SOURCE LINES 15-20 The initialisation should include the name, row indices (if applicable) and column indices (if applicable). If any other relevant initialisations are required, they can be included as well. If the row or column indices do not apply to the stainer, an empty list can be provided to the superclass init. .. GENERATED FROM PYTHON SOURCE LINES 22-25 When defining the transform, the parameters to be included should be df (the dataframe to be transformed), rng (a RNG Generator), row_idx = None and col_idx = None. .. GENERATED FROM PYTHON SOURCE LINES 27-31 In the transform method, the self._init_transform(df, row_idx, col_idx) method can be called to accurately generate the row_idx and col_idx (This allows the Stainer to work correctly with DDF). The transform method should then implement the Stainer. .. GENERATED FROM PYTHON SOURCE LINES 33-35 To provide relevant statistics to the user, messages and timings can be added. These can be added via self.update_history(message, time) .. GENERATED FROM PYTHON SOURCE LINES 37-44 A row mapping and column mapping are also required. These represent a movement or creation of any row / col in the dataframe. It should be formatted as a dictionary where key = old_row/col_index, value = List of indices of where the corresponding row/col ended up in the new dataframe. For instance, if the index-2 row was duplicated and is now the index-2 and index-3 row, the row_map should contain the entry {2: [2, 3]}. If the rows/columns order were not altered, an empty dictionary should be returned. .. GENERATED FROM PYTHON SOURCE LINES 46-48 The transform function should return a tuple of the new dataframe, row mapping, and the column mapping. .. GENERATED FROM PYTHON SOURCE LINES 50-51 Refer to the sample code below for an example. .. GENERATED FROM PYTHON SOURCE LINES 51-77 .. code-block:: default class ShuffleStainer(Stainer): def __init__(self, name = "Shuffle"): super().__init__(name, [], []) # name, row_idx, col_idx def transform(self, df, rng, row_idx = None, col_idx = None): new_df, row_idx, col_idx = self._init_transform(df, row_idx, col_idx) start = time() """ ## Implementation of Shuffle ## Creates new_df Creates new_idx = Original row numbers, in order of the new row numbers """ pass """Creates the mapping""" row_map = {} for i in range(df.shape[0]): row_map[new_idx[i]] = [i] end = time() # Timer to be added into history self.update_history("Order of rows randomized", end - start) # message, time return new_df, row_map, {} # new dataframe, row map, column map .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.001 seconds) .. _sphx_glr_download_auto_examples_plot_user_defined_stainer_example.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_user_defined_stainer_example.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_user_defined_stainer_example.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_