ddf.stainer.Stainer

class ddf.stainer.Stainer(name='Unnamed Stainer', row_idx=[], col_idx=[])

Parent Stainer class that contains basic initialisations meant for all stainers to inherit from.

Note

This class is not meant to be used on its own, and is meant as the superclass of any custom stainer that may be developed in the future.

name

Name of stainer.

Type

str

row_idx

Row indices that the stainer will operate on.

Type

int list

col_idx

Column indices that the stainer will operate on.

Type

int list

col_type

Column type that the stainer operates on, used for stainer to automatically select viable columns to operate on, if the user does not pass in any col_idx. Currently supports [“all”, “category”, “cat”, “datetime”, “date”, “time”, “numeric”, “int”, “float”].

Type

str

__init__(name='Unnamed Stainer', row_idx=[], col_idx=[])

The constructor for Stainer class.

Parameters
  • name (str, optional) – Name of stainer. Default is “Unnamed Stainer”.

  • row_idx (int list, optional) – Row indices that the stainer will operate on. Default is empty list.

  • col_idx (int list, optional) – Column indices that the stainer will operate on. Default is empty list.

get_col_type()

Returns the column type that the stainer operates on.

Returns

Column type that the stainer operates on.

Return type

string

get_history()

Compiles history information for this stainer and returns it.

Returns

  • name (str) – Name of stainer.

  • msg (str) – Message for user.

  • time (float) – Time taken to execute the self.transform() method.

get_indices()

Returns the row indices and column indices.

Returns

  • row_idx (int list) – Row indices that the stainer operates on.

  • col_idx (int list) – Column indices that the stainer operates on.

transform(df, rng, row_idx, col_idx)

Applies staining on the given indices in the provided dataframe.

Note

This method does not return anything and simply raises an error. However, it is expected for the user to implement the transform method for their custom user-defined stainers.

Parameters
  • df (pd.DataFrame) – Dataframe to be transformed.

  • rng (np.random.BitGenerator) – PCG64 pseudo-random number generator.

  • row_idx (int list) – Row indices that the stainer will operate on. Will take priority over the class attribute row_idx.

  • col_idx (int list) – Column indices that the stainer will operate on. Will take priority over the class attribute col_idx.

Returns

  • new_df (pd.DataFrame) – Modified dataframe.

  • row_map ({int: int} dictionary) – Row mapping showing the relationship between the original and new row positions.

  • col_map ({int: int} dictionary) – Column mapping showing the relationship between the original and new column positions.

Raises

Exception – Children class does not implement the transform method.

update_history(message='', time=0)

Used by transform method to set attributes required to display history information

Parameters
  • message (str) – Mesasge to be shown to user about the transformation

  • time (float) – Time taken to perform the transform

Methods

__init__([name, row_idx, col_idx])

The constructor for Stainer class.

get_col_type()

Returns the column type that the stainer operates on.

get_history()

Compiles history information for this stainer and returns it.

get_indices()

Returns the row indices and column indices.

transform(df, rng, row_idx, col_idx)

Applies staining on the given indices in the provided dataframe.

update_history([message, time])

Used by transform method to set attributes required to display history information