ddf.stainer.InflectionStainer

class ddf.stainer.InflectionStainer(col_idx=[], name='Inflection', ignore_cats=[], num_format=- 1, formats=None)

Stainer to introduce random string inflections (e.g. capitalization, case format, pluralization) to given categorical columns.

Note

Note that this stainer requires the Inflection library, which can be pip installed here: https://pypi.org/project/inflection/.

__init__(col_idx=[], name='Inflection', ignore_cats=[], num_format=- 1, formats=None)

The constructor for InflectionStainer class.

Parameters
  • name (str, optional) – Name of stainer. Default is “Inflection”.

  • col_idx (int list, optional) – Column indices that the stainer will operate on. Default is empty list.

  • ignore_cats (str list or {int: str list}, optional) – Category strings to be ignored by stainer. If input is string list: for all columns, ignore all categories present within the list. If inut is dict: maps each col_idx to list of ignored category strings for that particular column. Default is empty list.

  • num_format (int, optional) – Number of inflection formats present within each column. If num_format > number of available formats, or num_format == -1, use all formats. Default is -1.

  • formats (str list or None, optional) – List of inflection format options to chooses from. Choose from the following options: {‘original’, ‘uppercase’, ‘lowercase’, ‘capitalize’, ‘camelize’, ‘pluralize’, ‘singularize’, ‘dasherize’, ‘humanize’, ‘titleize’, ‘underscore’}. If None, all inflections are used.

Raises

KeyError – Format provided is not within the default 11 formats possible

get_col_type()

Returns the column type that the stainer operates on.

Returns

Column type that the stainer operates on.

Return type

string

get_history()

Compiles history information for this stainer and returns it.

Returns

  • name (str) – Name of stainer.

  • msg (str) – Message for user.

  • time (float) – Time taken to execute the self.transform() method.

get_indices()

Returns the row indices and column indices.

Returns

  • row_idx (int list) – Row indices that the stainer operates on.

  • col_idx (int list) – Column indices that the stainer operates on.

transform(df, rng, row_idx=None, col_idx=None)

Applies staining on the given indices in the provided dataframe.

Parameters
  • df (pd.DataFrame) – Dataframe to be transformed.

  • rng (np.random.BitGenerator) – PCG64 pseudo-random number generator.

  • row_idx (int list, optional) – Unused parameter as this stainer does not use row indices.

  • col_idx (int list, optional) – Column indices that the stainer will operate on. Will take priority over the class attribute col_idx.

Returns

  • new_df (pd.DataFrame) – Modified dataframe.

  • row_map (empty dictionary) – This stainer does not produce any row mappings.

  • col_map (empty dictionary) – This stainer does not produce any column mappings.

update_history(message='', time=0)

Used by transform method to set attributes required to display history information

Parameters
  • message (str) – Mesasge to be shown to user about the transformation

  • time (float) – Time taken to perform the transform

Methods

__init__([col_idx, name, ignore_cats, …])

The constructor for InflectionStainer class.

get_col_type()

Returns the column type that the stainer operates on.

get_history()

Compiles history information for this stainer and returns it.

get_indices()

Returns the row indices and column indices.

transform(df, rng[, row_idx, col_idx])

Applies staining on the given indices in the provided dataframe.

update_history([message, time])

Used by transform method to set attributes required to display history information