ddf.stainer.InflectionStainer¶
-
class
ddf.stainer.
InflectionStainer
(col_idx=[], name='Inflection', ignore_cats=[], num_format=- 1, formats=None)¶ Stainer to introduce random string inflections (e.g. capitalization, case format, pluralization) to given categorical columns.
Note
Note that this stainer requires the Inflection library, which can be pip installed here: https://pypi.org/project/inflection/.
-
__init__
(col_idx=[], name='Inflection', ignore_cats=[], num_format=- 1, formats=None)¶ The constructor for InflectionStainer class.
- Parameters
name (str, optional) – Name of stainer. Default is “Inflection”.
col_idx (int list, optional) – Column indices that the stainer will operate on. Default is empty list.
ignore_cats (str list or {int: str list}, optional) – Category strings to be ignored by stainer. If input is string list: for all columns, ignore all categories present within the list. If inut is dict: maps each col_idx to list of ignored category strings for that particular column. Default is empty list.
num_format (int, optional) – Number of inflection formats present within each column. If num_format > number of available formats, or num_format == -1, use all formats. Default is -1.
formats (str list or None, optional) – List of inflection format options to chooses from. Choose from the following options: {‘original’, ‘uppercase’, ‘lowercase’, ‘capitalize’, ‘camelize’, ‘pluralize’, ‘singularize’, ‘dasherize’, ‘humanize’, ‘titleize’, ‘underscore’}. If None, all inflections are used.
- Raises
KeyError – Format provided is not within the default 11 formats possible
-
get_col_type
()¶ Returns the column type that the stainer operates on.
- Returns
Column type that the stainer operates on.
- Return type
string
-
get_history
()¶ Compiles history information for this stainer and returns it.
- Returns
name (str) – Name of stainer.
msg (str) – Message for user.
time (float) – Time taken to execute the self.transform() method.
-
get_indices
()¶ Returns the row indices and column indices.
- Returns
row_idx (int list) – Row indices that the stainer operates on.
col_idx (int list) – Column indices that the stainer operates on.
-
transform
(df, rng, row_idx=None, col_idx=None)¶ Applies staining on the given indices in the provided dataframe.
- Parameters
df (pd.DataFrame) – Dataframe to be transformed.
rng (np.random.BitGenerator) – PCG64 pseudo-random number generator.
row_idx (int list, optional) – Unused parameter as this stainer does not use row indices.
col_idx (int list, optional) – Column indices that the stainer will operate on. Will take priority over the class attribute col_idx.
- Returns
new_df (pd.DataFrame) – Modified dataframe.
row_map (empty dictionary) – This stainer does not produce any row mappings.
col_map (empty dictionary) – This stainer does not produce any column mappings.
-
update_history
(message='', time=0)¶ Used by transform method to set attributes required to display history information
- Parameters
message (str) – Mesasge to be shown to user about the transformation
time (float) – Time taken to perform the transform
Methods
__init__
([col_idx, name, ignore_cats, …])The constructor for InflectionStainer class.
Returns the column type that the stainer operates on.
Compiles history information for this stainer and returns it.
Returns the row indices and column indices.
transform
(df, rng[, row_idx, col_idx])Applies staining on the given indices in the provided dataframe.
update_history
([message, time])Used by transform method to set attributes required to display history information
-