ddf.stainer.DatetimeSplitStainer

class ddf.stainer.DatetimeSplitStainer(name='Datetime Split', col_idx=[], keep_time=True, prob=1.0)

Stainer that splits each given date / datetime columns into 3 columns respectively, representing day, month, and year.

If a given column’s name is ‘X’, then the respective generated column names are ‘X_day’, ‘X_month’, and ‘X_year’. If keep_time is True, then further generate ‘X_hour’, ‘X_minute’, and ‘X_second’. Otherwise, only dates will be kept.

If a column is split, the original column will be dropped.

For ‘X_month’ and ‘X_year’, a format from [‘m’, ‘%B’, ‘%b’], and [‘%Y’, ‘%y’] is randomly chosen respectively.

__init__(name='Datetime Split', col_idx=[], keep_time=True, prob=1.0)

The constructor for DatetimeSplitStainer class.

Parameters
  • name (str, optional) – Name of stainer. Default is “Datetime Split”.

  • col_idx (int list, optional) – Column indices that the stainer will operate on. Default is empty list.

  • keep_time (boolean, optional) – Whether time component of datetime should be kept, thus 3 new columns are created. Default is True.

  • prob (float [0, 1], optional) – Probability that the stainer splits a date column. Probabilities of split for each given date column are independent. Default is 1.

get_col_type()

Returns the column type that the stainer operates on.

Returns

Column type that the stainer operates on.

Return type

string

get_history()

Compiles history information for this stainer and returns it.

Returns

  • name (str) – Name of stainer.

  • msg (str) – Message for user.

  • time (float) – Time taken to execute the self.transform() method.

get_indices()

Returns the row indices and column indices.

Returns

  • row_idx (int list) – Row indices that the stainer operates on.

  • col_idx (int list) – Column indices that the stainer operates on.

transform(df, rng, row_idx=None, col_idx=None)

Applies staining on the given indices in the provided dataframe.

Parameters
  • df (pd.DataFrame) – Dataframe to be transformed.

  • rng (np.random.BitGenerator) – PCG64 pseudo-random number generator.

  • row_idx (int list, optional) – Unused parameter as this stainer does not use row indices.

  • col_idx (int list, optional) – Column indices that the stainer will operate on. Will take priority over the class attribute col_idx.

Returns

  • new_df (pd.DataFrame) – Modified dataframe.

  • row_map (empty dictionary) – This stainer does not produce any row mappings.

  • col_map (dictionary {int: int}) – Column mapping showing the relationship between the original and new column positions.

update_history(message='', time=0)

Used by transform method to set attributes required to display history information

Parameters
  • message (str) – Mesasge to be shown to user about the transformation

  • time (float) – Time taken to perform the transform

Methods

__init__([name, col_idx, keep_time, prob])

The constructor for DatetimeSplitStainer class.

get_col_type()

Returns the column type that the stainer operates on.

get_history()

Compiles history information for this stainer and returns it.

get_indices()

Returns the row indices and column indices.

transform(df, rng[, row_idx, col_idx])

Applies staining on the given indices in the provided dataframe.

update_history([message, time])

Used by transform method to set attributes required to display history information