ddf.stainer

Stores all the stainers which are to be applied on to dataframes. All stainers should inherit from the Stainer class. Each stainer should override the transform method which dictates how the stainer will change the dataframe. The transform method returns the new dataframe, the row mapping and the column mapping. The mapping describes the positional change of indices of the original dataframe and the transformed dataframe.

Classes

BinningStainer([name, col_idx, group_size, …])

Stainer that bins continuous columns into discrete groups (each group represents an interval [a,b)).

DateFormatStainer([name, col_idx, …])

Stainer to alter the format of dates for given date columns.

DatetimeFormatStainer([name, col_idx, …])

Stainer to alter the format of datetimes for given datetime columns.

DatetimeSplitStainer([name, col_idx, …])

Stainer that splits each given date / datetime columns into 3 columns respectively, representing day, month, and year.

FTransformStainer(deg[, name, col_idx, …])

Stainer that takes a numerical column and applies a transformation to it.

InflectionStainer([col_idx, name, …])

Stainer to introduce random string inflections (e.g.

LatlongFormatStainer(col_idx[, name, …])

Stainer to alter the format of datetimes for given latlong columns.

LatlongSplitStainer(col_idx[, name, prob])

Stainer that splits each given latlong columns into 6 columns, representing degree, minute, and seconds, for lat and long respectively.

NullifyStainer(deg[, name, row_idx, …])

Stainer that convert various values to missing data / values that represent missing values.

RowDuplicateStainer(deg[, max_rep, name, …])

Stainer to duplicate rows of a dataset.

ShuffleStainer([name])

Stainer to randomly rearrange the rows of the DataFrame

Stainer([name, row_idx, col_idx])

Parent Stainer class that contains basic initialisations meant for all stainers to inherit from.