ddf.stainer.RowDuplicateStainer¶
-
class
ddf.stainer.RowDuplicateStainer(deg, max_rep=2, name='Add Duplicates', row_idx=[])¶ Stainer to duplicate rows of a dataset.
-
__init__(deg, max_rep=2, name='Add Duplicates', row_idx=[])¶ Constructor for RowDuplicateStainer
- Parameters
deg (float (0, 1]) – Proportion of given data that would be duplicated. Note: If 5 rows were specified and deg = 0.6, only 3 rows will be duplicated
max_rep ((2/3/4/5), optional) – Maximum number of times a row can appear after duplication. That is, if max_rep = 2, the original row was duplicated once to create 2 copies total. Capped at 5 to conserve computational power. Defaults to 2
name (str, optional) – Name of stainer. Default is “Add Duplicates”
row_idx (int list, optional) – Row indices that the stainer will operate on. Default is empty list.
- Raises
ValueError – Degree provided is not in the range of (0, 1]
ValueError – max_rep is not in the range of [2, 5]
-
get_col_type()¶ Returns the column type that the stainer operates on.
- Returns
Column type that the stainer operates on.
- Return type
string
-
get_history()¶ Compiles history information for this stainer and returns it.
- Returns
name (str) – Name of stainer.
msg (str) – Message for user.
time (float) – Time taken to execute the self.transform() method.
-
get_indices()¶ Returns the row indices and column indices.
- Returns
row_idx (int list) – Row indices that the stainer operates on.
col_idx (int list) – Column indices that the stainer operates on.
-
transform(df, rng, row_idx=None, col_idx=None)¶ Applies staining on the given indices in the provided dataframe.
- Parameters
df (pd.DataFrame) – Dataframe to be transformed.
rng (np.random.BitGenerator) – PCG64 pseudo-random number generator.
row_idx (int list, optional) – Row indices that the stainer will operate on. Default is empty list.
col_idx (int list, optional) – Unused parameter. Columns will be duplicated when new rows are created.
- Returns
new_df (pd.DataFrame) – Modified dataframe.
row_map (empty dictionary) – Row mapping showing the relationship between the original and new row positions.
col_map (empty dictionary) – This stainer does not produce any column mappings.
-
update_history(message='', time=0)¶ Used by transform method to set attributes required to display history information
- Parameters
message (str) – Mesasge to be shown to user about the transformation
time (float) – Time taken to perform the transform
Methods
__init__(deg[, max_rep, name, row_idx])Constructor for RowDuplicateStainer
Returns the column type that the stainer operates on.
Compiles history information for this stainer and returns it.
Returns the row indices and column indices.
transform(df, rng[, row_idx, col_idx])Applies staining on the given indices in the provided dataframe.
update_history([message, time])Used by transform method to set attributes required to display history information
-