ddf.stainer.RowDuplicateStainer¶
-
class
ddf.stainer.
RowDuplicateStainer
(deg, max_rep=2, name='Add Duplicates', row_idx=[])¶ Stainer to duplicate rows of a dataset.
-
__init__
(deg, max_rep=2, name='Add Duplicates', row_idx=[])¶ Constructor for RowDuplicateStainer
- Parameters
deg (float (0, 1]) – Proportion of given data that would be duplicated. Note: If 5 rows were specified and deg = 0.6, only 3 rows will be duplicated
max_rep ((2/3/4/5), optional) – Maximum number of times a row can appear after duplication. That is, if max_rep = 2, the original row was duplicated once to create 2 copies total. Capped at 5 to conserve computational power. Defaults to 2
name (str, optional) – Name of stainer. Default is “Add Duplicates”
row_idx (int list, optional) – Row indices that the stainer will operate on. Default is empty list.
- Raises
ValueError – Degree provided is not in the range of (0, 1]
ValueError – max_rep is not in the range of [2, 5]
-
get_col_type
()¶ Returns the column type that the stainer operates on.
- Returns
Column type that the stainer operates on.
- Return type
string
-
get_history
()¶ Compiles history information for this stainer and returns it.
- Returns
name (str) – Name of stainer.
msg (str) – Message for user.
time (float) – Time taken to execute the self.transform() method.
-
get_indices
()¶ Returns the row indices and column indices.
- Returns
row_idx (int list) – Row indices that the stainer operates on.
col_idx (int list) – Column indices that the stainer operates on.
-
transform
(df, rng, row_idx=None, col_idx=None)¶ Applies staining on the given indices in the provided dataframe.
- Parameters
df (pd.DataFrame) – Dataframe to be transformed.
rng (np.random.BitGenerator) – PCG64 pseudo-random number generator.
row_idx (int list, optional) – Row indices that the stainer will operate on. Default is empty list.
col_idx (int list, optional) – Unused parameter. Columns will be duplicated when new rows are created.
- Returns
new_df (pd.DataFrame) – Modified dataframe.
row_map (empty dictionary) – Row mapping showing the relationship between the original and new row positions.
col_map (empty dictionary) – This stainer does not produce any column mappings.
-
update_history
(message='', time=0)¶ Used by transform method to set attributes required to display history information
- Parameters
message (str) – Mesasge to be shown to user about the transformation
time (float) – Time taken to perform the transform
Methods
__init__
(deg[, max_rep, name, row_idx])Constructor for RowDuplicateStainer
Returns the column type that the stainer operates on.
Compiles history information for this stainer and returns it.
Returns the row indices and column indices.
transform
(df, rng[, row_idx, col_idx])Applies staining on the given indices in the provided dataframe.
update_history
([message, time])Used by transform method to set attributes required to display history information
-