ddf.DirtyDF.DirtyDF¶
-
class
ddf.DirtyDF.
DirtyDF
(df, seed=None, copy=False)¶ Dirty DataFrame. Stores information about the dataframe to be stained, previous staining results, and the mapping of the rows and columns.
To be used in conjunction with Stainer class to add and execute stainers.
-
__init__
(df, seed=None, copy=False)¶ Constructor for DirtyDF
- Parameters
df (pd.DataFrame) – Dataframe to be transformed.
seed (int, optional) – Controls the randomness of the staining process. For a deterministic behaviour, seed has to be fixed to an integer. If unspecified, will choose a random seed
copy (boolean, optional) – Not for use by user. Determines if a copy of DirtyDF is being created. If True, will copy the details from the previous DDF.
-
add_stainers
(stain, use_orig_row=True, use_orig_col=True)¶ Adds a stainer / list of stainers to current list of stainers to be executed.
- Parameters
stain (Stainer or Stainer list) – stainers to be added to the DDF to be executed in the future
use_orig_row (boolean, optional) – Indicates if indices in stainer refers to the initial dataframe, or the index of the dataframe at time of execution. If True, indices from initial dataframe are used. Defaults to True
use_orig_col (boolean, optional) – Indicates if indices in stainer refers to the initial dataframe, or the index of the dataframe at time of execution. If True, indices from initial dataframe are used. Defaults to True
- Returns
ddf – Returns new copy of DDF with the stainer added
- Return type
-
get_df
()¶ Returns the dataframe
- Returns
df – Current dataframe in DDF
- Return type
pd.DataFrame
-
get_map_from_history
(index, axis=0)¶ Mapping of rows/cols of the sepcified stainer transformation that had been executed. A dictionary is returned with information on what row/col index right before the specified transformation has converted to after the transformation. For instance, if row 3 got shuffled to row 8 in the new dataframe, then row 8 got shuffled to row 2, calling index=0 will return {3: [8]} and calling index=1 will return {8: [2]}
- Parameters
index (int) – Index of stainer sequence to query mapping. E.g. index=1 will query the mapping performed by the 2nd stainer operation.
axis ((0/1), optional) –
If 0, returns the row mapping. If 1, returns the col mapping.
Defaults to 0
- Returns
map – Mapping of original row/col indices to current dataframe’s row/col indices.
- Return type
{int : int list} dictionary
- Raises
Exception – If axis provided is not 0/1
-
get_mapping
(axis=0)¶ Mapping of rows/cols from original dataframe to most recent dataframe. A dictionary is returned with information on which index the original rows/cols are displayed in the newest dataframe. For instance, if row 3 got shuffled to row 8 in the new dataframe, then row 8 got shuffled to row 2, the function will return {3: [2]}
- Parameters
axis ((0/1), optional) –
If 0, returns the row mapping. If 1, returns the col mapping.
Defaults to 0
- Returns
map – Mapping of original row/col indices to current dataframe’s row/col indices.
- Return type
{int : int list} dictionary
- Raises
Exception – If axis provided is not 0/1
-
get_previous_map
(axis=0)¶ Mapping of rows/cols of the most recent stainer transformation that had been executed. A dictionary is returned with information on what row/col index right before the transformation has converted to after the transformation. For instance, if row 3 got shuffled to row 8 in the new dataframe, then row 8 got shuffled to row 2, the function will return {8: [2]}
- Parameters
axis ((0/1), optional) –
If 0, returns the row mapping. If 1, returns the col mapping.
Defaults to 0
- Returns
map – Mapping of original row/col indices to current dataframe’s row/col indices.
- Return type
{int : int list} dictionary
- Raises
Exception – If axis provided is not 0/1
-
get_rng
()¶ Returns seed generator
- Returns
rng – PCG64 pseudo-random number generator used for randomisation
- Return type
np.random.BitGenerator
-
get_seed
()¶ Returns seed number
- Returns
seed – Integer seed used to create Generator for randomisation
- Return type
int
-
print_history
()¶ Print historical details of the stainers that have been executed
-
reindex_stainers
(new_order)¶ Reorder stainers in a specified order
- Parameters
new_order (int list) – Indices of the new order of stainers. If original was [A, B, C] and new_order = [1, 2, 0], the resulting order will be [C, A, B].
- Returns
ddf – Returns new copy of DDF with the stainers rearranged
- Return type
-
reset_rng
()¶ Resets Random Generator object
-
run_all_stainers
()¶ Applies the transformation of all stainers in order
- Returns
ddf – Returns new DDF after all the stainers have been executed
- Return type
-
run_stainer
(idx=0)¶ Applies the transformation of the specified stainer
- Parameters
idx (int, optional) – Index of stainer to execute. Defaults to 0 (first stainer added)
- Returns
ddf – Returns new DDF after the specified stainer has been executed
- Return type
-
shuffle_stainers
()¶ Randomly reorder the stainers
- Returns
ddf – Returns new copy of DDF with the stainers rearranged
- Return type
-
summarise_stainers
()¶ Prints names of stainers that have yet to be executed
Methods
__init__
(df[, seed, copy])Constructor for DirtyDF
add_stainers
(stain[, use_orig_row, use_orig_col])Adds a stainer / list of stainers to current list of stainers to be executed.
copy
()Creates a copy of the DDF
get_df
()Returns the dataframe
get_map_from_history
(index[, axis])Mapping of rows/cols of the sepcified stainer transformation that had been executed.
get_mapping
([axis])Mapping of rows/cols from original dataframe to most recent dataframe.
get_previous_map
([axis])Mapping of rows/cols of the most recent stainer transformation that had been executed.
get_rng
()Returns seed generator
get_seed
()Returns seed number
Print historical details of the stainers that have been executed
reindex_stainers
(new_order)Reorder stainers in a specified order
Resets Random Generator object
Applies the transformation of all stainers in order
run_stainer
([idx])Applies the transformation of the specified stainer
Randomly reorder the stainers
Prints names of stainers that have yet to be executed
-