edgaro.balancing package

edgaro.balancing.transformer module

class edgaro.balancing.transformer.Transformer(name_sufix: str = '_transformed', verbose: bool = False)

Bases: BaseTransformer, ABC

The abstract class to define balancing transformations for a single Dataset.

Parameters:
  • name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.

  • verbose (bool, default=False) – Print messages during calculations.

Variables:
  • name_sufix (str) – Sufix to be set to a transformed Dataset.

  • verbose (bool) – Print messages during calculations.

fit(dataset: Dataset) None

Fit the transformer.

Parameters:

dataset (Dataset) – The object to fit Transformer on.

property was_fitted: bool

The information whether the Transformer was fitted.

Return type:

bool

transform(dataset: Dataset) Dataset | DatasetArray

Transform the object.

Parameters:

dataset (Dataset) – The object to be transformed.

Returns:

The transformed object.

Return type:

Dataset, DatasetArray

abstract set_params(**params) None

Set params for Transformer.

Parameters:

params (dict) – The parameters to be set.

abstract get_params() Dict | List

Get parameters of Transformer.

Returns:

The parameters.

Return type:

Dict, list

set_dataset_suffixes(name_sufix: str | list) None

Set sufix to be set to transformed Dataset.

Parameters:

name_sufix (str, list) – Sufix to be set to a transformed Dataset.

class edgaro.balancing.transformer.ImblearnProtocol(*args, **kwargs)

Bases: Protocol

A Protocol to define the expected structure of a Transformer from imblearn library.

fit(X, y) Any
fit_resample(X, y) Tuple[DataFrame, Series]
get_params() Dict
set_params(**params) Any
class edgaro.balancing.transformer.TransformerFromIMBLEARN(transformer: ImblearnProtocol, name_sufix: str = '_transformed', verbose: bool = False)

Bases: Transformer

Create balancing Transformer from Transformer implemented in imblearn library.

Parameters:
  • transformer (ImblearnProtocol) – Transformer from imblearn` library.

  • name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.

  • verbose (bool, default=False) – Print messages during calculations.

Examples

>>> from test.resources.objects import *
>>> from imblearn.under_sampling import RandomUnderSampler
>>> dataset = Dataset(name_1, df_1, target_1)
>>> transformator = TransformerFromIMBLEARN(RandomUnderSampler(sampling_strategy=1, random_state=42))
>>> transformator.fit(dataset)
>>> transformator.transform(dataset)
transform(dataset: Dataset) Dataset

Transform the object.

Parameters:

dataset (Dataset) – The object to be transformed.

Returns:

The transformed object.

Return type:

Dataset

get_imblearn_transformer() ImblearnProtocol

Get the base transformer object from imblearn library.

Return type:

ImblearnProtocol

set_params(**params) None

Set params for Transformer.

The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.

Parameters:

params (dict) – The parameters to be set.

get_params() Dict

Get parameters of Transformer.

Returns:

The parameters.

Return type:

Dict, list

set_dataset_suffixes(name_sufix: str) None

Set sufix to be set to transformed Dataset.

Parameters:

name_sufix (str) – Sufix to be set to a transformed Dataset.

class edgaro.balancing.transformer.RandomUnderSampler(imbalance_ratio: float = 1, name_sufix: str = '_transformed', random_state: int | None = None, verbose: bool = False, *args, **kwargs)

Bases: TransformerFromIMBLEARN

Create Random Under Sampling transformer.

Parameters:
  • imbalance_ratio (float, default=1) – Imbalance Ratio after transformations.

  • name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.

  • verbose (bool, default=False) – Print messages during calculations.

  • random_state (int, optional) – Random state seed.

  • *args (tuple, optional) – Additional parameter for Random Under Sampling transformer from imblearn.

  • **kwargs (dict, optional) – Additional parameter for Random Under Sampling transformer from imblearn.

set_params(**params) None

Set params for Transformer.

The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.

Parameters:

params (dict) – The parameters to be set.

class edgaro.balancing.transformer.RandomOverSampler(imbalance_ratio: float = 1, name_sufix: str = '_transformed', random_state: int | None = None, verbose: bool = False, *args, **kwargs)

Bases: TransformerFromIMBLEARN

Create Random Over Sampling transformer.

Parameters:
  • imbalance_ratio (float, default=1) – Imbalance Ratio after transformations.

  • name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.

  • verbose (bool, default=False) – Print messages during calculations.

  • random_state (int, optional) – Random state seed.

  • *args (tuple, optional) – Additional parameter for Random Over Sampling transformer from imblearn.

  • **kwargs (dict, optional) – Additional parameter for Random Over Sampling transformer from imblearn.

set_params(**params) None

Set params for Transformer.

The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.

Parameters:

params (dict) – The parameters to be set.

class edgaro.balancing.transformer.SMOTE(imbalance_ratio: float = 1, name_sufix: str = '_transformed', random_state: int | None = None, columns_categorical: List[str] | None = None, verbose: bool = False, *args, **kwargs)

Bases: TransformerFromIMBLEARN

Create SMOTE/SMOTENC/SMOTEN transformer.

The method works also with categorical variables. The method guesses that columns of types ‘category’, ‘object’, ‘int’ are always categorical. Keep that in mind before using this Transformer!

Parameters:
  • imbalance_ratio (float, default=1) – Imbalance Ratio after transformations.

  • name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.

  • verbose (bool, default=False) – Print messages during calculations.

  • random_state (int, optional) – Random state seed.

  • *args (tuple, optional) – Additional parameter for SMOTE/SMOTENC/SMOTEN transformer from imblearn.

  • **kwargs (dict, optional) – Additional parameter for SMOTE/SMOTENC/SMOTEN transformer from imblearn.

set_params(**params) None

Set params for Transformer.

The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.

Parameters:

params (dict) – The parameters to be set.

edgaro.balancing.transformer_array module

class edgaro.balancing.transformer_array.TransformerArray(base_transformer: Transformer, parameters: List[List | Dict[str, Any]] | None = None, keep_original_dataset: bool = False, dataset_suffixes: str | List[str] = '_transformed', result_array_sufix: str = '_transformed_array', allow_dataset_array_sufix_change: bool = True, verbose: bool = False, set_suffixes: bool = True)

Bases: BaseTransformerArray

Create a class to apply Transformer transformation with more than one set of parameters and/or to each of the Dataset objects in DatasetArray.

Note: If you use NestedAutomaticTransformer (or children class) as a parameter to TransformerArray, it is advisable to pass parameter set_suffixes=False in TransformerArray object. Otherwise, the suffixes will be distorted.

Parameters:
  • base_transformer (Transformer) – The object defining the transformation procedure.

  • parameters (list[list, Dict[str, Any]]], optional) – The list of parameters for base_transformer. If the object is used for a DatasetArray object, the parameter list should be nested. For details, see Examples section.

  • keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.

  • dataset_suffixes (list, str, default=’_transformed’) – Suffixes to be set to a transformed objects.

  • result_array_sufix (str, default=’_transformed_array’) – Suffix of the main transformed DatasetArray object.

  • allow_dataset_array_sufix_change (bool, default=True) – Allow changing passed value of result_array_sufix according to dataset_suffixes.

  • verbose (bool, default=False) – Print messages during calculations.

  • set_suffixes (str, default=True) – Information whether suffixes for sub-Transformers should be set.

Examples

Example 1

>>> from test.resources.objects import *
>>> from edgaro.data.dataset import Dataset
>>> from edgaro.data.dataset_array import DatasetArray
>>> from edgaro.balancing.transformer import RandomUnderSampler
>>> from edgaro.balancing.transformer_array import TransformerArray
>>> df = Dataset(name_1, df_1, target_1)
>>> params = [{'sampling_strategy': 0.98}, {'sampling_strategy': 1}]
>>> transformer = RandomUnderSampler()
>>> array = TransformerArray(transformer, parameters=params)
>>> array.fit(df)
>>> array.transform(df)

Example 2

>>> from test.resources.objects import *
>>> from edgaro.data.dataset import Dataset
>>> from edgaro.data.dataset_array import DatasetArray
>>> from edgaro.balancing.transformer import RandomUnderSampler
>>> from edgaro.balancing.transformer_array import TransformerArray
>>> df = DatasetArray([Dataset(name_2, df_1, target_1), Dataset(name_1, df_1, target_1)])
>>> params = [ [{'sampling_strategy': 0.98}, {'sampling_strategy': 1}] for _ in range(len(df)) ]
>>> transformer = RandomUnderSampler()
>>> array = TransformerArray(transformer, parameters=params)
>>> array.fit(df)
>>> array.transform(df)
set_dataset_suffixes(name_sufix: str | List[str]) None

Set suffixes to be set to transformed Dataset.

Parameters:

name_sufix (str, list) – Suffixes to be set to a transformed Dataset.

get_dataset_suffixes() str | List[str] | None

Get suffixes for transformed Dataset.

Returns:

Suffixes for a transformed Dataset.

Return type:

str, list

set_params(**params) None

Set params for Transformer.

Parameters:

params (dict) – The parameters to be set.

fit(dataset: Dataset | DatasetArray) None

Fit the transformer.

Parameters:

dataset (Dataset, DatasetArray) – The object to fit Transformer on.

transform(dataset: Dataset | DatasetArray) Dataset | DatasetArray

Transform the object.

Parameters:

dataset (Dataset, DatasetArray) – The object to be transformed.

Returns:

The transformed object.

Return type:

Dataset, DatasetArray

property transformers: List[Transformer | TransformerArray | List]

All the Transformer objects used by this object.

Return type:

list[Transformer, TransformerArray, list]

property base_transformer: Transformer

Base transformers for creation of this object.

Return type:

Transformer

edgaro.balancing.nested_transformer module

class edgaro.balancing.nested_transformer.NestedAutomaticTransformer(base_transformers: List[Transformer], base_transformers_names: List[str], keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)

Bases: Transformer

Create NestedAutomaticTransformer.

This object creates an object that behaves like a Transformer, but transforms a single Dataset using not only one, but more methods. Moreover, you do not specify Imbalance Ratio values, but they are automatically calculated. You pass only an argument n_per_method and the class will balance the Dataset with that number of Imbalance Raio values.

Note: If you use NestedAutomaticTransformer (or children class) as a parameter to TransformerArray, it is advisable to pass parameter set_suffixes=False in TransformerArray object. Otherwise, the suffixes will be distorted.

Parameters:
  • base_transformers (list(Transformer)) – List of base transformers.

  • base_transformers_names (list(str)) – List of names of base_transformers.

  • keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.

  • result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.

  • n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.

  • random_state (int, optional, default=None,) – Random state seed.

  • IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.

  • min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.

  • verbose (bool, default=False) – Print messages during calculations.

Variables:
  • keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.

  • n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.

  • random_state (int, optional, default=None,) – Random state seed.

  • default='_automatic_transformed_array' (result_array_sufix str,) – Suffix of the transformed DatasetArray.

  • IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.

  • min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.

  • verbose (bool, default=False) – Print messages during calculations.

fit(dataset: Dataset) None

Fit the transformer.

Parameters:

dataset (Dataset) – The object to fit Transformer on.

transform(dataset: Dataset) DatasetArray

Transform the object.

Parameters:

dataset (Dataset) – The object to be transformed.

Returns:

The transformed object.

Return type:

DatasetArray

set_dataset_suffixes(name_sufix: str | List[str | List[str]]) None

Set suffixes to be set to transformed Dataset.

Parameters:

name_sufix (str, list) – Suffixes to be set to a transformed Dataset.

get_dataset_suffixes() str | List[str] | None

Get suffixes for transformed Dataset.

Returns:

Suffixes for a transformed Dataset.

Return type:

str, list

set_params(**params) None

Set params for Transformer.

Parameters:

params (dict) – The parameters to be set.

get_params() List[Dict | List]

Get parameters of Transformer.

Returns:

The parameters.

Return type:

Dict, list

property was_fitted: bool

The information whether the Transformer was fitted.

Return type:

bool

property transformers: List[Transformer | TransformerArray | List]

All the Transformer objects used by this object.

Return type:

list[Transformer, TransformerArray, list]

property base_transformer: Transformer | TransformerArray | List

Base transformers for creation of this object.

Return type:

list[Transformer, TransformerArray, list]

class edgaro.balancing.nested_transformer.BasicAutomaticTransformer(keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)

Bases: NestedAutomaticTransformer

Create BasicAutomaticTransformer.

This object contains three most popular methods - Random Under Sampler, Random Over Sampler and SMOTE.

This class can be used for both continuous (numerical) and categorical data.

Parameters:
  • keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.

  • result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.

  • n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.

  • random_state (int, optional, default=None,) – Random state seed.

  • IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.

  • min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.

  • verbose (bool, default=False) – Print messages during calculations.

class edgaro.balancing.nested_transformer.ExtensionAutomaticTransformer(keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)

Bases: NestedAutomaticTransformer

Create ExtensionAutomaticTransformer.

This object contains three more complex methods implemented in imblearn. There are methods used for oversampling (BorderlineSMOTE), undersampling (NearMiss) and there is also a hybrid method (SMOTETomek).

This class can be used only for continuous (numerical) data.

Parameters:
  • keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.

  • result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.

  • n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.

  • random_state (int, optional, default=None,) – Random state seed.

  • IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.

  • min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.

  • verbose (bool, default=False) – Print messages during calculations.

class edgaro.balancing.nested_transformer.AutomaticTransformer(keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)

Bases: NestedAutomaticTransformer

Create AutomaticTransformer.

This object contains six methods implemented in imblearn. There are methods used for oversampling (RandomOverSampling, BorderlineSMOTE), undersampling (RandomUnderSampling, NearMiss) and there is also a hybrid method (SMOTETomek).

This class can be used only for continuous (numerical) data.

Parameters:
  • keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.

  • result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.

  • n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.

  • random_state (int, optional, default=None,) – Random state seed.

  • IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.

  • min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.

  • verbose (bool, default=False) – Print messages during calculations.