edgaro.balancing package
edgaro.balancing.transformer module
- class edgaro.balancing.transformer.Transformer(name_sufix: str = '_transformed', verbose: bool = False)
Bases:
BaseTransformer
,ABC
The abstract class to define balancing transformations for a single Dataset.
- Parameters:
name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.
verbose (bool, default=False) – Print messages during calculations.
- Variables:
name_sufix (str) – Sufix to be set to a transformed Dataset.
verbose (bool) – Print messages during calculations.
- fit(dataset: Dataset) None
Fit the transformer.
- Parameters:
dataset (Dataset) – The object to fit Transformer on.
- property was_fitted: bool
The information whether the Transformer was fitted.
- Return type:
bool
- transform(dataset: Dataset) Dataset | DatasetArray
Transform the object.
- Parameters:
dataset (Dataset) – The object to be transformed.
- Returns:
The transformed object.
- Return type:
- abstract set_params(**params) None
Set params for Transformer.
- Parameters:
params (dict) – The parameters to be set.
- abstract get_params() Dict | List
Get parameters of Transformer.
- Returns:
The parameters.
- Return type:
Dict, list
- set_dataset_suffixes(name_sufix: str | list) None
Set sufix to be set to transformed Dataset.
- Parameters:
name_sufix (str, list) – Sufix to be set to a transformed Dataset.
- class edgaro.balancing.transformer.ImblearnProtocol(*args, **kwargs)
Bases:
Protocol
A Protocol to define the expected structure of a Transformer from imblearn library.
- fit(X, y) Any
- fit_resample(X, y) Tuple[DataFrame, Series]
- get_params() Dict
- set_params(**params) Any
- class edgaro.balancing.transformer.TransformerFromIMBLEARN(transformer: ImblearnProtocol, name_sufix: str = '_transformed', verbose: bool = False)
Bases:
Transformer
Create balancing Transformer from Transformer implemented in imblearn library.
- Parameters:
transformer (ImblearnProtocol) – Transformer from imblearn` library.
name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.
verbose (bool, default=False) – Print messages during calculations.
Examples
>>> from test.resources.objects import * >>> from imblearn.under_sampling import RandomUnderSampler >>> dataset = Dataset(name_1, df_1, target_1) >>> transformator = TransformerFromIMBLEARN(RandomUnderSampler(sampling_strategy=1, random_state=42)) >>> transformator.fit(dataset) >>> transformator.transform(dataset)
- transform(dataset: Dataset) Dataset
Transform the object.
- Parameters:
dataset (Dataset) – The object to be transformed.
- Returns:
The transformed object.
- Return type:
- get_imblearn_transformer() ImblearnProtocol
Get the base transformer object from imblearn library.
- Return type:
- set_params(**params) None
Set params for Transformer.
The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.
- Parameters:
params (dict) – The parameters to be set.
- get_params() Dict
Get parameters of Transformer.
- Returns:
The parameters.
- Return type:
Dict, list
- set_dataset_suffixes(name_sufix: str) None
Set sufix to be set to transformed Dataset.
- Parameters:
name_sufix (str) – Sufix to be set to a transformed Dataset.
- class edgaro.balancing.transformer.RandomUnderSampler(imbalance_ratio: float = 1, name_sufix: str = '_transformed', random_state: int | None = None, verbose: bool = False, *args, **kwargs)
Bases:
TransformerFromIMBLEARN
Create Random Under Sampling transformer.
- Parameters:
imbalance_ratio (float, default=1) – Imbalance Ratio after transformations.
name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.
verbose (bool, default=False) – Print messages during calculations.
random_state (int, optional) – Random state seed.
*args (tuple, optional) – Additional parameter for Random Under Sampling transformer from imblearn.
**kwargs (dict, optional) – Additional parameter for Random Under Sampling transformer from imblearn.
- set_params(**params) None
Set params for Transformer.
The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.
- Parameters:
params (dict) – The parameters to be set.
- class edgaro.balancing.transformer.RandomOverSampler(imbalance_ratio: float = 1, name_sufix: str = '_transformed', random_state: int | None = None, verbose: bool = False, *args, **kwargs)
Bases:
TransformerFromIMBLEARN
Create Random Over Sampling transformer.
- Parameters:
imbalance_ratio (float, default=1) – Imbalance Ratio after transformations.
name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.
verbose (bool, default=False) – Print messages during calculations.
random_state (int, optional) – Random state seed.
*args (tuple, optional) – Additional parameter for Random Over Sampling transformer from imblearn.
**kwargs (dict, optional) – Additional parameter for Random Over Sampling transformer from imblearn.
- set_params(**params) None
Set params for Transformer.
The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.
- Parameters:
params (dict) – The parameters to be set.
- class edgaro.balancing.transformer.SMOTE(imbalance_ratio: float = 1, name_sufix: str = '_transformed', random_state: int | None = None, columns_categorical: List[str] | None = None, verbose: bool = False, *args, **kwargs)
Bases:
TransformerFromIMBLEARN
Create SMOTE/SMOTENC/SMOTEN transformer.
The method works also with categorical variables. The method guesses that columns of types ‘category’, ‘object’, ‘int’ are always categorical. Keep that in mind before using this Transformer!
- Parameters:
imbalance_ratio (float, default=1) – Imbalance Ratio after transformations.
name_sufix (str, default=’_transformed’) – Sufix to be set to a transformed Dataset.
verbose (bool, default=False) – Print messages during calculations.
random_state (int, optional) – Random state seed.
*args (tuple, optional) – Additional parameter for SMOTE/SMOTENC/SMOTEN transformer from imblearn.
**kwargs (dict, optional) – Additional parameter for SMOTE/SMOTENC/SMOTEN transformer from imblearn.
- set_params(**params) None
Set params for Transformer.
The function allows using imbalance_ratio and IR parameters, which are transformed to sampling_strategy parameter in imblearn Transformer.
- Parameters:
params (dict) – The parameters to be set.
edgaro.balancing.transformer_array module
- class edgaro.balancing.transformer_array.TransformerArray(base_transformer: Transformer, parameters: List[List | Dict[str, Any]] | None = None, keep_original_dataset: bool = False, dataset_suffixes: str | List[str] = '_transformed', result_array_sufix: str = '_transformed_array', allow_dataset_array_sufix_change: bool = True, verbose: bool = False, set_suffixes: bool = True)
Bases:
BaseTransformerArray
Create a class to apply Transformer transformation with more than one set of parameters and/or to each of the Dataset objects in DatasetArray.
Note: If you use NestedAutomaticTransformer (or children class) as a parameter to TransformerArray, it is advisable to pass parameter set_suffixes=False in TransformerArray object. Otherwise, the suffixes will be distorted.
- Parameters:
base_transformer (Transformer) – The object defining the transformation procedure.
parameters (list[list, Dict[str, Any]]], optional) – The list of parameters for base_transformer. If the object is used for a DatasetArray object, the parameter list should be nested. For details, see Examples section.
keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.
dataset_suffixes (list, str, default=’_transformed’) – Suffixes to be set to a transformed objects.
result_array_sufix (str, default=’_transformed_array’) – Suffix of the main transformed DatasetArray object.
allow_dataset_array_sufix_change (bool, default=True) – Allow changing passed value of result_array_sufix according to dataset_suffixes.
verbose (bool, default=False) – Print messages during calculations.
set_suffixes (str, default=True) – Information whether suffixes for sub-Transformers should be set.
Examples
Example 1
>>> from test.resources.objects import * >>> from edgaro.data.dataset import Dataset >>> from edgaro.data.dataset_array import DatasetArray >>> from edgaro.balancing.transformer import RandomUnderSampler >>> from edgaro.balancing.transformer_array import TransformerArray >>> df = Dataset(name_1, df_1, target_1) >>> params = [{'sampling_strategy': 0.98}, {'sampling_strategy': 1}] >>> transformer = RandomUnderSampler() >>> array = TransformerArray(transformer, parameters=params) >>> array.fit(df) >>> array.transform(df)
Example 2
>>> from test.resources.objects import * >>> from edgaro.data.dataset import Dataset >>> from edgaro.data.dataset_array import DatasetArray >>> from edgaro.balancing.transformer import RandomUnderSampler >>> from edgaro.balancing.transformer_array import TransformerArray >>> df = DatasetArray([Dataset(name_2, df_1, target_1), Dataset(name_1, df_1, target_1)]) >>> params = [ [{'sampling_strategy': 0.98}, {'sampling_strategy': 1}] for _ in range(len(df)) ] >>> transformer = RandomUnderSampler() >>> array = TransformerArray(transformer, parameters=params) >>> array.fit(df) >>> array.transform(df)
- set_dataset_suffixes(name_sufix: str | List[str]) None
Set suffixes to be set to transformed Dataset.
- Parameters:
name_sufix (str, list) – Suffixes to be set to a transformed Dataset.
- get_dataset_suffixes() str | List[str] | None
Get suffixes for transformed Dataset.
- Returns:
Suffixes for a transformed Dataset.
- Return type:
str, list
- set_params(**params) None
Set params for Transformer.
- Parameters:
params (dict) – The parameters to be set.
- fit(dataset: Dataset | DatasetArray) None
Fit the transformer.
- Parameters:
dataset (Dataset, DatasetArray) – The object to fit Transformer on.
- transform(dataset: Dataset | DatasetArray) Dataset | DatasetArray
Transform the object.
- Parameters:
dataset (Dataset, DatasetArray) – The object to be transformed.
- Returns:
The transformed object.
- Return type:
- property transformers: List[Transformer | TransformerArray | List]
All the Transformer objects used by this object.
- Return type:
list[Transformer, TransformerArray, list]
- property base_transformer: Transformer
Base transformers for creation of this object.
- Return type:
edgaro.balancing.nested_transformer module
- class edgaro.balancing.nested_transformer.NestedAutomaticTransformer(base_transformers: List[Transformer], base_transformers_names: List[str], keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)
Bases:
Transformer
Create NestedAutomaticTransformer.
This object creates an object that behaves like a Transformer, but transforms a single Dataset using not only one, but more methods. Moreover, you do not specify Imbalance Ratio values, but they are automatically calculated. You pass only an argument n_per_method and the class will balance the Dataset with that number of Imbalance Raio values.
Note: If you use NestedAutomaticTransformer (or children class) as a parameter to TransformerArray, it is advisable to pass parameter set_suffixes=False in TransformerArray object. Otherwise, the suffixes will be distorted.
- Parameters:
base_transformers (list(Transformer)) – List of base transformers.
base_transformers_names (list(str)) – List of names of base_transformers.
keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.
result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.
n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.
random_state (int, optional, default=None,) – Random state seed.
IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.
min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.
verbose (bool, default=False) – Print messages during calculations.
- Variables:
keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.
n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.
random_state (int, optional, default=None,) – Random state seed.
default='_automatic_transformed_array' (result_array_sufix str,) – Suffix of the transformed DatasetArray.
IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.
min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.
verbose (bool, default=False) – Print messages during calculations.
- fit(dataset: Dataset) None
Fit the transformer.
- Parameters:
dataset (Dataset) – The object to fit Transformer on.
- transform(dataset: Dataset) DatasetArray
Transform the object.
- Parameters:
dataset (Dataset) – The object to be transformed.
- Returns:
The transformed object.
- Return type:
- set_dataset_suffixes(name_sufix: str | List[str | List[str]]) None
Set suffixes to be set to transformed Dataset.
- Parameters:
name_sufix (str, list) – Suffixes to be set to a transformed Dataset.
- get_dataset_suffixes() str | List[str] | None
Get suffixes for transformed Dataset.
- Returns:
Suffixes for a transformed Dataset.
- Return type:
str, list
- set_params(**params) None
Set params for Transformer.
- Parameters:
params (dict) – The parameters to be set.
- get_params() List[Dict | List]
Get parameters of Transformer.
- Returns:
The parameters.
- Return type:
Dict, list
- property was_fitted: bool
The information whether the Transformer was fitted.
- Return type:
bool
- property transformers: List[Transformer | TransformerArray | List]
All the Transformer objects used by this object.
- Return type:
list[Transformer, TransformerArray, list]
- property base_transformer: Transformer | TransformerArray | List
Base transformers for creation of this object.
- Return type:
list[Transformer, TransformerArray, list]
- class edgaro.balancing.nested_transformer.BasicAutomaticTransformer(keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)
Bases:
NestedAutomaticTransformer
Create BasicAutomaticTransformer.
This object contains three most popular methods - Random Under Sampler, Random Over Sampler and SMOTE.
This class can be used for both continuous (numerical) and categorical data.
- Parameters:
keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.
result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.
n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.
random_state (int, optional, default=None,) – Random state seed.
IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.
min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.
verbose (bool, default=False) – Print messages during calculations.
- class edgaro.balancing.nested_transformer.ExtensionAutomaticTransformer(keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)
Bases:
NestedAutomaticTransformer
Create ExtensionAutomaticTransformer.
This object contains three more complex methods implemented in imblearn. There are methods used for oversampling (BorderlineSMOTE), undersampling (NearMiss) and there is also a hybrid method (SMOTETomek).
This class can be used only for continuous (numerical) data.
- Parameters:
keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.
result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.
n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.
random_state (int, optional, default=None,) – Random state seed.
IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.
min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.
verbose (bool, default=False) – Print messages during calculations.
- class edgaro.balancing.nested_transformer.AutomaticTransformer(keep_original_dataset: bool = False, result_array_sufix: str = '_automatic_transformed_array', n_per_method: int = 5, random_state: int | None = None, IR_round_precision: int = 2, min_samples_to_modify: int | None = None, verbose: bool = False)
Bases:
NestedAutomaticTransformer
Create AutomaticTransformer.
This object contains six methods implemented in imblearn. There are methods used for oversampling (RandomOverSampling, BorderlineSMOTE), undersampling (RandomUnderSampling, NearMiss) and there is also a hybrid method (SMOTETomek).
This class can be used only for continuous (numerical) data.
- Parameters:
keep_original_dataset (bool, default=False) – Keep the original Dataset after transformations or not.
result_array_sufix str, default=’_automatic_transformed_array’ – Suffix of the transformed DatasetArray.
n_per_method (int, default=5) – Number of intermediate Imbalance Ratio values.
random_state (int, optional, default=None,) – Random state seed.
IR_round_precision (int, default=2) – Round precision of Imbalance Ratio when printing.
min_samples_to_modify (int, optional, default=None) – Minimal number of samples to modify to create an intermediate Imbalance Ratio. If the number of modified observations is less than this number, the n_per_method parameter will be modified.
verbose (bool, default=False) – Print messages during calculations.