The class defines Explainer for a Model object - it allows to calculate PDP [1] or ALE [2] curves.
Parameters:
model (Model) – A Model object to calculate explanations on.
N (int, optional, default=None) – Number of observations that will be sampled from the test Dataset before the calculation of profiles
(PDP/ALE curves). None means all data.
explanation_type ({‘PDP’, ‘ALE’, ‘VI’}, default=’PDP’) – An explanation type to be calculated (PDP - Partial Dependence Profile, ALE - Accumulated Local Effects,
VI - Variable Importance)
verbose (bool, default=False) – Print messages during calculations.
processes (int, default=1) – Number of processes for the calculation of explanations.
If -1, it is replaced with the number of available CPU cores.
random_state (int, optional, default=None) – Random state seed.
B (int, optional, default=10) – Number of permutation rounds to perform on each variable - applicable only if explanation_type=’VI’.
performance_metric_name (str, default=’balanced_accuracy’) – Name of the performance metric.
performance_metric (callable, default=balanced_accuracy_score) – Name of the performance metric.
Variables:
model (Model) – A Model object to calculate explanations on.
name (str) – A name of the Explainer, by default it is a Model name.
N (int, optional, default=None) – Number of observations that will be sampled from the test Dataset before the calculation of profiles
(PDP/ALE curves). None means all data.
explanation_type ({'PDP', 'ALE', 'VI'}) – An explanation type to be calculated.
verbose (bool) – Print messages during calculations.
explainer (dx.Explainer, optional) – An explainer object from dalex package.
processes (int) – Number of processes for the calculation of explanations.
If -1, it is replaced with the number of available CPU cores.
random_state (int, optional) – Random state seed
B (int, optional) – Number of permutation rounds to perform on each variable - applicable only if explanation_type=’VI’.
performance_metric_name (str) – Name of the performance metric.
performance_metric (callable) – Name of the performance metric.
Create a class to calculate PDP [1], ALE [2] curves or Variable Importance for Model and ModelArray objects.
Parameters:
models (Model, ModelArray) – A Model/ModelArray object to calculate the curves on.
N (int, optional, default=None) – Number of observations that will be sampled from the test Dataset before the calculation of profiles
(PDP/ALE curves). None means all data.
explanation_type ({‘PDP’, ‘ALE’, ‘VI’}, default=’PDP’) – An explanation type to be calculated (PDP - Partial Dependence Profile, ALE - Accumulated Local Effects,
VI - Variable Importance)
verbose (bool, default=False) – Print messages during calculations.
processes (int, default=1) – Number of processes for the calculation of explanations.
If -1, it is replaced with the number of available CPU cores.
random_state (int, optional, default=None) – Random state seed.
B (int, optional, default=10) – Number of permutation rounds to perform on each variable - applicable only if explanation_type=’VI’.
performance_metric_name (str) – Name of the performance metric.
performance_metric (callable) – Name of the performance metric.
Variables:
models (Model, ModelArray) – A Model/ModelArray object to calculate the curves for.
name (str) – A name of the ExplainerArray, by default it is a Model/ModelArray name.
sub_calculators (list[Explainer, ExplainerArray], optional) – A list of calculators for nested Datasets/DatasetArrays.
N (int, optional) – Number of observations that will be sampled from the test Dataset before the calculation of profiles
(PDP/ALE curves). None means all data.
explanation_type ({'PDP', 'ALE', 'VI'}, default='PDP') – An explanation type to be calculated.
verbose (bool) – Print messages during calculations.
processes (int) – Number of processes for the calculation of explanations.
If -1, it is replaced with the number of available CPU cores.
random_state (int, optional) – Random state seed
B (int, optional) – Number of permutation rounds to perform on each variable - applicable only if explanation_type=’VI’.
performance_metric_name (str) – Name of the performance metric.
performance_metric (callable) – Name of the performance metric.
variable (str, optional, default=None) – Variable for which the plot should be generated. If None, the first column is plotted.
figsize (tuple(int, int), optional, default=(8, 8)) – Size of a figure.
add_plot (list[ModelProfileExplanation], optional, default=None) – List of other ModelProfileExplanation objects that also contain the variable and should be plotted.
ax (matplotlib.axes.Axes, optional, default=None) – The parameter should be passed if the plot is to be created in a certain Axis. In that situation, figsize
parameter is ignored.
show_legend (bool, default=True) – The parameter indicates whether the legend should be plotted.
y_lim (tuple(float, float), optional, default=None) – The limits of 0Y axis.
metric_precision (int, default=5) – Number of digits to round the value of metric value*10^5.
centered (bool, default = False) – If True, the plots will be centered to start at 0.
The function calculates the metric to compare the curves for a given variable(s).
Currently, there is only one comparison metric called SDD (Standard Deviation of Distances). It is the variance of
the distances between curve in this object and curves in other in intermediate points.
If there is more than one other object and return_raw_per_variable=False, the mean variance
is returned (ASDD - Averaged SDD).
Parameters:
other (list[ModelProfileExplanation]) – List of ModelProfileExplanation objects to compare the curve against.
variable (str, list[str], optional, default=None) – List of variable names to calculate the metric distances. If None, the metrics are calculated for
all the columns in this object.
return_raw_per_variable (bool, default=False) – If True, raw values for each variable are returned.
The function plots the Variable Importance profile.
Parameters:
variable (str, list[str], optional, default=None) – Variable for which the VI should be plotted. If None, the all columns is plotted.
figsize (tuple(int, int), optional, default=(8, 8)) – Size of a figure.
add_plot (list[ModelPartsExplanation], optional, default=None) – List of other ModelPartsExplanation objects that also contain the variable and should be plotted.
max_variables (int, optional, default=None) – Maximal number of variables from the current object to be taken into account.
ax (matplotlib.axes.Axes, optional, default=None) – The parameter should be passed if the plot is to be created in a certain Axis. In that situation, figsize
parameter is ignored.
show_legend (bool, default=True) – The parameter indicates whether the legend should be plotted.
x_lim (tuple(float, float), optional, default=None) – The limits of 0X axis.
metric_precision (int, default=5) – Number of digits to round the value of the metric value.
The function calculates the metric to compare model parts of two or more models.
Currently, there is only one metric based on the Wilcoxon statistical test [3]. The metric value is the
p-value of this test, where the inputs are variable importance values.
The idea is based on the approach presented in the article [4].
Parameters:
other (list[ModelPartsExplanation]) – List of ModelPartsExplanation objects to compare the curve against.
variable (str, list[str], optional, default=None) – List of variable names to calculate the metric distances. If None, the metrics are calculated for
all the columns in this object.
max_variables (int, optional, default=None) – Maximal number of variables from the current object to be taken into account.
return_raw (bool, default=True) – If True, the p-values are returned for each model. Otherwise, the mean value is returned.
The function plots the PDP/ALE curves for given variables using all available Curves in the object.
Parameters:
index_base (int, str, default=-1) – Index of a curve to be a base for comparisons.
variables (list[str], optional, default=None) – Variables for which the plot should be generated. If None, plots for all variables are generated if all the
available ModelProfileExplanation objects have exactly the same set of column names.
n_col (int, default=3) – Number of columns in the final plot.
figsize (tuple(int, int), optional, default=None) – The size of a figure. If None, the figure size is calculates as (8 * n_col, 8 * n_rows).
model_filter (str, optional, default=None) – A regex expression to filter the names of the ModelProfileExplanation objects for comparing.
centered (bool, default = False) – If True, the plots will be centered to start at 0.
variable (list[str], optional, default=None) – List of variable names to calculate the metric distances. If None, the metrics are calculated for
all the columns in this object.
index_base (int, str, default=-1) – Index of a curve to be a base for comparisons.
return_raw (bool, default=True) – If True, the metrics for each of the model are returned. Otherwise, the mean of the values is returned.
return_raw_per_variable (bool, default=True) – If True, the metrics for each of the variables are returned. Otherwise, the mean of the values is returned.
model_filter (str, optional, default=None) – A regex expression to filter the names of the ModelProfileExplanation objects for comparing.
The function plots boxplots of comparison metrics of curves in the object.
Parameters:
variables (list[str], optional, default=None) – Variables for which the plot should be generated. If None, plots for all variables are generated if all the
available ModelProfileExplanation objects have exactly the same set of column names.
figsize (tuple(int, int), optional, default=None) – The size of a figure.
model_filters (list[str], optional, default=None) – List of regex expressions to filter the names of the ModelProfileExplanation objects for comparing.
Each element in the list creates a new boxplot. If None, one boxplot of all results is plotted.
filter_labels (list[str], optional, default=None) – Labels of model filters.
index_base (int, str, default=-1) – Index of a curve to be a base for comparisons.
return_df (bool, default=False) – If True, the method returns a dataframe on which a plot is created.
The function plots performance gain analysis plot which compares ASDD values and
difference in performance metric values.
Parameters:
variables (list[str], optional, default=None) – Variables for which the plot should be generated. If None, plots for all variables are generated if all the
available ModelProfileExplanation objects have exactly the same set of column names.
figsize (tuple(int, int), optional, default=None) – The size of a figure.
model_filters (list[str], optional, default=None) – List of regex expressions to filter the names of the ModelProfileExplanation objects for comparing.
Each element in the list creates a new boxplot. If None, one boxplot of all results is plotted.
filter_labels (list[str], optional, default=None) – Labels of model filters.
index_base (int, str, default=-1) – Index of a curve to be a base for comparisons.
return_df (bool, default=False) – If True, the method returns a dataframe on which a plot is created.
percent (bool, default=False) – If True, the percentage change will be plotted instead of difference.
ax (matplotlib.axes.Axes, optional) – ax to plot on
The function plots the Variable Importance profile using all ModelPartsExplanation objects.
Parameters:
variable (str, list[str], optional, default=None) – Variable for which the VI should be plotted. If None, the all columns is plotted.
figsize (tuple(int, int), optional, default=(8, 8)) – Size of a figure.
max_variables (int, optional, default=None) – Maximal number of variables from the current object to be taken into account.
ax (matplotlib.axes.Axes, optional, default=None) – The parameter should be passed if the plot is to be created in a certain Axis. In that situation, figsize
parameter is ignored.
show_legend (bool, default=True) – The parameter indicates whether the legend should be plotted.
x_lim (tuple(float, float), optional, default=None) – The limits of 0X axis.
metric_precision (int, default=5) – Number of digits to round the value of the metric value.
index_base (int, str, default=-1) – Index of an explanation to be a base for comparisons.
The function compares variable importance in the array.
Parameters:
variable (str, list[str], optional, default=None) – List of variable names to calculate the metric distances. If None, the metrics are calculated for
all the columns in this object.
max_variables (int, optional, default=None) – Maximal number of variables from the current object to be taken into account.
return_raw (bool, default=True) – If True, the p-values are returned for each model. Otherwise, the mean value is returned.
index_base (int, str, default=-1) – Index of an explanation to be a base for comparisons.
model_filter (str, optional, default=None) – A regex expression to filter the names of the ModelPartsExplanation objects for comparing.
The function plots boxplots of comparison metrics of VI in the object if significance_level is provided.
Otherwise, the results of the statistical test are plotted as barplots according to the significance_level.
Parameters:
variables (str, list[str], optional, default=None) – Variable for which the VI should be plotted. If None, the all columns is plotted.
figsize (tuple(int, int), optional, default=(8, 8)) – Size of a figure.
model_filters (list[str], optional, default=None) – List of regex expressions to filter the names of the ModelPartsExplanation objects for comparing.
Each element in the list creates a new boxplot. If None, one boxplot / barplot of all results is plotted.
filter_labels (list[str], optional, default=None) – Labels of model filters.
index_base (int, str, default=-1) – Index of an explanation to be a base for comparisons.
max_variables (int, optional, default=None) – Maximal number of variables from the current object to be taken into account.
significance_level (float, optional, default=None) – A significance level of the statistical test (metric).
fdr_correction (bool, default=True) – Add p-value correction for false discovery rate. Note that it is used only if significance_level is not None.
return_df (bool, default=False) – If True, the method returns a dataframe on which a plot is created.