Metrics#

Metrics for the regression setting.

class RegressionMetrics(**data)[source]#

Model for regression metrics.

We will use this model to validate the benchmark results.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

get_regression_metrics(predictions, labels)[source]#

Get regression metrics.

Parameters:
  • predictions (ArrayLike) – predictions for one objective

  • labels (ArrayLike) – true labels for one objective

Returns:

regression metrics

Return type:

RegressionMetrics

Examples

>>> predictions = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> labels = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> get_regression_metrics(predictions, labels)
RegressionMetrics(**{'mean_absolute_error': 0.0,
'mape': 0.0,
'mean_squared_error': 0.0,
'r2': 1.0,
'max_error': 0.0,
'top_5_in_top_5': 5,
'top_10_in_top_10': 10,
'top_100_in_top_100': 100,
'top_500_in_top_500': 500})
top_n_in_top_k(predictions, labels, k, n, maximize=True)[source]#

Find how many of the top n predictions are in the top k labels.

Parameters:
  • predictions (ArrayLike) – predictions for one objective

  • labels (ArrayLike) – true labels for one objective

  • k (int) – number of top labels to consider

  • n (int) – number of top predictions to consider

  • maximize (bool) – Set to True if larger is better. Defaults to True.

Examples

>>> predictions = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> labels = [0.1, 0.2, 0.3, 0.4, 0.5]
>>> top_n_in_top_k(predictions, labels, k=2, n=2)
2
Returns:

number of top n predictions in top k labels

Return type:

int

Helpers for adverserial validation

class AdverserialValidator(x_a, x_b, modeltype='rf', k=5)[source]#

Helper for adverserial validation.

Adverserial is a method to estimate how different two datasets are. Most commonly, it is used to estimate if the train and test sets come from the same distribution. It has found widespread use in data science competions [KaggleBook], but more recently, also in some auto-ML systems.

The basic idea is quite simple: Train a classifier to distinguish two datasets. If it can learn to do so, there are differences, if it cannot, then the datasets are indistinguishable. Additionally, this approach also enables us to investigate what the most important features for this difference are. If one aims to reduce data drift problems, one might remove those features [Uber].

Here, we use simple ensemble classifiers such as random forests or extra trees to compute the k-fold crossvalidated area under the receiver-operating characteristic curve.

Example

>>> from mofdscribe.metrics.adverserial import AdverserialValidator
>>> x_a = np.array([[1, 2, 3], [4, 5, 6]])
>>> x_b = np.array([[1, 2, 3], [4, 5, 6]])
>>> validator = AdverserialValidator(x_a, x_b)
>>> validator.score().mean()
0.5

References

[Uber]

Pan, J.; Pham, V.; Dorairaj, M.; Chen, H.; Lee, J.-Y. arXiv:2004.03045 June 26, 2020.

[KaggleBook]

Banachewicz, K.; Massaron, L. The Kaggle Book: Data Analysis and Machine Learning for Competitive Data Science; Packt Publishing, 2022.

Initiate a AdverserialValidator instance.

Parameters:
  • x_a (Union[ArrayLike, pd.DataFrame]) – Data for the first dataset (e.g. training set).

  • x_b (Union[ArrayLike, pd.DataFrame]) – Data for the second dataset (e.g. test set).

  • modeltype (str) – Classifier to train. Defaults to “rf”.

  • k (int) – Number of folds in k-fold crossvalidation. Defaults to 5.

Raises:

ValueError – If the chosen modeltype is not supported.

score()[source]#

Compute the area under the receiver-operating characteristic curve.

A score close to 0.5 means that the two datasets are similar.

Returns:

Areas under the receiver-operating characteristic curve.

Return type:

np.array

get_feature_importance()[source]#

Identify the features distinguishing the two datasets.

Uses the default impurity-based feature importance.

Returns:

Feature importance scores.

Return type:

np.array