fit#

pycafee.sample.outliers.ModifiedZScore.fit(self, x_exp, which=None, critical=None, details=None)#

This function applies the Modified Z-score test for outlier detection [1].

Parameters
x_expnumpy array

One dimension numpy array with at least 3 sample data.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

  • If details = "short" (or None, e.g, the default), a simplified version of the test result is returned.

  • if details = "binary", the conclusion will be 1 (data has outlier) or 0 (data has no outlier).

whichstr, optional

The value that should be evaluated as a possible outlier.

  • If it is None (default), the outlier is automatically inferred as the farthest observation from the mean

  • If it is "max", the highest value is checked if it is a possible outlier.

  • If it is "min", the lowest value is checked if it is a possible outlier.

criticalint or float, optional

The critical value of the test (default is 3.5).

Returns
resulttuple with
statisticfloat

The test statistic.

criticalfloat

The critical value.

outlierfloat or int

The value checked as a possible outlier

conclusionstr or int

The test conclusion (e.g, Possible outlier/ no outliers).

Notes

The ModifiedZScore test for outlier detection compares a possible outlier with a pre establish critical value. The test statistic is calculated using the following equation:

\[M_i = \frac{0.6745(|x_i-\widetilde{x}|)}{MAD}\]

where \(x_i\) is the possible outlier, \(\widetilde{x}\) is the sample median and \(MAD\) is the median of the absolute deviations about the median which is obtained with the following equation

\[MAD = median_i\left\{|x_i-\widetilde{x}|\right\}\]

By default, the critical value is 3.5 [1]. The conclusion of the test is based on the comparison between the critical value and the statistic of the test:

if critical <= statistic:
    Data does not have a outlier
else:
    Data has a outlier

References

1(1,2)

IGLEWICZ, B.; HOAGLIN, D. C. The ASQC Basic References in Quality Control: Statistical Techniques Volume 16: How to Detect and Handle Outliers. Milwaukee: BookCrafters, Inc, 1993.

Examples

>>> from pycafee.sample.outliers import ModifiedZScore
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> test = ModifiedZScore()
>>> result, conclusion = test.fit(x)
>>> print(result)
ModifiedZScoreResult(Statistic=1.6862500000000022, critical=3.5, outlier=5.4)
>>> print(conclusion)
The dataset has no outliers