fit#

pycafee.sample.outliers.ZScore.fit(self, x_exp, which=None, critical=None, details=None)#

This function applies the Z-score test for outlier detection

Parameters
x_expnumpy array

One dimension numpy array with at least 3 sample data.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

  • If details = "short" (or None, e.g, the default), a simplified version of the test result is returned.

  • if details = "binary", the conclusion will be 1 (data has outlier) or 0 (data has no outlier).

whichstr, optional

The value that should be evaluated as a possible outlier.

  • If it is None (default), the outlier is automatically inferred as the farthest observation from the mean

  • If it is "max", the highest value is checked if it is a possible outlier.

  • If it is "min", the lowest value is checked if it is a possible outlier.

criticalint or float, optional

The critical value of the test (default is 3).

Returns
resulttuple with
statisticfloat

The test statistic.

criticalfloat

The critical value.

outlierfloat or int

The value checked as a possible outlier

conclusionstr or int

The test conclusion (e.g, Possible outlier/ no outliers).

Notes

The ZScore test for outlier detection compares a possible outlier with the critical value for the standard Normal distribution. The test statistic is calculated using the following equation:

\[Z_i = \frac{|x_i - \overline{x}|}{s}\]

where \(x_i\) is the possible outlier, \(\overline{x}\) is the sample mean and \(s\) is the sample standard deviation.

By default, the critical value is 3 [1], which corresponds to 99.8% of confidencial level. The conclusion of the test is based on the comparison between the critical value and the statistic of the test:

if critical <= statistic:
    Data does not have a outlier
else:
    Data has a outlier

References

1

OSBORNE, J. W.; OVERBAY, A. The power of outliers (and why researchers should ALWAYS check for them). Practical Assessment, Research, and Evaluation, v. 9, n. 6, p. 1–8, 2004.

Examples

>>> from pycafee.sample.outliers import ZScore
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> test = ZScore()
>>> result, conclusion = test.fit(x)
>>> print(result)
ZScoreResult(Statistic=1.8533964859229188, critical=3, outlier=5.4)
>>> print(conclusion)
The dataset has no outliers