fit#
- pycafee.sample.outliers.ZScore.fit(self, x_exp, which=None, critical=None, details=None)#
This function applies the Z-score test for outlier detection
- Parameters
- x_exp
numpy array One dimension numpy array with at least 3 sample data.
- details
str, optional The
detailsparameter determines the amount of information presented about the hypothesis test.If
details = "short"(orNone, e.g, the default), a simplified version of the test result is returned.if
details = "binary", the conclusion will be1(data has outlier) or0(data has no outlier).
- which
str, optional The value that should be evaluated as a possible outlier.
If it is
None(default), the outlier is automatically inferred as the farthest observation from the meanIf it is
"max", the highest value is checked if it is a possible outlier.If it is
"min", the lowest value is checked if it is a possible outlier.
- critical
intorfloat, optional The critical value of the test (default is
3).
- x_exp
- Returns
- result
tuplewith - statistic
float The test statistic.
- critical
float The critical value.
- outlier
floatorint The value checked as a possible outlier
- statistic
- conclusion
strorint The test conclusion (e.g, Possible outlier/ no outliers).
- result
See also
Notes
The ZScore test for outlier detection compares a possible outlier with the critical value for the standard Normal distribution. The test statistic is calculated using the following equation:
\[Z_i = \frac{|x_i - \overline{x}|}{s}\]where \(x_i\) is the possible outlier, \(\overline{x}\) is the sample mean and \(s\) is the sample standard deviation.
By default, the critical value is
3[1], which corresponds to 99.8% of confidencial level. The conclusion of the test is based on the comparison between thecriticalvalue and thestatisticof the test:if critical <= statistic: Data does not have a outlier else: Data has a outlier
References
- 1
OSBORNE, J. W.; OVERBAY, A. The power of outliers (and why researchers should ALWAYS check for them). Practical Assessment, Research, and Evaluation, v. 9, n. 6, p. 1–8, 2004.
Examples
>>> from pycafee.sample.outliers import ZScore >>> import numpy as np >>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9]) >>> test = ZScore() >>> result, conclusion = test.fit(x) >>> print(result) ZScoreResult(Statistic=1.8533964859229188, critical=3, outlier=5.4) >>> print(conclusion) The dataset has no outliers