fit#
- pycafee.sample.outliers.ModifiedZScore.fit(self, x_exp, which=None, critical=None, details=None)#
This function applies the Modified Z-score test for outlier detection [1].
- Parameters
- x_exp
numpy array One dimension numpy array with at least 3 sample data.
- details
str, optional The
detailsparameter determines the amount of information presented about the hypothesis test.If
details = "short"(orNone, e.g, the default), a simplified version of the test result is returned.if
details = "binary", the conclusion will be1(data has outlier) or0(data has no outlier).
- which
str, optional The value that should be evaluated as a possible outlier.
If it is
None(default), the outlier is automatically inferred as the farthest observation from the meanIf it is
"max", the highest value is checked if it is a possible outlier.If it is
"min", the lowest value is checked if it is a possible outlier.
- critical
intorfloat, optional The critical value of the test (default is
3.5).
- x_exp
- Returns
- result
tuplewith - statistic
float The test statistic.
- critical
float The critical value.
- outlier
floatorint The value checked as a possible outlier
- statistic
- conclusion
strorint The test conclusion (e.g, Possible outlier/ no outliers).
- result
See also
Notes
The ModifiedZScore test for outlier detection compares a possible outlier with a pre establish critical value. The test statistic is calculated using the following equation:
\[M_i = \frac{0.6745(|x_i-\widetilde{x}|)}{MAD}\]where \(x_i\) is the possible outlier, \(\widetilde{x}\) is the sample median and \(MAD\) is the median of the absolute deviations about the median which is obtained with the following equation
\[MAD = median_i\left\{|x_i-\widetilde{x}|\right\}\]By default, the critical value is
3.5[1]. The conclusion of the test is based on the comparison between thecriticalvalue and thestatisticof the test:if critical <= statistic: Data does not have a outlier else: Data has a outlier
References
- 1(1,2)
IGLEWICZ, B.; HOAGLIN, D. C. The ASQC Basic References in Quality Control: Statistical Techniques Volume 16: How to Detect and Handle Outliers. Milwaukee: BookCrafters, Inc, 1993.
Examples
>>> from pycafee.sample.outliers import ModifiedZScore >>> import numpy as np >>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9]) >>> test = ModifiedZScore() >>> result, conclusion = test.fit(x) >>> print(result) ModifiedZScoreResult(Statistic=1.6862500000000022, critical=3.5, outlier=5.4) >>> print(conclusion) The dataset has no outliers