fit#
- pycafee.normalitycheck.andersondarling.AndersonDarling.fit(self, x_exp, alfa=None, comparison=None, details=None)#
This function is a wraper around
scipy.stats.anderson()[1] andstatsmodels.stats.diagnostic.normal_ad()[2] to perform the AndersonDarling Normality test [3], but with some facilities.The method used to perform the test depends on the
comparisonparameter, but both models are fixed to compare a sample with the Normal distribution. Hence, we use:>>> scipy.stats.anderson(x, dist='norm')
and
>>> statsmodels.stats.diagnostic.normal_ad(x)
- Parameters
- x_exp
numpy array One dimension numpy array with at least
4sample data.- alfa
float, optional The level of significance (
ɑ). Default isNonewhich results in0.05(ɑ = 5%).- comparison
str, optional This parameter determines how to perform the comparison test to evaluate the Normality test and which api to use.
If
comparison = "critical"(orNone), the comparison test is performed by comparing the critical value (withɑsignificance level) with the test statistic, using theSciPymethod [1].If
comparison = "p-value", the comparison test is performed comparing thep-valuewith the adopted significance level (ɑ), using the statsmodels method [2].
Both results should lead to the same conclusion.
- details
str, optional The
detailsparameter determines the amount of information presented about the hypothesis test.If
details = "short"(orNone), a simplified version of the test result is returned.If
details = "full", a detailed version of the hypothesis test result is returned.if
details = "binary", the conclusion will be1(\(H_0\) is rejected) or0(\(H_0\) is accepted).
- x_exp
- Returns
- result
tuplewith - statistic
float The test statistic.
- critical
floatorNone The tabulated value for alpha equal to
1%,5%,10%,15%or20%. Other values will returnNone.- p_value
floatorNone The p-value for the hypothesis test.
- statistic
- conclusion
str The test conclusion (e.g, Normal/ not Normal).
- result
See also
Notes
The critical values [2] includes samples with sizes between
4and20in addition to the values for25and30samples, forɑequal to1%,5%,10%,15%or20%.For data with sample size between
21and24(20 < n_rep < 25), the critical value returned is the value for25observations;For data with sample size between
26and29(25 < n_rep < 30), the critical value returned is the value for30observations;For data with a sample size higher than
31(n_rep > 30), the critical value returned is the aproximation proposed by the authors.
The Anderson Darling Normality test has the following premise:
☕
\(H_0:\) data comes from Normal distribution.
\(H_1:\) data does not come from Normal distribution.
By default (
comparison = "critical"), the conclusion is based on the comparison between thecriticalvalue (atɑsignificance level) andstatisticof the test. The results are calculated usingscipy.stats.anderson(), and thep-valuewill beNone. In summary:if critical >= statistic: Data is Normal else: Data is not Normal
The other option (
comparison="p-value") makes the conclusion comparing thep-valuewithɑ. The results are calculated usingstatsmodels.stats.diagnostic.normal_ad()and thecriticalvalue will beNone. In summary:if p-value >= ɑ: Data is Normal else: Data is not Normal
References
- 1(1,2)
SCIPY. scipy_stats_anderson. Available at: www.scipy.org. Access on: 10 May. 2022
- 2(1,2,3)
STATSMODELS. statsmodels_stats_diagnostic_normal_ad. Available at: www.statsmodels.org. Access on: 10 May. 2022
- 3
STEPHENS, M. A. EDF Statistics for Goodness of Fit and Some Comparisons. Journal of the American Statistical Association, v. 69, n. 347, p. 730–737, 1974. DOI: 10.2307/2286009.
Examples
Applying the test with default values
>>> from pycafee.normalitycheck.andersondarling import AndersonDarling >>> import scipy.stats as stats >>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42) >>> andersondarling_test = AndersonDarling() >>> result, conclusion = andersondarling_test.fit(x) >>> print(result) AndersonDarlingResult(Statistic=0.25343395875111696, Critical=0.759, p_value=None, Alpha=0.05) >>> print(conclusion) Data is Normal at a 95.0% of confidence level.
Applying the test using the
p-valueto make the conclusion>>> from pycafee.normalitycheck.andersondarling import AndersonDarling >>> import scipy.stats as stats >>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42) >>> andersondarling_test = AndersonDarling() >>> result, conclusion = andersondarling_test.fit(x, comparison="p-value") >>> print(result) AndersonDarlingResult(Statistic=0.25343395875111696, Critical=None, p_value=0.7268427515457196, Alpha=0.05) >>> print(conclusion) Data is Normal at a 95.0% of confidence level.
Applying the test at
1%of significance level>>> from pycafee.normalitycheck.andersondarling import AndersonDarling >>> import numpy as np >>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568]) >>> andersondarling_test = AndersonDarling() >>> result, conclusion = andersondarling_test.fit(x, alfa=0.01) >>> print(result) AndersonDarlingResult(Statistic=0.3416856332675007, Critical=0.95, p_value=None, Alpha=0.01) >>> print(conclusion) Data is Normal at a 99.0% of confidence level.
Applying the test with a detailed conclusion
>>> from pycafee.normalitycheck.andersondarling import AndersonDarling >>> import numpy as np >>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9]) >>> andersondarling_test = AndersonDarling() >>> result, conclusion = andersondarling_test.fit(x, alfa=0.10, details="full") >>> print(result) AndersonDarlingResult(Statistic=0.22687861079050364, Critical=0.57, p_value=None, Alpha=0.1) >>> print(conclusion) Since the critical value (0.57) >= statistic (0.226), we have NO evidence to reject the hypothesis of data normality, according to the AndersonDarling test at a 90.0% of confidence level.
Applying the test using a not Normal data
>>> from pycafee.normalitycheck.andersondarling import AndersonDarling >>> import numpy as np >>> x = np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9]) >>> andersondarling_test = AndersonDarling() >>> result, conclusion = andersondarling_test.fit(x, alfa=0.10, details="full", comparison='p-value') >>> print(result) AndersonDarlingResult(Statistic=2.5532223880710276, Critical=None, p_value=9.992220237231388e-07, Alpha=0.1) >>> print(conclusion) Since p-value (0.0) < alpha (0.1), we HAVE evidence to reject the hypothesis of data normality, according to the AndersonDarling test at a 90.0% of confidence level.