fit#

pycafee.normalitycheck.andersondarling.AndersonDarling.fit(self, x_exp, alfa=None, comparison=None, details=None)#

This function is a wraper around scipy.stats.anderson() [1] and statsmodels.stats.diagnostic.normal_ad() [2] to perform the AndersonDarling Normality test [3], but with some facilities.

The method used to perform the test depends on the comparison parameter, but both models are fixed to compare a sample with the Normal distribution. Hence, we use:

>>> scipy.stats.anderson(x, dist='norm')

and

>>> statsmodels.stats.diagnostic.normal_ad(x)
Parameters
x_expnumpy array

One dimension numpy array with at least 4 sample data.

alfafloat, optional

The level of significance (ɑ). Default is None which results in 0.05 (ɑ = 5%).

comparisonstr, optional

This parameter determines how to perform the comparison test to evaluate the Normality test and which api to use.

  • If comparison = "critical" (or None), the comparison test is performed by comparing the critical value (with ɑ significance level) with the test statistic, using the SciPy method [1].

  • If comparison = "p-value", the comparison test is performed comparing the p-value with the adopted significance level (ɑ), using the statsmodels method [2].

Both results should lead to the same conclusion.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

  • If details = "short" (or None), a simplified version of the test result is returned.

  • If details = "full", a detailed version of the hypothesis test result is returned.

  • if details = "binary", the conclusion will be 1 (\(H_0\) is rejected) or 0 (\(H_0\) is accepted).

Returns
resulttuple with
statisticfloat

The test statistic.

criticalfloat or None

The tabulated value for alpha equal to 1%, 5%, 10%, 15% or 20%. Other values will return None.

p_valuefloat or None

The p-value for the hypothesis test.

conclusionstr

The test conclusion (e.g, Normal/ not Normal).

Notes

The critical values [2] includes samples with sizes between 4 and 20 in addition to the values for 25 and 30 samples, for ɑ equal to 1%, 5%, 10%, 15% or 20%.

  • For data with sample size between 21 and 24 (20 < n_rep < 25), the critical value returned is the value for 25 observations;

  • For data with sample size between 26 and 29 (25 < n_rep < 30), the critical value returned is the value for 30 observations;

  • For data with a sample size higher than 31 (n_rep > 30), the critical value returned is the aproximation proposed by the authors.

The Anderson Darling Normality test has the following premise:

\(H_0:\) data comes from Normal distribution.

\(H_1:\) data does not come from Normal distribution.

By default (comparison = "critical"), the conclusion is based on the comparison between the critical value (at ɑ significance level) and statistic of the test. The results are calculated using scipy.stats.anderson(), and the p-value will be None. In summary:

if critical >= statistic:
    Data is Normal
else:
    Data is not Normal

The other option (comparison="p-value") makes the conclusion comparing the p-value with ɑ. The results are calculated using statsmodels.stats.diagnostic.normal_ad() and the critical value will be None. In summary:

if p-value >= ɑ:
    Data is Normal
else:
    Data is not Normal

References

1(1,2)

SCIPY. scipy_stats_anderson. Available at: www.scipy.org. Access on: 10 May. 2022

2(1,2,3)

STATSMODELS. statsmodels_stats_diagnostic_normal_ad. Available at: www.statsmodels.org. Access on: 10 May. 2022

3

STEPHENS, M. A. EDF Statistics for Goodness of Fit and Some Comparisons. Journal of the American Statistical Association, v. 69, n. 347, p. 730–737, 1974. DOI: 10.2307/2286009.

Examples

Applying the test with default values

>>> from pycafee.normalitycheck.andersondarling import AndersonDarling
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> andersondarling_test = AndersonDarling()
>>> result, conclusion = andersondarling_test.fit(x)
>>> print(result)
AndersonDarlingResult(Statistic=0.25343395875111696, Critical=0.759, p_value=None, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test using the p-value to make the conclusion

>>> from pycafee.normalitycheck.andersondarling import AndersonDarling
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> andersondarling_test = AndersonDarling()
>>> result, conclusion = andersondarling_test.fit(x, comparison="p-value")
>>> print(result)
AndersonDarlingResult(Statistic=0.25343395875111696, Critical=None, p_value=0.7268427515457196, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test at 1% of significance level

>>> from pycafee.normalitycheck.andersondarling import AndersonDarling
>>> import numpy as np
>>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568])
>>> andersondarling_test = AndersonDarling()
>>> result, conclusion = andersondarling_test.fit(x, alfa=0.01)
>>> print(result)
AndersonDarlingResult(Statistic=0.3416856332675007, Critical=0.95, p_value=None, Alpha=0.01)
>>> print(conclusion)
Data is Normal at a 99.0% of confidence level.

Applying the test with a detailed conclusion

>>> from pycafee.normalitycheck.andersondarling import AndersonDarling
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> andersondarling_test = AndersonDarling()
>>> result, conclusion = andersondarling_test.fit(x, alfa=0.10, details="full")
>>> print(result)
AndersonDarlingResult(Statistic=0.22687861079050364, Critical=0.57, p_value=None, Alpha=0.1)
>>> print(conclusion)
Since the critical value (0.57) >= statistic (0.226), we have NO evidence to reject the hypothesis of data normality, according to the AndersonDarling test at a 90.0% of confidence level.

Applying the test using a not Normal data

>>> from pycafee.normalitycheck.andersondarling import AndersonDarling
>>> import numpy as np
>>> x = np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9])
>>> andersondarling_test = AndersonDarling()
>>> result, conclusion = andersondarling_test.fit(x, alfa=0.10, details="full", comparison='p-value')
>>> print(result)
AndersonDarlingResult(Statistic=2.5532223880710276, Critical=None, p_value=9.992220237231388e-07, Alpha=0.1)
>>> print(conclusion)
Since p-value (0.0) < alpha (0.1), we HAVE evidence to reject the hypothesis of data normality, according to the AndersonDarling test at a 90.0% of confidence level.