fit#

pycafee.normalitycheck.kolmogorovsmirnov.KolmogorovSmirnov.fit(self, x_exp, alfa=None, comparison=None, details=None)#

This function is just a wraper around scipy.stats.kstest() [1] to perform the Kolmogorov Smirnov normality test, but with some facilities.

The main difference between this method and the original one is that this wrap only allows the comparison of a sample with the Normal distribution, using cdf="norm", with the data beeing considered sample data, e.g:

>>> scipy.stats.kstest(x_exp, cdf='norm', args=(x_exp.mean(), x_exp.std(ddof=1)), N = x_exp.size)

☕

Thus, the results obtained here are reliable only for sample data.

Parameters

x_expnumpy array

One dimension numpy array with at least 3 sample data.

alfafloat, optional

The level of significance (ɑ). Default is None which results in 0.05 (ɑ = 5%).

comparisonstr, optional

This parameter determines how to perform the comparison test to perform the Normality test.

If comparison = "critical" (or None, e.g., the default), the comparison test is made between the critical value (with ɑ significance level) and the calculated value of the test statistic.
If "p-value", the comparison test is performed between the p-value and the adopted significance level (ɑ).

Both results should lead to the same conclusion.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

If details = "short" (or None, e.g, the default), a simplified version of the test result is returned.
If details = "full", a detailed version of the hypothesis test result is returned.
if details = "binary", the conclusion will be 1 (\(H_0\) is rejected) or 0 (\(H_0\) is accepted).

Returns

resulttuple with

statisticfloat: The test statistic.
criticalfloat or None: The critical value for alpha equal to 1%, 5%, 10%, 15% or 20%. Other values will return None.
p_valuefloat: The p-value for the hypothesis test.

conclusionstr

The test conclusion (e.g, Normal/ not Normal).

See also

pycafee.normalitycheck.abdimolin.AbdiMolin.fit
pycafee.normalitycheck.andersondarling.AndersonDarling.fit
pycafee.normalitycheck.lilliefors.Lilliefors.fit
pycafee.normalitycheck.shapirowilk.ShapiroWilk.fit

Notes

The tabulated values [2] include samples with sizes between 2 and 35, for ɑ equal to 1%, 5%, 10%, 15% or 20%. For data with a sample size higher than 35, the critical value returned is an aproximation.

The Kolmogorov Smirnov Normality test has the following premise:

☕

\(H_0:\) data comes from Normal distribution.

\(H_1:\) data does not come from Normal distribution.

By default (comparison = "critical"), the conclusion is based on the comparison between the critical value (at ɑ significance level) and statistic of the test:

if critical >= statistic:
    Data is Normal
else:
    Data is not Normal

The other option (comparison = "p-value") makes the conclusion comparing the p-value with ɑ:

if p-value >= ɑ:
    Data is Normal
else:
    Data is not Normal

If comparison = "critical" and ɑ is not 0.01, 0.05, 0.10, 0.15 or 0.20, the function will raise ValueError.

References

1: SCIPY. scipy.stats.kstest. Available at: www.scipy.org. Access on: 10 May. 2022.
2: FRANK J. MASSEY, J. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, v. 46, n. 253, p. 68–78, 1951. DOI: 10.2307/2280095.

Examples

Applying the test with default values

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x)
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.05177647360597687, Critical=0.136, p_value=0.9514328623966609, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test using the p-value to make the conclusion

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, comparison='p-value')
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.05177647360597687, Critical=0.136, p_value=0.9514328623966609, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test at 1% of significance level

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import numpy as np
>>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568])
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, alfa=0.01)
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.17709753067016487, Critical=0.49, p_value=0.9123891112746063, Alpha=0.01)
>>> print(conclusion)
Data is Normal at a 99.0% of confidence level.

Applying the test with a detailed conclusion

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, alfa=0.10, details="full")
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.15459867079959644, Critical=0.368, p_value=0.9706128123504146, Alpha=0.1)
>>> print(conclusion)
Since the critical value (0.368) >= statistic (0.154), we have NO evidence to reject the hypothesis of data normality, according to the Kolmogorov Smirnov test at a 90.0% of confidence level.

Applying the test using a not Normal data

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import numpy as np
>>> x =  np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9])
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, alfa = 0.05, comparison = "p-value", details="full")
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.3072356484569813, Critical=0.301, p_value=0.04334566682403149, Alpha=0.05)
>>> print(conclusion)
Since p-value (0.043) < alpha (0.05), we HAVE evidence to reject the hypothesis of data normality, according to the Kolmogorov Smirnov test at a 95.0% of confidence level.

Kolmogorov Smirnov

to_xlsx