fit#

pycafee.normalitycheck.kolmogorovsmirnov.KolmogorovSmirnov.fit(self, x_exp, alfa=None, comparison=None, details=None)#

This function is just a wraper around scipy.stats.kstest() [1] to perform the Kolmogorov Smirnov normality test, but with some facilities.

The main difference between this method and the original one is that this wrap only allows the comparison of a sample with the Normal distribution, using cdf="norm", with the data beeing considered sample data, e.g:

>>> scipy.stats.kstest(x_exp, cdf='norm', args=(x_exp.mean(), x_exp.std(ddof=1)), N = x_exp.size)

Thus, the results obtained here are reliable only for sample data.

Parameters
x_expnumpy array

One dimension numpy array with at least 3 sample data.

alfafloat, optional

The level of significance (ɑ). Default is None which results in 0.05 (ɑ = 5%).

comparisonstr, optional

This parameter determines how to perform the comparison test to perform the Normality test.

  • If comparison = "critical" (or None, e.g., the default), the comparison test is made between the critical value (with ɑ significance level) and the calculated value of the test statistic.

  • If "p-value", the comparison test is performed between the p-value and the adopted significance level (ɑ).

Both results should lead to the same conclusion.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

  • If details = "short" (or None, e.g, the default), a simplified version of the test result is returned.

  • If details = "full", a detailed version of the hypothesis test result is returned.

  • if details = "binary", the conclusion will be 1 (\(H_0\) is rejected) or 0 (\(H_0\) is accepted).

Returns
resulttuple with
statisticfloat

The test statistic.

criticalfloat or None

The critical value for alpha equal to 1%, 5%, 10%, 15% or 20%. Other values will return None.

p_valuefloat

The p-value for the hypothesis test.

conclusionstr

The test conclusion (e.g, Normal/ not Normal).

Notes

The tabulated values [2] include samples with sizes between 2 and 35, for ɑ equal to 1%, 5%, 10%, 15% or 20%. For data with a sample size higher than 35, the critical value returned is an aproximation.

The Kolmogorov Smirnov Normality test has the following premise:

\(H_0:\) data comes from Normal distribution.

\(H_1:\) data does not come from Normal distribution.

By default (comparison = "critical"), the conclusion is based on the comparison between the critical value (at ɑ significance level) and statistic of the test:

if critical >= statistic:
    Data is Normal
else:
    Data is not Normal

The other option (comparison = "p-value") makes the conclusion comparing the p-value with ɑ:

if p-value >= ɑ:
    Data is Normal
else:
    Data is not Normal

If comparison = "critical" and ɑ is not 0.01, 0.05, 0.10, 0.15 or 0.20, the function will raise ValueError.

References

1

SCIPY. scipy.stats.kstest. Available at: www.scipy.org. Access on: 10 May. 2022.

2

FRANK J. MASSEY, J. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, v. 46, n. 253, p. 68–78, 1951. DOI: 10.2307/2280095.

Examples

Applying the test with default values

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x)
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.05177647360597687, Critical=0.136, p_value=0.9514328623966609, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test using the p-value to make the conclusion

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, comparison='p-value')
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.05177647360597687, Critical=0.136, p_value=0.9514328623966609, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test at 1% of significance level

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import numpy as np
>>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568])
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, alfa=0.01)
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.17709753067016487, Critical=0.49, p_value=0.9123891112746063, Alpha=0.01)
>>> print(conclusion)
Data is Normal at a 99.0% of confidence level.

Applying the test with a detailed conclusion

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, alfa=0.10, details="full")
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.15459867079959644, Critical=0.368, p_value=0.9706128123504146, Alpha=0.1)
>>> print(conclusion)
Since the critical value (0.368) >= statistic (0.154), we have NO evidence to reject the hypothesis of data normality, according to the Kolmogorov Smirnov test at a 90.0% of confidence level.

Applying the test using a not Normal data

>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov
>>> import numpy as np
>>> x =  np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9])
>>> ks_test = KolmogorovSmirnov()
>>> result, conclusion = ks_test.fit(x, alfa = 0.05, comparison = "p-value", details="full")
>>> print(result)
KolmogorovSmirnovResult(Statistic=0.3072356484569813, Critical=0.301, p_value=0.04334566682403149, Alpha=0.05)
>>> print(conclusion)
Since p-value (0.043) < alpha (0.05), we HAVE evidence to reject the hypothesis of data normality, according to the Kolmogorov Smirnov test at a 95.0% of confidence level.