fit#
- pycafee.normalitycheck.kolmogorovsmirnov.KolmogorovSmirnov.fit(self, x_exp, alfa=None, comparison=None, details=None)#
This function is just a wraper around
scipy.stats.kstest()[1] to perform the Kolmogorov Smirnov normality test, but with some facilities.The main difference between this method and the original one is that this wrap only allows the comparison of a sample with the Normal distribution, using
cdf="norm", with the data beeing considered sample data, e.g:>>> scipy.stats.kstest(x_exp, cdf='norm', args=(x_exp.mean(), x_exp.std(ddof=1)), N = x_exp.size)
☕
Thus, the results obtained here are reliable only for sample data.
- Parameters
- x_exp
numpy array One dimension numpy array with at least 3 sample data.
- alfa
float, optional The level of significance (
ɑ). Default isNonewhich results in0.05(ɑ = 5%).- comparison
str, optional This parameter determines how to perform the comparison test to perform the Normality test.
If
comparison = "critical"(orNone, e.g., the default), the comparison test is made between the critical value (withɑsignificance level) and the calculated value of the test statistic.If
"p-value", the comparison test is performed between thep-valueand the adopted significance level (ɑ).
Both results should lead to the same conclusion.
- details
str, optional The
detailsparameter determines the amount of information presented about the hypothesis test.If
details = "short"(orNone, e.g, the default), a simplified version of the test result is returned.If
details = "full", a detailed version of the hypothesis test result is returned.if
details = "binary", the conclusion will be1(\(H_0\) is rejected) or0(\(H_0\) is accepted).
- x_exp
- Returns
- result
tuplewith - statistic
float The test statistic.
- critical
floatorNone The critical value for alpha equal to
1%,5%,10%,15%or20%. Other values will returnNone.- p_value
float The p-value for the hypothesis test.
- statistic
- conclusion
str The test conclusion (e.g, Normal/ not Normal).
- result
See also
Notes
The tabulated values [2] include samples with sizes between
2and35, forɑequal to1%,5%,10%,15%or20%. For data with a sample size higher than35, the critical value returned is an aproximation.The Kolmogorov Smirnov Normality test has the following premise:
☕
\(H_0:\) data comes from Normal distribution.
\(H_1:\) data does not come from Normal distribution.
By default (
comparison = "critical"), the conclusion is based on the comparison between thecriticalvalue (atɑsignificance level) andstatisticof the test:if critical >= statistic: Data is Normal else: Data is not Normal
The other option (
comparison = "p-value") makes the conclusion comparing thep-valuewithɑ:if p-value >= ɑ: Data is Normal else: Data is not Normal
If
comparison = "critical"andɑis not0.01,0.05,0.10,0.15or0.20, the function will raiseValueError.References
- 1
SCIPY. scipy.stats.kstest. Available at: www.scipy.org. Access on: 10 May. 2022.
- 2
FRANK J. MASSEY, J. The Kolmogorov-Smirnov Test for Goodness of Fit. Journal of the American Statistical Association, v. 46, n. 253, p. 68–78, 1951. DOI: 10.2307/2280095.
Examples
Applying the test with default values
>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov >>> import scipy.stats as stats >>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42) >>> ks_test = KolmogorovSmirnov() >>> result, conclusion = ks_test.fit(x) >>> print(result) KolmogorovSmirnovResult(Statistic=0.05177647360597687, Critical=0.136, p_value=0.9514328623966609, Alpha=0.05) >>> print(conclusion) Data is Normal at a 95.0% of confidence level.
Applying the test using the
p-valueto make the conclusion>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov >>> import scipy.stats as stats >>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42) >>> ks_test = KolmogorovSmirnov() >>> result, conclusion = ks_test.fit(x, comparison='p-value') >>> print(result) KolmogorovSmirnovResult(Statistic=0.05177647360597687, Critical=0.136, p_value=0.9514328623966609, Alpha=0.05) >>> print(conclusion) Data is Normal at a 95.0% of confidence level.
Applying the test at
1%of significance level>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov >>> import numpy as np >>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568]) >>> ks_test = KolmogorovSmirnov() >>> result, conclusion = ks_test.fit(x, alfa=0.01) >>> print(result) KolmogorovSmirnovResult(Statistic=0.17709753067016487, Critical=0.49, p_value=0.9123891112746063, Alpha=0.01) >>> print(conclusion) Data is Normal at a 99.0% of confidence level.
Applying the test with a detailed conclusion
>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov >>> import numpy as np >>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9]) >>> ks_test = KolmogorovSmirnov() >>> result, conclusion = ks_test.fit(x, alfa=0.10, details="full") >>> print(result) KolmogorovSmirnovResult(Statistic=0.15459867079959644, Critical=0.368, p_value=0.9706128123504146, Alpha=0.1) >>> print(conclusion) Since the critical value (0.368) >= statistic (0.154), we have NO evidence to reject the hypothesis of data normality, according to the Kolmogorov Smirnov test at a 90.0% of confidence level.
Applying the test using a not Normal data
>>> from pycafee.normalitycheck.kolmogorovsmirnov import KolmogorovSmirnov >>> import numpy as np >>> x = np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9]) >>> ks_test = KolmogorovSmirnov() >>> result, conclusion = ks_test.fit(x, alfa = 0.05, comparison = "p-value", details="full") >>> print(result) KolmogorovSmirnovResult(Statistic=0.3072356484569813, Critical=0.301, p_value=0.04334566682403149, Alpha=0.05) >>> print(conclusion) Since p-value (0.043) < alpha (0.05), we HAVE evidence to reject the hypothesis of data normality, according to the Kolmogorov Smirnov test at a 95.0% of confidence level.