fit#

pycafee.normalitycheck.lilliefors.Lilliefors.fit(self, x_exp, alfa=None, comparison=None, details=None)#

This function is a wraper around statsmodels.stats.diagnostic.lilliefors() [1] to perform the Lilliefors Normality test, but with some facilities.

The main difference between this method and the original one is that this wrap only allows the comparison of a sample with the Normal distribution, using dist="norm". Also, the method to estimate the p-value is set to table, using pvalmethod="table". Hence:

>>> statsmodels.stats.diagnostic(x_exp, dist="norm", pvalmethod="table")
Parameters
x_expnumpy array

One dimension numpy array with at least 4 sample data.

alfafloat, optional

The level of significance (ɑ). Default is None which results in 0.05 (ɑ = 5%).

comparisonstr, optional

This parameter determines how to perform the comparison test to evaluate the Normality test.

  • If comparison = "critical" (or None), the comparison test is performed by comparing the critical value (with ɑ significance level) with the test statistic.

  • If comparison="p-value", the comparison test is performed comparing the p-value with the adopted significance level (ɑ).

Both results should lead to the same comparison.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

  • If details = "short" (or None), a simplified version of the test result is returned.

  • If details = "full", a detailed version of the hypothesis test result is returned.

  • if details = "binary", the conclusion will be 1 (\(H_0\) is rejected) or 0 (\(H_0\) is accepted).

Returns
resulttuple with
statisticfloat

The test statistic.

criticalfloat or None

The tabulated value for alpha equal to 1%, 5%, 10%, 15% or 20%. Other values will return None.

p_valuefloat

The p-value for the hypothesis test.

conclusionstr or int

The test conclusion (e.g, Normal/ not Normal).

Notes

The critical values [2] includes samples with sizes between 4 and 20 in addition to the values for 25 and 30 samples, for ɑ equal to 1%, 5%, 10%, 15% or 20%.

  • For data with sample size between 21 and 24 (20 < n_rep < 25), the critical value returned is the value for 25 observations;

  • For data with sample size between 26 and 29 (25 < n_rep < 30), the critical value returned is the value for 30 observations;

  • For data with a sample size higher than 31 (n_rep > 30), the critical value returned is the aproximation proposed by the authors.

The Lilliefors Normality test has the following premise:

\(H_0:\) data comes from Normal distribution.

\(H_1:\) data does not come from Normal distribution.

By default (comparison="critical"), the conclusion is based on the comparison between the critical value (at ɑ significance level) and statistic of the test. In summary:

if critical >= statistic:
    Data is Normal
else:
    Data is not Normal

The other option (comparison="p-value") makes the conclusion comparing the p-value with ɑ:

if p-value >= ɑ:
    Data is Normal
else:
    Data is not Normal

If comparison="critical" and ɑ is not 0.01, 0.05, 0.10, 0.15 or 0.20, the function will raise ValueError.

References

1

STATSMODELS. statsmodels.stats.diagnostic.lilliefors. Available at: www.statsmodels.org. Access on: 10 May. 2022

2

Hubert W. Lilliefors (1967) On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown, Journal of the American Statistical Association, 62:318, 399-402, DOI: 10.1080/01621459.1967.10482916.

Examples

Applying the test with default values

>>> from pycafee.normalitycheck.lilliefors import Lilliefors
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> li_test = Lilliefors()
>>> result, conclusion = li_test.fit(x)
>>> print(result)
LillieforsResult(Statistic=0.05177647360597687, Critical=0.0866, p_value=0.7370142762533124, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test using the p-value to make the conclusion

>>> from pycafee.normalitycheck.lilliefors import Lilliefors
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> li_test = Lilliefors()
>>> result, conclusion = li_test.fit(x, comparison='p-value')
>>> print(result)
LillieforsResult(Statistic=0.05177647360597687, Critical=0.0866, p_value=0.7370142762533124, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test at 1% of significance level

>>> from pycafee.normalitycheck.lilliefors import Lilliefors
>>> import numpy as np
>>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568])
>>> li_test = Lilliefors()
>>> result, conclusion = li_test.fit(x, alfa=0.01)
>>> print(result)
LillieforsResult(Statistic=0.17709753067016487, Critical=0.294, p_value=0.4976450090923252, Alpha=0.01)
>>> print(conclusion)
Data is Normal at a 99.0% of confidence level.

Applying the test with a detailed conclusion

>>> from pycafee.normalitycheck.lilliefors import Lilliefors
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> li_test = Lilliefors()
>>> result, conclusion = li_test.fit(x, alfa=0.10, details="full")
>>> print(result)
LillieforsResult(Statistic=0.15459867079959644, Critical=0.239, p_value=0.7104644322958894, Alpha=0.1)
>>> print(conclusion)
Since the critical value (0.239) >= statistic (0.154), we have NO evidence to reject the hypothesis of data normality, according to the Lilliefors test at a 90.0% of confidence level.

Applying the test using a not Normal data

>>> from pycafee.normalitycheck.lilliefors import Lilliefors
>>> import numpy as np
>>> x =  np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9])
>>> li_test = Lilliefors()
>>> result, conclusion = li_test.fit(x, alfa = 0.05, comparison = "p-value", details="full")
>>> print(result)
LillieforsResult(Statistic=0.3072356484569813, Critical=0.195, p_value=0.0009999999999998899, Alpha=0.05)
>>> print(conclusion)
Since p-value (0.0) < alpha (0.05), we HAVE evidence to reject the hypothesis of data normality, according to the Lilliefors test at a 95.0% of confidence level.