fit#

pycafee.normalitycheck.shapirowilk.ShapiroWilk.fit(self, x_exp, alfa=None, comparison=None, details=None)#

This function is just a wraper around scipy.stats.shapiro() [1] to perform the Shapiro Wilk [2] normality test, but with some facilities.

The test is performed using:

>>> scipy.stats.shapiro(x_exp)
Parameters
x_expnumpy array

One dimension numpy array with at least 3 sample data.

alfafloat, optional

The level of significance (ɑ). Default is None which results in 0.05 (ɑ = 5%).

comparisonstr, optional

This parameter determines how to perform the comparison test to perform the Normality test.

  • If comparison = 'critical' (or None, e.g, the default), the comparison test is made between the critical value (with ɑ significance level) and the calculated value of the test statistic.

  • If "p-value", the comparison test is performed between the p-value and the adopted significance level (ɑ).

Both results should lead to the same conclusion.

detailsstr, optional

The details parameter determines the amount of information presented about the hypothesis test.

  • If details = "short" (or None, e.g, the default), a simplified version of the test result is returned.

  • If details = "full", a detailed version of the hypothesis test result is returned.

  • if details = "binary", the conclusion will be 1 (\(H_0\) is rejected) or 0 (\(H_0\) is accepted).

Returns
resulttuple with
statisticfloat

The test statistic.

criticalfloat or None

The critical value for alpha equal to 1%, 2%, 5%, 10% or 50%. Other values will return None.

p_valuefloat

The p-value for the hypothesis test.

conclusionstr

The test conclusion (e.g, Normal/ not Normal).

Notes

The tabulated values [2] include samples with sizes between 3 and 50, for ɑ equal to 1%, 2%, 5%, 10%, 20% or 50%. For data with a sample size higher than 50, the critical value returned is the value for n_rep = 50.

The Shapiro Wilk Normality test has the following premise:

\(H_0:\) data comes from Normal distribution.

\(H_1:\) data does not come from Normal distribution.

By default (comparison = "critical"), the conclusion is based on the comparison between the critical value (at ɑ significance level) and statistic of the test:

if critical <= statistic:
    Data is Normal
else:
    Data is not Normal

Note that the comparison between the critical value and the test statistic is made in the opposite way to what is usually done in most Normality tests.

The other option (comparison = "p-value") makes the conclusion comparing the p-value with ɑ:

if p-value >= ɑ:
    Data is Normal
else:
    Data is not Normal

If comparison = "critical" and ɑ is not 0.01, 0.02, 0.05, 0.10 or 0.50, the function will raise ValueError.

References

1

SCIPY. scipy.stats.shapiro. Available at: www.scipy.org. Access on: 10 May. 2022.

2(1,2)

SHAPIRO, S. S.; WILK, M. B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika, v. 52, n. 3, p. 591–611, 1965. DOI: 10.2307/2333709.

Examples

Applying the test with default values

>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> sw_test = ShapiroWilk()
>>> result, conclusion = sw_test.fit(x)
>>> print(result)
ShapiroWilkResult(Statistic=0.9898831844329834, Critical=0.947, p_value=0.6551515460014343, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test using the p-value to make the conclusion

>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk
>>> import scipy.stats as stats
>>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42)
>>> sw_test = ShapiroWilk()
>>> result, conclusion = sw_test.fit(x, comparison='p-value')
>>> print(result)
ShapiroWilkResult(Statistic=0.9898831844329834, Critical=0.947, p_value=0.6551515460014343, Alpha=0.05)
>>> print(conclusion)
Data is Normal at a 95.0% of confidence level.

Applying the test at 1% of significance level

>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk
>>> import numpy as np
>>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568])
>>> sw_test = ShapiroWilk()
>>> result, conclusion = sw_test.fit(x, alfa=0.01)
>>> print(result)
ShapiroWilkResult(Statistic=0.9266945719718933, Critical=0.781, p_value=0.41617822647094727, Alpha=0.01)
>>> print(conclusion)
Data is Normal at a 99.0% of confidence level.

Applying the test with a detailed conclusion

>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk
>>> import numpy as np
>>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9])
>>> sw_test = ShapiroWilk()
>>> result, conclusion = sw_test.fit(x, alfa=0.10, details="full")
>>> print(result)
ShapiroWilkResult(Statistic=0.9698116779327393, Critical=0.869, p_value=0.8890941739082336, Alpha=0.1)
>>> print(conclusion)
Since the critical value (0.869) >= statistic (0.969), we have NO evidence to reject the hypothesis of data normality, according to the Shapiro Wilk test at a 90.0% of confidence level.

Applying the test using a not Normal data

>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk
>>> import numpy as np
>>> x =  np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9])
>>> sw_test = ShapiroWilk()
>>> result, conclusion = sw_test.fit(x, alfa = 0.05, comparison = "p-value", details="full")
>>> print(result)
ShapiroWilkResult(Statistic=0.7012777924537659, Critical=0.901, p_value=5.757619874202646e-05, Alpha=0.05)
>>> print(conclusion)
Since p-value (0.0) < alpha (0.05), we HAVE evidence to reject the hypothesis of data normality, according to the Shapiro Wilk test at a 95.0% of confidence level.