fit#
- pycafee.normalitycheck.shapirowilk.ShapiroWilk.fit(self, x_exp, alfa=None, comparison=None, details=None)#
This function is just a wraper around
scipy.stats.shapiro()[1] to perform the Shapiro Wilk [2] normality test, but with some facilities.The test is performed using:
>>> scipy.stats.shapiro(x_exp)
- Parameters
- x_exp
numpy array One dimension numpy array with at least 3 sample data.
- alfa
float, optional The level of significance (
ɑ). Default isNonewhich results in0.05(ɑ = 5%).- comparison
str, optional This parameter determines how to perform the comparison test to perform the Normality test.
If
comparison = 'critical'(orNone, e.g, the default), the comparison test is made between the critical value (withɑsignificance level) and the calculated value of the test statistic.If
"p-value", the comparison test is performed between the p-value and the adopted significance level (ɑ).
Both results should lead to the same conclusion.
- details
str, optional The
detailsparameter determines the amount of information presented about the hypothesis test.If
details = "short"(orNone, e.g, the default), a simplified version of the test result is returned.If
details = "full", a detailed version of the hypothesis test result is returned.if
details = "binary", the conclusion will be1(\(H_0\) is rejected) or0(\(H_0\) is accepted).
- x_exp
- Returns
- result
tuplewith - statistic
float The test statistic.
- critical
floatorNone The critical value for alpha equal to
1%,2%,5%,10%or50%. Other values will returnNone.- p_value
float The p-value for the hypothesis test.
- statistic
- conclusion
str The test conclusion (e.g, Normal/ not Normal).
- result
See also
Notes
The tabulated values [2] include samples with sizes between
3and50, forɑequal to1%,2%,5%,10%,20%or50%. For data with a sample size higher than50, the critical value returned is the value forn_rep = 50.The Shapiro Wilk Normality test has the following premise:
☕
\(H_0:\) data comes from Normal distribution.
\(H_1:\) data does not come from Normal distribution.
By default (
comparison = "critical"), the conclusion is based on the comparison between thecriticalvalue (atɑsignificance level) andstatisticof the test:if critical <= statistic: Data is Normal else: Data is not Normal
Note that the comparison between the critical value and the test statistic is made in the opposite way to what is usually done in most Normality tests.
The other option (
comparison = "p-value") makes the conclusion comparing thep-valuewithɑ:if p-value >= ɑ: Data is Normal else: Data is not Normal
If
comparison = "critical"andɑis not0.01,0.02,0.05,0.10or0.50, the function will raiseValueError.References
- 1
SCIPY. scipy.stats.shapiro. Available at: www.scipy.org. Access on: 10 May. 2022.
- 2(1,2)
SHAPIRO, S. S.; WILK, M. B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika, v. 52, n. 3, p. 591–611, 1965. DOI: 10.2307/2333709.
Examples
Applying the test with default values
>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk >>> import scipy.stats as stats >>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42) >>> sw_test = ShapiroWilk() >>> result, conclusion = sw_test.fit(x) >>> print(result) ShapiroWilkResult(Statistic=0.9898831844329834, Critical=0.947, p_value=0.6551515460014343, Alpha=0.05) >>> print(conclusion) Data is Normal at a 95.0% of confidence level.
Applying the test using the
p-valueto make the conclusion>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk >>> import scipy.stats as stats >>> x = stats.norm.rvs(loc=5, scale=3, size=100, random_state=42) >>> sw_test = ShapiroWilk() >>> result, conclusion = sw_test.fit(x, comparison='p-value') >>> print(result) ShapiroWilkResult(Statistic=0.9898831844329834, Critical=0.947, p_value=0.6551515460014343, Alpha=0.05) >>> print(conclusion) Data is Normal at a 95.0% of confidence level.
Applying the test at
1%of significance level>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk >>> import numpy as np >>> x = np.array([1.90642, 2.22488, 2.10288, 1.69742, 1.52229, 3.15435, 2.61826, 1.98492, 1.42738, 1.99568]) >>> sw_test = ShapiroWilk() >>> result, conclusion = sw_test.fit(x, alfa=0.01) >>> print(result) ShapiroWilkResult(Statistic=0.9266945719718933, Critical=0.781, p_value=0.41617822647094727, Alpha=0.01) >>> print(conclusion) Data is Normal at a 99.0% of confidence level.
Applying the test with a detailed conclusion
>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk >>> import numpy as np >>> x = np.array([5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9]) >>> sw_test = ShapiroWilk() >>> result, conclusion = sw_test.fit(x, alfa=0.10, details="full") >>> print(result) ShapiroWilkResult(Statistic=0.9698116779327393, Critical=0.869, p_value=0.8890941739082336, Alpha=0.1) >>> print(conclusion) Since the critical value (0.869) >= statistic (0.969), we have NO evidence to reject the hypothesis of data normality, according to the Shapiro Wilk test at a 90.0% of confidence level.
Applying the test using a not Normal data
>>> from pycafee.normalitycheck.shapirowilk import ShapiroWilk >>> import numpy as np >>> x = np.array([0.8, 1, 1.1, 1.15, 1.15, 1.2, 1.2, 1.2, 1.2, 1.6, 1.8, 2, 2.2, 3, 5, 8.2, 8.4, 8.6, 9]) >>> sw_test = ShapiroWilk() >>> result, conclusion = sw_test.fit(x, alfa = 0.05, comparison = "p-value", details="full") >>> print(result) ShapiroWilkResult(Statistic=0.7012777924537659, Critical=0.901, p_value=5.757619874202646e-05, Alpha=0.05) >>> print(conclusion) Since p-value (0.0) < alpha (0.05), we HAVE evidence to reject the hypothesis of data normality, according to the Shapiro Wilk test at a 95.0% of confidence level.