How to Interpret Statistical Analysis Results
Written by: STATISTICA News
Statistical tests examine a variety of relationships in data, but they share some common elements. Typically, statistical tests state a null and alternative hypothesis, calculate a test statistic, and report an associated p-value, and then the analyst makes a conclusion from the tests. This process is followed for simple tests as well as complex ones. Once you achieve a basic understanding of the process of statistical hypothesis testing, the concepts can be generalized to all tests.
Stating the Hypothesis
Statistical tests start with a null and alternative hypothesis. These hypotheses are statements about the population from which the sample was drawn. The sample data are used to support either the null or alternative hypothesis. A given test has one or more standard null and alternative hypotheses. For example, a one sample t-test has three possible hypotheses:
where μ represents the population mean and μ0 is the hypothesized mean. The first is a two-sided hypothesis where the researcher is looking for a significant difference between the population mean and the hypothesized mean. The second and third are one-sided alternatives where the researcher hypothesized that the true mean is either greater than (2) or less than (3) the hypothesized mean.
In a test for normality of data, the null hypothesis is: H0 : X ~ N(μ,σ) versus the alternative that the data are not normally distributed.
Calculating the Test Statistic and p-Value
Test statistics are used to decide between the null and alternative hypotheses. They can follow one of a variety of statistical distributions. This makes test statistics harder to interpret. The critical value for deciding between the null and alternative hypothesis varies by test.
A p-value is the probability of obtaining a sample data set as extreme as the observed data, given that the null hypothesis is true. While not technically accurate, it is much easier to think of the p-value as support for the null hypothesis. Before the analysis, a threshold is chosen, called alpha or level of significance. If the calculated p-value is less than the threshold, typically 0.05, then the null hypothesis is rejected in favor of the alternative. Said another way, the test is statistically significant. In STATISTICA, statistically significant p-values are reported in red.
Conclusion and Interpretation of Results
The p-value computed by the test leads you to reject or fail to reject the null hypothesis. (When the p-value is reported in red, reject the null hypothesis.) This conclusion should then be interpreted in terms of your project. A good interpretation will not mention hypotheses or test statistics. The interpretation will simply state the conclusion in the context of the problem.
Fail to Reject H0
When a test fails to reject the null hypothesis, it means that insufficient evidence exists to support the alternative hypothesis. Some examples of this include:
- A significant difference does not exist between the population means of A and B
- The correlation between A and B is not significantly different from 0.
- The distribution of the data is not significantly different from Normal.
- The regression parameter does not explain a significant amount of the variability in y. (The regression parameter is not significantly different from 0.)
The conclusion is not to accept the null hypothesis. The insignificant result from the test may be because the null is true. It may also be because either random chance or too small of a sample made it impossible to detect the significance.
When a test does reject the null hypothesis, it does so in favor of the alternative hypothesis. The reject H0 conclusions for the same tests given above are:
- A significant difference exists between the population means of A and B. (Or, the population mean of A is significantly greater than the population mean of B.
- The correlation between A and B is significantly different from 0.
- The distribution of the data is significantly different from Normal.
- The regression parameter does explain a significant amount of the variability in y. (The regression parameter is significantly different from 0.)
In a pain relief study, researchers are studying the effects of the pain relief medicine, aspirin, compared to a placebo. Pain relief scores were recorded for two groups of people who were given either aspirin or the placebo. Greater pain relief scores indicate better pain relief. The hypothesis to test is that pain relief will be different for patients given the aspirin compared to the placebo. Let’s write these as statistical hypotheses.
H0 : μasprin = μplacebo
Ha : μasprin ≠ μplacebo
The null hypothesis states that average pain relief for patients given aspirin is equal to the average pain relief for patients given a placebo. The alternative hypothesis (which is what the research team believes to be true) states that the average pain relief for patients given aspirin is not equal to relief from the placebo.
Looking at the table of output, the sample mean pain relief for patients given aspirin is 59.1, which is greater than the average placebo pain relief of 56.3. These sample statistics are used to compute a test statistic to make inference about the populations they represent. That test statistic is computed to be 2.09. The test used follows a Studentized t distribution with N-2=16 degrees of freedom. The p-value for this test is 0.0527.
To make a conclusion, the p-value is compared to the alpha level of significance. In this case, alpha = 0.05. The p-value = 0.0527 > 0.05 = alpha. The test fails to reject the null hypothesis. The test does not show a significant difference between the mean pain relief for patients given aspirin and those given a placebo.
Conclusion: A significant difference does not exist between the population average pain relief for patients given aspirin vs. those given a placebo.
The conclusion is not that the means are equal, but that they are not significantly different. It is possible that a difference in average pain relief does exist between the two groups. One possible reason for this is that the experiment did not collect enough data (samples). With additional data points, the statistical power of the test is improved. Another possibility is that random chance led to a sample with greater variability or a different mean than what is typical of the population.