The Problem
We have samples from two independent (unrelated) groups of individuals and one numerical or ordinal variable of interest. We are interested in whether the mean or distribution of the variable is the same in the two groups. For example, we may wish to compare the weights in two groups of children, each child being randomly allocated to receive either a dietary supplement or placebo.
The Unpaired (Two-Sample) t-Test
Assumptions
In the population, the variable is Normally distributed in each group and the variances of the two groups are the same. In addition, we have reasonable sample sizes so that we can check the assumptions of Normality and equal variances.
Rationale
We consider the difference in the means of the two groups. Under the null hypothesis that the population means in the two groups are the same, this difference will equal zero. Therefore, we use a test statistic that is based on the difference in the two sample means, and on the value of the difference in population means under the null hypothesis (i.e. zero). This test statistic, often referred to as t, follows the t-distribution.
Notation
Our two samples are of size n1 and n2. Their means are and ; their standard deviations are s1 and s2.
If s is an estimate of the pooled standard deviation of the two groups,
then the test statistic is given by t, where
which follows the t-distribution with (n1 + n2 − 2) degrees of freedom.
Refer t to Appendix A2. When the sample sizes in the two groups are large, the t-distribution approximates a Normal distribution, and then we reject the null hypothesis at the 5% level if the absolute value (i.e. ignoring the sign) of t is greater than 1.96.
Interpret the P-value and calculate a confidence interval for the difference in the two means. The 95% confidence interval, assuming equal variances, is given by
where t0.05 is the percentage point of the t-distribution with (n1 + n2 − 2) degrees of freedom which gives a two-tailed probability of 0.05.
Interpretation of the Confidence Interval
The upper and lower limits of the confidence interval can be used to assess whether the difference between the two mean values is clinically important. For example, if the upper and/or lower limit is close to zero, this indicates that the true difference may be very small and clinically meaningless, even if the test is statistically significant.
If the Assumptions are not Satisfied
When the sample sizes are reasonably large, the t-test is fairly robust (Chapter 35) to departures from Normality. However, it is less robust to unequal variances. There is a modification of the unpaired t-test that allows for unequal variances, and results from it are often provided in computer output. However, if there are concerns that the assumptions are not satisfied, then the data can either be transformed (Chapter 9) to achieve approximate Normality and/or equal variances, or a non-parametric test such as the Wilcoxon rank sum test can be used.
The Wilcoxon Rank Sum (Two-Sample) Test
Rationale
The Wilcoxon rank sum test makes no distributional assumptions and is the non-parametric equivalent to the unpaired t-test. The test is based on the sum of the ranks of the values in each of the two groups; these should be comparable after allowing for differences in sample size if the groups have similar distributions. An equivalent test, known as the Mann–Whitney U test, gives identical results although it is slightly more complicated to carry out by hand.