The Problem
We have samples from a number of independent groups. We have a single numerical or ordinal variable and are interested in whether the average value of the variable varies in the different groups, e.g. whether the average platelet count varies in groups of women with different ethnic backgrounds. Although we could perform tests to compare the averages in each pair of groups, the high Type I error rate, resulting from the large number of comparisons, means that we may draw incorrect conclusions (Chapter 18). Therefore, we carry out a single global test to determine whether the averages differ in any groups.
One-Way Analysis of Variance
Assumptions
The groups are defined by the levels of a single factor (e.g. different ethnic backgrounds). In the population of interest, the variable is Normally distributed in each group and the variance in every group is the same. We have a reasonable sample size so that we can check these assumptions.
Rationale
The one-way analysis of variance separates the total variability in the data into that which can be attributed to differences between the individuals from the different groups (the between-group variation) and to the random variation between the individuals within each group (the within-group variation, sometimes called unexplained or residual variation). These components of variation are measured using variances, hence the name analysis of variance (ANOVA). Under the null hypothesis that the group means are the same, the between-group variance will be similar to the within-group variance. If, however, there are differences between the groups, then the between-group variance will be larger than the within-group variance. The test is based on the ratio of these two variances.
Notation
We have k independent samples, each derived from a different group. The sample sizes, means and standard deviations in each group are ni, and si, respectively (i = 1, 2, … , k). The total sample size is n = n1 + n2 + … + nk.
The test statistic for ANOVA is a ratio, F, of the between-group variance to the within-group variance. This F-statistic follows the F-distribution (Chapter 8) with (k − 1, n − k) degrees of freedom in the numerator and denominator, respectively.
The calculations involved in ANOVA are complex and are not shown here. Most computer packages will output the values directly in an ANOVA table, which usually includes the F-ratio and P-value (see Example 1).
Refer the F-ratio to Appendix A5. Because the between-group variation is greater than or equal to the within-group variation, we look at the one-sided P-values.
If we obtain a significant result at this initial stage, we may consider performing specific pairwise post hoc comparisons. We can use one of a number of special tests devised for this purpose (e.g. Duncan’s, Scheffé’s) or we can use the unpaired t-test (Chapter 21) adjusted for multiple hypothesis testing (Chapter 18). We can also calculate a confidence interval for each individual group mean (Chapter 11). Note that we use a pooled estimate of the variance of the values from all groups when calculating confidence intervals and performing t-tests. Most packages refer to this estimate of the variance as the residual variance or residual mean square; it is found in the ANOVA table.