Numerical data: more than two groups


c22-fig-5002


The Problem


We have samples from a number of independent groups. We have a single numerical or ordinal variable and are interested in whether the average value of the variable varies in the different groups, e.g. whether the average platelet count varies in groups of women with different ethnic backgrounds. Although we could perform tests to compare the averages in each pair of groups, the high Type I error rate, resulting from the large number of comparisons, means that we may draw incorrect conclusions (Chapter 18). Therefore, we carry out a single global test to determine whether the averages differ in any groups.


One-Way Analysis of Variance


Assumptions


The groups are defined by the levels of a single factor (e.g. different ethnic backgrounds). In the population of interest, the variable is Normally distributed in each group and the variance in every group is the same. We have a reasonable sample size so that we can check these assumptions.


Rationale


The one-way analysis of variance separates the total variability in the data into that which can be attributed to differences between the individuals from the different groups (the between-group variation) and to the random variation between the individuals within each group (the within-group variation, sometimes called unexplained or residual variation). These components of variation are measured using variances, hence the name analysis of variance (ANOVA). Under the null hypothesis that the group means are the same, the between-group variance will be similar to the within-group variance. If, however, there are differences between the groups, then the between-group variance will be larger than the within-group variance. The test is based on the ratio of these two variances.


Notation


We have k independent samples, each derived from a different group. The sample sizes, means and standard deviations in each group are ni, c22ue001 and si, respectively (i = 1, 2, … , k). The total sample size is n = n1 + n2 + … + nk.









1 Define the null and alternative hypotheses under study
H0: all group means in the population are equal

H1: at least one group mean in the population differs from the others.

2 Collect relevant data from samples of individuals

3 Calculate the value of the test statistic specific to H0
The test statistic for ANOVA is a ratio, F, of the between-group variance to the within-group variance. This F-statistic follows the F-distribution (Chapter 8) with (k − 1, n − k) degrees of freedom in the numerator and denominator, respectively.
The calculations involved in ANOVA are complex and are not shown here. Most computer packages will output the values directly in an ANOVA table, which usually includes the F-ratio and P-value (see Example 1).

4 Compare the value of the test statistic to values from a known probability distribution
Refer the F-ratio to Appendix A5. Because the between-group variation is greater than or equal to the within-group variation, we look at the one-sided P-values.

5 Interpret the P-value and results
If we obtain a significant result at this initial stage, we may consider performing specific pairwise post hoc comparisons. We can use one of a number of special tests devised for this purpose (e.g. Duncan’s, Scheffé’s) or we can use the unpaired t-test (Chapter 21) adjusted for multiple hypothesis testing (Chapter 18). We can also calculate a confidence interval for each individual group mean (Chapter 11). Note that we use a pooled estimate of the variance of the values from all groups when calculating confidence intervals and performing t-tests. Most packages refer to this estimate of the variance as the residual variance or residual mean square; it is found in the ANOVA table.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 9, 2017 | Posted by in GENERAL & FAMILY MEDICINE | Comments Off on Numerical data: more than two groups

Full access? Get Clinical Tree

Get Clinical Tree app for offline access