where t_α is the critical ratio to determine the probability of a false-positive (α) error if the null hypothesis is rejected; is the mean difference that was observed in the outcome variable, s_d is the standard deviation of the before-after differences, and N is the sample size.

To solve for N, several rearrangements and substitutions of terms in the equation must be made. First, everything can be squared and the equation rearranged so that N is in the numerator and s² is the variance of the distribution of d:

Next, the terms can be rearranged so that the equation for N in a paired (before and after) study becomes:

Now the t in the formula must be replaced with z. This provides a solution to a circular problem: To know the value of t, the degrees of freedom (df) must be known. The df depends on N, however, which is what the investigator is initially trying to calculate. Because the value of z is not dependent on df, and because z is close to t when the sample size is large, z can be used instead of t. The formula becomes:

In theory, using z instead of t might produce a slight underestimate of the sample size needed. In practice, however, using z seems to work well, and its use is customary. The previous formula is for a study using the paired t-test, in which each participant serves as his or her own control. For a study using Student’s t-test, such as a randomized controlled trial (RCT) with an experimental group and a control group, it would be necessary to calculate N for each group. The previous formula considers only the problem of alpha error; to minimize the possibility of beta error, a z term for beta error must be introduced as well. Before these topics are discussed, however, the answers to the three questions posed earlier should be explored more fully in light of the information provided by the formula for the calculation of N.

1. The larger the variance (s²), the larger the sample size must be because the variance is in the numerator of the formula for N. This makes sense intuitively because with a large variance (and large standard error), a larger N is needed to compensate for the greater uncertainty of the estimate.

2. To have considerable confidence that a mean difference shown in a study is real, the analysis must produce a small p value for the observed mean difference, which implies that the value for t_α or z_α was large. Because z_α is in the numerator of the sample size formula, the larger z_α is, the larger the N (the sample size) that is needed. For a two-tailed test, a p value of 0.05 (the alpha level chosen) would require a z_α of 1.96, which, when squared as in the formula, would equal 3.84. To be even more confident, the investigator might set alpha at 0.01. This would require a z_α of 2.58, which equals 6.66 when squared, 73% greater than when alpha is set at 0.05. To decrease the probability of being wrong from 5% to 1% would require the sample size to be almost doubled.

3. If the investigator wanted to detect with confidence a very small difference between the mean values of two study groups (i.e., a small ), a very large N would be needed because the difference (squared) is in the denominator. The smaller the denominator is, the larger the ratio is, and the larger the N must be. A precise estimate and a large sample size are needed to detect a small difference.

Whether a small difference is considered clinically important often depends on the topic of research. Studies showing that a new treatment for hypertension reduces the systolic blood pressure by 2 to 3 mm Hg would be considered clinically trivial. Studies showing that a new treatment for pancreatic cancer improves the survival rate by 10% (0.1) would be considered a major advance. Clinical judgment is involved in determining the minimum difference that should be considered clinically important.

B Beta (False-Negative) Error

If a difference is examined with a t-test, and it is statistically significant at the prestated level of alpha (e.g., 0.05), beta error is not an issue. What if a reported finding seems to be clinically important, but it is not “statistically significant” in that study? Here the question of a possible false-negative (beta) error becomes important. Beta error may have occurred because the sample size was too small. When planning a study, investigators want to avoid the likelihood of beta (false-negative) error and the likelihood of alpha (false-positive) error, and readers of the literature should be on the lookout for this problem as well. The relationship between the results of a study and the true status can be seen in a “truth table” (Table 12-1). The similarity of Table 12-1 to the relationship between a test result and the disease status is obvious (compare with Table 7-1).

Table 12-1 “Truth Table” Showing Relationship between Study Results and True Status

A seminal article illustrated the need to be concerned about beta error: in most of 71 negative RCTs of new therapies published in prominent medical journals, the sample sizes were too small “to provide reasonable assurance that a clinically meaningful ‘difference’ (i.e., therapeutic effect) would not be missed.”¹ In the study, “reasonable assurance” was 90%. In 94% of these negative studies, the sample size was too small to detect a 25% improvement in outcome with reasonable (90%) assurance. In 75% of the studies, the sample size was too small to detect a 50% improvement in outcome with the same level of assurance. Evidence indicates that this problem has persisted over time.²

A study with a large beta error has a low sensitivity for detecting a true difference because, as discussed in Chapter 7:

When investigators speak of a research study versus a clinical test, however, they usually use the term “statistical power” instead of “sensitivity.” With this substitution in terms:

which means that statistical power is equal to (1 − beta error). When calculating a sample size, if the investigators accept a 20% possibility of missing a true finding (beta error = 0.2), the study should have a statistical power of 0.8, or 80%. That means the investigators are 80% confident that they would be able to detect a true mean difference of the size they specify with the sample size they determine. The best way to incorporate beta error into a study is to include it beforehand in the determination of sample size. Incorporating the statistical term for beta error (z_β) in the sample size calculation is simple but likely to increase the sample size considerably.

C Steps in Calculation of Sample Size

The first step in calculating sample size is to choose the appropriate formula to use, based on the type of study and the type of error to be considered. Four common formulas for calculating sample size are discussed in this chapter and listed in Table 12-2, and their use is illustrated in Boxes 12-1 through 12-4.³ The second step in calculating sample size requires that the investigators specify the following values: the variance expected (s²); the z_α value for the level of alpha desired; the smallest clinically important difference ; and, usually, beta (measured as z_β). All values except the variance must come from clinical and research judgment, although the estimated variance should be based on knowledge of data. If the outcome variable being studied is continuous, such as blood pressure, the estimate of variance can be obtained from the literature or from a small pilot study.