Applying Statistics to Trial Design: Sample Size, Randomization, and Control for Multiple Hypotheses

12 Applying Statistics to Trial Design

Sample Size, Randomization, and Control for Multiple Hypotheses

I Sample Size

The determination of sample size is critical in planning clinical research because sample size is usually the most important factor determining the time and funding necessary to perform the research. The sample size has a profound impact on the likelihood of finding statistical significance.

Members of committees responsible for evaluating and funding clinical studies look closely at the assumptions used to estimate the number of study participants needed and at the way in which calculations of sample size were performed. Part of their task when reviewing the sample size is to determine whether the proposed research is realistic (e.g., whether adequate participants are included in the intervention and control groups in a randomized clinical trial, or in the groups of cases and controls in a case-control study). In research reported in the literature, inadequate sample size may explain why apparently useful clinical results are not statistically significant.

Statisticians are probably consulted more often because an investigator wants to know the sample size needed for a study than for any other reason. Sample size calculations can be confusing, even for people who can do ordinary statistical analyses without trouble. As a test of intuition regarding sample size, try to answer the following three questions:

If your intuition suggested that all these requirements would create the need for a large sample size, you would be correct. If intuition did not suggest the correct answers, review these questions again after reading the following information about how the basic formulas for sample size are derived.

Other factors affecting the number of participants required for a study include whether the:

A Derivation of Basic Sample Size Formula

To derive the basic formula for calculating the sample size, it is easiest to start with the formula for the paired t-test (see Chapter 10):


where tα is the critical ratio to determine the probability of a false-positive (α) error if the null hypothesis is rejected; image is the mean difference that was observed in the outcome variable, sd is the standard deviation of the before-after differences, and N is the sample size.

To solve for N, several rearrangements and substitutions of terms in the equation must be made. First, everything can be squared and the equation rearranged so that N is in the numerator and s2 is the variance of the distribution of d:


Next, the terms can be rearranged so that the equation for N in a paired (before and after) study becomes:


Now the t in the formula must be replaced with z. This provides a solution to a circular problem: To know the value of t, the degrees of freedom (df) must be known. The df depends on N, however, which is what the investigator is initially trying to calculate. Because the value of z is not dependent on df, and because z is close to t when the sample size is large, z can be used instead of t. The formula becomes:


In theory, using z instead of t might produce a slight underestimate of the sample size needed. In practice, however, using z seems to work well, and its use is customary. The previous formula is for a study using the paired t-test, in which each participant serves as his or her own control. For a study using Student’s t-test, such as a randomized controlled trial (RCT) with an experimental group and a control group, it would be necessary to calculate N for each group. The previous formula considers only the problem of alpha error; to minimize the possibility of beta error, a z term for beta error must be introduced as well. Before these topics are discussed, however, the answers to the three questions posed earlier should be explored more fully in light of the information provided by the formula for the calculation of N.

Whether a small difference is considered clinically important often depends on the topic of research. Studies showing that a new treatment for hypertension reduces the systolic blood pressure by 2 to 3 mm Hg would be considered clinically trivial. Studies showing that a new treatment for pancreatic cancer improves the survival rate by 10% (0.1) would be considered a major advance. Clinical judgment is involved in determining the minimum difference that should be considered clinically important.

B Beta (False-Negative) Error

If a difference is examined with a t-test, and it is statistically significant at the prestated level of alpha (e.g., 0.05), beta error is not an issue. What if a reported finding seems to be clinically important, but it is not “statistically significant” in that study? Here the question of a possible false-negative (beta) error becomes important. Beta error may have occurred because the sample size was too small. When planning a study, investigators want to avoid the likelihood of beta (false-negative) error and the likelihood of alpha (false-positive) error, and readers of the literature should be on the lookout for this problem as well. The relationship between the results of a study and the true status can be seen in a “truth table” (Table 12-1). The similarity of Table 12-1 to the relationship between a test result and the disease status is obvious (compare with Table 7-1).

A seminal article illustrated the need to be concerned about beta error: in most of 71 negative RCTs of new therapies published in prominent medical journals, the sample sizes were too small “to provide reasonable assurance that a clinically meaningful ‘difference’ (i.e., therapeutic effect) would not be missed.”1 In the study, “reasonable assurance” was 90%. In 94% of these negative studies, the sample size was too small to detect a 25% improvement in outcome with reasonable (90%) assurance. In 75% of the studies, the sample size was too small to detect a 50% improvement in outcome with the same level of assurance. Evidence indicates that this problem has persisted over time.2

A study with a large beta error has a low sensitivity for detecting a true difference because, as discussed in Chapter 7:


When investigators speak of a research study versus a clinical test, however, they usually use the term “statistical power” instead of “sensitivity.” With this substitution in terms:


which means that statistical power is equal to (1 − beta error). When calculating a sample size, if the investigators accept a 20% possibility of missing a true finding (beta error = 0.2), the study should have a statistical power of 0.8, or 80%. That means the investigators are 80% confident that they would be able to detect a true mean difference of the size they specify with the sample size they determine. The best way to incorporate beta error into a study is to include it beforehand in the determination of sample size. Incorporating the statistical term for beta error (zβ) in the sample size calculation is simple but likely to increase the sample size considerably.

C Steps in Calculation of Sample Size

The first step in calculating sample size is to choose the appropriate formula to use, based on the type of study and the type of error to be considered. Four common formulas for calculating sample size are discussed in this chapter and listed in Table 12-2, and their use is illustrated in Boxes 12-1 through 12-4.3 The second step in calculating sample size requires that the investigators specify the following values: the variance expected (s2); the zα value for the level of alpha desired; the smallest clinically important difference image; and, usually, beta (measured as zβ). All values except the variance must come from clinical and research judgment, although the estimated variance should be based on knowledge of data. If the outcome variable being studied is continuous, such as blood pressure, the estimate of variance can be obtained from the literature or from a small pilot study.

Aug 27, 2016 | Posted by in PUBLIC HEALTH AND EPIDEMIOLOGY | Comments Off on Applying Statistics to Trial Design: Sample Size, Randomization, and Control for Multiple Hypotheses

Full access? Get Clinical Tree

Get Clinical Tree app for offline access