Forcing Cosmetic Credibility 142
Unequal Group Sizes in Restricted Trials 143
Alternatives in Non–Double-Blinded Trials 144
Mixed Randomisation 145
Full Disclosure in the Protocol? 149
We cringe at the pervasive notion that a randomised trial needs to yield equal sample sizes in the comparison groups. Unfortunately, that conceptual misunderstanding can lead to bias by investigators who force equality, especially if by nonscientific means. In simple, unrestricted, randomised trials (analogous to repeated coin-tossing), the sizes of groups should indicate random variation. In other words, some discrepancy between the numbers in the comparison groups would be expected. The appeal of equal group sizes in a simple randomised controlled trial is cosmetic, not scientific. Moreover, other randomisation schemes, termed restricted randomisation, force equality by departing from simple randomisation. Forcing equal group sizes, however, potentially harms the unpredictability of treatment assignments, especially when using permuted-block randomisation in non–double-blinded trials. Diminished unpredictability can allow bias to creep into a trial. Overall, investigators underuse simple randomisation and overuse fixed-block randomisation. For non–double-blinded trials larger than 200 participants, investigators should use simple randomisation more often and accept moderate disparities in group sizes. Such unpredictability reflects the essence of randomness. We endorse the generation of mildly unequal group sizes and encourage an appreciation of such inequalities. For non–double-blinded randomised controlled trials with a sample size of less than 200 overall or within any principal stratum or subgroup, urn randomisation enhances unpredictability compared with blocking. A simpler alternative, our mixed randomisation approach, attains unpredictability within the context of the currently understood simple randomisation and permuted-block methods. Simple randomisation contributes the unpredictability whereas permuted-block randomisation contributes the balance, but avoids the perfect balance that can result in selection bias.
A tantalising phone call begins, ‘I have just read a report of a randomised trial, and I found problems!’ All too often, however, the discussion proceeds with ‘Look at that difference in sample sizes in the groups; they are not equal. I am suspicious of this trial.’ Or in planning a trial, ‘What can we do to end up with equal sample sizes?’ Indeed, large disparities in sample sizes not explained by chance should cause concern, but many researchers look askance at a trial with any disparity. We cringe at this seemingly ubiquitous notion that a randomised trial needs to yield equal sample sizes. Somehow such a notion seems embedded in many a medical researcher’s psyche.
Such conceptual misunderstanding deters prevention of bias in trials. Exactly equal sample sizes in a randomised controlled trial contribute little to statistical power and potentially harm unpredictability, especially in non–double-blinded trials that use permuted-block randomisation. Unpredictability reflects the essence of randomisation because those involved cannot predict the next treatment assignment. With predictability comes bias.
Greater predictability emanates from randomisation schemes that depart from simple, unrestricted randomisation. Such departures are termed restricted randomisation schemes. They constrain treatment assignment schedules to yield similar or, most frequently, equal group sizes throughout the trial, assuming the most common desired allocation ratio of 1:1. The restricted randomisation schemes all sacrifice unpredictability, but that increased predictability primarily surfaces in non–double-blinded trials that use permuted blocks ( Panel 13.1 ).
Predictability in clinical trials breeds bias. If trial investigators identify or predict upcoming allocation assignments, they can instil selection bias. In assessment of eligibility, they could exclude a participant destined for, in their opinion, the wrong group. Moreover, various manoeuvres allow them to channel participants with a better prognosis to the experimental group and those with a poorer prognosis to the control group, or vice versa. Irrespective of the reasons for doing so, experimenters bias the comparison. Clinicians might revere predictability in caring for patients, but they must understand that predictability spawns bias in clinical trials.
Trial investigators can guess the next assignments by subverting the allocation concealment mechanism (e.g., by holding translucent envelopes to a light bulb). However, proper allocation concealment usually prevents this subversion. Alternatively, with permuted-block randomisation, trial investigators can sometimes predict the next assignments by noting a pattern of past assignments. For example, in a non–double-blinded trial with a block size of four, if a trial investigator notes that the sample size in the two groups equilibrates after every four participants, then many future assignments can be predicted. For example, if the sequence ABA materialises in a block of four, B would necessarily be the next assignment, or if the sequence BB materialises, AA would be the next two assignments.
In non–double-blinded trials, all intervention allocations become known after assignment, even with proper allocation concealment. Thus if a pattern to the allocation sequence exists, the trial investigator can discern it and predict some future assignments. However, if no pattern exists, or if the pattern is indiscernible, the allocation sequence is unpredictable. Therefore knowledge of past assignments would not help in prediction of future assignments. Unpredictability is essential in non–double-blinded randomised trials.
Proper allocation concealment before assignment and proper blinding of all involved in the trial after assignment shields knowledge of past assignments and thereby prevents prediction of future assignments. Proper blinding diminishes the need for unpredictability. Even in supposedly blinded trials, however, blinding after assignment is not always successful. If trial investigators perceive quickly developing, clinically obvious side effects that reveal the intervention assigned, for instance, blinding might not prevent predictions.
Trialists rely on the security of unpredictability. In the past, we suggested cultivation of a tolerance for groups of unequal sample sizes in simple randomised trials. We now suggest cultivation of a tolerance for groups of unequal sizes in restricted randomisation trials as well.
Forcing Cosmetic Credibility
Studies reported as randomised yield equal sample sizes in the comparison groups more frequently than expected. In simple, unrestricted randomised controlled trials (analogous to repeated coin-tossing), the relative sizes of comparison groups should indicate random variation. In other words, some discrepancy between the numbers in the comparison groups would be expected. However, analyses of reports of trials in general and specialist medical journals showed that researchers too frequently reported equal sample sizes of the comparison groups (defined as exactly equal or as equal as possible in view of an odd number total sample size). In the specialist journals, the disparity of sample sizes in the comparison groups deviated from expected (p < 0.001) and produced equal group sizes in 54% of the simple randomised (unrestricted) trials. This result was higher than that in blocked trials (36%), and blocked trials aspire for equality. Moreover, results of a similar analysis of the dermatology literature showed that an even higher 71% of simple randomised trials reported essentially equal group sizes.
Why would investigators seek equal or similar sample sizes in comparison groups? We feel many investigators strive for equal sample sizes as an end in itself. The lure of the so-called cosmetic credibility of equal sizes seems apparent. Sadly, that cosmetic credibility also appeals to readers. Striving for equal sample sizes with simple randomisation, however, reflects a methodological non sequitur .
The high proportion of equal group sizes noted previously represents pronounced aberrations from chance occurrences and suggests nonrandom manipulations of assignments to force equality. Other logical explanations seem plausible, but probably do not account for the degree of aberration witnessed. Such tinkering with assignments creates difficulties by directly instilling selection bias into trials. We hope to remove some of those difficulties by dispelling the misunderstanding behind the drive for exactly equal sizes.
Beyond the issue of nonrandom manipulations of assignments, however, we will concentrate on the potential bias introduced by balancing group sizes with valid restricted randomisation methods, primarily permuted-block randomisation, that produce equal group sizes throughout the trial. Unfortunately, methods used to ensure equal sample sizes can facilitate correct future predictions of treatment assignments, allowing bias to infiltrate.
Unequal Group Sizes in Restricted Trials
The method of restricted randomisation is used to balance sample sizes. That balance usually enhances statistical power and addresses any time trends that might exist in treatment efficacy and outcome measurement during the course of a trial. Moreover, restricted randomisation within strata becomes essential for investigators to attain the benefits of stratification. Thus reasonable scientific justification lends support to restriction.
For restriction to be effective, however, it need not yield exactly equal sample sizes. The power of a trial is not sensitive to slight deviations from equality of the sample sizes. Thus restricted approaches that produce similar sizes would yield power, time trend, and stratification benefits much the same as those restricted randomisation approaches that produce equal sizes.
Equal sample sizes, however, can have negative consequences. The predominant restricted randomisation method is random permuted blocks (blocking). Such an approach effectively attains the goals of equal sample sizes in the comparison groups overall (and, if stratified, within strata). Moreover, the method generates equal sample sizes after every block. With that attribute, however, comes the disadvantage of predictability.
Predictability, particularly, becomes a major weakness in a non–double-blinded trial. We define a double-blinded trial as one in which the treatment is hidden from participants, investigators, and outcome assessors. In virtually all non–double-blinded trials, some investigators become aware of the treatment. Thus even with adequate allocation concealment, treatment assignments become known after assignment. With that information, trial investigators can unravel the fixed block size (presumably the organisers initially shielded all block size information from them) and then anticipate when equality of the sample sizes will arise (see Panel 13.1 ) A sequence can be discerned from the pattern of past assignments and then some future assignments could be accurately anticipated. Hence selection bias could seep in, irrespective of the effectiveness of allocation concealment. The same difficulty to a lesser degree might be true in a double-blinded trial in which obvious, perceptible side effects materialise quickly.
Although empirical evidence indicates that selection bias exists in randomised trials, do those who implement trials actually try to anticipate future assignments? We have many anecdotal reports of such anticipation, and some researchers actually conducted a study. In questioning clinicians and research nurses, 16% admitted to trying to predict treatment allocations. As expected, they did it by keeping a log of all the previous assignments in the trial. Furthermore, we suspect that the 16% who admit this process represent a minimal estimate. The actual percentage that do it is probably higher.
Randomised controlled trials become prone to unravelling of block sizes when the block size remains fixed throughout the trial, especially if the block size is small (e.g., six or fewer participants). Hence if investigators use blocked randomisation, they should randomly vary the block size to lower the chances of an assignment schedule being inferred by those responsible for recruitment and assignment.
Random block sizes, however, are no panacea. Even with random variation of block sizes, blocking still generates equal sample sizes many times throughout a trial. Indeed, based on a modification of a model that measures inherent predictability of intervention assignments with certainty, random block sizes, at best, decrease but do not eliminate the potential for selection bias. In other words, random block sizes help to reduce, but in some instances might not eliminate, selection bias. Permuted-block randomisation, even with random block sizes, presents trial recruiters with opportunities to anticipate some assignments.
Alternatives in Non–Double-Blinded Trials
For non–double-blinded randomised controlled trials with an overall sample size of more than 200 (an average sample size of 100 in two groups) and within each planned subgroup or stratum, we recommend simple randomisation. It provides perfect unpredictability, thereby eliminating that aspect of selection bias due to the generation of the allocation sequence. Moreover, simple randomisation also provides the least probability for chance bias of all the generation procedures, and it enables valid use of virtually all standard statistical software. With sample sizes greater than 200, simple randomisation normally yields only mild disparities in sample sizes between groups. The cut-off of 200, however, is merely an overall guideline. Individual investigators might want to judge their particular acceptable levels of disparity. Another caveat centres on potential interim analyses done on sample sizes of less than 200 (i.e., before investigators reach total sample size). Greater relative disparities in treatment group sizes could materialise in those instances, although we feel those costs are more than offset by the gains in unpredictability from simple randomisation.
For non–double-blinded randomised controlled trials with a sample size of less than 200 overall or within any principal stratum or subgroup of a stratified trial, we recommend a restricted randomisation procedure. The urn design functions especially well to promote balance without forcing it. It tends to balance more in the important early stages of a trial and then approach simple randomisation as the trial size increases. This attribute becomes useful with uncertain overall trial sizes, or more likely, uncertain stratum sizes in a stratified trial. It also proves useful in trials that might be ended due to sequential monitoring of treatment effects. Urn designs usually have adequate balancing properties while still being less susceptible to selection bias than permuted-block designs (see Chapter 12 ).
With these desirable properties come caveats. Some statisticians recommend use of permutation tests with urn randomisation designs. Permutation tests are assumption-free statistical tests of the equality of treatments. Unfortunately, they usually are not available for urn designs in standard statistical software. That adds analytical complexity for researchers and statisticians. However, if no major time trends on the outcome variables exist, use of standard statistical analyses from widely available software on trials that use urn randomisation would normally yield similar results to permutation tests. Moreover, with standard statistical analyses, investigators can easily obtain confidence intervals for common measures of effect.
Of interest, a number of more complex designs have performed well. The Big Stick Design, the biased-coin design with imbalance tolerance, and the Ehrenfest urn design all perform better than blocking at achieving balance between groups with less predictability. The maximal procedure is also less predictable than blocking while preserving balance. Yet, these well-performing designs, along with the aforementioned urn randomisation design, appear infrequently, if at all, in reports. For example, in a review of randomisation methods used in trials published in four high-impact-factor general medical journals, almost 90% used a restricted method, and 90% of those used blocking with all the remainder using minimisation. No authors reported using any of these well-performing, complex designs.
Perhaps an impediment to widespread usage pertains to the conceptual complexities of urn randomisation and these other designs; they are more difficult to understand than simple or permuted-block randomisation. ‘It is not clear why trialists still predominantly favour permuted blocks over other designs, although simplicity may be a significant factor’. Whatever the reasons, these well-performing but more complicated designs languish in obscurity.