Study design
Protocol
Systematic review
Systematic location, appraisal, and synthesis of evidence from scientific studies
Experimental studies
Randomized
Subjects are randomly allocated to groups either for the intervention/treatment
Controlled trial
Being studied or control/placebo (using a random mechanism, such as coin toss, random number table, or computer-generated random numbers) and the outcomes are compared
Pseudo-randomized
Subjects are allocated to groups for intervention/treatment or control/placebo
Controlled trial
Using a nonrandom method (such as alternate allocation, allocation by days of the week, or odd–even study numbers) and the outcomes are compared
Clustered randomized trial
Subjects are randomized to intervention or control in groups (e.g., families, communities, hospitals)
Comparative (nonrandomized and observational) studies
Concurrent control or cohort
Outcomes are compared for a group receiving the treatment/intervention being studied, concurrently with control subjects receiving the comparison treatment/intervention (e.g., usual or no care)
Case control
Subjects with the outcome or disease and an appropriate group of controls without the outcome or disease are selected, and information is obtained about the previous exposure to the treatment/intervention or other factor being studied
Historical control
Outcomes for a prospectively collected group of subjects exposed to the new treatment/intervention are compared with either a previously published series or previously treated subjects at the same institutions
Interrupted time series
Trends in the outcome or disease are compared over multiple time points before and after the introduction of the treatment/intervention or other factor being studied
Other observational studies
Case series
A single group of subjects are exposed to the treatment/intervention
Posttest
Only outcomes after the intervention are recorded in the case series, so no comparisons can be made
Pretest/posttest
Outcomes are measured in subjects before and after exposure to the treatment/intervention for comparison (also called a “before-and-after” study)
Therefore, the main designs (also see appendix for more detail) for direct comparisons of surgical procedures are:
Historical comparisons (e.g., a surgeon moves from the established technique to the new one, comparing data obtained from both time periods and techniques)
Concurrent nonrandomized comparisons (e.g., surgeons decide which patients are allocated to the established technique and which to the new technique)
Randomized controlled trials (RCTs; random allocation of patients to each type of surgery)
In RCTs, patients are allocated by a random, concealed process, so that any differences seen can be attributed to differences in the treatment alone, not to bias or chance. Although all types of studies are often included in systematic reviews, there can be important differences between them. Consequently, each type of study is analyzed separately (randomized, nonrandomized, and case-series studies), giving more weight to findings from RCTs (Audige et al. 2004). The results of randomized and nonrandomized studies may or may not differ, but systematic causes for any differences are not clear (Kunz et al. 2004; Deeks et al. 2003). Deeks et al. (2003) identified two components of bias introduced by nonrandom allocation—time trends in the characteristics (case mix) of participants in historical comparisons and increased variation from haphazard differences in case mix between groups in studies with both historical and concurrent controls. Sometimes, sources of selection bias are clear; for example, in the ASERNIP–S review of laparoscopic ventral hernia repair (Pham et al. 2004), a finding of lower complication rates for the laparoscopic procedure in some nonrandomized studies was likely to be due to small and more straightforward hernias being allocated to the laparoscopic rather than the open procedure. In systematic reviews (SRs), data collected from published studies is collectively analyzed to investigate trends or statistical differences in two or more treatment methods across a number of studies. Differences in inclusion or exclusion criteria, study populations or the precise surgical technique(s) used can confound the review results. Therefore, when studies are assessed for inclusion into SRs the comparative inclusion criteria and study populations and interventions should be the same, or as close as possible. RCTs are preferred, but nonrandomized studies can be used. However, when including nonrandomized studies in a systematic review, it is best to analyze and discuss them separately from RCTs and interpret their findings with some caution (Audige et al. 2004).
Levels of Evidence
In order to assess the relative statistical “strength” of the data, levels of evidence can be designated, for example (NHMRC 2009):
I
Evidence obtained from a systematic review of all relevant randomized controlled trials
II
Evidence obtained from at least one properly designed randomized controlled trial
III-1
Evidence obtained from well-designed pseudo-randomized controlled trials (alternate allocation or some other method)
III-2
Evidence obtained from comparative studies with concurrent controls: nonrandomized experimental trials, cohort studies, case–control studies, or interrupted time series with a control group
III-3
Evidence obtained from comparative studies without concurrent controls: historical control studies, two or more single-arm studies, or interrupted time series without a parallel control group
IV
Evidence obtained from case series, with either posttest or pretest/posttest outcomes
Level I and II evidence represents the strongest available evidence, depending on the precise situation or question being evaluated (NHMRC 2009). In many situations, however, only lesser levels of evidence may be available.
Interpreting the Evidence
Even “good” and “strong” statistically robust evidence may not be applicable or usable for decision-making in particular clinical situations or contexts, in broad terms. Careful interpretation of research is crucial since, as in health care generally, there may be very small differences between new and established procedures. The “noise” from bias of various kinds may be larger than any real differences, leading to spurious findings and conclusions. A further complexity is that even when benefits are shown, there are also likely to be potential harm(s) and this trade-off or balance sheet approach is implicit in looking at both efficacy and safety. This approach is reflected in the ASERNIP–S classification scheme for each systematic review, which gives an overall rating of good, average, or poor evidence and separately classifies safety and efficacy as either at least as safe or efficacious as the comparator, less safe or efficacious than the comparator, or that safety or efficacy could not be determined. However, a distinction needs to be drawn between classifying the evidence and formulating recommendations for action. The international Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group has outlined the general circumstances for recommending that an intervention should be administered, should probably be administered, should probably not be administered, or should not be administered (Guyatt et al. 2008). This requires looking first at the quality of the evidence and then assessing how strongly a recommendation should be formulated:
The quality of the evidence indicates the extent to which one can be confident that an estimate of an effect is correct. The strength of a recommendation indicates the extent to which one can be confident that adherence to a recommendation will do more good than harm (Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) Working Group 2004).Stay updated, free articles. Join our Telegram channel
Full access? Get Clinical Tree