Evaluation of Surgical Safety and Efficacy

Study design


Systematic review

Systematic location, appraisal, and synthesis of evidence from scientific studies

Experimental studies


Subjects are randomly allocated to groups either for the intervention/treatment

Controlled trial

Being studied or control/placebo (using a random mechanism, such as coin toss, random number table, or computer-generated random numbers) and the outcomes are compared


Subjects are allocated to groups for intervention/treatment or control/placebo

Controlled trial

Using a nonrandom method (such as alternate allocation, allocation by days of the week, or odd–even study numbers) and the outcomes are compared

Clustered randomized trial

Subjects are randomized to intervention or control in groups (e.g., families, communities, hospitals)

Comparative (nonrandomized and observational) studies

Concurrent control or cohort

Outcomes are compared for a group receiving the treatment/intervention being studied, concurrently with control subjects receiving the comparison treatment/intervention (e.g., usual or no care)

Case control

Subjects with the outcome or disease and an appropriate group of controls without the outcome or disease are selected, and information is obtained about the previous exposure to the treatment/intervention or other factor being studied

Historical control

Outcomes for a prospectively collected group of subjects exposed to the new treatment/intervention are compared with either a previously published series or previously treated subjects at the same institutions

Interrupted time series

Trends in the outcome or disease are compared over multiple time points before and after the introduction of the treatment/intervention or other factor being studied

Other observational studies

Case series

A single group of subjects are exposed to the treatment/intervention


Only outcomes after the intervention are recorded in the case series, so no comparisons can be made


Outcomes are measured in subjects before and after exposure to the treatment/intervention for comparison (also called a “before-and-after” study)

Data from NHMRC

Note: Further discussion is given in the handbook How to Use the Evidence: Assessment and Application of Scientific Evidence (NHMRC 2000 and 2009)

Therefore, the main designs (also see appendix for more detail) for direct comparisons of surgical procedures are:

  • Historical comparisons (e.g., a surgeon moves from the established technique to the new one, comparing data obtained from both time periods and techniques)

  • Concurrent nonrandomized comparisons (e.g., surgeons decide which patients are allocated to the established technique and which to the new technique)

  • Randomized controlled trials (RCTs; random allocation of patients to each type of surgery)

In RCTs, patients are allocated by a random, concealed process, so that any differences seen can be attributed to differences in the treatment alone, not to bias or chance. Although all types of studies are often included in systematic reviews, there can be important differences between them. Consequently, each type of study is analyzed separately (randomized, nonrandomized, and case-series studies), giving more weight to findings from RCTs (Audige et al. 2004). The results of randomized and nonrandomized studies may or may not differ, but systematic causes for any differences are not clear (Kunz et al. 2004; Deeks et al. 2003). Deeks et al. (2003) identified two components of bias introduced by nonrandom allocation—time trends in the characteristics (case mix) of participants in historical comparisons and increased variation from haphazard differences in case mix between groups in studies with both historical and concurrent controls. Sometimes, sources of selection bias are clear; for example, in the ASERNIP–S review of laparoscopic ventral hernia repair (Pham et al. 2004), a finding of lower complication rates for the laparoscopic procedure in some nonrandomized studies was likely to be due to small and more straightforward hernias being allocated to the laparoscopic rather than the open procedure. In systematic reviews (SRs), data collected from published studies is collectively analyzed to investigate trends or statistical differences in two or more treatment methods across a number of studies. Differences in inclusion or exclusion criteria, study populations or the precise surgical technique(s) used can confound the review results. Therefore, when studies are assessed for inclusion into SRs the comparative inclusion criteria and study populations and interventions should be the same, or as close as possible. RCTs are preferred, but nonrandomized studies can be used. However, when including nonrandomized studies in a systematic review, it is best to analyze and discuss them separately from RCTs and interpret their findings with some caution (Audige et al. 2004).

Levels of Evidence

In order to assess the relative statistical “strength” of the data, levels of evidence can be designated, for example (NHMRC 2009):


Evidence obtained from a systematic review of all relevant randomized controlled trials



Evidence obtained from at least one properly designed randomized controlled trial



Evidence obtained from well-designed pseudo-randomized controlled trials (alternate allocation or some other method)



Evidence obtained from comparative studies with concurrent controls: nonrandomized experimental trials, cohort studies, case–control studies, or interrupted time series with a control group



Evidence obtained from comparative studies without concurrent controls: historical control studies, two or more single-arm studies, or interrupted time series without a parallel control group



Evidence obtained from case series, with either posttest or pretest/posttest outcomes


Level I and II evidence represents the strongest available evidence, depending on the precise situation or question being evaluated (NHMRC 2009). In many situations, however, only lesser levels of evidence may be available.

Interpreting the Evidence

Even “good” and “strong” statistically robust evidence may not be applicable or usable for decision-making in particular clinical situations or contexts, in broad terms. Careful interpretation of research is crucial since, as in health care generally, there may be very small differences between new and established procedures. The “noise” from bias of various kinds may be larger than any real differences, leading to spurious findings and conclusions. A further complexity is that even when benefits are shown, there are also likely to be potential harm(s) and this trade-off or balance sheet approach is implicit in looking at both efficacy and safety. This approach is reflected in the ASERNIP–S classification scheme for each systematic review, which gives an overall rating of good, average, or poor evidence and separately classifies safety and efficacy as either at least as safe or efficacious as the comparator, less safe or efficacious than the comparator, or that safety or efficacy could not be determined. However, a distinction needs to be drawn between classifying the evidence and formulating recommendations for action. The international Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group has outlined the general circumstances for recommending that an intervention should be administered, should probably be administered, should probably not be administered, or should not be administered (Guyatt et al. 2008). This requires looking first at the quality of the evidence and then assessing how strongly a recommendation should be formulated:

The quality of the evidence indicates the extent to which one can be confident that an estimate of an effect is correct. The strength of a recommendation indicates the extent to which one can be confident that adherence to a recommendation will do more good than harm (Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) Working Group 2004).

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Apr 9, 2017 | Posted by in GENERAL SURGERY | Comments Off on Evaluation of Surgical Safety and Efficacy

Full access? Get Clinical Tree

Get Clinical Tree app for offline access