This book embarks on medical tests and procedures and how they interface with pharmacologic interventions. In general, testing can include:
The questions asked of patients with chief concerns at their histories
Physical exam maneuvers performed on the patient
Laboratory, imaging tests, and procedures performed to elucidate further information
With each diagnostic intervention, a positive or negative observation is elicited and funneled into the diagnostic process. On a case-by-case basis, clinicians apply the power of tests to discriminate presence vs absence of disease to the pre-testing sense of probability that a disease process is present. For example, before applying a D-dimer or computed tomography (CT) angiogram to a patient with pleuritic shortness of breath in the emergency department or on the wards, clinicians generate a sense of probability of a pulmonary embolism (PE). Clinical reasoning then directs clinicians as to what individualized testing to apply to further increase or decrease the sense of PE probability in that patient. Clinicians work toward the highest or lowest post-test probability to rule in or rule out a diagnosis, respectively.
Tests and their accuracy characteristics, therefore, are inextricably intertwined with clinical reasoning. From the start of a patient assessment, the spinning wheels of clinical reasoning move in real time with every visual observation, concern-based question, hypothesis-driven bedside exam, laboratory test, and image ordered. Through history and exam, the initial pre-testing sense of diagnostic probabilities is made and tests are then applied.
The rest of this introductory chapter will discuss the knowledge of a test’s accuracy measures and how that knowledge is fundamental to making correct diagnoses.
No doubt the reader has come across a number of definitions or ways to think about a test’s sensitivity. These may include the following:
A test’s ability to identify a disease state
The probability of testing positive if the patient has the disease state
True positive (TP)/(TP + false negative [FN])
An accuracy measure of a test best suited to rule out disease
Each of the above conceptual definitions of sensitivity is useful. Most of the understanding of sensitivity can be achieved by laying out the famous 2 × 2 table. Table 3.1 crosses the test in question’s positive and negative findings on the vertical left column with a comparator gold standard’s positive and negative findings in the upper row.
|Test in Question||+||–|
|+||a [true positive (TP)]||b [false positive (FP)]|
|–||c [false negative (FN)]||d [true negative (TN)]|
Using the 2 × 2 visually, one has TP for the test in question in the upper-left quadrant (a), false positive (FP) for the test in the upper-right quadrant (b), FN in the lower-left quadrant (c), and true negative (TN) in the lower-right quadrant (d). For the sensitivity calculation, this comes to sensitivity = a/(a + c).
Notice where the statement that sensitivity best rules out disease comes from. If a test has a sensitivity of 97%, there is only a 3/100 FN rate. This means that when this test produces a negative result, with a gold standard used as a comparator, the patient testing negative is without disease 97% of the time. A negative test, when the test is highly sensitive like this one, nearly assures absence of disease.
Examples would include a Lyme ELISA for Lyme disease, an antinuclear antibody (ANA) for lupus, or adding nuclear material to stress testing for coronary ischemia.
The issue that clinicians must deal with, however, is that highly sensitive tests often suffer from specificity that is not nearly as good. There is a compromise in the ability of such a test to discriminate true disease from lack thereof because of an over-labeling of disease, a tendency to ascribe disease to too many. This is further examined through specificity.
The following defines the accuracy measure of a clinical test called specificity. The reader has likely come across one or all of the following:
A test’s ability to identify the absence of a disease state
The probability of testing negative if the patient does not have a disease state
TN/(TN + FP)
An accuracy measure of a test best suited to rule in disease
Again, each of these conceptual definitions is correct and by looking at Table 3.1 , which crosses the test in question’s positive and negative findings in the left-side column with a gold standard’s positive and negative findings in the upper row, specificity is derived as d/(d + b).
Notice again where the statement specificity rules in disease comes from. If a test has a specificity of 95%, there is only a 5/100 FP rate. This means when the test produces a positive result, with a gold standard used as a comparator, the patient testing positive is truly with disease 95% of the time. A positive test, when the test is highly specific like this one, nearly assures presence of disease.
Examples of specificity’s ability to discriminate the presence of disease are dsDNA or anti-Smith antibody in lupus, anti-cyclic citrullinated peptide antibodies in rheumatoid arthritis, a drop test for full-thickness rotator cuff tear, or 4/5 affirmative answers on a POUND (pulsatile, one-day duration or 4–72 hours, unilateral, nausea, disabling nature) series for migraine.
Again, clinicians have to consider that highly specific tests can suffer from suboptimal sensitivity. They can over-label an absence of disease and contribute to missed diagnoses.
The definitions of sensitivity and specificity, their strengths and weaknesses in terms of discrimination of disease vs absence of disease, and their typical dual relationship are the first level of thinking in clinical reasoning that builds on the first clinical impressions. The next level of accuracy measure understanding that contributes to the diagnostic thought process is positive predictive value (PPV) and negative predictive value (NPV).
Positive Predictive Value
With each patient response to a question in development of the chief concern each part of the exam, and each laboratory test or image, a positive finding is associated with a PPV for a target diagnosis being considered.
The reader may have heard of these two definitions of PPV:
The probability a patient testing positive has disease
TP/(TP + FP), or TP/all positive tests
Each of these conceptual definitions is useful, and by referring to Table 3.1 , PPV can be derived as a/(a + b).
Negative Predictive Value
Similarly, every time a patient answers a historical question in the negative, an exam maneuver is unremarkable, or a laboratory test or image is negative, such negative findings are associated with a NPV for a target diagnosis being considered.
The reader may have come across the following two definitions of NPV:
The probability a patient testing negative does not have disease
TN/(TN + FN), or TN/all negative tests
These conceptual definitions are correct, and by referring to Table 3.1 , NPV can be derived as d/(d + c).
For clinicians on a quest for testing information that helps confirm or deny a pre-test sense of a diagnostic probability, predictive values are inherently problematic. They have an essential flaw. Consider applying ANA to diagnose lupus in a large group of younger patients with synovitis and rash (common manifestations of lupus at an age when lupus tends to present). There will be a certain PPV and NPV for diagnosing lupus achieved by the application of the ANA. Now consider applying that same ANA test to a large group of older patients with morning stiffness and spurs on their hands on X-ray (an uncommon manifestation of lupus at an age less likely to be presenting with lupus for the first time). ANA is a highly sensitive test but will inevitably produce some false positivity in the latter group. That false positivity will compromise the test’s PPV in this subgroup of patients.
What is the difference between these two tested populations? As with a population of patients with diabetes with symptoms of coronary disease vs a population with low pre-test probability of coronary disease, the difference is prevalence of disease. Predictive value performance, both positive and negative, will be different for populations tested based on prevalence. The answer to this shortcoming, for the clinician preferring a prevalence-free accuracy measure that links pre- to post-test probability of disease, is the concept of a likelihood ratio (LR). Before moving to LR, just a word on the overall accuracy of any test.
Accuracy of a Test
The reader has likely heard this term used in a myriad of ways, such as:
A test’s ability to reliably identify both presence and absence of disease
A test’s probability of relaying a correct discrimination of positive from negative disease
(TP + TN)/(TP + FP + TN + FN) or true tests/all tests done
LRs can be defined as follows: they are an accuracy measure of a question series, exam technique, laboratory result, or image that links pre-test probability to post-test probability independent of the prevalence of the target condition.
The reader will be reminded that LRs can be expressed in the following terms:
LR+ = sensitivity/(1 – specificity)
LR– = (1 – sensitivity)/specificity
When presented with the raw 2 × 2 table data in an article presenting a new diagnostic technique, readers can calculate these LRs from the TP/FP rates and TN/FN rates. It is important to carry some basic realizations about the effect of LRs on clinical thinking.
LRs for a positive test that are >10 are significant in that the test’s discriminatory power can have an impressive effect on pre- to post-test thinking.
LRs for a negative test that are <0.1 are similarly significant in that the test’s discriminatory power can have an impressive effect on pre- to post-test thinking.
LRs for a positive test of 2, 5, and 10 increase the considered pre-test probability of disease by 15%, 30%, and 45%, respectively. Similarly, LRs for a negative test of 0.5, 0.2, and 0.1 decrease the considered pre-test probability of disease by 15%, 30%, and 45%, respectively. In a schematic, this is represented in Fig. 3.1 .