Biostatistics for Healthcare Epidemiology and Infection Control



Biostatistics for Healthcare Epidemiology and Infection Control


Elizabeth A. Tolley



It is common knowledge that investigators face challenges during all phases of planning and implementing research protocols. Clinical and experimental researchers possess the necessary expertise for the medical and scientific aspects of their investigations. Moreover, researchers usually have some knowledge of elementary statistical methods. Some researchers find elementary statistics adequate for their purposes and need only an occasional consultation with a biostatistician. However, recent trends in clinical research, especially in healthcare epidemiology and infection control, indicate increasing complexity that demands a higher level of statistical expertise. These general trends are probably going to continue for the foreseeable future— a situation that may leave a researcher feeling somewhat overwhelmed by all of the tasks to be handled in addition to mastery of subject matter. This chapter discusses the challenges and dilemmas related to statistical issues faced by the researcher during the various phases of planning and implementing a research protocol.

Statistics is the science of collecting, analyzing, interpreting, and presenting data. Descriptive statistical methods involve data reduction and summarizing many observations in a few representative numbers. Biostatistics is the application of statistical methods to biologic, biomedical, or health science problems. Data are numeric observations or measurements that result from a random phenomenon or process (1,2). A random process cannot be controlled, and the data collected can never be reproduced exactly. Data from a random process always contain some natural variation. To identify reasons for observed differences among groups of observations, the researcher must sort out the special causes that lead to systematic variation and separate these from the natural variation that is always present. Consequently, decisions will be uncertain. Before making a decision, the researcher uses statistical inference to objectively evaluate data and quantify the level of uncertainty. In addition, the researcher uses statistical models to represent data in terms of special causes and natural variation; these models aid the researcher in making inferences and decisions based on the data.

The numeric observations are in the form of variables, also called random variables. Certain statistical techniques apply to each type of random variable (1, 2, 3, 4,5,6,7,8,9). Measurement variables may be continuous, if the number of values is very large, or discrete, if only few values (generally <10) are possible. Some measurement variables are actually computed variables, for example, Acute Physiology and Chronic Health Evaluation III (APACHE III) scores. A ranked variable is a measurement variable, the values of which have been placed in ascending or descending order and replaced by the ranks. Attributes must translate into numbers (e.g., frequencies of occurrence or number of infected patients). Attributes are sometimes called categorical variables. If an attribute can be only present or absent, the term dichotomous variable is frequently used.

In today’s clinical studies, even the most focused research protocol can yield enormous amounts of information. The typical clinical setting contains a multitude of measuring devices that can provide exquisitely detailed measurements. Many measurements are collected because of availability rather than need. As a consequence, when a study is concluded, an investigator can be faced with the task of sorting through a huge amount of data. Certain measurements or variables are relevant to and necessary for carrying out the specific objectives of a study. An investigator determines what type of data to collect based primarily on specialized knowledge.

Two concepts have especially important implications for investigators. Accuracy is the closeness of the measure to the true value; lack of accuracy has to do with bias (1, 2, 3,9,10). Before recommending a study or grant for approval and/or funding, most reviewers insist that an investigator show how the results will be unbiased. Thus, the investigator’s responsibility includes demonstrating the experimental validity of the study. Precision is the closeness of repeated measurements to each other (2,3,9). Importantly, precision has no bearing on closeness to the true value. In fact, precision without accuracy can be a problem when an investigator is trying to make statistical inferences.

Most clinical studies involve samples that are chosen from a population, instead of the entire population (2, 3, 4,8,11,12,13). The term population refers to the reference or study population. A random sample is a group chosen from a population such that each member of the sample has a nonzero probability of being chosen, independent of any other member being chosen. A simple random sample is the same as a random sample, except that each member of the population has the same nonzero probability of
being chosen. Parameters of the reference population are usually unknown and unknowable. The investigator uses statistics from samples to estimate the parameters of the reference population. Because the sample is smaller than the population, information obtained from the sample is partial, and the investigator uses this information to infer something about the population. Most statistics used in healthcare epidemiology and infection control require the investigator to make the assumptions that (a) the reference population is infinitely large and well defined and (b) the sample behaves like a simple random sample. In practice, the population may not be well defined or infinite. Likewise, the sample may not be random; for clinical studies, samples are often composed of those patients who have been admitted to a particular hospital over a specified period because of certain underlying diagnoses and who have undergone various medical and surgical procedures.


DESCRIPTIVE STATISTICS

In published reports, healthcare epidemiologists summarize patient characteristics with descriptive statistics (1, 2, 3, 4,5,6,7,8,9,11,12,13). Typically, a list of patient characteristics includes measures of central tendency and dispersion for continuous variables.

During the research process, the clinical investigator may start exploratory data analysis by obtaining descriptive statistics of important variables. These descriptive statistics have a variety of other practical uses. For example, a potentially important determinant of disease, such as age, may vary only slightly for those patients included in the study; consequently, the clinical investigator may decide not to consider this variable as a potential risk factor in this study. In addition, the researcher may note which variables have highly skewed distributions and, thus, might yield spurious results during data analysis. Finally, unusually high or low values can be identified and verified, if necessary. The following sections describe descriptive statistics for continuous variables.


Measures of Location or Central Tendency

Location refers to where on an axis a particular group of data is located relative to a norm or another group. Measures of central tendency or central location are used to obtain a number that represents the middle of a group of data.

Mean Mean usually refers to the arithmetic mean or average. The mean is probably the most commonly used measure of location. However, the investigator should be aware that the mean is sensitive to extreme values—both very high and very low values. Other means exist but are used less frequently; the geometric mean is an example. An investigator computes a geometric mean by first taking the logarithm of a group of numbers, computing the mean of the transformed values, and then obtaining the antilog of the mean. Blood pH values are logarithms; however, in practice, after calculating the mean of pH values, no one takes the antilog to obtain the mean hydrogen ion concentration. The Greek letter µ is used to represent the population mean. The sample mean [X with bar above] is an unbiased estimator of µ regardless of the shape of the distribution. If the underlying distribution is normal, then the sample mean is the unbiased estimator with the smallest variance.

Median The median is the 50% point or 50th percentile and, as such, is insensitive to extreme values. If an odd number of observations is ranked from smallest to largest, the median is the middle observation. If an even number of observations is similarly ranked, the median is the average of the n/2 and (n/2) + 1 observations where n is the sample size. For example, if the sample size is 20, after ranking, the median is the average of the 10th and 11th observations. For symmetric distributions, the mean and the median coincide. There is no standard symbol for the median of a population or a sample; however, M can be used for connoting the population parameter or the sample statistic (4).

Mode The mode, or the value with the highest frequency, is a measure of concentration. Distributions may have more than one mode. Distributions with two modes are called bimodal. Trimodal refers to distributions with three modes. For symmetric distributions, the mean, median, and mode have the same value. No standard symbol exists for the mode of a population or a sample.


Measures of Dispersion or Spread

Range The range is the distance between the highest (largest) and the lowest (smallest) value. In healthcare epidemiology, investigators often refer to the interquartile range, which is the distance between the 25th and 75th values. Researchers should report ranges with medians; in this way, information on both location and dispersion can be conveyed to others. For a sample, the range is symbolized by R.

Variance The variance is a measure of dispersion that is often used in calculations. Another name for the variance is the mean square. For populations, the variance is called sigma squared and symbolized with the Greek letter σ2; for samples, the variance is represented by σ2. Because of the availability of inexpensive calculators and spreadsheets with statistical functions, only definitional formulas for the variance of a population and a sample are given, where n is the sample size from a population with N members, and N is much greater than n. For the population, the variance is computed as


where Xi is the value of the random variable X, measured on each member of the population; i is a unique identifier of each member of the population; m is the population mean for the variable X; Σ signifies summing the squared deviations of the individual values from the mean over all members; and N is the number of members in the population. For the sample, the variance is computed as



where Xi is the value of the random variable X, measured on each observation in the sample; i is a unique identifier of each observation in the sample; [X with bar above] is the sample mean for the variable X; Σ signifies summing the squared deviations over all observations; and n is the number of observations in the sample.

Standard Deviation The standard deviation is the square root of the variance and is sometimes called the root mean square. The standard deviation is a measure of the average distance from the mean. If the standard deviation is small, the observations are crowded near the mean; if the standard deviation is large, there is substantial spread in the data. For populations, the standard deviation is symbolized with the Greek letter σ; for samples, the standard deviation is represented by s. Standard deviations correspond to means. Occasionally, an investigator must approximate the standard deviation of a future sample. The expected range (i.e., the largest value that one expects to record from a future sample minus the smallest value) divided by 4 provides an approximation when no other information is available.


Other Descriptive Measures

Measures of Skewness Measures of skewness and kurtosis may be computed to evaluate how a distribution deviates from a normal distribution. Most clinical investigators do not routinely need these measures. In practice, the investigator may plot the distribution of the data to evaluate the presence of outliers, those observations with values much larger or smaller than the rest of the sample. A distribution that has a few to a moderate number of high values and a mean that is greater than the median is generally referred to as right or positively skewed. Conversely, a distribution that has a few to a moderate number of low values and a mean that is smaller than the median is generally referred to as left or negatively skewed. In summary, the direction in which the tail of the distribution points characterizes the direction of skew.

Kurtosis Kurtosis refers to how flat or peaked the distribution is relative to the normal distribution. If a distribution is flatter than the normal distribution, it is called platykurtotic. On the other hand, if a distribution is more peaked than the normal distribution, it is called leptokurtotic. For kurtotic distributions, the mean and the median coincide, but the standard deviation is either larger or smaller, respectively, than it would have been if the observations were sampled from a normal distribution.

Coefficient of Variation The coefficient of variation allows the researcher to compare two or more standard deviations, because the standard deviation has been standardized by the mean. The population coefficient of variation is (σ/µ)100%, and the sample coefficient of variation is (s/[X with bar above])100%. For most biologic data, the standard deviation increases as the mean increases. Therefore, the coefficient of variation of a particular variable tends to be rather stable over a wide range of values. For experimental studies, the coefficient of variation is an indicator of the reproducibility of the observations. The clinical investigator may use the coefficient of variation to compare variables that may be potential confounders or effect modifiers. For one group of subjects, the spread of different variables may be compared using the coefficient of variation. For two or more groups of subjects, the coefficient of variation may be used to compare the groups with respect to the spread of a particular variable.


PROBABILITY

Many patient characteristics are dichotomous attributes, which are either present or absent, such as fever. Some characteristics have the form of categorical variables with only a few possible states. For example, the investigator may categorize patients according to the presence of a rapidly fatal disease, an ultimately fatal disease, or a nonfatal disease. In some statistical texts, authors apply the term discrete variable to a characteristic or attribute with two or more states. In published reports, healthcare epidemiologists summarize these types of patient characteristics by indicating the proportion of the total group with each characteristic of interest.

During the research process, the clinical investigator often begins exploratory data analysis by considering the relationships between pairs of categorical variables. The following sections contain important rules and definitions that the clinical investigator must master before undertaking a complex study. Dichotomous variables are emphasized, because many clinically important risk factors are dichotomous variables.



Applications Relevant to Epidemiology

Epidemiologists measure morbidity in terms of prevalence and incidence. Several applications of probability to epidemiology require the investigator to recognize the distinction between these two measures. Prevalence is the proportion of individuals who have the disease. Stated another way, prevalence is the proportion of individuals who have the disease out of all individuals in the population
(i.e., those who are at risk for the disease). Prevalence can be defined as the probability that an individual has the disease regardless of the time elapsed since diagnosis. In contrast, incidence is the rate at which new cases occur among individuals who were disease free. Incidence is the number of new cases that have occurred over a specified time divided by the number of individuals who were disease free (i.e., at risk for the disease) at the beginning of the period. Therefore, incidence can be defined as the probability that a disease-free individual will develop the disease over a specified period.

Relative Risk or Risk Ratio RR is the ratio of the incidence of a disease among exposed persons to the incidence of a disease among unexposed persons (1,3,5,8,12, 13, 14, 15,16,17,18,19,20,21,22). Often, epidemiologists use the term risk ratio interchangeably with relative risk. Values for RR are positive and range theoretically from zero to infinity; however, in practice, the denominator probability (i.e., incidence of disease in the unexposed) determines the upper limit for RR. For example, if the incidence of disease in the unexposed is 0.4, then the upper limit for RR is 2.5. This restriction limits the direct comparability of RRs across locations or studies.

If the probability of disease is equally likely for those exposed and those not exposed, the RR equals 1.0. Whenever the RR equals 1.0, exposure and disease are independent. If the probability of disease is higher for those exposed than for those not exposed, RR is >1.0 and exposure is a risk factor. If the probability of disease is lower for those exposed than for those not exposed, RR is less than 1.0 and exposure is a protective factor. As the RR of disease increases or decreases from 1.0, there is evidence that the two events, exposure and disease, are associated or dependent. Using the information in a tabled display, the infection control officer can obtain two conditional probabilities: Pr(P|V) = .25 and Pr(P|image) = .167. Thus, the RR is 1.497. In this situation, the officer would conclude that according to these data, a patient on a ventilator is about 1.5 times as likely to develop pneumonia as a patient who is not on a ventilator.

Odds Ratio When incidence is not known, RR cannot be obtained. However, the RR can be approximated by the odds ratio (OR) (1,5,8,12, 13, 14,15,16,17,19,20,21,22). If the proportion of diseased persons (i.e., prevalence) is small (i.e., <0.1), then the OR is usually a reasonably good approximator of the RR. Therefore, the investigator is responsible for carefully evaluating the OR as an approximator of the RR. In making this evaluation, the investigator must consider whether the disease is chronic or acute. Approximation of the RR is biased when only prevalent cases are used in the analysis. When the duration is short (because of either rapid fatality or cure), the numbers of incident and prevalent cases are very nearly the same; very little bias in approximating RR based on prevalent cases is likely. However, when duration is long, bias can be a problem. For example, when serum cholesterol is used to predict death from heart disease, the OR from prevalent cases is lower than the RR from incident cases. This downward bias occurs, because the individuals with the highest cholesterol values are more likely to have a high fatality rate and thereby to escape detection as prevalent cases. In addition, the investigator should be aware that for a particular sample, the OR will have a more extreme value compared with the RR. If the estimates of the OR and RR based on the sample are >1.0, the estimated OR will be larger than the estimated RR. Conversely, if the estimates of the OR and RR based on the sample are <1.0, the estimated OR will have a value smaller than the estimated RR.

Both RRs and ORs are very useful statistics and have many applications for observational and quasi-experimental studies. Although the clinical investigator often makes the same inferences from an OR as from an RR, these statistics are not interchangeable. Therefore, investigators should be very strict in stipulating whether an estimate is an RR or an approximation based on an OR. Furthermore, it is incumbent on the investigator to demonstrate the validity of any implicit assumption that the approximation based on an OR is a good approximation of RR. Failure to do so can have dangerous consequences involving misinterpretation of published reports and erroneous clinical decisions about patient care.

From the first table, the RR may be computed as a ratio with p1/(p1 + p3) in the numerator and p2/(p2 + p4) in the denominator. If the number of patients with pneumonia is small, p1 will contribute very little to the quantity (p1 + p3); likewise, p2 will contribute very little to the quantity (p2 + p4). The OR equals a ratio with p1/p3 in the numerator and p2/p4 in the denominator. Statistically, the OR can always be used to approximate the RR. As p1 and p2 become smaller, the OR may become a better approximator of the RR. Like RR, the OR ranges theoretically from zero to infinity. However, the OR has a property that can make it a more useful statistic than the RR. The OR is independent of the denominator probability (i.e., an OR of 2.0 has the same meaning regardless of the population or sample on which it was based). The OR is considered the odds of having the disease with the factor present relative to the odds of having the disease with the factor absent. The OR may be calculated from a 2 × 2 table by calculating the ratio of cross-products (multiplying diagonally): OR = (p1p4)/(p2p3).

Sensitivity, Specificity, and Predictive Value The healthcare epidemiologist can use joint, conditional, and total probabilities for quantifying commonly used laboratory tests (5,8,12, 13, 14,15,16,17,18,19,20,21,22,23,24,25,26). The total or marginal probability of disease may be represented as Pr(D); this probability is an estimate of disease state prevalence in a population. Prevalence can be thought of as the underlying probability of disease state in a particular population. Likewise, Pr(image) can be thought of as the underlying probability of not having the disease state; it is not necessarily the probability of wellness or health.

In terms of conditional probability, the probability of a positive test result given that a patient has the disease—that is, Pr(T|D)—refers to test sensitivity. Similarly, the probability of a negative test result given that a patient does not have the disease—that is, Pr(image|image)—refers to test specificity. The sensitivity and specificity of a test are independent of prevalence.


The healthcare epidemiologist can display the various possible combinations of disease states and test results in a 2 × 2 table.





























Positive Test Result


Negative Test Result


Marginal Probability


Disease present


Pr(D) × Pr(T|D)


Pr(D) × Pr(image|D)


Pr(D)


Disease absent


Pr(image) × Pr(T|image)


Pr(image) × Pr(image|image)


Pr(image)


Marginal probability


Pr(T)


Pr(image)


1.0


In contrast, the predictive values of a positive test result (PV+) and a negative test result (PV-) depend on prevalence. In terms of conditional probability, the probability of a patient having the disease given that the test result is positive —that is, Pr(D|T)—refers to positive predictive value of the test (PV+). Similarly, the probability of a patient not having the disease given that the test result is negative— that is, Pr (image|image)—refers to negative predictive value of the test (PV−).





























Positive Test Result


Negative Test Result


Marginal Probability


Disease present


Pr(T) × Pr(D|T)


Pr(image) × Pr(D|image)


Pr(D)


Disease absent


Pr(T) × Pr(image|T)


Pr(image) × Pr(image|image)


Pr(image)


Marginal probability


Pr(T)


Pr(image)


1.0


Alternatively, the healthcare epidemiologist may interpret this table in terms of joint probabilities. From this perspective, the epidemiologist considers the probability of an average (or random) patient having a test result that is considered true positive (TP), true negative (TN), false positive (FP), or false negative (FN). Specifically, the probability of a TP test result is a joint probability—that is, Pr(T and D). The other three outcomes may be expressed similarly as joint probabilities. The probability of obtaining a TN result is the joint probability of testing negative and not having the disease. The probability of obtaining an FP result is the probability that a patient selected at random will test positive but not have the disease. Finally, the probability of obtaining an FN result is the probability of a patient selected at random testing negative but having the disease. In practice, these probabilities are often expressed as percentages. These probabilities may be displayed as follows.
































Test Results



Positive


Negative


Total Probability


Disease present


Pr(TP) = Pr(T and D)


Pr(FN) = Pr(image and D)


Pr(D)


Disease absent


Pr(FP) = Pr(T and image)


Pr(TN) = Pr(image and image)


Pr(image)


Total probability


Pr(T)


Pr(image)


1.0


Prevalence is the sum of the probability of a TP result and the probability of an FN result. Similarly, the probability of testing positive is the sum of the probability of a TP result and the probability of an FP result. The other two marginal probabilities can be obtained in the same way.

Bayes’ Theorem In more complex situations, the healthcare epidemiologist encounters more than two possible clinical signs or symptoms (symbolized as Ti, where i indicates the alternative clinical signs and symptoms) and more than two possible disease states (symbolized as Dj, where j indicates the alternative disease states). The 2 × 2 tables can be expanded into i columns and j rows, representing clinical findings and disease states, respectively. Bayes’ theorem or rule allows the healthcare epidemiologist to obtain the conditional probability of a particular disease given a particular clinical finding (1,3,5,8,12,15,16,18,25). Bayes’ theorem or rule states that the conditional probability of D1 given T1 equals the joint probability of T1 and D1 divided by the sum of the joint probabilities of T1 and each Dj:


where (a) Pr(Dj) represents the known probabilities of disease states in a specified population and the sum of all Pr(Dj) values equals 1.0 and (b) the various Dj values are mutually exclusive (i.e., a patient cannot have more than one disease). When healthcare epidemiologists need to choose the most likely explanation for their clinical findings, they often use Bayes’ rule to assess the conditional probabilities of several disease states in light of their particular clinical findings. In published literature, epidemiologists may use conditional probabilities to discuss the merits of several alternative explanations. Clinicians may use Bayes’ rule to evaluate a number of diagnostic possibilities. They realize that although no test is absolutely accurate, positive test results do tend to increase the probability that a particular disease is present. The conditional probability of disease given certain clinical findings provides a number that quantifies the amount of confidence that can be placed in stating that a particular disease is present. Differential diagnosis, decision theory, and decision making involve applications of Bayes’ rule.


HYPOTHESIS TESTING

Hypothesis testing does have a place in analysis of data related to healthcare epidemiology and infection control. One-sample tests can be used to determine whether the sample is different from the reference population. Clinical investigators often use two-sample tests during exploratory data analysis to identify potentially important risk factors. The following sections address general definitions and rules for hypothesis testing for one- and two-sample tests for categorical and continuous variables using parametric and nonparametric methods.



Hypothesis Tests for Categorical Data

A random variable is a numeric quantity that has different values, depending on natural variability. A discrete or categorical random variable is a variable for which there exists a discrete set of values, each having a nonzero probability. Many data from biologic and medical investigations have a common underlying structure.

Cumulative incidence and prevalence of a disease are distributed binomially (1,8,12). Variables that follow a binomial frequency distribution are characterized by the following criteria: (a) a sample is taken of n independent trials, (b) each trial may have two possible outcomes (e.g., success/failure, present/absent, alive/dead), and (c) the probabilities for the outcomes are a constant p for success and (1-p) =q for each failure for every trial. Usually a healthcare epidemiologist is not concerned with the order in which the failures occurred; instead the epidemiologist is interested in the number of failures and the probability that a number as extreme or more extreme occurred given that H0 is true.

Generally, an incidence density variable follows a binomial distribution. For variables such as incidence density, the Poisson distribution is often an accurate approximation of the binomial distribution. The Poisson distribution is a discrete frequency distribution of the number of occurrences of rare events (1,8,12). For the Poisson distribution, the theoretical number of trials is infinite and the number of possible events is also very large. Incidence density studies often involve one or more cohorts of disease-free individuals. A failure is defined as the occurrence of the disease of interest in a previously disease-free individual. The probability of k events (i.e., failures) occurring in a period of time T is defined for a Poisson random variable. Thus, the Poisson distribution depends on two parameters: the length of the interval, T, and the underlying λ, which represents the expected number of events per unit of time. Time may also be defined as a combination of time and level of exposure (e.g., pack-years of smoking or patient-days in the ICU). The mean and the variance of a Poisson distribution are the same. For variables that follow a binomial distribution, when n is large and p is small, the mean and variance will be similar; thus, the Poisson may be used as an approximation of the binomial.

The following two sections describe statistical methods for one- and two-sample tests on binomial proportions or rates (1,3,4,5,6,7,8,15,18,25,27). Throughout these sections, unless otherwise stated, the significance level is .05; power is 0.80; and all tests are two-sided. In power and sample size formulas, a z-score for the 97.5th percentile is used for a two-sided test with a significance level of .05: z0.975 is 1.96. When power of 0.80 is used to determine sample size, a z-score for the 80th percentile is used: z0.80 is 0.842.

These sections, describing one- and two-sample tests for binomial proportions or rates, are not designed as casual reading material; instead, they provide a concise reference of commonly used statistical methods. The only
formulas included are those for the test statistics. Most clinical investigators use statistical packages for obtaining sample size estimates or power calculations. For appropriate formulas, the reader is referred to various biostatistical textbooks, for example, Rosner (8) or Sokal and Rohlf (2). For a binomial probability, π refers to the population parameter and p refers to the sample statistic, which approximates the parameter. Each section follows the same format, which is outlined in the following.

Step 1. Set up H0 and H1.

The investigator uses the research question to form H0 and H1. Generally, H1 reflects the result that the investigator expects to find (i.e., that there is a special cause that differentiates the study group from the norm). For a onesample hypothesis test, H0 states that the proportion of events or rate of occurrence (π) in the study group is the same as some specified or norm value, π0. The investigator obtains this value, π0, from some source other than the current study. Typically, the investigator obtains π0 from theoretically derived values or uses nationally or locally compiled values. In the one-sample situation, H1 states that the proportion of events or rate of occurrence (π) in the group being studied differs from the specified value, π0. The investigator estimates π from a sample as p. If the estimated value is sufficiently close to the specified value, π0, the investigator decides in favor of H0 (i.e., that the data are consistent with H0 being true). If the data fail to support H0, the conclusion is that the data are not consistent with H0 being true; therefore, the investigator rejects the H0, concluding that the rate or proportion must be some other value (i.e., higher or lower than π0).

For a two-sample hypothesis test, H0 states that the proportion of events or rate of occurrence (π1) from the first group is the same as that (π2) from the second group. For a clinical trial, the groups might reflect those receiving and not receiving the treatment. For an observational study, the groups might reflect those subjects with and without the attribute of interest. Interpretations of failing to reject and rejecting H0 are similar to those described for the one-sample situation.

Step 2. Choose α, power, and the difference between π and π0 (or π1 and π2) that is clinically meaningful. Another term for the difference between π and π0 (or π1 and π2) is effect size. Frequently, investigators overlook this step. For example, the healthcare epidemiologist may not have the opportunity to conduct a formal power analysis before data collection begins. However, whenever the effect size estimated from the sample is clinically meaningful but the results are consistent with H0, the investigator should determine power retrospectively. This analysis allows the investigator to determine how much larger the sample would have to be to reject H0, given the results of the study. Even when statistical significance is achieved, a retrospective power analysis can indicate how cautiously the results should be interpreted.

Step 3. Using an available computer package, determine sample size, n. Sample size is extremely sensitive to the effect size chosen by the investigator.

Step 4. Obtain data.

Step 5. Compute test statistic in terms of parameters under H0. Obtain the p value associated with the test statistic, assuming H0 is correct. The interpretation of the p value is valid only in terms of H0 and H1. By choosing to make a hypothesis test, the investigator restates the research question and must decide between H0 and H1 based on how consistent or inconsistent the data are with H0. The term consistent connotes having sufficient empirical support for the investigator to decide that the unknown true state of nature is likely to be H0 instead of H1. Conversely, the term inconsistent connotes having sufficient empirical support for the investigator to decide that the unknown true state of nature is likely not to be H0 but rather H1. Therefore, the p value is the probability of obtaining a result that is at least as extreme as this result, which the investigator has obtained from these data, given that H0 is true. Stated another way, the investigator rejects H0 when the results from the study could be called unusual if H0 were correct. The consensus among statisticians and scientists is that, if the p value is .05 or smaller, the investigator should reject H0 and decide that H1 is correct. A p value of .05 indicates that this result would occur no more often than 1 in 20 times if H0 were true.

Step 6. Decide whether to reject or fail to reject H0. Compare the p value to α.

One-Sample Tests for a Binomial Proportion or Rate Normal Approximation Method The normal approximation method based on a z-test was selected because the computation of this test statistic more closely parallels the estimation of confidence limits than any of the other methods. If the normal approximation to the binomial distribution is valid (i.e., npq > 5), a two-sided hypothesis test is conducted as follows:

Step 1. Set up H0 and H1.

H0: π = π0 versus H1: π = π0

Step 2. Choose α, power, and the difference between π and π0 that is clinically meaningful.

Step 3. Using an available computer package, determine sample size, n. Sample size is extremely sensitive to the difference between π and π0 and to how close these are to 0 or 1.0. When no information is available, a pilot study can be conducted to get some idea of differences that can be obtained in a particular clinical situation.

Step 4. Obtain data.

Step 5. Compute test statistic zs in terms of parameters under H0, where zs is a z-score from the standard normal distribution, and obtain the p value as twice the probability associated with the zs assuming that H0 is correct. If the significance level is .05, z0.975 is 1.96. With the wide availability of computer-based packages that contain statistical functions, many clinical investigators can obtain the p value.


where p is the estimate from the sample of the parameter π0. One should note that image; the squared z-score, obtained from the data (i.e., zs), equals a chisquare test statistic with 1 degree of freedom obtained from the same data (i.e., image). Most computer packages report a chi-square test statistic with 1 degree of freedom (i.e., image) along with the associated p value. If the significance level is .05, image with 1 degree of freedom is 3.84,
which equals 1.962. If the normal approximation to the binomial is not valid, p values may be obtained by the exact method.

Step 6. Decide whether to reject or fail to reject H0. Compare the p value to α.

One-Sided Hypothesis Tests If the hypothesis test is onesided (i.e., H1: π > π0), calculate power and estimate sample size substituting 1-α for 1-α/2 in the previous formulas (e.g., z0.95 is 1.645). In addition, the p value is not multiplied by 2. It is always easier to reject a one-sided test than a similar two-sided test. In addition, an effectively larger α increases power by reducing β.

Two-Sample Tests for Binomial Proportions or Rates When the random variable under study is classified into discrete categories, hypothesis testing and methods of inference should reflect the data structure. For the two-sample situation, there are two typical study designs: independent and paired samples. Before formulating the hypothesis, the investigator must determine whether the samples are independent or not. Two samples are independent when the data points in one sample are unrelated to the data points in the second sample. Samples that are not independent are paired. Paired samples may represent two sets of measurements on the same individuals. Alternatively, paired samples may represent measurements on different individuals chosen or matched such that each member of the pair is very similar to the other. Statistical analysis of data from clinical studies is valid only in the context of the study design; inferences are only valid in the context of research questions.

When a healthcare epidemiologist investigates the relationship between two dichotomous variables, the observations are tabulated in 2 × 2 tables according to attributes. For example, suppose the epidemiologist classifies observations according to the following two attributes:

Attribute 1: A, image

Attribute 2: B, image

The results will be classified into four groups that include all possible combinations of attributes 1 and 2: (A and B), (image and B), (A and image), and (image and image). After tabulation, data can be presented in the following format, where a, b, c, and d are the frequencies at which the four groups occur in the sample.





























B


B


Total


A


a


b


a + b


image


c


d


c + d


Total


a + c


b + d


n


The results of studies with either independent or paired designs may be tabulated according to the frequencies into the same four groups. Thus, this table can be obtained in different ways.

Two-Sample Tests for Independent Samples Both the table and the test statistic are the same regardless of whether the data are obtained from an observational study or a clinical trial. However, the research questions, hypotheses, and statistical tests may be different depending on the type of study. Consequently, the analyses also depend on study design.

Step 1. Set up H0 and H1. In many observational studies, the investigator can only control the total number of subjects; the research question involves whether the two sets of attributes are independent of each other. The statistical test is called a test of independence or association. In observational studies, the concept of independent samples stems from the notion that for a given attribute, such as pneumonia, the patients with pneumonia are unrelated to those without pneumonia. The null and alternative hypotheses may be written as follows:

H0: π = π0 for all four groups versus H1: π ≠ π0 for at leasts one group,

where the null and alternative hypotheses are stated in terms of joint probabilities, that is, the observed proportion equals the expected proportion. The general approach is discussed in the earlier section on probability. For example, the investigator may record the observed joint probabilities of (a) developing pneumonia and being on the ventilator, (b) not developing pneumonia and being on the ventilator, (c) developing pneumonia and not being on the ventilator, and (d) not developing pneumonia and not being on the ventilator. The expected joint probabilities are those that would have occurred under the assumption of independence. The statistical test for association involves determining the probability of finding the observed joint probabilities if the attributes were independent.

For clinical trials, the general research question for studies with independent samples is whether the proportion of B (and image) is the same for A and image (i.e., the proportion of patients who die is the same for those with the drug [treated] as for those without the drug [control subjects]). Usually, the investigator determines not only the total number of subjects but also the number of subjects in each group. The statistical test is called a test of homogeneity of two proportions. For example, a clinical trial of a drug that may reduce the death rate associated with ventilator-associated pneumonia may be conducted. In this example, the investigator first estimates the observed conditional probabilities of death depending on whether the subject is in the treated or the control group. Next, the investigator estimates the observed marginal probabilities of death and survival using the addition rule. Using these observed marginal probabilities, the investigator then estimates the expected conditional probabilities of death independent of whether the subject is in the treated or the control group. These expected (or theoretical) conditional probabilities are based on the assumption that the death rate is the same in both groups (i.e., that H0 is true). The statistical test involves determining the probability of finding the observed conditional probabilities if the probability of death were the same in both groups. The null and alternative hypotheses may be stated as follows:

H0B|Aimage = 0 versus H1B|Aimage = 0,

Step 2. Choose α, power, and the difference between πB|A and image that is clinically meaningful.


Step 3. For clinical trials using an available computer package, determine sample size for each group, n1 and n2. Sample size is very sensitive to the difference between πB|A and πB|image. This difference, also called the effect size, should be that difference which is biologically or clinically meaningful in the opinion of the researcher. When no information is available, a pilot study can be conducted to get some idea of differences that can be obtained in a particular clinical situation. Although the algebra is not difficult, the formula for determining the sample size is quite complex; the reader is referred to the formula in Sokal and Rohlf (2) or Fleiss et al. (15), which minimizes the chances of underestimating the sample size required to detect the absolute value of the difference of |πB|Aimage| at given levels of significance and power. The formula in Rosner (8) is used in most statistical packages and yields sample size estimates that are generally about 5% smaller than those based on the Sokal and Rohlf or Fleiss formula. Computation of sample size can be tedious. For step 3, the investigator may wish to consult a biostatistician. Computer software is available for making some computations; however, the investigator should review documentation to determine which formulas are used and choose a software package that does not typically underestimate sample size. This precaution is especially important if sample sizes are less than 50 per group.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jun 22, 2016 | Posted by in GENERAL & FAMILY MEDICINE | Comments Off on Biostatistics for Healthcare Epidemiology and Infection Control

Full access? Get Clinical Tree

Get Clinical Tree app for offline access