Random Distribution of Cases in Time and Space

19.1 Introduction


Random distribution in time and space Epidemiologists are often called upon to evaluate whether an observed number of cases in a population is greater than expected and whether any such increase is beyond what could be expected due to chance. Random fluctuations in the number of cases in a population are to be expected, and are easily addressed by statistical analyses.


Don’t be fooled by randomness Figure 19.1 is a computer-generated image of 50 dots randomly distributed on a grid of 50 squares. The average number of dots per box is 1 with the following distribution: 18 of the squares contain no dots, 19 contain 1 dot, 9 contain 2 dots, 3 contain 3 dots, and 1 contains 4 dots. Now imagine that each square on this grid represents a community, and each dot represents a case of some relatively rare disease. (The communities are of equal size and age distribution.) If we were to select the community with the 4 cases, we would be correct in saying that the occurrence was 4 times the expected rate. However, this is merely an artifact of the random distribution of cases. Therefore, in assessing whether an occurrence is greater than expected, we must factor in expected geographic and temporal random fluctuations. The question then becomes: At what point can we declare a distribution of cases to be nonrandom?



Figure 19.1 Random distribution of 50 dots on a grid with 50 squares.

c19f001

19.2 The Poisson distribution


The Poisson distribution is a probability distribution that is well suited for describing the random occurrence of rare events over time. The Poisson formula is


19.1


Where:


A ≡ = the variable number of cases over a given amount of person-time


Pr(A = a) ≡ = the probability of observing exactly a cases


e ≡ = the universal constant that forms the base of natural logarithms (e = 2.718 281)


μ ≡ = the expected number of cases in the population


a! ≡ = the mathematical operation “a factorial” = a(a − 1)(a − 2) … (1). For example, 4! = (4)(3)(2)(1) = 24. By definition, 0! = 1.


Use of the Poisson formula


To illustrate use of the Poisson formula, let us consider the probability of observing no cases (A = 0) in a population in which one case is expected (μ = 1). Accordingly,


equation


The probability of observing one case when one case is expected is


equation


The probability of observing two cases is


equation


The probability of observing three cases is


equation


The probability of observing four cases is


equation


and so on. Figure 19.2 displays a bar chart of this distribution.



Figure 19.2 Poisson distribution, μ = 1.

c19f002

Calculating the expected number of cases


Use of the Poisson formula requires knowledge of the expected number of cases in a population. In epidemiologic investigations, this information comes from knowledge of the population size (n) and expected rate (λ0):


19.2







Illustrative Example 19.1 Poisson distribution (rare cancer)

Suppose a rare form of cancer has an expected rate of 1 per million person-years in a population of a given age distribution. In a city of, say, 100 000 (with the given age distribution), we expect to see μ = (100 000)(1 × 10−6 year) = 0.1 cases per year. The Poisson distribution for μ = 0.1 is calculated:


equation


(and so on).


Figure 19.3 displays these probabilities in the form of a bar chart. From this distribution we can say that the city will experience zero cases most years (90.5% of the time), one case 9.0% of the time, and two or more cases about 0.5% of the time (the right tail of the distribution). We may therefore say that under this model, we will see two or more cases every 200 years (on the average).







Figure 19.3 Poisson distribution for Illustrative Example 19.1, μ = 0.1.

c19f003

Epidemiologic computing WinPEPIa (Abramson, 2011) will compare an observed number of cases to an expected number of cases under a Poisson model. Use WinPEPI → Describe → H. Compare SMR or indirectly standardized rate. Look for the one-sided Fisher’s exact results to replicate the results in Illustrative Example 19.1.


Post hoc identification of clusters


The term clustering in epidemiology is usually reserved for describing unusually high accumulations of rare diseases in a circumscribed time and space. By understanding the Poisson distribution as a description of random occurrences in time and space, we can appreciate that a certain amount of clustering is to be expected—somewhere there will be a clustering of cases.


In 1989, state health departments in the United States received approximately 1500 requests to investigate cancer clusters (Greenberg and Wartenberg, 1991). Many of these requests for investigations turned out to be normal occurrences or artifacts of inflated reporting. Those that did represent true increases in occurrence were often difficult to evaluate. Therefore, in 1989, the Centers for Disease Control and Prevention convened a national conference to discuss the study of cancer clusters. The conference clarified the following difficulties surrounding such investigations (Rothman, 1990):



  • Perceived clusters often include different types of cancers, thus reducing the likelihood that they resulted from a common exposure.
  • Many reported clusters include too few cases to reach reliable statistical conclusions.
  • Regional boundaries of clusters are rarely demarcated, making it difficult to determine the size of the population at risk that gave rise to the cases in question.
  • Regional boundaries of clusters may have been arbitrarily altered to make the cluster seem more substantial or inclusive.
  • Conclusions about the perceived clusters may not be reliable because of differences in the sensitivity of statistical mapping techniques used for their detection.
  • Causal exposures are often unspecified and, when specified, are often insufficiently intense to explain the perceived cluster.
  • Chance can never really be ruled out as an “after the fact” explanation for a cluster—even when the statistical chances of an observation are small, rare events are inevitable if enough possibilities are considered.

Despite difficulties encountered in studying clusters, most public health agencies agree that it is good public relations to respond to community concerns about cancer clusters. If a true increase in cancer frequency does exist, citizens can take appropriate action. If a true increase in cancer frequency does not exist, the worries of citizens can be alleviated.b Cluster investigations also provide the opportunity for public health agencies to demonstrate their responsiveness to public concerns and to educate the public (Bender et al., 1990). Therefore, many states have adopted standardized protocols for investigating perceived clusters that are reported by citizens. Typically, this entails talking with the person who reported the cluster, verifying diagnoses of cases, reviewing important exposure information, and determining the extent to which the increased occurrence is “statistically unusual.” At each step of the investigation, findings are reported to the public and the need for more extensive and costly research is evaluated.


19.3 Goodness of fit of the Poisson distribution


Fitting the Poisson distribution


The problem of investigating a single cluster has been discussed. Thus, a more meaningful way to determine if a distribution of cases is nonrandom is to collect data over multiple years and/or locations and then compare the observed distribution of case occurrences to what is expected under a Poisson random model. When the Poisson model fits the observed distribution, the hypothesis of randomness is corroborated. When the Poisson distribution does not fit the observed frequency distribution, the hypothesis of nonrandom occurrence is supported.







Illustrative Example 19.2 Goodness of fit to Poisson distribution (“horse kicks”)

Data: Table 19.1 lists the number of fatal horse kicks in Prussian army units for the 20 years between 1875 and 1894. The unit of observation in this analysis is “army corp-years.” There are 200 such units of observation (n = 200).


Poisson model: Because the value of Poisson parameter μ is unknown, we estimate it with the sample mean ():


19.3


where fa represents the frequency of observing a cases and = n. For the “horse kick” data,


equation


We thus assume μ = 0.610, with Poisson probabilities calculated as:


equation


equation


and so on. Table 19.2 lists Poisson probabilities for the random model.


Expected frequencies: The next task is to determine the frequency distribution predicted by the Poisson model. The expected frequency of observing a cases in a given time period is


19.4


where Pr(A = a) represents the probability of observing a cases under the Poisson model and n represents the total number of observations (n = ). For example, the expected frequency of 0 fatalities in the “horse-kick” illustrative example is = [0.5435][200] = 108.68. Table 19.3 lists the expected frequencies in column 3, and Figure 19.4 plots the observed and expected frequencies side by side. On inspection, the Poisson model fits the data well.


Goodness of fit test: The fit of the observed to expected frequencies can be tested formally. The null hypothesis H0 is that events are randomly distributed as predicted by the Poisson model. The alternative hypothesis H1 is that events are not randomly distributed. The null hypothesis may be false because cases are either more uniformly distributed than expected or more tightly clustered (Figure 19.5).


Before putting the data to the goodness of fit test, classes with expected frequencies of less than 1.0 are merged because the test requires that minimum expectations exceed 1 (Cochran, 1954). In the horse-kick data, we group categories of three or more fatal horse kicks to comply with this requirement (Table 19.4).


Log-likelihood goodness-of-fit statistic G: The test can be performed with a standard chi-square statistic or a G log-likelihood ratio statistic. The two tests yield the same conclusion when n is large. However, there is some advantage to the G log-likelihood statistic when the sample is small (Rao and Chakravarti, 1956). The log-likelihood G statistic is


19.5


where fa represents the observed frequency in class a and represents the expected frequency of class a. Under the null hypothesis, this statistic has a chi-square distribution with k − 2 degrees of freedom, where k represents the number of classes submitted to the test. For the data in Table 19.3, there are four classes (0, 1, 2, 3+), so k = 4 and df = 4 − 2 = 2. Table 19.4 shows calculation of the G statistic for Illustrative Example 19.2. In this instance, G = 0.33 with 2 degrees of freedom. The p value is 0.85. Thus, we conclude that the Poisson distribution is a reasonable fit and the occurrence of fatal horse kicks over time was random.


Epidemiologic computing: WinPEPI will perform these operations with WinPEPI → Describe → C. Appraise a frequency table → 3. Values that a Poisson distribution would produce.





Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Oct 31, 2017 | Posted by in PUBLIC HEALTH AND EPIDEMIOLOGY | Comments Off on Random Distribution of Cases in Time and Space

Full access? Get Clinical Tree

Get Clinical Tree app for offline access