ATTRIBUTES OF THE NORMAL DISTRIBUTION

There is a mathematical equation for the normal distribution that describes the relationship between the points on the abscissa and the ordinate. A collection of values with a normal distribution will have a mean μ and standard deviation σ. These descriptors determine the peak of the curve and its spread. There are many different bell curves for different values of μ and σ. For a particular curve, however, the height of the curve y at any given point depends on the value of x. Box 9-1 shows the formula for normal distributions, but it is not necessary to memorize it.

BOX 9-1

The formula for a normal distubution is:

All of the values on the right side of the equation are known except for x, because they are determined from the data (π is a constant). So, for any value of x along the abscissa we can solve for y, which is the frequency at which it occurs.

In a normal distribution, the mode is the peak of the hill. It is the value that occurs most frequently. It is also the mean, or the average of all the values in the distribution that is represented by μ. And because the data points are evenly distributed on either side, the same point represents the median, which is the 50th percentile. When μ changes but σ remains the same, the curve will shift along the abscissa but the shape remains the same, as in Figure 9-2.

FIGURE 9-2 The effect of different values of μ with the same value of σ.

Figure 9-3 shows what happens to the overall shape of the curve for different values of σ. When the standard deviation is smaller, the average distance from the mean is less, so more data points will have a value closer to the mean. This results in a narrower curve with a higher peak. The mean stays the same, but the curve is drawn upward.

FIGURE 9-3 In a normal distribution the mode = mean = median = 50th percentile = μ. If the standard deviation is σ small, the curve is narrow and tall. As σ gets larger the curve gets shallow and wide.

We see that the normal distribution is actually a collection of bell-shaped curves, depending on μ and σ. We will ultimately use the curve as a frequency distribution to plot outcomes on the abscissa, so we can see the probability of their occurrence on the ordinate. It is to our advantage to somehow standardize all of these curves so we only need to refer to one. We can do this by performing some simple mathematical alterations to each data point. It is possible because the family of normal distributions are related by the way the data points are arranged under portions of the curve.

One of the distinctions of the normal distribution is related to its symmetry. One half of the data points lie to the left of μ and the remaining half lie to the right, no matter how spread out they are. When we plot all the data points, we also find that 68% of all of the observations fall within one standard deviation (σ) from the mean. When we go out two standard deviations on either side of the mean (actually 1.96 standard deviations, but often approximated to 2), about 95% of all observations will be encompassed in this area. The same distance (or deviation) from the mean will roughly contain the same number of data points on either side. In fact, one method to check for normality of data is to compute the percentile points in a collection of values for a variable and see if roughly 68% and 95% of the values fall within one and two standard deviations, respectively.

Figure 9-4 is an illustration of this principle. The population mean μ is also the median, and marks the 50th percentile. One half of the data points live on either side of μ. When going out one standard deviation (σ) on either side, the markers are at the 16th and 84th percentiles. This means 68% of the data points lie between these two markers (84 − 16 = 68). Likewise, the markers for 95% of the data points are 97.5 and 2.5. When the points in the two tails of the graph outside these markers are excluded, 95% of the original data points will be housed here.