Summarizing Data
It is very difficult to have any ‘feeling’ for a set of numerical measurements unless we can summarize the data in a meaningful way. A diagram (Chapter 4) is often a useful starting point. We can also condense the information by providing measures that describe the important characteristics of the data. In particular, if we have some perception of what constitutes a representative value, and if we know how widely scattered the observations are around it, then we can formulate an image of the data. The average is a general term for a measure of location; it describes a typical measurement. We devote this chapter to averages, the most common being the mean and median (Table 5.1). We introduce measures that describe the scatter or spread of the observations in Chapter 6.
Type of average | Advantages | Disadvantages |
Mean |
|
|
Median |
|
|
Mode |
|
|
Geometric mean |
|
|
Weighted mean |
|
|
The Arithmetic Mean
The arithmetic mean, often simply called the mean, of a set of values is calculated by adding up all the values and dividing this sum by the number of values in the set.
It is useful to be able to summarize this verbal description by an algebraic formula. Using mathematical notation, we write our set of n observations of a variable, x, as x1, x2, x3, …, xn. For example, x might represent an individual’s height (cm), so that x1 represents the height of the first individual, and xi the height of the ith individual, etc. We can write the formula for the arithmetic mean of the observations, written and pronounced ‘x bar’, as
Using mathematical notation, we can shorten this to
where Σ (the Greek uppercase ‘sigma’) means ‘the sum of’, and the sub- and superscripts on the Σ indicate that we sum the values from i = 1 to i = n. This is often further abbreviated to
The Median
If we arrange our data in order of magnitude, starting with the smallest value and ending with the largest value, then the median is the middle value of this ordered set. The median divides the ordered values into two halves, with an equal number of values both above and below it.
It is easy to calculate the median if the number of observations, n, is odd. It is the (n + 1)/2th observation in the ordered set. So, for example, if n = 11, then the median is the (11 + 1)/2 = 12/2 = 6th observation in the ordered set. If n is even then, strictly, there is no median. However, we usually calculate it as the arithmetic mean of the two middle observations in the ordered set [i.e. the n/2th and the (n/2 + 1)th]. So, for example, if n = 20, the median is the arithmetic mean of the 20/2 = 10th and the (20/2 + 1) = (10 + 1) = 11th observations in the ordered set.
The median is similar to the mean if the data are symmetrical (Fig. 5.1), less than the mean if the data are skewed to the right (Fig. 5.2), and greater than the mean if the data are skewed to the left (Fig. 4.1d).