Describing data: the ‘average’


c5-fig-5002


Summarizing Data


It is very difficult to have any ‘feeling’ for a set of numerical measurements unless we can summarize the data in a meaningful way. A diagram (Chapter 4) is often a useful starting point. We can also condense the information by providing measures that describe the important characteristics of the data. In particular, if we have some perception of what constitutes a representative value, and if we know how widely scattered the observations are around it, then we can formulate an image of the data. The average is a general term for a measure of location; it describes a typical measurement. We devote this chapter to averages, the most common being the mean and median (Table 5.1). We introduce measures that describe the scatter or spread of the observations in Chapter 6.


Table 5.1 Advantages and Disadvantages of Averages.




























Type of average Advantages Disadvantages
Mean

  • Uses all the data values
  • Algebraically defined and so mathematically manageable
  • Known sampling distribution (Chapter 9)


  • Distorted by outliers
  • Distorted by skewed data
Median

  • Not distorted by outliers
  • Not distorted by skewed data


  • Ignores most of the information
  • Not algebraically defined
  • Complicated sampling distribution
Mode

  • Easily determined for categorical data


  • Ignores most of the information
  • Not algebraically defined
  • Unknown sampling distribution
Geometric mean

  • Before back-transformation, it has the same advantages as the mean
  • Appropriate for right-skewed data


  • Only appropriate if the log transformation produces a symmetrical distribution
Weighted mean

  • Same advantages as the mean
  • Ascribes relative importance to each observation
  • Algebraically defined


  • Weights must be known or estimated

The Arithmetic Mean


The arithmetic mean, often simply called the mean, of a set of values is calculated by adding up all the values and dividing this sum by the number of values in the set.


It is useful to be able to summarize this verbal description by an algebraic formula. Using mathematical notation, we write our set of n observations of a variable, x, as x1, x2, x3, …, xn. For example, x might represent an individual’s height (cm), so that x1 represents the height of the first individual, and xi the height of the ith individual, etc. We can write the formula for the arithmetic mean of the observations, written x9995_in and pronounced ‘x bar’, as


c05ue001


Using mathematical notation, we can shorten this to


c05ue002


where Σ (the Greek uppercase ‘sigma’) means ‘the sum of’, and the sub- and superscripts on the Σ indicate that we sum the values from i = 1 to i = n. This is often further abbreviated to


c05ue003


The Median


If we arrange our data in order of magnitude, starting with the smallest value and ending with the largest value, then the median is the middle value of this ordered set. The median divides the ordered values into two halves, with an equal number of values both above and below it.


It is easy to calculate the median if the number of observations, n, is odd. It is the (n + 1)/2th observation in the ordered set. So, for example, if n = 11, then the median is the (11 + 1)/2 = 12/2 = 6th observation in the ordered set. If n is even then, strictly, there is no median. However, we usually calculate it as the arithmetic mean of the two middle observations in the ordered set [i.e. the n/2th and the (n/2 + 1)th]. So, for example, if n = 20, the median is the arithmetic mean of the 20/2 = 10th and the (20/2 + 1) = (10 + 1) = 11th observations in the ordered set.


The median is similar to the mean if the data are symmetrical (Fig. 5.1), less than the mean if the data are skewed to the right (Fig. 5.2), and greater than the mean if the data are skewed to the left (Fig. 4.1d).



Figure 5.1 The mean, median and geometric mean age of the women in the study described in Chapter 2 at the time of the baby’s birth. As the distribution of age appears reasonably symmetrical, the three measures of the ‘average’ all give similar values, as indicated by the dotted lines.


c05f001

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 9, 2017 | Posted by in GENERAL & FAMILY MEDICINE | Comments Off on Describing data: the ‘average’

Full access? Get Clinical Tree

Get Clinical Tree app for offline access