Summarizing Data

It is very difficult to have any ‘feeling’ for a set of numerical measurements unless we can summarize the data in a meaningful way. A diagram (Chapter 4) is often a useful starting point. We can also condense the information by providing measures that describe the important characteristics of the data. In particular, if we have some perception of what constitutes a representative value, and if we know how widely scattered the observations are around it, then we can formulate an image of the data. The average is a general term for a measure of location; it describes a typical measurement. We devote this chapter to averages, the most common being the mean and median (Table 5.1). We introduce measures that describe the scatter or spread of the observations in Chapter 6.

Table 5.1 Advantages and Disadvantages of Averages.

Type of average	Advantages	Disadvantages
Mean	Uses all the data values Algebraically defined and so mathematically manageable Known sampling distribution (Chapter 9)	Distorted by outliers Distorted by skewed data
Median	Not distorted by outliers Not distorted by skewed data	Ignores most of the information Not algebraically defined Complicated sampling distribution
Mode	Easily determined for categorical data	Ignores most of the information Not algebraically defined Unknown sampling distribution
Geometric mean	Before back-transformation, it has the same advantages as the mean Appropriate for right-skewed data	Only appropriate if the log transformation produces a symmetrical distribution
Weighted mean	Same advantages as the mean Ascribes relative importance to each observation Algebraically defined	Weights must be known or estimated

The Arithmetic Mean

The arithmetic mean, often simply called the mean, of a set of values is calculated by adding up all the values and dividing this sum by the number of values in the set.

It is useful to be able to summarize this verbal description by an algebraic formula. Using mathematical notation, we write our set of n observations of a variable, x, as x₁, x₂, x₃, …, x_n. For example, x might represent an individual’s height (cm), so that x₁ represents the height of the first individual, and x_i the height of the ith individual, etc. We can write the formula for the arithmetic mean of the observations, written and pronounced ‘x bar’, as

Using mathematical notation, we can shorten this to

where Σ (the Greek uppercase ‘sigma’) means ‘the sum of’, and the sub- and superscripts on the Σ indicate that we sum the values from i = 1 to i = n. This is often further abbreviated to

The Median

If we arrange our data in order of magnitude, starting with the smallest value and ending with the largest value, then the median is the middle value of this ordered set. The median divides the ordered values into two halves, with an equal number of values both above and below it.

It is easy to calculate the median if the number of observations, n, is odd. It is the (n + 1)/2th observation in the ordered set. So, for example, if n = 11, then the median is the (11 + 1)/2 = 12/2 = 6th observation in the ordered set. If n is even then, strictly, there is no median. However, we usually calculate it as the arithmetic mean of the two middle observations in the ordered set [i.e. the n/2th and the (n/2 + 1)th]. So, for example, if n = 20, the median is the arithmetic mean of the 20/2 = 10th and the (20/2 + 1) = (10 + 1) = 11th observations in the ordered set.

The median is similar to the mean if the data are symmetrical (Fig. 5.1), less than the mean if the data are skewed to the right (Fig. 5.2), and greater than the mean if the data are skewed to the left (Fig. 4.1d).

Figure 5.1 The mean, median and geometric mean age of the women in the study described in Chapter 2 at the time of the baby’s birth. As the distribution of age appears reasonably symmetrical, the three measures of the ‘average’ all give similar values, as indicated by the dotted lines.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Tags: Medical Statistics at a Glance

May 9, 2017 | Posted by admin in GENERAL & FAMILY MEDICINE | Comments Off

Basicmedical Key

Fastest Basicmedical Insight Engine

Describing data: the ‘average’

Summarizing Data

The Arithmetic Mean

The Median

Like this:

Related

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

Basicmedical Key

Fastest Basicmedical Insight Engine

Describing data: the ‘average’

Summarizing Data

The Arithmetic Mean

The Median

Share this:

Like this:

Related

Related posts:

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree