The rates and measures that we explored in Chapter 2 provide a variety of ways to describe the health of a population and thus also enable us to compare patterns of health and disease between populations and over time. This allows us to answer the core questions relating to disease burden that are the essential first step in setting health planning and service priorities. As we discussed in Chapter 1, this descriptive epidemiology, concerned as it is with ‘person, place and time’, attempts to answer the questions ‘Who?’, ‘What?’, ‘Where?’ and ‘When?’. This can include anything from a description of disease in a single person (a case report) or a special survey conducted to measure the prevalence of a particular health issue in a specific population, to reports from national surveys and data collection systems showing how rates of disease or other health-related factors vary in different geographical areas or over time (time trends).
Although descriptive data may be collected specifically to answer a defined question, they often come from governments, health care providers and statistical agencies that routinely collect vast amounts of information. Summary data – often the various forms of rate which you met in Chapter 2 – can be accessed from published reports and, increasingly, from online databanks. In some cases it is also possible to obtain information from which the rates are calculated at the individual level. These descriptive data are essential to identify health problems and for health planning and, although they cannot usually answer the question ‘Why?’, they may provide the first ideas about causality and thus generate hypotheses that can then be tested in more formal ‘analytic’ studies that we will discuss in Chapter 4. As you will come to see in later chapters, descriptive studies also play a critical and often under-appreciated role in monitoring the effects of large-scale interventions.
In this chapter we will look in more detail at some of the most common types of descriptive data and where they come from. However, before embarking on a data hunt, we first need to decide exactly what it is we want to know, and this can pose a challenge; to make good use of the most relevant descriptive data, it is critical to formulate our question as precisely as possible. If we want to know about youth suicide, are we interested in the suicide rate, the number of hospitalisations for attempted suicide, or the proportion of teenagers who have considered suicide? Mortality data are probably readily available from a number of sources, but the accuracy of the underlying certification of this cause of death may be problematic. Hospital admission data may also be accessible, but might not capture suicide attempts that are dealt with in the emergency room and not admitted. Furthermore, separating individuals from events can be tricky – are a lot of youths making a single suicide attempt each, or are there a smaller number who have made multiple attempts? The resulting policy implications are quite different. In contrast, to find out what proportion of youths have suicidal thoughts we would probably need to conduct a special survey, as this information is unlikely to be captured in routine statistics.
Case reports and case series
The identification of a new or recurring health problem often begins with a case report or case series. These are detailed descriptions, usually by a doctor or group of doctors, of one or more cases of a disease that are unusual for some reason. This might be because the disease has not been seen before or the cases may have occurred either in individuals who would not normally be expected to develop that disease, or in an area where the disease had not previously been reported or was thought to have been controlled. The cases might also be reported in conjunction with a previous exposure to something that, it is speculated, may have caused the disease.
The selective nature of these reports and the limited amount of information they contain mean that they provide little evidence of causality and cannot say much about patterns of disease occurrence. However, they can help identify potential health problems such as the outbreaks of Ebola, severe acute respiratory syndrome (SARS), bird flu and swine flu that the world experienced during the last decade (we will discuss these further in Chapter 13). They may also stimulate interest in an area, leading to more detailed studies, and in this regard some have been seminal in advancing knowledge (Box 3.1). However, if we want to know how big the problem is or even if the occurrence is really anything out of the ordinary, we need more comprehensive information about the frequency of occurrence of the event of interest in the population.
The classic description of a series of infants born with congenital cataracts, some with additional cardiac abnormalities, in Australia in 1941. This led a Sydney doctor to postulate a causal link between a severe epidemic of rubella (German measles) that had occurred six to nine months before the children were born and the subsequent abnormalities (Gregg, 1941). It is now well known that if a woman develops rubella during pregnancy it may affect her unborn baby.
A case report published in the UK in 1961 described the development of a pulmonary embolism in a 40-year-old pre-menopausal woman, five weeks after she had started using an oral contraceptive (OC) to treat endometriosis (Jordan and Anand, 1961). Because pulmonary embolism is rare in women of that age, the authors suggested that it might have been caused by the OC, particularly as it was a novel exposure at that time. A report of one case could not provide conclusive evidence that it was the OC rather than some other characteristic of the patient that led to the embolism – but it did pave the way for more detailed studies. These have consistently shown that there is an association between the use of OCs and the risk of this condition.
A report of a series of five cases of Pneumocystis carinii pneumonia that occurred in young, previously healthy, homosexual men in three Los Angeles hospitals in a six-month period during 1980–81 (CDC, 1981). Until then, this disease had been seen almost exclusively in the elderly, the severely malnourished and those on anti-cancer chemotherapy whose immune systems were suppressed. This cluster of cases in young men suggested that the men were suffering from a previously unknown disease, possibly related to sexual behaviour. We now know this as AIDS.
Vital statistics and mortality data
As you saw in Chapter 2, most of the measures we use in descriptive epidemiology relate the number of events of interest that occurred to the number of people in the population – for example, the number of new cases of HIV per 100,000 people in a given country in a given year. In this section we will look at some of the sources of routine data that provide information about the size of a population and key vital statistics such as birth and death rates before we move on to consider other more specialised sources of data that provide information about other health events. We will note some of their advantages and disadvantages, give examples of the uses to which they can be put, and provide links to some of the most useful sources. Table 3.1 summarises some of the more common mortality and morbidity data collection and reporting systems.
Data collection or reporting system | Source of raw data | Summary data published | Individual-level data sometimes availablea |
---|---|---|---|
Census | Census forms (self-reported); completion required by law | Population estimates, overall and in subgroups | – |
Civil registration or vital statistics systems (national) | Birth, marriage and death certificates; often required by law | Fertility and mortality rates | Date and cause of death (through a National Death Index or Register) |
Health and demographic surveillance systems (regional) | Regular surveys of the same population | Vital statistics and a variety of other data | – |
Disease registries (e.g. cancer registries, injury registers) | Pathology reports, testing laboratories, hospital and medical records; sometimes required by law | Incidence, mortality and survival rates, prevalence | Diagnosis, date, disease characteristics and demographics;b mortality data may also be available |
Notifiable disease systems (e.g. AIDS, SARS, TB, other infectious diseases) | Laboratories, medical practitioners and hospitals | Numbers of cases, incidence rates | Diagnosis, date, disease characteristics and demographics |
Hospital administrative systems | Hospital discharge sheets and databases, medical records | – | Diagnosis, date, medications prescribed, investigations and procedures performed, costs and demographics |
Other administrative health systems, e.g. prescribing and insurance databases | Prescriptions, investigations and medical procedures performed | Health service use and costs | Date, medications prescribed, investigations and procedures performed, costs |
Demographic and Health Surveys (morbidity, risk factors, needs, service use, etc.) | Special surveys, sometimes national, often repeated at regular intervals with a different sample of the population each time | Special reports | De-identified grouped data sometimes available |
Special surveillance systemsc | e.g. ‘sentinel’ primary care practices or disease registers (UK GP data base), MONICA (international CHD) | Varied | Varied |
a With appropriate consent/approvals.
b Basic demographic information such as age, sex, and last known address.
c See Chapter 12 for a more detailed discussion of surveillance systems.
Events such as births, marriages and deaths are collectively known as vital statistics, from the Latin vita meaning life.
Census data
A census is a regular procedure for systematically counting and collecting information about everyone in a given population. It is this emphasis on ‘everyone’ that differentiates it from a survey which would normally only collect data for a sample of people. Early records of national censuses include the biblical account of the census conducted in Israel around the time of Jesus’ birth and the Domesday Book compiled by William the Conqueror in England in 1086. In both cases the goal of the census was to facilitate the collection of taxes. Sweden was the first European country to establish a regular population census in 1749. Census data provide information about the number of people in the population and their age and sex as well as information about where people have come from, where they live, family structure, education and employment. The United Nations recommends that countries conduct a census at least every 10 years and provides guidelines regarding the information that should be collected in a census in order to standardise practice (United Nations, 2008). Census data usually provide the best estimates of the number of people in the population, both overall and by key characteristics such as age, country of birth, area of residence and level of education, and they are usually readily available in summary form through the relevant national statistics office.
Demography, the study of the characteristics of human populations such as population size, growth and distribution and vital statistics such as births and deaths, has many parallels to descriptive epidemiology.
Civil registration systems
While censuses provide valuable snapshots of a population at isolated points in time, they inevitably miss events that occur between census years. Civil registration refers to the ongoing compulsory recording of the occurrence and characteristics of events such as births, marriages and deaths within a population. In most countries, the registration of these events is a legal requirement and the resulting birth, marriage and death certificates are legal documents. So, for example, when someone dies, a medical practitioner must complete a medical certificate that usually includes basic demographic information about the individual, including name, date of birth, ethnicity and gender, as well as the date and cause(s) of death. This goes to the Registrar who registers the death and issues a legal death certificate. Information about the cause of death is coded according to the WHO International Classification of Diseases or ICD (World Health Organization, 2015) and used to compile national mortality statistics. Again, Sweden was one of the first countries to establish a nationwide population register and, as a result, Statistics Sweden has national statistics spanning a period of more than 250 years (e.g. see Figure 3.1). Elsewhere, the General Register Office of England and Wales has records of births, marriages and deaths dating back to 1837 with data from Australia, New Zealand, the USA and Canada available from the late nineteenth century. These data form the basis of many of the mortality-based measures that you met in Chapter 2 and historical information is often made available to individuals for genealogy research. Access to more recent records is tightly controlled but sometimes possible for approved medical research (e.g. see National Death Registers, below). Complete coverage, accuracy and timeliness are critical for quality vital statistics and good population statistics are essential to measure and track health indicators such as the Millennium Development Goals that you met in Chapter 2.
Describe the changes in life expectancy by age and sex over time shown in Figure 3.1 and comment on the patterns.
In the late eighteenth century, average life expectancy in Sweden was only about 35 years for men and women, although then, as now, women could expect to live slightly longer than men. This young average age was a consequence of the very high mortality rates in babies and children at the time and improvements in this area led to much of the large gains in average life expectancy, particularly between 1850 and 1950. However, even in 1800, if an individual survived their childhood years and made it to their 65th birthday they could then expect to live to about 75. Now, most deaths in a country like Sweden occur over the age of 65; thus, someone who reaches that age can still only expect to live until about 85, a much more modest improvement.
National death registers
Recognising the enormous value of the information, particularly the mortality data, collected by their registries of births, marriages and deaths, many countries with a comprehensive civil registration system now also operate a national death register or index to facilitate health research. These electronic registers hold information about the name, date of birth and sex of every individual who has died since the register began, as well as their date, place and cause of death and they allow bona fide researchers conducting scientifically and ethically approved studies to obtain death information for individuals in their studies. In some countries it is also possible to get approval to ‘link’ these data to other health data sets; we will discuss this further in Chapter 4.
Verbal autopsy
In many lower-income countries the vital registration systems are less well-developed than in high-income countries and, although the fact of death is registered, information about cause of death may not be available. In these areas an alternative method used to capture information about causes of death, particularly among children, is the verbal autopsy. These ‘autopsies’ are conducted by a structured interview with the family members about the circumstances of their relative’s death. This information can then be used to classify the cause of death according to defined rules and criteria. For example, until recently up to 40% of the 400,000 deaths each year in Thailand were classified to poorly defined conditions, and there were concerns regarding the accuracy when specific causes were assigned. To obtain more reliable information about the patterns of mortality in Thailand, researchers conducted almost 10,000 verbal autopsies and compared the results to those obtained from the vital registration system. This showed that for some conditions mortality rates were at least double those estimated from vital registration data, while life expectancy was approximately two years lower (Porapakkham et al., 2010). An even larger project, the Million Death Study (MDS, Centre for Global Health Research, 2015) is ongoing in India where, despite the introduction of laws mandating birth and death registration in 1969, some states still have low rates of death registration. The MDS, which is monitoring almost 14 million people in 2.4 million nationally representative households from 1998 to 2014, uses verbal autopsies to assign a probable cause to any deaths that occur. The resulting data help identify areas with excess mortality so that action can be taken to reduce preventable deaths. While initially used primarily in the research setting, there is now a push to incorporate verbal autopsies into the routine death registration process in countries with less-developed vital registration systems, and WHO and other groups are working to develop a standard instrument for verbal autopsy.
Health and demographic surveillance systems
In many sub-Saharan African countries and some countries in Asia, the civil registration and vital statistics systems are incomplete or non-existent. In the absence of a comprehensive national system, Health and Demographic Surveillance Systems (HDSS) have been established to monitor vital events within a defined region. Some of these systems have been in existence for several decades, for example the Niakhar HDSS which was first established in a rural area of Senegal in 1962 and now includes 30 villages with a combined population of approximately 43,000 (Delaunay et al., 2013). As well as collecting standard vital statistics, the HDSS often collect additional information about locally relevant health issues, such as the vaccination status of children and cases of vaccine-preventable diseases or other diseases such as malaria. Unlike the Demographic and Health Surveys that we will discuss below, the key feature of a HDSS is that it follows the same group of people over time.1
In 1998, the International Network for the Demographic Evaluation of Populations and their Health (INDEPTH; www.indepth-network.org2) was established to bring together the existing HDSS sites and encourage new sites to join (Sankoh and Byass, 2012). In 2014 there were 49 member centres from 22 countries including 36 from 15 countries in sub-Saharan Africa. Use of the verbal autopsy is common in the HDSS regions and the INDEPTH Network has been closely involved with WHO in developing standardised forms for this.
Challenges in using mortality Data
As we noted above, death registration is a legal requirement in most countries. The registration of a death therefore establishes the fact that someone has died with virtual certainty. Unfortunately, the information is less reliable when the cause of death is of interest, rather than the simple fact that a death has occurred. This can be a consequence either of misdiagnosis (e.g. if a doctor does not know a person’s full medical history) or of mis-specification on the form. The sample certificate shown in Figure 3.2 shows the challenge of getting the sequence and content right. Look at the instructions on completing the ‘cause of death’ section: it will often not be easy, and those dying at older ages tend to have a number of coexisting diseases. How should the practitioner sequence the diagnoses of an overweight woman who has had diabetes for 20 years and high blood pressure for 10 years and who dies of pneumonia 1 year after suffering a stroke? Such a scenario is not uncommon, so we can be left with considerable uncertainty about the actual cause of death even on inspection of the original form. Indeed, in research studies where people are followed up for mortality, considerable extra effort often needs to be made in collecting clinical and pathology records in order to ensure accuracy in assigning cause of death. This can never be the case for routine vital statistics collections (it is far too expensive), so reports of mortality rates based on death certificates need to be used circumspectly. Generally only a single cause is extracted from the death certificate for each person who has died, that which is thought to be underlying any subsequent conditions. Multiple cause of death coding has recently been introduced in some countries but, while this may alleviate the problem of coding multiple conditions, it introduces another – the question of how to report and use this extra information.
Figure 3.3 shows diabetes mortality rates over time in the USA. What explanations can you think of for the sudden change that occurred between 1948 and 1949? Which do you think is most likely?
We saw in Chapter 1 (Figures 1.7 and 1.8 ) that US death rates for a number of causes have been declining over time, but none as dramatically as seen here in Figure 3.3, where the mortality rate for diabetes appeared to halve between 1948 and 1949 before plateauing at the new level.3 This could be due to a spectacular new treatment (but insulin is still the mainstay, as it was in the 1940s), or to fewer cases of diabetes occurring (but no good means of preventing diabetes had been identified). So we are forced to consider artefacts in the data as a possible explanation. Here the dramatic shift in diabetes mortality was due to a coding change in the International Classification of Diseases (ICD), such that, when diabetes and coronary heart disease occurred together, diabetes was no longer listed as the underlying cause.
Not surprisingly, some diseases are recorded more accurately on death certificates than others. One that is rapidly fatal is likely to be clear-cut, whereas with a long-term disease there is more chance of another illness occurring and being recorded on the death certificate instead. For example, many people like the woman described above would not have diabetes recorded anywhere on their death certificates. Similarly, diseases that are easily diagnosed tend to be more accurately recorded than those that require more complex diagnostic procedures; in the absence of an autopsy (and they are now uncommon), death from a motor vehicle accident would clearly be easier to recognise than one from pancreatic cancer. In an Australian study it was found that the overall accuracy of death certificates was only 77% compared with autopsy records, although cancers were accurately reported in 90% of cases (Maclaine et al., 1992). A similarly high concordance for cancers was found in a UK study linking death certificates and hospital records, but chronic diseases such as diabetes and hypertension were correctly listed as an underlying cause only about half of the time (Goldacre, 1993). More recent studies have continued to report considerable levels of discrepancy between the cause of death listed on a death certificate and that assigned based on an independent review of the medical records (Rampatige et al., 2014).
Cause of death coding is particularly challenging for suicide as it may be hard to differentiate between intentional self-harm and accident. While the reported number of suicides has fallen in Australia since 1997, the numbers of deaths coded as accidents involving asphyxia or firearms, methods suggestive of suicide, increased. Overall, the authors suggested suicide cases were undercounted by between 11% and 16% (De Leo et al., 2010).