Data Collection in Healthcare Epidemiology
Stephen B. Kritchevsky
Ronald I. Shorr
Data for healthcare epidemiology come from three sources: direct ascertainment of information from subjects using questionnaires or direct observation; review of medical records; and electronic sources, such as billing records, laboratory records, and medication administration records. Although we provide an overview of each of these data sources, we emphasize the development of questionnaires. After data from any of these sources are collected, they are entered and organized (usually in a database) and analyzed, usually using a statistical package. We offer suggestions on the preparation and formatting of data to facilitate the transfer from data collection to data analysis.
QUESTIONNAIRES
Questionnaire Development
Questionnaires are often the most labor-intensive form of data collection but are required in situations where surveillance using electronic data sources or medical record review is inadequate.
After deciding what data need to be collected, the investigator has to decide how to collect it. This means developing the form(s) to guide data collection, identifying the data sources (e.g., individuals, proxies, medical records, direct observation), identifying data gatherers, and deciding on a mode of data collection. Decisions in each of these areas affect the others, and investigators should understand the trade-offs between them in order to make good decisions when planning an investigation.
Writing the Questions The first step in developing a data collection form is writing the questions to elicit the information required by the study. Good questions are clear and unambiguous and match the verbal skills of prospective participants. Poorly worded questions result in answers that are unreliable or uninterpretable. Although writing good questions is something of an art, there are several common problems that can result in bad questions.
Choosing Verbiage If respondents do not understand the words used in questions, they will not be able to answer them (or worse, they will answer them anyway). In general, the words used in the questions should be ones used by respondents in their usual conversation (e.g., use “help” rather than “assist” and “enough” rather than “sufficient”). Avoid medical jargon and abbreviations that may not be commonly understood. Be aware of regional or cultural differences in the meanings of words and the names of diseases (e.g., diabetes may be called “the sugar” by some respondents). Avoid using loaded words (i.e., those carrying excessively negative connotations).
Consider these questions:
Should smoking be banned in the hospital?
Should smoking be allowed in the hospital?
The word banned is loaded, and some answers to the question may be a reaction to the word itself rather than to the content of the question.
Ambiguous Questions One of the most difficult tasks in writing a question is asking it in such a way that the respondent has the same concept in mind when answering it as the investigator did when asking it. The investigator wanting to identify current cigarette smokers might ask: “Do you smoke?” Cigar and pipe smokers will answer this affirmatively, contrary to the investigator’s intention. There may also be those who have very recently quit smoking (and may soon begin the habit again). They would answer no, but the investigator might want them classified as smokers for the purposes of the study. “Have you smoked two or more packs of cigarettes in the past 2 months?” is a better version of the question. Cigarettes are specifically named, and the amount of consumption and the period are specified. In some cases, visual aids such as pictures of products or models may be helpful in orienting the respondent.
In an outbreak investigation of central line infections, hospital personnel might be asked, “Did you see patients on [a particular ward]?” This question has two ambiguous referents. Does “see” mean “care for,” or is the question intended to detect less formal contact as well? Also, does the investigator mean if the respondent has ever seen patients on a ward, or just during the epidemic period? A better phrasing might be, “Did you provide care for any patients on [the ward] since March of this year?”
Causes must precede effects in time. Therefore, when assessing the relationship between a behavior that may change over time and disease occurrence, it is important that the questions refer to the period prior to the onset of disease symptoms. Failure to make this clear can lead to
biased results if the behavior changes in the face of symptoms. Both the failure to elicit exposure information from the appropriate period and the inclusion of irrelevant exposure information can lead to bias.
biased results if the behavior changes in the face of symptoms. Both the failure to elicit exposure information from the appropriate period and the inclusion of irrelevant exposure information can lead to bias.
Hypotheticals Avoid hypothetical questions. Consider a question that might be asked of nurses in an infection control project: “Is it important to wear gloves when placing an IV?” The question is problematic, because it may refer to either what is important in a hypothetical sense or what is personally important to the respondent. The responses will be a mixture of these two interpretations, with the investigator having no way of distinguishing the two.
Asking More than One Question Each question should try to elicit only one piece of information. Consider the question, “Have you experienced nausea, vomiting, night sweats, or loss of appetite?” This set of symptoms may be useful in arriving at a diagnosis, but in an epidemiologic investigation, it may be important to document each symptom individually for later use in applying a consistent case definition. Furthermore, respondents may focus on the last symptom named. A respondent may have had night sweats but answer, “No, my appetite’s fine.” A checklist is often used to systematically identify symptoms of potential interest.
Assumptions in the Questions Answers to questions that make tacit assumptions can be difficult to interpret. Consider the example from Kelsey et al. (1): “Do you bring up phlegm when you cough?” The question assumes that a cough is present. A negative response might mean that no phlegm is produced or that the respondent does not have a cough.
Vague Questions and Answers Avoid the use of words such as “regularly,” “frequently,” and “often,” both in questions and as response options. Different responders will interpret these words differently. The potential for qualitative responses to introduce unwanted variability was vividly demonstrated by Bryant and Norman (2), who asked 16 physicians to assign a numerical probability to qualitative adjectives such as “probable,” “normally,” and “always.” The numerical probability assigned to the word “probable” ranged from 30% to 95%. The probability assigned to “normally” ranged from 40% to 100%, and that to “always” ranged from 70% to 100%. Whenever possible, try to elicit a quantitative response.
Threatening Questions Care needs to be taken when asking questions of a somewhat embarrassing nature. Embarrassing questions concern respondent behaviors that may be illegal or socially undesirable or concern areas of life that may threaten the respondent’s self-esteem. Research has indicated that the self-reported frequency of potentially embarrassing behaviors can be increased if long, open-ended questions are asked. Open-ended questions are those to which categories of set responses are not supplied by the investigator (as opposed to closed questions, in which the respondents select answers from a list of supplied alternatives). Bradburn and Sudman (3) contrasted various question styles ranging from short questions with fixed response categories to very long questions with open-ended responses. They also contrasted using the respondent’s familiar term for the behavior versus a term supplied by an interviewer. Respondents were randomized in a 2 × 2 × 2 factorial design into one of eight different question formats (i.e., long vs. short response format, open vs. closed response format, and familiar vs. standard wording). One short question (with standard wording and closed response format) read: “In the past year, how often did you become intoxicated while drinking any kind of beverage?” The respondents picked a response from a list of eight alternatives. The long form (with respondent’s wording and open response format) read:
Sometimes people drink a little too much beer, wine, or whiskey so that they act different from usual. What word do you think we should use to describe people when they get that way, so that you will know what we mean and feel comfortable talking about it?
Occasionally, people drink on an empty stomach or drink a little too much and become [respondent’s word]. In the past year, how often have you become [respondent’s word] while drinking any kind of alcoholic beverage?
The respondents were given no response categories but asked to supply their own best estimate. The question format did not seem to affect the percentage of people reporting that they had engaged in an activity. It did, however, strongly influence the self-reported frequency of the activity. Those responding to long questions with open-ended responses using familiar terms reported significantly higher frequencies of the behavior of interest. The mean annual consumption of cans of beer calculated using responses from the long, open format with familiar wording was 320 cans; that calculated using the short, closed format with standard wording was 131 cans. Large differences in responses attributable to question format were seen for questions dealing with the frequency of sexual activity as well. Most of the difference was attributable to the use of an open-ended response format and longer questions. The effect of using familiar wording was weaker but was associated with consistently higher reported frequencies of potentially embarrassing behaviors.
Asking Questions about Events in the Past Asking individuals about the occurrence and/or frequency of specific events in the past is a special measurement challenge. An investigator whose study depends on the validity of human recall must be particularly attuned to the shortcomings of human memory. Respondents to epidemiologic questionnaires are often asked to perform one of three memory tasks: (a) recall whether a particular event occurred to the individual, (b) recall when the event occurred, or (c) recall how frequently it occurred. Research has shown that it takes some time for people to access their memory for the occurrence of events. Longer questions seem to be useful in giving respondents more time to recall events and may increase the percentage of events recalled (4). Nevertheless, people frequently forget specific events in their past. As a rule, an event is harder to remember if (a) it occurred a long time ago, (b) it is one of a series of similar events, or (c) the respondent attaches little significance to it (4).
People also frequently misplace remembered events in time. There is a tendency to judge events that are harder to recall as less recent and, conversely, there is a tendency to date events about which a lot of detail is recalled as more recent. This problem is termed “telescoping” in survey research (4). Consider the question: “Have you been to a doctor in the past 12 months?” Respondents frequently answer affirmatively if the visit was 15 months ago. People may remember the event better than its date and import the event into the time interval of interest.
Aspects of the design and administration of questionnaires can improve both remembering and dating events. Questions starting with recent events and working backward in time can improve recall. Also, providing date cues can help. One common technique is to provide the respondent with a calendar. Before asking about events of interest, the respondent identifies personally relevant dates such as birthdays and holidays. Then, the respondent is walked back through time and assigns dates to the occurrence of the events of interest with respect to the personal landmarks. As reported by Means et al. (5), a sample of George Washington University Health Plan enrollees was asked to try to recall all health plan visits in the past year. All study participants had at least four visits in the past year. Before using the landmarking technique, participants were able to recall 41% of the health plan visits recorded in the medical record. After the landmarking, 63% of health plan visits were remembered. In a separate study group using only the landmarking technique, 57% of visits were recalled. The use of landmarking also led to an improvement in dating accuracy.
The frequency of a behavior is often of epidemiologic interest insofar as it may serve to quantify the amount and/or rate of an exposure. Humans tend to rely on two strategies for recalling the frequency of events (4). The first is simply trying to remember every instance of a behavior over a period. The second is referred to as the event decomposition method. People first estimate a rate at which a behavior is performed and then apply it over the period of interest. For example, if a respondent is asked how many times she went to a restaurant in the past 2 months, she may figure that she goes to a restaurant twice a week, and therefore, she ate at a restaurant eight times in 2 months. In general, the decomposition method seems to lead to more accurate estimates than the recall of individual events. Investigators planning studies to measure the frequency of exposure may wish to structure questionnaires to explicitly elicit these frequency estimates.
PRETESTING THE DATA COLLECTION INSTRUMENT
Prior to full-scale data collection, it is useful to pretest all study procedures. This includes pretesting any data collection documents. The pretest may include a number of steps. An expert in the field should review the data collection forms. This expert should be able to identify any content omissions. The review by nonexpert colleagues can be useful to give overall impressions, to identify troublesome questions, and to determine if the skip patterns flow logically. In the next phase of pretesting, test the data collection procedures under study conditions on a number of potential study subjects (frequently 20-30). In this phase of pretesting, one can identify questions that don’t work and whether the needed information is indeed available from the intended data source. If the data collection form is being used to elicit information from respondents, debrief your pretest subjects to discover what they had in mind while they were answering and how some questions might be asked better. A pretest also provides the opportunity to ascertain preferences among varying question wordings and answer formats.
Schlesselman (6) describes an example indicative of the kind of problems that a pretest can identify. In a study involving analgesic use, the following series of questions were tested:
Q. HAVE YOU EVER HAD FREQUENT HEADACHES?
Yes
No
Q. HAVE YOU EVER HAD VERY SEVERE HEADACHES?
Yes
No
Q. HAVE YOU HAD HEADACHES ONCE A WEEK OR MORE DURING THE PAST MONTH?
Yes
No
The third question was used as a filter for a series of questions relating to analgesic use for headache. The purpose of the questions was to identify individuals who were likely to be frequent analgesics users for headache relief. Schlesselman states, “The third question was included under the assumption that recall is better for the most recent period, and that a person with a history of recurrent headaches in the past would retain this pattern in the present.” In pretesting, however, it was found that there were many patients who had frequent headaches but for whom the past month was atypical. Thus, contrary to the intention of the investigator, a number of study participants were skipping the series of headache-analgesic questions. In light of the pretest, the third question was modified to:
Q. HAVE YOU EVER HAD HEADACHES ONCE A WEEK OR MORE FOR AT LEAST ONE MONTH?
OPTIONS FOR ADMINISTRATION
The primary options for administering a questionnaire are respondent self-administered and interviewer-administered. Self-administered questionnaires are usually either given in a supervised setting or mailed to the respondent; however, there are an increasing number of questionnaires that are being administered using e-mail or the Internet. Intervieweradministered questionnaires can be administered either in person or over the phone. Each method has its advantages and drawbacks. Self-administered questionnaires are usually less expensive to administer but need to be simpler and shorter than interviewer-administered questionnaires. Also, when a portion of the study population is of low literacy, the use of self-administered forms results in unacceptable losses of information. Internet-based surveys can
be more complex but require that respondents have access to and are comfortable with computers and the Internet. Interviewer-administered questionnaires can be more complicated and longer, and the literacy of the respondent is not an issue. Also, the use of an interviewer permits the probing of the respondent for clarifications and elaborations. The major drawback of using an interviewer is the cost.
be more complex but require that respondents have access to and are comfortable with computers and the Internet. Interviewer-administered questionnaires can be more complicated and longer, and the literacy of the respondent is not an issue. Also, the use of an interviewer permits the probing of the respondent for clarifications and elaborations. The major drawback of using an interviewer is the cost.
Differing modes of administration have their advantages and disadvantages. Mailed questionnaires are relatively inexpensive to administer, but response rates tend to be low (typically 40-60%). Response rates can be increased by a number of techniques such as hand-addressing the envelopes, using certified mail, using postage stamps instead of metered mail, and rewarding the respondent. Collecting data over the phone is more expensive than by mail, but the response rates are higher (frequently 75-85%). Completion rates for telephone interviews can be increased by sending an introductory letter to the home introducing the study. Using the phone as the sole mode of contact may introduce subtle biases into a study. The portion of the study population that does not own a phone is systematically different from the portion that does. Also, the ability to contact certain segments of a population may differ. For example, young, single, smoking males are harder to contact by phone than some other segments of the population. Internet surveys appear to be a reasonable substitute for mailed surveys given that respondents have Internet access. Initial response rates can be low but can be increased with reminder letters/e-mails (7).
Face-to-face interviews have the highest completion rates (up to 90%), and they are also the most expensive to conduct. In face-to-face situations, visual aids and more elaborate questioning techniques can be used, providing the opportunity to improve the quality of the collected data.
MEDICAL RECORDS
Collecting data from recorded information is a part of nearly all epidemiologic studies conducted in a hospital setting. Recorded data sources include diagnostic reports, physician notes, prescription records, and culture reports. In addition to routinely collected medical data, administratively collected data are also available from billing records, insurance claim files, etc. The advantages of recorded data are clear: they provide a concurrent source of information concerning the study subject’s medical experience. However, the limitations of routinely recorded data should also be borne in mind. Data are put in the medical record by a number of different individuals who are not standardized in their recording habits, and they certainly do not record information with a particular epidemiologic study in mind. Two studies illustrate the problems with the medical record as a tool for epidemiologic research.