html xmlns=”http://www.w3.org/1999/xhtml” xmlns:mml=”http://www.w3.org/1998/Math/MathML” xmlns:epub=”http://www.idpf.org/2007/ops”>
The validity of a study depends on many things, from hypotheses that are based on sound science, to a study design that will test these hypotheses without bias, to appropriate, well-defined methods for measuring the study variables. It also depends on the quality of the data that are collected and analyzed. The results of a study are only as good as the information that goes into it, and even the most advanced technology and statistical methods will not make up for poor quality data. During the implementation phase of the study, data quality must be maintained by use of appropriate methods to collect and record the study data so that it is accurate and complete. These procedures should be developed when designing the study. Resources for data collection must be included in the study budget and time must be allowed in the study schedule to ensure that the data being collected are of high quality.
29.1 Creating a Study ID
Even before you collect any data, you need to develop a method of assigning a Study ID to the participant. The Study ID is a unique number to identify the specific participant in the study. A Study ID is meaningless except as a link to the person’s identify for this specific study. To preserve the participant’s confidentiality, this should be a unique number that is not related to any number associated with the participant, such as age, date of birth, or social security number. This number would not be used in other contexts (such as in the health care medical record number, encounter number, etc.) to identify the participant.
The Study ID should be used for all data collection, even when the information is collected as part of the screening process and the individual may never become part of the study. If some of the measurements are to be done by raters outside the study group, such as blood tests and interpreting MRI scans, the participant should be identified only by the Study ID. Sometimes you may be using information from a written report that was generated outside your group, not as part of your protocol. In this case someone would have to add the Study ID to the record and remove or black out the participant’s name and any other personal identifiers.
The link between Study ID and the person’s identity should be kept in an encrypted computer file or a locked cabinet or drawer with limited access, following standards required for protecting confidential personal identifiers at your institution. The same procedures should be used for recording and keeping other identifying information, such as social security number or address. Computer records of personal information should be created by a single person and be encrypted and password protected and follow whatever procedures are required at your facility. Any subsequent information, should it be paper copy or a computer file, should exclude all personal information and use only the Study ID to identify the participant. If you need to keep a source document with the participant’s name, such as a consent form, then it too must be kept in a locked cabinet following the procedures specified at your institution.
29.2 Methods for Data Collection
Sometimes the data you need already exists and will be accessible either in written format, as in a chart, or on a computer file. It is important to ensure that you will have access to these data and what the terms of this access are when you write the protocol. One requirement will almost always be to ensure that you will protect the confidentiality of the participant. This is already discussed in detail in Section 29.1. If the data already exist, the major issues involve data abstraction (if from paper records) and record verification, which are issues we focus on in Chapter 30. Here we focus on the collection of primary data.
29.2.1 Questionnaires
The general term “instrument” for data collection includes both actual questionnaires, computer input by the participant, and structured interviews by an interviewer. Instruments collect data in a structured manner, usually with precoded response options for most questions. When a participant directly enters data into a computer, the data entry format will still be based on a paper form, even if data are never collected on paper.
Information collected via questionnaires may be completed by the study participant or by a study team member interacting with the participant. When a team member is involved, we refer to that person as an “interviewer.” Historically, the questionnaire was a paper form completed in pencil and then entered into the computer or a typed record. In many studies, paper forms are still used. Forms may also be completed on a touch screen or another electronic device including a smartphone. Using electronic devices requires additional resources for the devices themselves (especially if provided to the participant) and software to collect the information. There already exists some public domain software for this for different computer platforms, and we expect that this will become even more widely used on tablets and smartphones. For some participants, computer interaction is a bonus because they like or are intrigued by computers, but some people are intimidated by them and may require guidance and some instruction or training. The advantage of such direct entry is that the data entered by the participant requires no data transcription into a computer file. The disadvantage of this approach is that there is no possibility of correcting or validating the data later, so that inconsistencies cannot be resolved. For this reason, the software used should check the data while it is being entered to ensure that the data are valid and, as far as possible, consistent across the different questions being collected during the visit. If possible, this data validation would include consistency with other responses from the participant over time. Data checking is discussed in Section 30.5.
When a new patient shows up at a medical office, he is generally asked to complete a questionnaire with his demographic data and some medical history including current medications. Often other questionnaires about symptoms and problems are collected at the same time.
The information collected in Example 29A is mainly objective or measurable and limited to the questions on the form, although there may be multiple responses to single questions, such as current medications. You may use an instrument like this to collect basic information when you are evaluating a potential participant for inclusion in the study. Very often, you can use the entry form in your institution and just add the extra information you need for this first step. This has the advantage of saving time. With an instrument that has been in use for a long time, inconsistencies should have been corrected, except for your additions.
Other instruments may collect objective or subjective responses, or both. For subjective data, additional precautions are needed to ensure that the interviewer collects as unbiased data as possible, which is discussed extensively in Section 29.2.2. The study may, and almost always should, use data collection methods (questionnaires, structured clinical evaluations) that have been developed and validated by others if such data collection instruments exist. For example, the Hamilton scale for depression, the Beck Depression Inventory, or the Center for Epidemiological Studies Depression Scale may be used to measure the extent of depression and related symptoms. There are a large number of existing validated scales to measure many psychological states or traits. We recommend that you use a published and generally accepted scale for measuring subjective information.
If a question has several choices but only one can be selected, this must be clearly and prominently stated. For paper forms, we believe it is better to ask the participant to circle a choice than to write the number in a box next to the question. We have found that even when asked to enter the number of the choice, participants will often circle it anyway and not always enter it; moreover the handwritten entered number can be hard to read, and potentially the circled answer and the number entered may be different. When this happens, and if there is no way to determine which response is correct, this item must be considered missing data. This problem should not occur when direct data collection using a computer is done, as usually the option would be selected from a drop-down list.
In designing a questionnaire you must remember that it is intended both to pose questions and record responses, and to be used as input to a database. Therefore, it must be designed with both in mind: easy for respondents to understand and complete, and structured for ease of data entry, especially when entered directly by the participant, or by scanning a paper document.
The characteristics of the study participants should be considered. If the questionnaire is too long for the participants, they may simply not complete it or answer without thinking. Although there is no right answer for how many questions is too many, our basic rule of thumb is “when in doubt, throw it out.” A question should be included only if you are sure that you need it. It is better to be missing an interesting secondary variable then to be missing the most important data for the study. Participants may also be reluctant to come for follow-up visits if repeating the questionnaire is part of the protocol, and losing participants completely is even worse than missing some potentially interesting information.
The level of the language in the questionnaire should be geared to the participants. If the study participants will be drawn from a population with a low education level, then the wording of questions should be geared to that level. Sometimes a pictorial scale, in which the participant just marks where she feels she is on a line from none to a maximum, may be easier for the participant. For small children and possibly some adults, a more elaborate pictorial scale may be used.
The Wong-Baker Faces Scale for assessing pain in children consists of a series of six cartoon faces with expressions ranging from a happy face (no hurt) to a crying, frowning face (hurts worst). Children can pick which face most clearly reflects how they feel. A numeric scale from 0 to 10 by twos is used to code the child’s response. For adults, typically either a 10-point scale or a 10-cm line, called a visual analog scale, is used as one measure of pain.