Samples

CHAPTER 4 Samples




Populations may be based on theoretical concepts but the individual members that make up a population are very real. We need a way to concretely study a population that has been identified. As we have learned, populations can be very large; some have so many members that we cannot count them all. They are also very fluid, with a continuous flux of members that enter and exit. It is often impossible to study each individual member but this does not mean the endeavor of studying populations is futile. We use a method that relies on sampling.


A sample is a group of individuals that represents the population. If we choose the sample correctly, the results we get in the sample members will be very close to the results we would have gotten had we studied the whole population. The attribute being measured in the sample will approximate the value of that attribute in the entire population.


For instance, if we wanted to know the distribution of different religions in the United States, we could poll every person to find out their religious affiliation. This is, of course, an impossible task. If we choose a sample of people correctly, however, the proportions of different religions that we see in the sample would accurately reflect the proportions of different religions in the population.



SAMPLE SELECTION


Ideally, we choose a sample so there is equal representation of the individuals that comprise the population. That is to say, every member of the population has an equal chance of being chosen. This is called random sampling. The law of independence states that the choice of one member does not influence the chance of choosing any other. When sampling is done following the above rules, the laws of chance apply so that when we study the sample we know how close our observation will be to the real result we would observe had we studied the entire population. The numbers would not be exactly the same, however. For instance, the proportion of Baptists may be 10% in our sample whereas the real value in the population may be 12%. (These data are used as an example and are not based on actual studies.)



If each individual in a population theoretically has an equal chance of being chosen for the sample, how many possible sample combinations of a given size N are there? If the population is large (as most are), then there is also an incredibly large number of possible combinations that could comprise the sample. For instance, in a smaller population of size 20, if we wanted to study a sample of size 5, there are 15,504 possible combinations! There is a formula to calculate this,* but the lesson to appreciate here is not how many possible different combinations of individuals could be chosen as the sample, but the fact that in a random, independent process each combination has an equally likely chance of being the sample.


It may seem that we are compromising our scientific methods by employing these sampling shortcuts to make life easy. When we use samples, we do not have access to the complete collection of information that we would ideally use if we studied the entire population, and we will not get an exact answer. We understand, however, the need to reach conclusions based on incomplete information. It is thus quite acceptable to study a population by using a sample as long as we accept a modicum of uncertainty in interpreting the results. We realize that by using samples we are actually estimating the result we would have gotten had we studied the entire population.


This is a vital point in biostatistics. It is important to consciously distinguish between populations and the samples that represent them. We refer to the attributes that characterize a population as parameters, in comparison to those observed in the sample, which are called statistics. In practice, we use sample statistics to estimate population parameters. In the above example regarding religion, the sample statistic of 10% Baptist is an estimate of the true population parameter of 12% Baptist (again, these are fictitious data).




ESTIMATES AND UNCERTAINTY


Since the sample statistic provides us with an estimate of the population parameter, it could be off in either direction. That is to say, the statistic could be an overestimation, an underestimation, or right on target. We don’t know with certainty which of the above scenarios is right, but we are willing to accept a margin of error. When sample statistics are reported, they are often reported with their value plus or minus a certain amount. This range of values is called the confidence interval. These are like comfort zones that attempt to encompass the true population parameter. A narrow confidence interval means that our sample statistic is very likely to be a pretty close estimation of the population parameter.



The confidence interval is a range of values derived from the sample that has a given probability of encompassing the true value we seek. It reflects the margin of error that inherently goes along when we use sample statistics to estimate the true value of the population parameter. We will see how it is calculated but, for now, the important thing to know is that there is a given probability that the boundaries of the confidence interval contain the true population parameter.


It is customary to use a 5% degree of uncertainty in statistics. If the experiment were repeated over and over using different random samples of the same size, we would get a variety of results. There would be different estimates of the true parameter and each would have its own confidence interval.


Figure 4-1

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jun 18, 2016 | Posted by in BIOCHEMISTRY | Comments Off on Samples

Full access? Get Clinical Tree

Get Clinical Tree app for offline access