Again, the confidence interval is a range of likely values for the difference in means.
Since the interval contains zero no difference , we do not have sufficient evidence to conclude that there is a difference. The previous section dealt with confidence intervals for the difference in means between two independent groups. There is an alternative study design in which two comparison groups are dependent, matched or paired. Consider the following scenarios:.
A goal of these studies might be to compare the mean scores measured before and after the intervention, or to compare the mean scores obtained with the two conditions in a crossover study. Yet another scenario is one in which matched samples are used. For example, we might be interested in the difference in an outcome between twins or between siblings. Once again we have two samples, and the goal is to compare the two means.
However, the samples are related or dependent. In the first scenario, before and after measurements are taken in the same individual. In the last scenario, measures are taken in pairs of individuals from the same family. When the samples are dependent, we cannot use the techniques in the previous section to compare means. Because the samples are dependent, statistical techniques that account for the dependency must be used.
These techniques focus on difference scores i. This distinction between independent and dependent samples emphasizes the importance of appropriately identifying the unit of analysis, i.
Again, the first step is to compute descriptive statistics. We compute the sample size which in this case is the number of distinct participants or distinct pairs , the mean and standard deviation of the difference scores , and we denote these summary statistics as n, d and s d , respectively. The appropriate formula for the confidence interval for the mean difference depends on the sample size.
The formulas are shown in Table 6. When samples are matched or paired, difference scores are computed for each participant or between members of a matched pair, and "n" is the number of participants or pairs, is the mean of the difference scores, and S d is the standard deviation of the difference scores.
In the Framingham Offspring Study, participants attend clinical examinations approximately every four years. Suppose we want to compare systolic blood pressures between examinations i. Since the data in the two samples examination 6 and 7 are matched, we compute difference scores by subtracting the blood pressure measured at examination 7 from that measured at examination 6 or vice versa.
Notice that several participants' systolic blood pressures decreased over 4 years e. We now estimate the mean difference in blood pressures over 4 years. This is similar to a one sample problem with a continuous outcome except that we are now using the difference scores. The calculations are shown below. Difference - Mean Difference. Difference - Mean Difference 2.
The null or no effect value of the CI for the mean difference is zero. Crossover trials are a special type of randomized trial in which each subject receives both of the two treatments e. Participants are usually randomly assigned to receive their first treatment and then the other treatment. In many cases there is a "wash-out period" between the two treatments. Outcomes are measured after each treatment in each participant. When the outcome is continuous, the assessment of a treatment effect in a crossover trial is performed using the techniques described here.
A crossover trial is conducted to evaluate the effectiveness of a new drug designed to reduce symptoms of depression in adults over 65 years of age following a stroke. Symptoms of depression are measured on a scale of with higher scores indicative of more frequent and severe symptoms of depression. Patients who suffered a stroke were eligible for the trial. The trial was run as a crossover trial in which each patient received both the new drug and a placebo. Patients were blind to the treatment assignment and the order of treatments e. After each treatment, depressive symptoms were measured in each patient.
The difference in depressive symptoms was measured in each patient by subtracting the depressive symptom score after taking the placebo from the depressive symptom score after taking the new drug. A total of participants completed the trial and the data are summarized below. The mean difference in the sample is Since the sample size is large, we can use the formula that employs the Z-score. Because we computed the differences by subtracting the scores after taking the placebo from the scores after taking the new drug and because higher scores are indicative of worse or more severe depressive symptoms, negative differences reflect improvement i.
It is common to compare two independent groups with respect to the presence or absence of a dichotomous characteristic or attribute, e. When the outcome is dichotomous, the analysis involves comparing the proportions of successes between the two groups. There are several ways of comparing proportions in two independent groups. Generally the reference group e. The risk ratio is a good measure of the strength of an effect, while the risk difference is a better measure of the public health impact, because it compares the difference in absolute risk and, therefore provides an indication of how many people might benefit from an intervention.
An odds ratio is the measure of association used in case-control studies. It is the ratio of the odds or disease in those with a risk factor compared to the odds of disease in those without the risk factor. When the outcome of interest is relatively uncommon e. The odds are defined as the ratio of the number of successes to the number of failures.
All of these measures risk difference, risk ratio, odds ratio are used as measures of association by epidemiologists, and these three measures are considered in more detail in the module on Measures of Association in the core course in epidemiology. Confidence interval estimates for the risk difference, the relative risk and the odds ratio are described below. A risk difference RD or prevalence difference is a difference in proportions e. The point estimate is the difference in sample proportions, as shown by the following equation:.
The sample proportions are computed by taking the ratio of the number of "successes" or health events, x to the sample size n in each group:. The formula for the confidence interval for the difference in proportions, or the risk difference, is as follows:.
Note that this formula is appropriate for large samples at least 5 successes and at least 5 failures in each sample. If there are fewer than 5 successes events of interest or failures non-events in either comparison group, then exact methods must be used to estimate the difference in population proportions. The following table contains data on prevalent cardiovascular disease CVD among participants who were currently non-smokers and those who were current smokers at the time of the fifth examination in the Framingham Offspring Study.
When constructing confidence intervals for the risk difference, the convention is to call the exposed or treated group 1 and the unexposed or untreated group 2. Here smoking status defines the comparison groups, and we will call the current smokers group 1 and the non-smokers group 2.
A confidence interval for the difference in prevalent CVD or prevalence difference between smokers and non-smokers is given below. In this example, we have far more than 5 successes cases of prevalent CVD and failures persons free of CVD in each comparison group, so the following formula can be used:. The null value for the risk difference is zero.
A randomized trial is conducted among subjects to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently used the "standard of care". Patients are randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery. The patients are blind to the treatment assignment. Before receiving the assigned treatment, patients are asked to rate their pain on a scale of with high scores indicative of more pain.
Each patient is then given the assigned treatment and after 30 minutes is again asked to rate their pain on the same scale. The primary outcome is a reduction in pain of 3 or more scale points defined by clinicians as a clinically meaningful reduction. The risk difference quantifies the absolute difference in risk or prevalence, whereas the relative risk is, as the name indicates, a relative measure. Both measures are useful, but they give different perspectives on the information.
By convention we typically regard the unexposed or least exposed group as the comparison group, and the proportion of successes or the risk for the unexposed comparison group is the denominator for the ratio. The relative risk is a ratio and does not follow a normal distribution, regardless of the sample sizes in the comparison groups. However, the natural log Ln of the sample RR, is approximately normally distributed and is used to produce the confidence interval for the relative risk.
Therefore, computing the confidence interval for a risk ratio is a two step procedure. First, a confidence interval is generated for Ln RR , and then the antilog of the upper and lower limits of the confidence interval for Ln RR are computed to give the upper and lower limits of the confidence interval for the RR.
Note that the null value of the confidence interval for the relative risk is one. The outcome of interest was all-cause mortality. Those assigned to the treatment group exercised 3 times a week for 8 weeks, then twice a week for 1 year. Exercise training was associated with lower mortality 9 versus 20 for those with training versus those without. Therefore, exercisers had 0. In order to generate the confidence interval for the risk, we take the antilog exp of the lower and upper limits:. The null value is 1. Consider again the randomized trial that evaluated the effectiveness of a newly developed pain reliever for patients following joint replacement surgery.
Using the data in the table below, compute the point estimate for the relative risk for achieving pain relief, comparing those receiving the new drug to those receiving the standard pain reliever.
In case-control studies it is not possible to estimate a relative risk, because the denominators of the exposure groups are not known with a case-control sampling strategy. Nevertheless, one can compute an odds ratio, which is a similar relative measure of effect. Consider the following hypothetical study of the association between pesticide exposure and breast cancer in a population of 6, people. If data were available on all subjects in the population the the distribution of disease and exposure might look like this:.
If we had such data on all subjects, we would know the total number of exposed and non-exposed subjects, and within each exposure group we would know the number of diseased and non-disease people, so we could calculate the risk ratio. However, suppose the investigators planned to determine exposure status by having blood samples analyzed for DDT concentrations, but they only had enough funding for a small pilot study with about 80 subjects in total.
The problem, of course, is that the outcome is rare, and if they took a random sample of 80 subjects, there might not be any diseased people in the sample. To get around this problem, case-control studies use an alternative sampling strategy: the investigators find an adequate sample of cases from the source population, and determine the distribution of exposure among these "cases".
The investigators then take a sample of non-diseased people in order to estimate the exposure distribution in the total population. As a result, in the hypothetical scenario for DDT and breast cancer the investigators might try to enroll all of the available cases and 67 non-diseased subjects, i. After the blood samples were analyzed, the results might look like this:. With this sampling approach we can no longer compute the probability of disease in each exposure group, because we just took a sample of the non-diseased subjects, so we no longer have the denominators in the last column.
In other words, we don't know the exposure distribution for the entire source population. However, the small control sample of non-diseased subjects gives us a way to estimate the exposure distribution in the source population. So, we can't compute the probability of disease in each exposure group, but we can compute the odds of disease in the exposed subjects and the odds of disease in the unexposed subjects.
The probability that an event will occur is the fraction of times you expect to see that event in many trials. Probabilities always range between 0 and 1. The odds are defined as the probability that the event will occur divided by the probability that the event will not occur. If the probability of an event occurring is Y, then the probability of the event not occurring is 1-Y. Example: If the probability of an event is 0. This could be expressed as follows:. With the case-control design we cannot compute the probability of disease in each of the exposure groups; therefore, we cannot compute the relative risk.
However, we can compute the odds of disease in each of the exposure groups, and we can compare these by computing the odds ratio. In the hypothetical pesticide study the odds ratio is. Notice that this odds ratio is very close to the RR that would have been obtained if the entire source population had been analyzed. The explanation for this is that if the outcome being studied is fairly uncommon, then the odds of disease in an exposure group will be similar to the probability of disease in the exposure group.
Consequently, the odds ratio provides a relative measure of effect for case-control studies, and it provides an estimate of the risk ratio in the source population, provided that the outcome of interest is uncommon. We emphasized that in case-control studies the only measure of association that can be calculated is the odds ratio.
However, in cohort-type studies, which are defined by following exposure groups to compare the incidence of an outcome, one can calculate both a risk ratio and an odds ratio. As with a risk ratio, the convention is to place the odds in the unexposed group in the denominator. In addition, like a risk ratio, odds ratios do not follow a normal distribution, so we use the lo g transformation to promote normality. As a result, the procedure for computing a confidence interval for an odds ratio is a two step procedure in which we first generate a confidence interval for Ln OR and then take the antilog of the upper and lower limits of the confidence interval for Ln OR to determine the upper and lower limits of the confidence interval for the OR.
The two steps are detailed below. The null, or no difference, value of the confidence interval for the odds ratio is one. We again reconsider the previous examples and produce estimates of odds ratios and compare these to our estimates of risk differences and relative risks. This gives the following interval 0. Interpretation: The odds of breast cancer in women with high DDT exposure are 6. The null value is 1, and because this confidence interval does not include 1, the result indicates a statistically significant difference in the odds of breast cancer women with versus low DDT exposure.
Therefore, odds ratios are generally interpreted as if they were risk ratios. Note also that, while this result is considered statistically significant, the confidence interval is very broad, because the sample size is small. As a result, the point estimate is imprecise. Notice also that the confidence interval is asymmetric, i. Remember that we used a log transformation to compute the confidence interval, because the odds ratio is not normally distributed.
Therefore, the confidence interval is asymmetric, because we used the log transformation to compute Ln OR and then took the antilog to compute the lower and upper limits of the confidence interval for the odds ratio. Remember that in a true case-control study one can calculate an odds ratio, but not a risk ratio.
However, one can calculate a risk difference RD , a risk ratio RR , or an odds ratio OR in cohort studies and randomized clinical trials. Consider again the data in the table below from the randomized trial assessing the effectiveness of a newly developed pain reliever as compared to the standard of care.
Remember that a previous quiz question in this module asked you to calculate a point estimate for the difference in proportions of patients reporting a clinically meaningful reduction in pain between pain relievers as 0. Because this confidence interval did not include 1, we concluded once again that this difference was statistically significant. When the study design allows for the calculation of a relative risk, it is the preferred measure as it is far more interpretable than an odds ratio.
The odds ratio is extremely important, however, as it is the only measure of effect that can be computed in a case-control study design.
So for example, let's say-- And if you're actually trying to do this, I would recommend doing at least data points, or 1,, and later on we'll talk about how you can think about whether you've measured enough or how confident you can be. But let's just say you're a little bit lazy, and you just sample five men.
And so you get their five heights. Let's say one is 6. Let's say one is 5. Let's say one ends up being 5. Another one is 6. Another is 5. Now, if these are the ones that you happen to sample, what would you get for the mean of this sample? Well let's get our calculator out.
And we get 6. The sum is And then we want to divide by the number of data points we have. So we have five data points. So let's divide So here, our sample mean-- and I'm going to denote it with an x with a bar over it, is-- and I already forgot the number-- 5. This is our sample mean, or, if we want to make it clear, sample arithmetic mean.
And when we're taking this calculation based on a sample, and somehow we're trying to estimate it for the entire population, we call this right over here, we call it a statistic. Now, you might be saying, well, what notation do we use if, somehow, we are able to measure it for the population? Let's say we can't even measure it for the population, but we at least want to denote what the population mean is.
Well if you want to do that, the population mean is usually denoted by the Greek letter mu. And so in a lot of statistics, it's calculating a sample mean in an attempt to estimate this thing that you might not know, the population mean. And these calculations on the entire population, sometimes you might be able to do it. Oftentimes, you will not be able to do it. These are called parameters. So what you're going to find in much of statistics, it's all about calculating statistics for a sample, finding these sample statistics in order to estimate parameters for an entire population.
Now the last thing I want to do is introduce you to some of the notation that you might see in a statistics textbook that looks very math-y and very difficult. But hopefully, after the next few minutes, you'll appreciate that it's really just doing exactly what we did here-- adding up the numbers and dividing by the number of numbers you add. If you had to do the population mean, it's the exact same thing.
It's just many, many more numbers in this context. You have to add up million numbers and divide by million. So how do mathematicians talk about an operation like that-- adding up a bunch of numbers and then dividing by the number of numbers? Let's first think about the sample mean, because that's where we actually did the calculation. So a mathematician might call each of these data points-- let's say they'll call this first one right over here x sub 1. They'll call this one x sub 2. They'll call this one x sub 3.
They'll call this one-- when I say sub, I'm really saying subscript 1, subscript 2, subscript 3. They could call this x subscript 4. They could call this x subscript 5. And so if you had n of these you would just keep going. And so to take the sum of all of these, they would denote it as let me write it right over here. So they will say that the sample mean is equal to the sum of all my x sub i's-- so the way you can conceptualize it, these i's will change. In this case, the i started at 1. The i's are going to start at 1 until the size of our actual sample.
So all the way until n. In this case n was equal to 5. So this is literally saying this is equal to x sub 1 plus x sub 2 plus x sub 3, all the way to the nth one. Once again, in this case, we only had five. Now, are we done? Is this what the sample mean is? Well, no, we aren't done. We don't just add up all of the data points.
We then have to divide by the number of data points there are. So this might look like very fancy notation, but it's really just saying, add up your data points and divide by the number of data points you have. And this capital Greek letter sigma literally means sum. Sum all of the x i's, from x sub 1 all the way to x sub n, and then divide by the number of data points you have.
Now let's think about how we would denote the same thing but, instead of for the sample mean, doing it for the population mean.