PSY 85 Langston University Evaluate Statistical Analyses Discussion
This week, you have learned about t-tests and ANOVA and will apply what you have learned in the evaluation of several research scenarios. For this task, you will read each research scenario and answer the questions regarding each one.
Research Scenario 1A researcher is interested in the effects of a new weight loss supplement. Participants in this double-blind study are randomly assigned to either a group that receives the new supplement or a group that receives a placebo. Participants are weighed before starting the program. After 6 weeks of taking either the new supplement or the placebo, participants return to the lab to be weighed.
- Provide the appropriate null and alternative hypotheses.
- Determine which type of analysis would be appropriate to answer this research question. Be specific and support your answer using the textbook or other course materials.
- Name the independent and dependent variables used in the analysis. What are the levels of the independent variable?
- Indicate the levels of measurement for each variable.
- Describe the Type I error for this study.
- Describe the Type II error for this study
.
Research Scenario 2A researcher is interested in whether certain memory strategies help people to remember information. This researcher employs students from a local college, and then randomly assigns them to one of three groups—the visualization group, the mnemonic technique group, and the rote repetition group. Participants in each group receive an hour of instruction regarding how to use the particular technique to remember lists of words. After the instruction, all participants are presented with a list of 60 words that they are instructed to remember. The words are presented one-at-a time on a computer screen. After the last word is presented, all participants are instructed to recall as many words as possible by writing them on a blank sheet of paper. All participants are given 10 minutes to recall the words.
- Provide the appropriate null and alternative hypotheses.
- Indicate which type of analysis would be appropriate to answer this research question. Be specific and support your answer using the textbook or other course materials.
Name the independent and dependent variables used in the analysis. What are the levels of the independent variable?Indicate the levels of measurement for each variable.Describe the Type I error for this study.Describe the Type II error for this study
Research Scenario 3A local manufacturing company is interested in determining whether their employees are as happy with their jobs as other employees. The manufacturing company asked the workers, who volunteered to participate, to rate their happiness at work on a scale from 1 to 10 where 1 was not at all happy and 10 was extremely happy. The manufacturing company found that the mean happiness rating for their employees is 7.3. In the general population of workers in the United States, the mean happiness rating is 6.
- Provide the appropriate null and alternative hypotheses.
- Determine which type of analysis would be appropriate to answer this research question. Be specific. Please support your answer using the textbook or other course materials.
- Name the variables used in the analysis.
- What are the levels of measurement for each variable?
Describe the Type I error for this study.Describe the Type II error for this study
Length: 1-2 pages
Please review the following:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC31165…
https://support.minitab.com/en-us/minitab-express/…
Please use the correct title page (attached). Font larger than 12-points should not be used anywhere in the paper. Do not use underlining. The paper should be written in the correct APA essay format. The paper should demonstrate thoughtful consideration of the ideas and concepts presented in the course by providing new thoughts and insights relating directly to this topic. Your response should reflect scholarly writing and current APA standards.
One-Way Analysis of Variance
In: Statistics for the Social Sciences
By: R. Mark Sirkin
Pub. Date: 2011
Access Date: February 28, 2022
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9781412905466
Online ISBN: 9781412985987
DOI: https://dx.doi.org/10.4135/9781412985987
Print pages: 317-358
© 2006 SAGE Publications, Inc. All Rights Reserved.
This PDF has been generated from SAGE Research Methods. Please note that the pagination of the
online version will vary from the pagination of the print book.
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
One-Way Analysis of Variance
▾ PROLOGUE ▾
This chapter expands the kinds of comparisons in the last chapter to more than two groups as well. So
this time, our juvenile criminals might be broken into three groups: one of those that had detention, one of
those with probation alone, and one group that received probation and community service. In our marriage
counseling example from before, suppose our couples are assigned randomly to three groups: individual
counseling, group counseling (with more than one couple participating), and no counseling.
We are not limited to three groups. Maybe we are doing a study for a pharmaceutical company. We randomly
assign participants with the same illness to five groups as follows: one getting ½ mg per day, one getting 1
mg per day, one getting 1½ mg per day, one getting a placebo (a “sugar pill” with no therapeutic value), and
one group getting no medication at all. We then compare the mean recovery time for each of the groups.
Introduction
Analysis of variance (ANOVA) is a statistical cousin to the t test. Like the t test, it is a technique for
comparing sample means, but unlike the t test, ANOVA can be used to compare more than two means.
Analysis of variance is very versatile. It is particularly friendly to experimental applications, where we may
be comparing the means of several treatment groups and a control group. Consequently, psychologists rely
heavily on this procedure. ANOVA is also useful in nonexperimental situations in the same way that the t
test is. Interestingly, though, ANOVA has been less widespread in nonexperimental research than many other
statistical procedures, despite its great potential.
One-way analysis of variance (ANOVA) A technique for comparing sample means that
can be used to compare more than two means.
With ANOVA, because several sample means are usually being compared, once a null hypothesis has been
rejected, we need a follow-on, or post hoc, procedure. This is because although ANOVA examines all sample
means at once, it is possible that some pairs of means may not be significantly different from one another,
even though when all means are taken together in their entirety, the null hypothesis may be rejected. Thus,
the process is a bit like remote sensing (i.e., aerial photography). ANOVA gives us a high-altitude picture,
and if we can reject the null hypothesis, we swoop down for a closer look. The post hoc test provides the
low-altitude shot.
Page 2 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Post hoc procedure A follow-on procedure that is used once a null hypothesis has been
rejected.
At the end of the chapter, we will briefly look at some other variations of the ANOVA technique.
How Analysis of Variance Is Used
Analysis of variance is designed, in its nonexperimental application, for two variables; one is interval level of
measurement (usually the dependent variable), and the other is grouped data of any level of measurement.
Suppose we study a random sample of 8 people. We determine for each of them whether they are from urban
or rural areas as well as their score on a “pro-life” index that ranges from 0 (most pro-choice) to 200 (most
pro-life) on the issue of abortion. We get the following results:
Clearly, the means of the two groups differ in our sample of 8 subjects. May we assume that they differ in the
population as a whole? Assuming no directionality, our hypotheses would be as follows:
We could handle this problem using the two-sample t test, or we could perform a one-way analysis
of variance (one-way ANOVA). Unfortunately, analysis of variance does not allow for a directionality
assumption (although ANOVA’s cousin, the two-sample t test, does). Thus, we use the nondirectional H1.
One-way analysis of variance (one-way ANOVA) Analysis of variance with one
dependent variable and one independent variable.
From our sample data, we will calculate a statistic called F or the Fratio (named for Fisher, who originally
helped developed it). As we did with other tests, we will compare our obtained F to Fcritical at the .05 level,
and if our F exceeds Fcritical, we will reject the null hypothesis.
Page 3 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
ForFratio The statistic generated by the analysis of variance procedure.
Analysis of Variance in Experimental Situations
Our first example (abortion stance by area) came from a nonexperimental context, a survey. In many social
sciences, experiments have been relatively rare, but in the behavioral sciences, laboratory sciences, and
medicine, they are quite common. Where experimental designs are common, ANOVA is the most common
statistical technique used.
Let us use an example from training and development, a growing subfield of communications that deals with
the training of adults, usually in a job-related setting. Suppose an advertising firm is seeking to improve its
operation. A training program is being established for employees of this firm to give the employees skills
needed to properly advise and assist clients. The firm wishes to develop the most effective training program
possible. The trainers are interested in comparing the relative efficiency of day, night, and weekend programs.
Let us assume that the same instructor will present identical material in each of three sections. One section
will meet 1 hour per day, Monday through Friday, at the same time for one week. The second section will
meet at night under similar circumstances. The third section will meet on a Saturday for a single long session,
including 5 hours of instruction plus time for breaks and meals. There will be 10 people in each section. At
the end of the instruction, each person taking the class will rate his or her overall satisfaction with the course
on a scale of 0 (dissatisfied) through 5 (completely satisfied).
The 30 “students” are the participants or subjects, as they used to be called, in the experiment. They are
all employees of the same firm and are selected to take the course for professional purposes. There is no
random sample being selected here. However, if the trainer suspects that satisfaction will differ among the
three classes—day, night, and Saturday—he or she may test that hypothesis.
Participants or subjects The people being studied in an experiment.
To do so, a table of random numbers or some computerized random number generator can be used to
randomly assign the 30 subjects to three groups of 10 people each. This randomization process tends to
control for other social or behavioral characteristics. Each group will, for instance, have about the same
proportion of men and women as the original group of 30. Each group will have about the same mean age.
Each group will have about the same proportion of disgruntled employees who feel forced by their bosses
to take the course. (We keep saying “about the same” since there will be some differences, slight we hope,
equivalent to sampling error, resulting from chance in the randomization process.) Sampling error aside, if
we randomly assign subjects, any difference in mean satisfaction scores among the three groups will be the
result of the time frame for the course (day, night, Saturday) rather than other factors. Thus, for the variable
Page 4 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
satisfaction:
Note that analysis of variance will take as many groups (or categories) as we have, whereas the two-sample
t test is limited to two groups (or categories). Thus, for the above problem, we are only able to use one-way
analysis of variance.
With more than two categories, it becomes difficult to write H1 with symbols. In effect, H1 says that in the
population, there exists at least one inequality that negates the null hypothesis. Any of the following would
negate H0.
It is not necessary that all three population means be unequal, although that could be the case:
Suppose we obtain the following results:
Clearly, the category means of 4.1, 2.6, and 3.8 are different from each other. Are they different enough to
conclude that they were not the result of sampling error—in this case, due to the randomization process
whereby we assigned the subjects to the three categories? If we can reject the null hypothesis, we may
conclude that “in the population,” that is, for people in general, student satisfaction levels differ by the time
and format of the class offered regardless of the instructor, course content, or anything else.
Page 5 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
We calculate F and compare it to Fcritical at the appropriate degrees of freedom. If the F we obtained exceeds
Fcritical, .05 level, we reject H0. We then compare our F to Fcritical at other levels to form a probability
statement.
F: AN INTUITIVE APPROACH
We may think of F as a measure of how well the categories of the independent variable explain the variation
in the scores of the dependent variable. If the categories of the independent variable are totally useless in
explaining the variation of scores of the dependent variable, then F will equal zero. As the categories of the
independent variable begin to explain or account for some of the variation in the dependent variable, F begins
to get larger. The better the independent variable explains variation in the dependent variable, the greater is
the relationship between the variables and the larger F becomes. When the categories of the independent
variable explain nearly all the variation in the dependent variable, F grows extremely large, approaching
infinity as a limit.
As an illustration, imagine a simplified version of the course satisfaction problem. For simplicity’s sake, we will
limit the independent variable to two categories, a day class and a night class. Suppose there are six people
in each class. The following satisfaction scores emerge.
If we were to ignore the existence of the two categories and just calculate the mean of all 12 satisfaction
scores, the mean we would get—called the grand mean—would be 48/12 or 4.0, the same as the two
category means. Since mean satisfaction is the same (4.0) whether or not we know in which category a
subject belongs, the categories do not help us predict a subject’s score on the dependent variable. Thus,
the two variables are unrelated. There is no difference between day and night classes in terms of course
satisfaction. F will be zero.
Page 6 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Grand mean The mean of all scores.
Now imagine a slight variation in which one more person in the day class has a satisfaction score of 5 and
one more person in the night class has a satisfaction score of 3. While the grand mean is unchanged (4.0),
the two category means now differ.
Once we learn to calculate F we will see that for this problem, F = 1.248, up from 0 in Case 1. Whereas
in Case 1, each category had three scores of 5 and three scores of 3, in Case 2, the day class is slightly
more satisfied than the night class: four scores of 5, two scores of 3 in the day class with a mean of 4.33 as
opposed to two scores of 5, and four scores of 3 in the night class with a lower mean of 3.67. The two classes
now differ somewhat in terms of satisfaction.
We now add one more score of 5 to the day class, replacing a score of 3, and in the night class, we replace
a score of 5 with a 3.
Page 7 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
The differentiation between day and night classes, in terms of satisfaction, has grown even more pronounced.
Day students are clearly more satisfied than night students. The category means are more widely spread
about the grand mean than in Case 2. Though we could be wrong, there is now a five-to-one chance that if
someone tells us he or she is in the day class, we would be accurate in predicting the satisfaction score as
being 5. (Of the six day students, five have scores of 5; only one does not.) If someone is in the night class,
we predict (also with odds of five to one) that the satisfaction score is 3.
Finally, we change the final score of 3 in the day class to 5 and change the final 5 in the night class to 3.
In this case, the categories of class type explain all the differences in satisfaction scores. Knowing what class
one is in gives us perfect predictive ability in terms of satisfaction. If in the day class, a person’s satisfaction
score is 5; if in the night class, the satisfaction score is 3. The two variables, satisfaction and class meeting
time, are perfectly related.
Notice that in Case 1, all the variations of the scores from the grand mean were actually within each of the two
categories. The category means did not vary at all from the grand mean. As we progressed through Cases 2
and 3, more and more of the deviations or variations of scores from the grand mean could be explained by
the category means. Finally, in Case 4, there were no deviations of scores within the categories. All scores fell
at the means of their respective categories. All deviations of scores from the grand mean could be accounted
for by the deviations of their respective category means about the grand mean. To see this more clearly, note
that algebraically, we can break the distance between any score and the grand mean into two components:
(a) the distance from that score to its respective category mean, plus (b) the distance from that category mean
to the grand mean.
In Case 1, every score is either a 5 or a 3, both category means are 4.00, and the grand mean is 4.00. Thus,
for a score of 5 in the day class,
Page 8 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
The same would hold for a score of 5 in the night class. For a score of 3 in either class,
Notice that in Case 4, it is (Score – Category Mean) that always reduces to zero. Since all scores in the day
section are 5,
Since all scores in the night section are 3,
Page 9 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
In Case 1, (Category Mean – Grand Mean) is always zero; in Case 4, (Score – Category Mean) is always
zero. These are the two extreme cases. In Case 1, the categories explain no variation in the dependent
variable, and in Case 4, the categories explain all the variation in the dependent variable. Cases 2 and 3 fall
in the middle. In Case 2, one third of the distance between a score and the grand mean can be accounted
for by the distance between the category mean and the grand mean; two thirds of the distance is within the
category, between the score and its category mean. In Case 3, two thirds of the distance between a score
and the grand mean can be accounted for by the distance between the category mean and the grand mean;
one third of the distance is within the category, between the score and its category mean.
ANOVA Terminology
The logic of analysis of variance is based on partitioning the distance to the grand mean into distances
explained by the category means and distances unexplained by the category means, in a manner somewhat
analogous to what we have been doing. However, ANOVA squares distances to get rid of negative numbers,
and it works with sums of these squared distances. It also makes use of its own computational formulas.
Thus, before learning the technique for calculating F we need to define some terminology.
Since we will ultimately be using variance estimates (squared sigma-hats), let us return for a moment to the
definitional formula for a population variance estimate.
The denominator for this estimate, n − 1, is its degrees of freedom or df. The numerator
read as the sum of the squared deviations of the values of x from the mean. Each
can be
is the deviation, the
distance of that value of x from the mean. These deviations are squared to get rid of negative signs, and the
squared deviations are summed. In analysis of variance, “the sum of the squared deviations of the values of
x from the mean” is shortened to the sum of squares and is indicated by the letters SS.
Sum of squares The sum of the squared deviations of the values of x from the mean.
When we divide SS by n for a sample variance or by df n = 1 for a variance estimate, we are finding a kind of
average amount of squared deviation for each value of x: “the mean squared deviation of a score (a value of
x) from the mean of all scores,” which is shortened to the mean square and is indicated by the letters MS. A
variance or variance estimate is the average, or mean, amount of the sum of squares per unit. Our σ2 formula
is then symbolized as
Page 10 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Mean square The mean squared deviation of a score (a value of x) from the mean of all
scores.
What ANOVA does is first to calculate, using the computational formula, the total sum of squares (SSTotal
or SST), meaning the total of the squared deviations of scores about the grand mean.
Total sum of squares The total of the squared deviations of scores about the grand mean.
The total sum of squares (SST) is then partitioned (divided) into two components. The first component is the
between-groups sum of squares (SSBetween or SSB), the portion of the total sum of squares that can be
accounted for by the variations of the category means about the grand mean. That is, SSB is the portion of
SST that can be accounted for (explained by) the categories.
Between-groups sum of squares The portion of the total sum of squares that can be
accounted for by the variations of the category means about the grand mean.
The second component is the within-groups sum of squares(SSWithin or SSW), the portion of the total
sum of squares left unexplained by the variations of the category means about the grand mean. This, then, is
the sum of squares within the categories or the squared deviations of scores about their respective category
means. It is sometimes called the error sum of squares.
Within-groups sum of squares or error sum of squares The portion of the total sum of
squares left unexplained by the variations of the category means about the grand mean.
In short,
SSB = the portion of SST accounted for by the categories of the independent variable.
SSW = the portion of SST not accounted for by the categories of the independent variable.
These are additive:
Page 11 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
We use SSB and SSW to form two separate population variance estimates. The first of these variance
estimates, the between-groups mean square (MSBetween or MSB), is a variance estimate based on the
between-groups sum of squares. MSB estimates the population variance accounted for by the variation of the
category means about the grand mean—the population variance accounted for by the groups or categories of
the independent variable. To find MSB, we divide SSB by the between-groups degrees of freedom(dfB or
dfBetween). Since here we are talking not about the number of respondents but about the number of groups
or categories, dfB equals the number of categories (or groups) minus 1.
Between-groups mean square A variance estimate based on the between-groups sum
of squares.
Between-groups degrees of freedom The degrees of freedom based on the number of
groups studied or categories of the independent variable.
so
The second variance estimate is the within-groups mean square(MSWithin or MSW), a population variance
estimate based on what the categories of the independent variable do not explain—the variation of scores
within the groups. We take SSW and divide by the within-groups degrees of freedom(dfWithin or dfW). The
within-groups degrees of freedom is found by subtracting the between-groups degrees of freedom from the
degrees of freedom belonging to σ2 (our original total variance estimate), which is n − 1. Thus,
Within-groups mean square A population variance estimate based on what the
categories of the independent variable do not explain—the variation of scores within the
groups.
Within-groups degrees of freedom That portion of the total degrees of freedom not
accounted for by the number of groups studied.
so
Note that like the sums of squares, the degrees of freedom are additive, so that
Page 12 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Although the SSs and df’s are additive, the MSs are not. The mean square total, σ2, does not equal MSB +
MSW.
Finally, we find the F ratio:
The F that we obtain is compared to Fcritical (see Table 10.1).1 Note that F uses two degrees of freedom,
dfBetween, which is found in the column on the left-hand side, and dfwithin, which we locate on the top row.
(Here, n1 and n2 mean degrees of freedom: n1 = dfB and n2 = dfW.) There are three pages to this table:
one for the .05 level, one for the .01 level, and one for the .001 level. If Fobtained exceeds Fcritical at the .05
level, we reject the null hypothesis and go on to the page with Fcritical at the .01 level. If Fobtained exceeds
Fcritical at the .01 level, we go on to compare it to Fcritical at the .001 level. This is exactly what we did with
earlier tests. Probabilities are reported the same way If ANOVA is done on a computer using a program such
as SAS or SPSS, the exact probability will be listed, and thus it will not be necessary to use a critical value of
the F table.
The ANOVA Procedure
Before we actually work an F problem through, look at Box 10.1, where the computational steps and all
appropriate formulas are given. To calculate F, we must find the following: n for each category, nTotal, ∑x for
each category, ∑xtotal, and ∑x2total, which we get by finding ∑x2 for each category and adding them up. Note
that all the examples used so far in this chapter have equal category sizes; this ideal is not necessary and
not always possible. Note, too, that though we need not find the category means for the sample to calculate
F using these formulas, we do so anyway to better understand the problem we are working. Applying these
steps to the first problem presented in this chapter,
Page 13 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Table 10.1 Critical Values of F for p = .05
Page 14 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Page 15 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Page 16 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
BOX 10.1 Procedure for Calculating One-Way Analysis of Variance
Calculate the total sum of squares (sum of squared deviations from the grand
mean), where
Calculate the between-group sum of squares (sum of squared deviations of
category means from the grand mean), where
Page 17 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Note: +…+ means continue repeating the (∑xcat)/ncat for the subsequent
categories, if any.
Calculate the within-group sum of squares, where
Calculate the between and within mean squares, where
Calculate the F ratio, where
Use the table of F values in your textbook to test the F for significance.
Note: Numerator df = dfBetween = no. of categories − 1 Denominator df =
dfWlthin = (nTotal − 1) – (no. of categories − 1)
If F is significant and the number of categories is greater than two, perform a
post hoc procedure such as Scheffe’s test.
If F is significant and this is nonexperimental research, measure association
with the correlation ratio or ri. (These will be presented in Chapter 13.)
Following the steps in Box 10.1:
We find the total sum of squares.
Page 18 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
We calculate the between-group sum of squares. Note that the last expression in this formula,
was already calculated in Step 1.
We find the within-group sum of squares.
We calculate the mean squares.
We calculate the F ratio.
We find the degrees of freedom.
We then find Fcritical at the .05 level from the first table in Table 10.1, going along the top row to dfB (1) and
dropping down that column until it intersects the row for dfW (6):
Page 19 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Since Fobtained is 10.80 and greater than 5.99, we reject H0. Using the second table of Table 10.1, we repeat
the procedure to find Fcritical at the .01 level, which is 13.74 and greater than 10.80. Since p is less than .05
but greater than .01, we state our probability of falsely rejecting a true null hypothesis as p < .05.
Before moving on, let us return to Table 10.1 for a moment. From time to time, we will calculate a degreesof-freedom figure that does not appear in the table, such as dfB = 7 or dfW = 31. In such a case, we use
the adjacent row or column that has the higher value of F. Thus, for dfB = 7, we would see which value was
higher, F at df = 6 or F at df = 8. For example, if dfB = 7 and dfW = 6, we would have a choice between
Fcritical, df 6 and 6 (4.28), or Fcritical, df 8 and 6 (4.15). We use the larger of the two (4.28) as our critical
value. If dfB = 1 and dfW = 31, we have a choice between Fcritical, df 1 and 30 (4.17), or Fcritical, df 1 and
40 (4.08). Again, we select the larger of the two (4.17) as our critical value. If we are very close to rejecting
H0 using this procedure but do not quite make it, our best bet is to use a computer program that reports the
exact probability.
We go back once again to our ANOVA problem for which we have now rejected H0 with a probability of error
< .05. We may wish to summarize our findings in what is called an ANOVA source table.
ANOVA source table A table summarizing the results of the main steps in the ANOVA
procedure.
Note that we usually do not report dfTotal (in this case, n − 1 = 8 − 1 = 7, or MSTotal, which is σ2) since
neither was necessary for finding F.
COMPARING F WITH t
At this juncture, let us pause to compare F to the two-sample t tests presented previously. Although they are
Page 20 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
treated in most statistics texts as separate procedures, they are in fact mathematically related. In the case of
a two-category independent variable such as the problem just completed, it turns out that if the two-sample t
test, assuming equal population variances, had been calculated on the same data, then F = t2. In our problem,
calculating the t value would yield 3.287, and squaring that yields 10.80, the same as our F.
If we may do either t or F when comparing two groups, which is preferable? Generally, the two-sample t test
is, for two reasons. First, unlike ANOVA, it is possible to make a directionality assumption in a t test's H1. If
we had advance reasons to believe rural residents were more pro-life than urban ones, we could have used
the directional critical values of t and reduced our probability of error by one half.
Second, ANOVA as presented here assumes equal population variances. If there is reason to assume
unequal population variances, a t test formula must be used. In fact, it is somewhat ironic that the assumption
of equal population variances plays such a major role in the t test procedure since in most actual applications
of ANOVA, the researchers merely make the equal variance assumption and do F without testing the
assumption. The reason is that statisticians consider F to be robust, a term in statistics meaning accurate
even when underlying assumptions (such as equal population variances) are violated. This is particularly true
when all category sizes are the same. Despite this observation about F, if evidence suggests very unequal
population variances, this procedure should be avoided.
Robust Accurate even when underlying assumptions (such as equal population
variances) are violated.
Finally, note that just as was the case with the t test, ANOVA assumes that the populations from which the
categories are drawn are normally distributed along the dependent variable. In our samples, if category sizes
are sufficiently large, we may relax the normality assumption. In this respect, ANOVA is the same as the t
test.
Analysis of Variance with Experimental Data
Let us now turn our attention to the training of advertising consultants problem. Recall that H0: μday = μNight
= μSaturday and that H1: there exists at least one inequality that negates H0. We had 10 trainees each in
three separate classes.
Page 21 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Following the steps in Box 10.1:
Page 22 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Post HOC Testing
In rejecting the null hypothesis, we conclude H1: in the population there exists at least one inequality that
negates H0. Though we can conclude that in the population, there is a difference between student satisfaction
and the time that the class is offered, we cannot be sure exactly where that difference lies since only one
inequality negates H0. The possibilities here are as follows:
Page 23 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
In the context of this specific problem, μD = μN ≠ μS is probably illogical since
D is closer to
S than to
N, although sampling error could conceivably have yielded these means from a population where μD = μN ≠
μS is really true. Also, in the context of sampling, we might have a situation where we cannot reject the null
hypothesis either between μN and μS or between μS and μD but we can reject it between μN and μD.
Why not run a series of two-sample t tests between each pair of means and see which null hypotheses could
be rejected? The answer is that since the t test was predicated on the assumption that a single comparison
of means was to be made, if we make more than one comparison, the probability of a Type I or alpha error
increases with each comparison.
However, there are a number of tests, known as post hoc tests of multiple comparisons, that control for
such inflated alpha levels and enable us to narrow our conclusion regarding exactly where these population
inequalities are. One such test is Scheffe's test (pronounced as in French: shef-FAY). There are other more
powerful control procedures, but we present this one because of its flexibility and robustness. It can be applied
even when the groups being compared have different sizes (some tests assume equal ns), and it is less
sensitive to departures from normality and any assumptions of equal population variances than are some
other tests.
Post hoc tests of multiple comparisons Tests that enable us to narrow our conclusion
to specifically where these population inequalities are to be found.
Scheffé's test A test that finds the critical difference between any two sample means that
is necessary to reject the null hypothesis that their corresponding population means are
equal.
Scheffe's test finds the critical difference between any two sample means that is necessary to reject the null
hypothesis that their corresponding population means are equal. If μD ≠ μN, how big must the difference be
between
D and
? This difference, Scheffe's critical value, may be calculated between each pair of
means, and the actual sample mean differences are compared to the critical values. If
for any two
categories, i and j, exceeds Scheffe's critical value, we may reject H0 and conclude μi ≠ μj.
Scheffe's critical value The value in this test needed to reject the null hypothesis.
Page 24 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
We begin by presenting the ANOVA source table for the problem just completed.
For any two categories, i and j, the following formula generates Scheffe's critical value.
From the source table, we see that dfB = 2 and MSW =1.44. The Fcritical, .05 level from Table 10.1 is Fcritical,
.05. df = 2 and 27 = 3.35. Finally, since all of our category ns are equal to 10, ni = n. = 10. Thus, one critical
value will apply to all three mean comparisons. Had our category sizes been unequal, we would have had to
calculate a separate critical value for each pair of sample means. Plugging into our formula,
In other words, the absolute value of any pair of sample mean differences must equal or exceed 1.389 in
order to reject H0. Examining our sample
s, we see the following:
Thus, although our overall F was significant, we have traced that fact to the single explanation of an inequality
between the day and night class population means. We cannot conclude that the population mean for the
Saturday group differs from either the day or the night classes.
Suppose, however, that
had been larger than 1.389, and we could also have rejected H0. Our
conclusion would be modified: The significant F resulted from the difference between the night class's scores,
on one hand, and the combined day and Saturday scores, on the other. Since the day and Saturday scores
are not significantly different, we might conclude that, since the Saturday classes also met during the daytime,
it is the day versus night difference that counts, regardless of which day or days of the week that the day class
is held.
Page 25 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
As noted earlier, there are many post hoc and a priori tests other than Scheffe's. These include Duncan's
multiple-range test, the Student-Newman-Keuls's multiple-range test, the least significant difference test,
Tukey's honestly significant difference test, the Bonferroni procedure, and others. Because of space
considerations, only the calculation of Scheffe's test is presented here. Some of the other tests require
extensive calculations or additional tables of critical values. However, if you have access to a computer, use
of the Bonferroni test is generally preferred over Scheffe's test because it is easier to reject the null hypothesis
for each pair of differences. Be aware, though, that there are circumstances where Scheffe's test or Tukey's
test may be a better one to use.2 There is considerable debate over which test is most appropriate for specific
research situations. Consult an advanced research design text to learn more about them.
Computer Applications
The SPSS
ANOVA printouts resemble the source tables you have seen in this chapter. In general, SPSS's subprograms
ONEWAY and ANOVA use the same terminology used in this chapter. Table 10.2 shows the SPSS data list
for the problem we just completed. VAR00001 is the type of class with day coded as 1, night coded as 2, and
Saturday coded as 3. VAR00002 is the satisfaction rating.
To run the one-way analysis of variance, click on the menu bar as follows:
In the Dialog box, find VAR00002 on the left, click on it, and use the upper arrow button to place it in the
dependent list. Now, on the left, click on VAR00001 and move it into the factor list.
Page 26 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Table 10.2
Now click on the post hoc button. From the list of tests, we will select the Scheffé test. Click on this and then
click continue. Now click the options button and click on descriptive. We do not have to run the descriptive
statistics for this problem, but it is generally a useful option to run. Click continue and when the Dialog box
reappears, click ok.
In Table 10.3, the output for this run is reproduced. Note that first are the descriptive statistics that we had
opted to include. This is followed by the ANOVA source table.
The Scheffé results are found in Table 10.4. Significant mean differences are highlighted with an asterisk.
Below that, two homogeneous subsets are identified. Subset 1 contains the means for Groups 2 and 3,
Page 27 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
indicating no population differences in satisfaction when night students are compared to Saturday students.
Subset 2 indicates no population differences between day and Saturday students. Note the absence of
a subset containing the means of Group 1 and Group 2. This parallels our earlier finding that there is a
significant difference between day and Saturday attendees.
Table 10.3 Oneway
The SAS
Click as before:
Enter the data just as in Table 10.2, with A being the category (either 1, 2, or 3) and B being the satisfaction
rating. Then click
(Shortcut: The second icon from the right at the top of the page, p = 05, also clicks you into one-way ANOVA.)
Page 28 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Table 10.4 Post Hoc Tests
In the Dialog box, move B to the Dependent Variable box and A to the Independent Variable box. At the
bottom of the Dialog box is a button labeled means. A new Dialog box will open.
It is here that you will select the post hoc test or tests you wish run. Click on A in the box labeled Main Effects
(on the left-hand side of the screen). A menu of post hoc tests will appear. You may pick whatever test you
want. In this example, you would want to highlight Scheffe's Multiple Comparison Method. Then click on the
add button just above the effects/methods box, and the Scheffe test will be listed in that box. If Scheffe is all
you want, click the ok button.
Page 29 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
To add additional tests, do not click ok. Instead, click on the down arrow icon to the right of the box under
Comparison Method, to be found near the top of the screen. The menu of tests will appear. Click on the test
to be added, and the menu will disappear with the test you just selected listed in the box below Comparison
Method. Again click on A on the Main Effects box and then click on the Add button. The new test is now found
in the effects/methods box under the Scheffe test. To add more tests, go back to the down arrow icon and
click it. Repeat the procedure by highlighting the third test you want done and follow the same procedure as
above.
When done, click ok to go back to the first Dialog box, and click ok again to run the ANOVA. The output will
appear on the screen as in Tables 10.5 (ANOVA) and 10.6 (Scheffe).
Table 10.5 SAS Analysis of Variance Output
Page 30 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Table 10.6 SAS Post Hoc Test (Scheffe's Test)
Excel
As with the t test in Excel, the data are entered in columns, with column A being the day class, column B
being the night class, and column C being the Saturday class (see Figure 10.1). Then click
Highlight the input range as before. In this case, $A$1:$C$10 should appear in the box, and make sure the
Grouped by Columns button is indicated. Click ok. The results also appear in Figure 10.1. No post hoc tests
are available with this routine.
TWO-WAY ANALYSIS OF VARIANCE
We have only scratched the surface of ANOVA, covering topics most germane to social scientists who will
generally use nonexperimental techniques.
Page 31 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 10.1 Excel Data and Printout for ANOVA
ANOVA: Single Factor
In behavioral science applications, where experiments are more common, a wide variety of elaborate
advanced analysis of variance techniques are used by researchers.
In two-way analysis of variance, we extend our model to include a second independent variable. Suppose
in our training example, we also wanted to see if subjects’ satisfaction with the course could be explained by
the nature of their specialization within the firm. Suppose half of the subjects were employees whose missions
Page 32 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
emphasized advertising strategies and issues, whereas the other half of the subjects were primarily media
consultants who were less issue oriented and more concerned with communications skills and use of mass
media. Do the two groups respond similarly to the scheduling of the course?
Two-way analysis of variance Analysis of variance that includes a second independent
variable.
Suppose the following pattern emerges:
Assuming that F is significant when comparing our six means, we see the same trend as before: Day
and Saturday classes are preferred to night classes for both advertising strategists and media specialists.
However, we note the consistently higher satisfaction of the strategists. Their satisfaction is higher than their
media colleagues, regardless of the time of the class. We would need further research to determine why these
differences exist; perhaps media people need a different kind of training program.
On the other hand, suppose the following pattern of scores had emerged.
Page 33 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Here, the strategists are unaffected by class scheduling. The sample means are the same, 3.6, for each class.
We could assume no difference in the populations for the advertising strategists.
For the media specialists, assuming F is significant; not only are there differences in satisfaction from class to
class, but the day session is most popular. Also, both day and Saturday mean scores for the media specialists
are greater than those for advertising strategists. Only the night class is unpopular among the media people.
Such a situation suggests many possible follow-up studies.
Also, there are situations where the relationship between the dependent variable and one of the independent
variables is a function of the levels of the other independent variable. These are known as interaction
effects, and two-way ANOVA also measures such effects.
Interaction effects Situations where the relationship between the dependent variable
and one of the independent variables is a function of the levels of the other independent
variable.
Social scientists have recently been doing greater numbers of experiments than in the past. For example, one
now finds structured simulations of decision-making processes under conditions varied by the experimenter
for different groups. Accordingly, two-way ANOVA techniques may soon become as important in social
science as they are in so many other fields.
Page 34 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Conclusion
We have now seen analysis of variance used in both experimental and nonexperimental contexts and
discussed Scheffe's test as well as several other procedures related to ANOVA. In Chapter 14, we will
demonstrate another context in which this procedure is applied, namely, as part of the regression procedure.
At that point, we shall have completed the process of weaving together the two statistical strands—descriptive
and inferential—that have run through this text.
Chapter 10: Summary of Major Formulas
Page 35 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Exercises
Exercise 10.1
Here is one of the example problems from Chapter 9. Perform ANOVA. Explain why your conclusion differs
from the one reached with the two-sample t test.
Exercise 10.2
A scale measuring support for increased gun control legislation (0 = no support to 5 = most support) is
administered to random samples of urban, suburban, and rural voters. Do the three population means differ
in terms of support? If so, do Scheffe's test. What do you conclude?
Exercise 10.3
Page 36 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
The same attitude scale used in the previous exercise is applied to random samples of urban police officers,
white-collar workers, and blue-collar workers. Do ANOVA, and if the null hypothesis can be rejected, do
Scheffe's test. What do you conclude?
Exercise 10.4
For a random sample of Democrats in the U.S. House of Representatives, liberalism scores were compared
by region of the country. Find F and, if statistically significant, do Scheffe's test. What do you conclude?
Exercise 10.5
For a random sample of 25 physicians, scores measuring support for a national health care insurance
program were compared by medical specialization. Complete the resulting source table. What are your
conclusions?
Exercise 10.6
From an experiment measuring the cognitive learning of students with learning disabilities by various teaching
strategies, ANOVA was run. Complete the source table and state your conclusions.
Page 37 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Exercise 10.7
The same study as in Exercise 10.6 was done with students without learning disabilities. Complete and
interpret the source table.
Exercise 10.8
An ANOVA was run comparing political rights by GDP/capita (High, Medium, Low, Very Low). What
conclusions do you reach?
Exercise 10.9
In a certain study, scores earned on a graduate school admissions test were compared between those who
had no formal preparation and those taking courses designed to prepare students for the exam. Interpret the
printout with regard to statistical significance.
DEPENDENT VARIABLE: SCORE
Exercise 10.10
Here is a study of pilot reaction times under two different instrument panel configurations. Interpret the
printout.
Page 38 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Exercise 10.11
Using the computer, enter and run the following ANOVA data:
A.
The pro-life scale by urban vs. rural. Compare your results to those presented earlier in this
chapter (pp. 333–337). Are they the same?
B.
The satisfaction score by type of class data (pp. 339–340 and p. 342). Compare the ANOVA
results. Also, in addition to running Scheffe's test, run Bonferroni's and a few other available post
hoc tests. Do any of the results differ?
C.
The data presented in Exercise 10.2. Compare the results to those you calculated earlier.
D.
The data presented in Exercise 10.3. Also compare to your earlier findings.
Notes
1. If you have read the previous chapter, you are already familiar with the use of the F table. However, note
that here we have tables for the .01 and .001 levels as well as for the .05 level. In the last chapter, we used
only the .05 level table.
2. J. Neter, W Wasserman, and M. Kutner, Applied Linear 5tatistical Models: Regression, Analysis of
Variance, and Experimental Designs (Homewood, IL: Irwin, 1985), p. 584.
▾KEY
▾
CONCEPTS▾▾
Page 39 of 40
One-Way Analysis of Variance
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
http://dx.doi.org/10.4135/9781412985987.n10
Page 40 of 40
One-Way Analysis of Variance
Probability Distributions and One-Sample z
and t Tests
In: Statistics for the Social Sciences
By: R. Mark Sirkin
Pub. Date: 2011
Access Date: February 28, 2022
Publishing Company: SAGE Publications, Inc.
City: Thousand Oaks
Print ISBN: 9781412905466
Online ISBN: 9781412985987
DOI: https://dx.doi.org/10.4135/9781412985987
Print pages: 225-270
© 2006 SAGE Publications, Inc. All Rights Reserved.
This PDF has been generated from SAGE Research Methods. Please note that the pagination of the
online version will vary from the pagination of the print book.
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Probability Distributions and One-Sample z and t Tests
▾PROLOGUE▾
▾
▾
This chapter is essentially an extension of the previous chapter, except that in addition to presenting several
new topics, we go back to explain the theory that underlies the z formula. What are we actually doing when
we calculate z?
It isn't absolutely necessary to understand the underlying theory simply to work these formulas, any more than
it is to understand the physics and chemistry of the cooking process in order to cook a meal. Nevertheless,
it can be useful to understand what you process in order to cook a meal. Now that you are hungry, let's get
back to statistics. You (or your computer) can always calculate z, t, F, and so on. Still, it is useful to be able to
visualize and understand what is actually happening when you do these tests.
Introduction
In the previous chapter, we discussed tests of significance and used the one-sample z formula to illustrate
the entire procedure.
In this chapter, we turn our attention to the origin of that formula and explain what is taking place when we use
it. It is possible to use any statistical formula without such an understanding and simply plug in the numbers
as we did in Chapter 7. But if you can visualize what is going on, your understanding will be enhanced since
essentially the same process takes place no matter what test of significance is being performed.
The z test of significance is based on a frequency distribution known as a normal distribution and is applied
to a specific normal curve called the sampling distribution of sample means. When we perform this test, we
are actually taking the given sample statistics and population parameters and locating them on the sampling
distribution of sample means. In fact, all tests of significance do the same thing, even though their sampling
distributions differ from one another.
At the end of this chapter, we will discuss the one-sample t test, and in subsequent chapters, the other
commonly used tests of significance will be presented. We will begin by discussing an even simpler z formula
than the one in the previous chapter and introducing the concept of a normal distribution.
Normal Distributions
Page 2 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Normal distributions are a family of frequency distributions that, when graphed, often resemble bells.
Generally, they are represented in the form found in Figure 8.1. Such a curve has three major characteristics:
(a) it is unimodal, (b) it is symmetric, and (c) it is asymptotic to the x-axis. This last characteristic, which
becomes very important later on, means that the tails of the curve get closer and closer to the x-axis but
never reach it. Consequently, no matter how far you get from the mean on the x-axis, there will always be a
tail continuing beyond that point. The tail never ends, at least not in the mathematical model.
Normal distributions A family of frequency distributions that, when graphed, often
resemble bells.
The reason for using the term normal distribution is that certain characteristics such as human height, weight,
or intelligence graph in frequency distributions approximating this bell-shaped pattern. The term normal is a
bit misleading, however. Not everything in nature is normally distributed, so nonnormal distributions are not
really abnormal.
Figure 8.1 The Mathematical Model of the Normal Distribution
Page 3 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.2 IQ as a Normal Distribution
Let us use measures of intelligence—intelligence quotients or IQs—to illustrate the normal distribution (see
Figure 8.2). IQs are designed to range from 0 to 200 with a mean of 100. Standard deviations vary by the
age of the subjects but are usually around 13 or 14; for computational ease we will use 10. Note that, unlike
the mathematical model in Figure 8.1, in Figure 8.2 the tails do end—at IQs of 0 and 200, respectively. This
indicates that there are natural upper and lower limits in the actual measurement tool being utilized. For
example, say it is a test with 200 questions and someone's IQ is the number of correct answers. Two geniuses
each score 200, yet, if one of the two is really twice as smart as the other and the IQ test had 400 questions
instead of 200, the one genius would score 400 whereas the other would only get 200.
Suppose Sandra takes this IQ test, and her score, which we designate as x, is 115. We would like to know
what proportion of people would likely have higher IQs than Sandra's and what proportion would have lower
IQs. (Sorry, this procedure cannot tell us how many will have exactly Sandra's IQ.) It turns out that the area
under the curve corresponds to the proportion of people with a particular characteristic. The total area under
the curve (1.00 proportion) accounts for all (100%) people. Since a normal curve is symmetric, .50 proportion
(50%) of the area under the curve falls below the mean, and .50 proportion falls above the mean. Thus, half
of all people should have IQs below 100, and half should have IQs above it. This proportion of the area also
pertains to the probability of randomly selecting a person with a particular characteristic. Since .50 proportion
of the area of the curve is below the mean, there is also a .50 probability of randomly selecting a person
whose IQ is below 100. Likewise, there is a .50 probability of randomly selecting someone whose IQ is greater
than 100.
In Figure 8.3, we have added Sandra's IQ, x = 115. The proportion of people with an IQ greater than 115 is
the shaded area under the curve in the right tail, from x = 115 to x = 200. The proportion of people with IQs
below 115 is represented by the remaining unshaded area under the curve, from the left of x = 115 to x =
0. Note that the unshaded area has two components, the .50 proportion of IQs less than 100 plus the area
under the curve from 100 to 115.
To find these areas, we use a table of areas under the normal curve that applies to all normal distributions.
Page 4 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
To use the table, we begin by calculating what is called a standard score, which is universally designated
by the letter z. (Its relationship to our z test of significance will be explained later.) To calculate z, we recast
the distance from the mean (100) to the value of x we are studying (Sandra's IQ of 115), expressed as
standard deviation units. The distance from the mean of Sandra's IQ is x − µ = 115 − 100 = 15; her IQ is 15
points greater than the mean. To convert that distance into standard deviation units, we divide by the size of
the standard deviation, σ = 10, which was given to us. Since 15/10 = 1.5, we know that Sandra's IQ is 1.5
standard deviation units from the mean. Expressing the whole process in a single equation, we get
Figure 8.3
Standard score A score universally designated by the letter z, in which that score is
expressed in standard deviation units from the mean.
We now go to Table 8.1 and see that each page has three blocks of figures, and, in turn, each block has three
columns: A, B, C. Column A lists a value of z. Column B shows the area under the curve from the mean out to
that specified value of z (note the graphs above each column). Column C shows the area in the tail beyond z.
Note that the area in column B plus the area in column C always add to .5000. Also note that as z gets larger,
the area in column B gets larger, and the area in column C gets smaller.
The bigger the z, the smaller the tail.
Page 5 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
We find our z of 1.5 at the bottom of the center block on the second page of Table 8.1. Note that at z = 1.5,
the number in column B is .4332, and the number in column C is .0668. The area in the tail, corresponding
to the proportion of people with IQs greater than 115, is the number in column C, .0668—only 6.68% have an
IQ higher than Sandra's. To find the proportion with IQs less than 115, we take the area in column B and add
to it .50, the proportion with IQs below the mean.
Thus, .9332 proportion of people has IQs below 115. Sandra's pretty bright!
If Sandra is bright, George is not; his IQ is only 80. Let us find the proportions of area above and below 80
(see Figure 8.4). First we find z.
Here, z is negative since x is less than µ. In Table 8.1, we will look for the absolute value of our z (2.00) but
use the graphs at the bottom of the page to see that now the shaded areas are to the left of the mean. In the
center of the left-hand block of the third page of the table, you will find z = 2.00. The column B area is .4772,
and the column C area is .0228. Accordingly, only .0228 proportion of people has IQs below George's. To find
the other proportion, add the column B figure to the .50 whose IQs exceed the mean: .4772 + .5000 = .9772
proportion. Poor George! Nearly 98% of all IQs exceed his.
Page 6 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Table 8.1 Proportions of Area Under Standard Normal Curve
Page 7 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Page 8 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Page 9 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Page 10 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Now we can solve a mystery: the source of the critical values of z used in the previous chapter. Although
a much more detailed table of areas under the normal curve is needed to find all the critical values of z
presented in Chapter 7, we can find approximate values using Table 8.1.
Page 11 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.4
We simply specify a particular tail area, find it in column C, and read the corresponding z score from column
A.
For example, to find the z value for the one-tailed .05 level, we see that in column C, the two closest
approximations are .0505 (z = 1.64) and .0495 (z = 1.65). Actually, the mean of the two z values, 1.645, is the
true critical value, but when we round to two decimal places, we get 1.65. Likewise, the closest tail area to
.01 is actually .0099, and its z is 2.33. For the .001 level, we find .0010 occurring three times, where z is 3.08,
3.09, and 3.10. If this table had used more decimal places, we would see that 3.09 would be our best-fitting
value of z.
For a two-tailed test, we need two tails whose areas, added together, equal the probability level desired. We
take one half of the probability level as the area to locate in column C. For instance, at p = .05, we need
two equal tails whose areas add to .05, so .05/2 = .025. Finding .0250 in column C, we see that z = 1.96.
(Unfortunately, Table 8.1 is not complete enough for us to find the other two-tailed values of z.)
Figures 8.5 and 8.6 summarize the relationships between z values and probabilities for one and two tails,
respectively.
THE ONE-SAMPLE z TEST FOR STATISTICAL SIGNIFICANCE
The formula we used in the previous chapter to test for statistical significance,
, is
really a reworking of the formula we have been using in this chapter, except that it is applied to a specific type
Page 12 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
of frequency distribution known as the sampling distribution of sample means. This sampling distribution
is the frequency distribution that would be obtained from calculating the means of all theoretically possible
samples of a designated size that could be drawn from a given population. To illustrate this definition, let us
imagine a population of 5 people with scores ranging from 1 to 5. Using a sample size of n = 3, what are the
different combinations of scores that we would obtain in selecting all possible samples of 3 people out of the
original 5? If we consider the order of selection of the people (e.g., 5–4-3 is one sample, 4–5-3 is another,
and 3–4-5 yet another), there are actually 60 possible samples that could be drawn from this population. If we
disregard the order of selection, there are only 10 possible combinations of scores.
Page 13 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.5 Critical Values of z—One Tail
Page 14 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.6 Critical Values of z—Two Tails
Page 15 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Sampling distribution of sample means The frequency distribution that would be
obtained from calculating the means of all theoretically possible samples of a designated
size that could be drawn from a given population.
We will work with these 10 samples of 3 people with the 3 scores shown in the columns. For each of these
samples, we can calculate the mean.
The frequency distribution of all these possible sample means is as follows:
We graph this frequency distribution, which will approximate the sampling distribution of sample means, as a
histogram, as shown in Figure 8.7.
Note that the histogram in Figure 8.7 has a pattern that begins to resemble a normal curve in the sense that it
is unimodal and symmetric. In fact, if either the population from which the samples are drawn is itself normally
distributed along the variable x and/or self-normally distributed along the variable x and/or the samples drawn
from that population are sufficiently large, the sampling distribution of sample means will also be a normal
distribution. This characteristic will prove to be very useful to us.
Page 16 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.7 The Sampling Distribution of Sample Means Obtained From 10 Samples of Size n = 3 From
a Specified Population
The formal statement of these characteristics comes from what is known as the central limit theorem and a
related theorem known as the law of large numbers.
The Central Limit Theorem
According to the central limit theorem, if repeated random samples of size n are drawn from a population
that is normally distributed along some variable x, having a mean µ and a standard deviation σ, then the
sampling distribution of all theoretically possible sample means will be a normal distribution having a mean µ
and a standard deviation
.1
Central limit theorem If repeated random samples of size n are drawn from a population
that is normally distributed along some variable x, having a mean μ and a standard
deviation σ, then the sampling distribution of all theoretically possible sample means will
be a normal distribution having a mean μ and a standard deviation
If, for a particular population, some variable (x) is normally distributed and we draw a series of samples of a
predetermined size (n) from that population, the central limit theorem tells us that
The sampling distribution of sample means will be a normal distribution.
Page 17 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
The mean of the sampling distribution of sample means, the mean of all the sample means
(designated
), will be equal to µ, the mean of the population from which the samples were
originally drawn.
The standard deviation of the sampling distribution of sample means will be equal to
, the standard deviation of the population from which the samples were drawn divided by the
square root of the size of the samples that we were drawing. This standard deviation of our
sampling distribution,
, is also called the standard error of the mean or, more often, just
the standard error, and is sometimes designated with the symbol
.
Standard error of the mean or the standard error The standard deviation of the
sampling distribution, designated with the symbol
To illustrate: Suppose your state or province mandates a series of competency examinations in reading, math,
and so on, to be taken by all schoolchildren in selected grades. Assume that on the math competency exam
given to all ninth-graders, the mean, µ, is 70, and the standard deviation, σ, is 20. We wish to compare these
results to those we would have found if we had studied random samples of ninth-graders rather than the
whole population. How would the sample means be distributed? Assume we select random samples of size
n = 100.
According to the central limit theorem, the sampling distribution of sample means would be a normal
curve with a mean
and a standard deviation (the standard error of the mean)
. This is graphed in Figure 8.8.
The implications of Figure 8.8 are immense since they suggest that the overwhelming majority of theoretically
possible sample means are going to fall very close to the original population's mean. There are tails to the
curve in Figure 8.8 above and below the mean, and they are asymptotic to the x-axis, but they are so tiny that
they are barely perceivable.
In fact, we know the following about normal distributions: 68.27% of the area under the normal curve (thus
68.27% of all sample means) falls between
that is, one standard deviation above and
below the mean. Since in this particular example, the standard deviation of the sampling distribution (the
standard error of the mean) is 2.0, we are observing that 68.27% of all sample means will lie between 68
and 72. We also know that 95.45% of the area under the curve (95.45% of the sample means) falls between
, and
99.73% of the area falls between
.
In this particular example:
Page 18 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.8 The Actual Appearance of a Sampling Distribution of Sample Means for Samples n = 100
Drawn From a Population “Where µ = 70 and σ = 20
With 99.73% of sample means falling between 64 and 76, the remaining 0.27% of all sample means must
Page 19 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
fall below 64 or above 76. Out of 1000 samples, only 2 or 3 would have means below 64 or above 76—an
extremely improbable, but statistically possible, event.
The central limit theorem becomes most useful to us when we are given, as before, µ and σ for a population
and data about one random sample presumably drawn from that population. In that case, we are asking how
likely it is that, from the given population, we could draw a random sample whose
as we observe. If the likelihood is low, we might better conclude that the
differs from µ by as much
reflects a population with a mean
other than the µ of the population from which we initially assumed that the sample was drawn.
This brings us to the kind of problem presented here and in the previous chapter. Suppose in our competency
exam example, we have a random sample of 100 ninth-graders who had been enrolled in a 6-week-long
course to prepare for this examination. This sample's mean score is 73. Thus, H0: µall= µcourse. Assuming
advance data on which to make a directionality assumption, we could write
We calculate z using the formula from Chapter 7 and compare zobtained to the critical values of z.
Since 1.50 < 1.65, we cannot reject H0. The course appears to have been unsuccessful.
With this formula, we are finding our sample's
finding the distance from that
on the x-axis of the sampling distribution of sample means,
to the mean of the sampling distribution, and converting that distance into
standard deviation units (standard scores) based on the standard deviation of the sampling distribution. To
see how this works, let us start by converting our simple z formula from symbols to words.
Remember that the frequency distribution is the sampling distribution of sample means. Now note the
following:
The value of the variable whose distance from the mean (of the sampling distribution) we wish to
find is the
for those taking the preparatory course.
The mean of our frequency distribution (the sampling distribution),
, according to the central limit
theorem, equals µ for all the ninth-graders.
The standard deviation of our frequency distribution, which for a sampling distribution is called the
standard error, according to the central limit theorem equals σ for all the ninth-graders divided by
Page 20 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
the square root of our sample size:
.
Substituting this information from the central limit theorem for the words in our equation, we get
Simplifying,
Or, in general terms,
Thus, the formula for the one-sample z test of significance is really a recasting of the basic formula z = (x
– µ)σ to apply to the sampling distribution. The central limit theorem enables us to find a z value on the
sampling distribution from data pertaining to the population and the sample. This is illustrated in Figure 8.9,
which shows our sampling distribution (not drawn to actual scale) and its components.
We now see why a directional H1 is dubbed a one-tailed H1—we only make use of one tail on the
sampling distribution. Without a directionality assumption, we would move out from the mean of the sampling
distribution toward both the left and the right, examine the size of both of the tails by comparing the absolute
value of zobtained to zcritical, and pay the price of needing a larger zobtained than is needed when using
only one tail.
Page 21 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.9 The Sampling Distribution of Sample Means for Samples n = 100 Based on Competency
Exam Data (Hypothetical)
Review
Before we proceed, let us review the fact that in using the central limit theorem, we are working with three
separate frequency distributions: the population, the sample, and the sampling distribution of sample means.
We are given information about the first two distributions. The central limit theorem then enables us to take
data from those two distributions and make use of the properties of the sampling distribution. We know the
following:
The frequency distribution for variable x for some population. We assume that this distribution is
normal. We know its mean i and its standard deviation σ.
The frequency distribution of a particular random sample that we have drawn. We know its size n
and its mean
. The variable x in our sample is the same variable x in our population.
The sampling distribution of sample means. (You never see this distribution; you just make use
of it!) There exists a separate sampling distribution of sample means for each possible sample
size (each n). For any given n, this represents the frequency distribution of all possible sample
means from all possible samples drawn randomly from that population whose mean and standard
deviation along variable x are μ and σ, respectively.
For the specific sample that we have drawn, our sample mean
will be one point (one value of
) on that
sampling distribution. The central limit theorem enables us to find the distance from the sample's mean to the
Page 22 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
population's mean, expressed as standard errors or standard deviations of the sampling distribution. Since
the sampling distribution is a normal curve, we may determine the probability of our sample's
reflecting a
population whose mean is i and, based on that probability, either retain or reject our null hypothesis.
The Normality Assumption
Note that the central limit theorem assumes that the population we are studying is normally distributed along
variable x. This is called the normality assumption. If it is true, the sampling distribution of sample means
will be a normal distribution, and we may make use of the z formula to test for statistical significance. (Note
that nothing requires that our sample be normally distributed.) What if we know that the population is not
normally distributed, or more realistically, what if we have no basis for making a normality assumption about
the population in the first place? Even in such cases, if our sample's size is large enough, we may still be able
to make use of the central limit theorem due to the law of large numbers.
Normality assumption The assumption that that the population being studied is normally
distributed along variable x.
The law of large numbers states that if the size of the sample, n, is sufficiently large (no less than 30;
preferably no less than 50), then the central limit theorem will apply even if the population is not normally
distributed along variable x. Thus, if n is large enough, the population distribution need not be normal and
could, in fact, be anything: skewed, bimodal, trimodal, anything. When n is large enough, we relax the
normality assumption for our population, but the sampling distribution of sample means will still be a normal
curve, and the central limit theorem will still apply.
Law of large numbers A law that states that if the size of the sample, n, is sufficiently
large (no less than 30; preferably no less than 50), then the central limit theorem will apply
even if the population is not normally distributed along variable x.
How large must n be to relax the normality assumption? The figures given in the above theorem are rather
arbitrary; other sources give other cutoffs. In fact, in some texts of statistics for psychology (which often only
requires small samples or small experimental and control groups), the minimum sample size is as low as 15,
but that is probably too low Perhaps we ought to put it this way:
Page 23 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
In social science survey research, our sample sizes are generally large enough to make use of the law of
large numbers. This is particularly fortunate, since in actual research all too often, the issue of the normality
assumption is not adequately addressed.
Let us look at an example. At a small liberal arts college, an index of support for civil liberties, ranging from
0 (least supportive) to 10 (most supportive), was pilot tested on the entire student body, yielding a mean of
7.5 and a standard deviation of 1.5. A random sample of 100 students who had been the direct victims or
close relatives of victims of serious crimes was also given the test, and their mean score was 7.2. May we
conclude that for all similar victims, the support score for civil liberties differs in general from the population of
all students at that college?
Our hypotheses are
Since n =100, we may relax the normality assumption for the population. We have all necessary data for a
one-sample z test.
We compare the absolute value of z to the two-tailed zcritical values:
We conclude, therefore, that the civil liberties support score for all serious crime victims at this college is lower
than the average for the college as a whole (p < .05). (The sampling distribution is shown in Figure 8.10.)
THE ONE-SAMPLE t TEST
We know that to do the one-sample z test, we need to know or be able to hypothesize two population
parameters, μ and σ. What could we do in the unlikely event that we know μ but not σ? Initially, the sample
standard deviation s was assumed to be a good estimate of σ, so s was substituted when σ was unknown.
Once the sample mean
Page 24 of 47
had been calculated, s was generated using the definitional formula
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
or one of several possible computational formulas.
Figure 8.10 The Sampling Distribution of Sample Means for Civil Liberties Support Scores
However, it was discovered that, particularly when the sample size n was small, calculating z with s produced
inaccurate conclusions. A British quality control expert2 working for a Dublin brewery discovered that by
calculating a different estimate of σ from sample data, a better test of significance could be developed. This
new best “unbiased” estimate of σ, which we designate
(read as “sigma-hat,” because sigma is wearing a
hat), is created when we substitute n − 1 for n in the standard deviation formula.
Sigma-hat ( ) An estimate of sigma.
This new test of significance is called the tttest to differentiate it from the z test; note that the formulas are the
same except that
is substituted for σ
When n is large, the substitution of σ for s makes very little difference, but as n gets smaller, σ and s diverge,
Page 25 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
causing a likewise divergence between t (using σ) and z (using s to estimate σ).
t test A test of significance similar to the z test but used when the population's standard
deviation is unknown.
The sampling distributions of t and z also differ. In the case of the z test, the sampling distribution of sample
means is a normal curve. Since the value of each sample mean can be expressed as a z score (indicating the
distance x is from µ in terms of standard errors), the sampling distribution of sample means is the same as
the distribution of all the z scores from all the theoretically possible sample means that make up the sampling
distribution. Thus, the sampling distribution of z (all the zs from those sample means) is also a normal curve.
If we take the same means in our sampling distribution and calculate t scores instead, the sampling
distribution of t (all the ts from those sample means) is a normal distribution only when the sample sizes are
above 120. As the sample sizes fall below 120 (give or take), the sampling distribution begins to be flatter
than a normal curve (say platykurtic, if you want to impress your friends). When the curve is flatter than a
normal curve at its peak, the tails are also larger than those of a normal curve. (The effect is similar to pushing
a balloon down from its top, thus displacing the air to the sides as we press.) As n gets smaller, the peak
of the sampling distribution gets flatter, and its tails get larger. The important consequence is that as n gets
smaller, we must go ever-greater distances away from the mean to get a tail area equal to .05 proportion of
the area under the curve.
Figure 8.11 shows the changes in the critical values of t (.05 level, one-tailed) as n decreases. At n = 121,
the sampling distribution is nearly a normal curve, and tcritical is 1.658, only slightly larger than zcritical (.05
level, one-tailed), which is 1.65 (actually 1.645 before rounding). In fact, as n increases above 121, the critical
values of z and t get ever closer to each other. As n gets extremely large, approaching infinity as a limit, the
critical values of z and t become the same. However, as n drops below 121, the tcritical value gets larger. In
other words, we have to go farther out to get a tail with .05 of the area under the curve in it. By the time n =
21, tcritical has gone from 1.658 to 1.725, and at n = 6, tcritical has risen to 2.015.
Page 26 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Figure 8.11 Changes in the Sampling Distribution of t as Sample Size Decreases
In comparing the critical values of t to those of z, bear in mind that the sampling distribution of z is always
a normal distribution, and its critical values remain constant, independent of sample size. By contrast, the
critical values of t depend on sample size. At best, when n is large, the critical values of t are almost as
small as those of z. But as n decreases, the critical values of t get larger, making it harder to reject the null
hypothesis. Thus, if we know σ and can therefore do a one-sample z test, we always do the z test, not the
Page 27 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
t test. We do the one-sample t test only if σ is unknown and we must estimate it with σ. In fact, when n
gets large (say 30 or more), many statisticians advocate the use of the z test, with s substituting for σ in the
formula, rather than the use of the t test. But with a smaller n where o is unknown, we must always do the t
test and retain the normality assumption for the population.
Degrees of Freedom
Note that in Figure 8.11 under each of the three reported ns—121, 21, and 6—is another number labeled df,
which is one less than n—120, 20, 5. As we learned in Chapter 7, the df stands for degrees of freedom, a
number we generate to make use of a table of critical t values. In the case of the one-sample t test,
Degrees of freedom A number that is generated to make use of a table of critical values.
We need to find the degrees of freedom in order to find the critical values of t against which we compare our
obtained t. As noted, the sampling distribution of t changes from a normal curve as n decreases, and thus the
critical values change as well. As we see in Figure 8.11, at 120 degrees of freedom (n = 121) we need a t of
1.658 to have one tail on the sampling distribution with a .05 area. By the time degrees of freedom drops to
5, we need a t of 2.015.
Tables of critical values for all tests of significance beyond the z test require that we first calculate a degreesof-freedom figure to make use of the table. Why find df? Why not base the tables on n as we did the sampling
distributions in Figure 8.11? The simplest answer to the question is that there are several formulas that
generate t scores, not just the one presented in this chapter. Likewise, for each of the different t formulas,
there is a separate degrees-of-freedom formula. The formula df= n − 1 is used only for the one-sample t
test presented here. In the next chapter, we will discuss some of the other t formulas, each having its own
degrees-of-freedom formula, but all making use of a common table of critical values of t. Without degrees of
freedom, we would need a separate table of critical values for each separate formula.
There is a mathematical meaning to the concept of degrees of freedom, having to do with how many numbers
are free to vary in a formula. For instance, if x1 + x2 + x3 = 10 and you let any two of the scores vary (say
we make x1 = 2 and x2 = 5), then the remaining value of x is fixed. Since 2 + 5 = 7 and 7 + x3 = 10, once x1
and x2 are determined, x3 can take on only one value. In this case, x3 = 3. So three unknowns adding up to
a fixed sum has two degrees of freedom. Only two of the unknowns are free to vary. At the level of applied
statistics that we cover in this book, it is not really necessary to know the definition of degrees of freedom
to make use of the concept. So we will simply move on, referring the curious to more advanced texts. For
our purposes, degrees of freedom are simply numbers that we must calculate to make use of critical values
tables for t and the other tests of significance to be encountered later.
Page 28 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
THE t TABLE
The table of critical values of t, found in Table 8.2 and also in the Appendix, is simple to use. At the top are
levels of significance for a one-tailed test (a directional H1), and below it are the corresponding levels for a
two-tailed test. Thus, tcritical one-tailed at the .10 level is the same as tcritical two-tailed at the .20 level. The
one-tailed probability levels are always one half of the corresponding two-tailed levels.
Since we always begin by comparing tobtained to tcritical at the .05 level, we first isolate the appropriate .05
column for whichever H1 (one-tailed or two-tailed) we are using. Then we go down the df column on the far
left until we come to the number that we found in the df formula. Noting the values highlighted earlier in Figure
8.11, if df is 120, we go all the way down the df column until we find 120. We then move across the row until
we are under the .05 level for a one-tailed test. At the intersection of the 120 row and the .05 column, we find
the critical value of t, 1.658. Likewise, in the same .05 column, we find the tcritical of 1.725 in the row for 20
degrees of freedom and 2.015 in the row for 5 degrees of freedom.
Page 29 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Table 8.2 Distribution of t
Page 30 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Page 31 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Under the df = 120 row, we note the symbol for infinity (an eight that has gone down for the count). In this
case, “infinity” is any df above 120. Here, the sampling distribution has become (or is in the act of becoming)
a more perfect normal curve. Note that at this point, there is no difference between the critical values of t and
those of z.
If you cannot find the df that you need in the table, go to the nearest critical value that makes it harder to
reject H0. In the case of Table 8.2, move up to the next lower df For instance, if the df is 35, a number not
presented in the table, go up to 30 df and use those critical values. Thus, if the t obtained in a one-tailed test
at 35 df were 1.7, you would compare it to the .05 critical value at 30 degrees of freedom, 1.697. Since 1.7 is
greater than 1.697, you would reject H0. What if the obtained t were 1.690? That would be less (barely) than
1.697, and you could not reject H0 using this table. However, you would be right in assuming that had you
known tcritical at 35 degrees of freedom, there would be a good chance that it would be equal to or less than
your t of 1.690. In this case, consult a book of tables for statisticians, which would have a more complete t
table than the one used here.3
If the obtained t exceeds tcritical at the .05 level, you then compare it to the critical values to the right of
the .05 column. Following the same procedure used for the z test, you make your probability statement by
seeing how many critical values are less than the obtained t. The only difference is that in the t table, there
are critical values for levels other than .05, .01, and .001. Suppose at 60 df, we obtain a t value of 3.0 using a
nondirectional H1. Going down the .05 level column, for the two-tailed test, we see at the 60 df row a critical
value of 2.000. We can reject H0. We then compare our 3.0 obtained t to the critical values to the right of the
2.000 we exceeded. We exceed the 2.390 (.02 level) and the 2.660 (.01 level) but not the 3.460 critical value
at the .001 level. Thus, we report p < .01. Had this been a one-tailed test, we would be reporting p < .005.
AN ALTERNATIVE t FORMULA
We have been using the following formulas:
where
Suppose you did not have access to but did know the original sample standard deviation of
Rather than recalculating, you may make use of the s in a modified t formula:
Page 32 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Again, remember that if you get the standard deviation from either a computer printout or a calculator with
a standard deviation function built in, consult the appropriate manual to find out how that standard deviation
was calculated to determine whether you have an s or a σ. Then pick the appropriate t formula to use.
A z TEST FOR PROPORTIONS
The formula for the z test for sample means may be modified to test the difference in proportions in a sample
compared to the equivalent difference in proportions in a population.
ztest for proportions A z test designed to test whether the difference between
proportions in a sample reflects the difference in the population.
For instance, suppose that in some small community, the proportions of minorities (people of African or
Hispanic origin) make up 20% (.20 proportion) of the population. The new school superintendent suspects
that minorities are underrepresented among the 100 teachers in her public school system since there are only
15 minority faculty, 15% or a .15 proportion. For such a problem,
where
Ps = the proportion of minorities in the sample = .15,
Pp = the proportion of minorities in the population = .20,
Qp = the proportion of nonminorities in the population = 1 – Pp = 1 – .20 = .80,
n = the size of the sample or group being studied = 100.
Here,
Using the directional zcritical at the .05 level of 1.65, we cannot reject H0 since 1.25 < 1.65. We cannot
Page 33 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
conclude that minorities are underrepresented among the teachers.
Interval Estimation
We have already discussed the fact that if we did not know σ, our best estimate of it from sample data would
be
. Likewise, our best estimate of µ would be
. Suppose we wanted to estimate μ from
. We know from
the sampling distribution of sample means that not all sample means will be exactly equal to μ, even though
our one
is the best estimate of that parameter. With interval estimation, we establish an interval of scores
called a confidence interval, and we state with a certain level of confidence that the μ will fall within the limits
of the interval we created.
Interval estimation An interval of scores that is established, within which a population's
mean (or another parameter) is likely to fall, when that parameter is being estimated from
sample data.
Confidence interval (for means and proportions) An estimated interval within which we
are “confident”—based on sampling theory—that the parameter we are trying to estimate
will fall.
For instance, we can see from our sampling distribution that with no directionality assumption, 95% of all
sample means lie between µ and ± 1.96 standard errors. Likewise, 99% lie between µ and ±2.58 standard
errors. The number of standard errors corresponds to the two-tailed zcriticals at the .05 and .01 levels,
respectively. Also, 99.9% of all sample means lie between µ and ± 3.29 standard errors, and 3.29 is the
critical z at the .001 level. Suppose we would be satisfied to find the interval within which 95% of all sample
means would fall. We build an interval around the
and assume that µ will fall within that interval. We
call this the 95% confidence interval, our level of confidence corresponding to the percentage of all means
falling within the interval. Thus, we are 95% confident that µ will lie in the interval between
and
.
Remembering that we already know that
, we find our confidence interval by the following
formula:
Suppose
= 55, σ= 10, and n = 64. The upper limit of our interval would be
Page 34 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
Our lower limit would be
Thus, the 95% confidence interval for estimating µ is 52.55 to 57.45. We know that 95% of all sample means
fall within the interval, so we are 95% confident that µ will be between 52.55 and 57.45.
Suppose we wanted a greater level of confidence, 99%. The price we would pay for it would be a wider
confidence interval. For our upper limit,
and for our lower limit
We are 99% confident that µ falls between 51.77 and 58.23.
If σ is unknown, which is generally the case, we may do exactly the same procedure with the t test using
either of the following formulas:
The nondirectional tcritical at df = n − 1 at the .05 level would be used for a 95% confidence interval, the
tcritical at the .01 level would be used for a 99% confidence interval, and so on.
Confidence Intervals for Proportions
Imagine that you are a campaign manager of a presidential candidate in a two-person race. A telephone
survey of 900 voters gives your candidate a 53% lead over the opponent. How likely does that percentage
lead reflect the electorate? You seek to construct a 95% confidence interval around the .53 proportion that
your candidate received in the sample. The formula we use is
The 1.96 is the appropriate critical value of z—in this case, at the .05 level since we chose a 95% confidence
Page 35 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
interval.
Ps = your candidate's proportion of support in the sample = .53.
Pp = your candidate's proportion of support in the population, which we estimate with Ps. Thus,
Ps = Pp = .53.
Qp = the opponent's proportion of support in the population, which we estimate from the sample
by subtracting Ps from 1. Thus Qp = 1 – Pp = 1 – Ps = 1 – .53 = .47.
n = the number of cases, which must equal or exceed 5/min(Ps, 1 – Ps), that is, 5 divided by
whichever is smaller, Ps or 1 – Ps.
Thus, the upper limit of our 95% confidence interval is
The lower limit would be
So our confidence interval ranges from .50 to .56. Since in percentages, this is 50% to 56%, a range of 6
percentage points, we report that according to our poll, our candidate has a 53% lead, but our margin of error
is plus or minus 3 percentage points. Our candidate could receive as little as 50% or as much as 56%. If we
had chosen a 99% confidence interval, our confidence interval would be larger and so would the margin of
error reported.
When n is small or if we want to be particularly sure of our estimate, it is safer to make a more conservative
estimation of Pp than to use Ps. Here, we assume that each candidate has half of the vote. Thus, Pp = .50
and Qp = .50. This will yield a larger confidence interval than any other estimate of Pp would generate. By
widening the interval, we minimize the risk in making our estimate. Suppose Ps were .53, but n = 150 instead
of 900. We estimate Pp and Qp as .50, respectively. For the upper limit,
Page 36 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
And our lower limit would be
Here, our 95% confidence interval ranges from .45 to .61, and we have an 8 percentage point margin of error.
More on Probability
Suppose we have developed a scale to be used in a survey. This scale measures the extent to which the
respondent is aware of and knowledgeable about HIV and AIDS. Assume that the scale ranges from a low of
0 to a high of 100 and is a normal distribution with a mean of 50 and standard deviation of 15. We may thus
apply the z formula to this distribution to determine the proportion of cases falling within a specified range of
scores. Let us use this scale to extend our discussion of probability.
We begin by outlining some new notations and defining them.
P(A) = the probability of outcome A occurring.
P(A or B) = the probability of either outcome A or outcome B occurring.
P(A and B) = the probability of both outcomes A and B occurring jointly.
P(A | B) = the probability of outcome A occurring given that outcome B has already occurred
(conditional probability).
Conditional probability The probability of outcome A occurring given that outcome B has
already occurred.
Let us illustrate using our AIDS awareness scale. Suppose outcome A is the probability of selecting an
individual with an AIDS awareness score of 70 or above. Since x = 70, µ = 50, and σ = 15, we apply the z
formula and find z = 1.33. Looking at Table 8.1, column C, we find a probability of .0918. Thus, P(A) = .0918.
Let outcome B be the probability of selecting someone with an AIDS awareness score of 40 or below.
Plugging into the z formula, we obtain a z of −0.66 and find a probability of .2546 from Table 8.1. Thus, P(B)
= .2546.
Page 37 of 47
Probability Distributions and One-Sample z and t Tests
SAGE
SAGE Research Methods
2006 SAGE Publications, Ltd. All Rights Reserved.
The Addition Rule
Suppose we would like to know the probability of selecting someone whose AIDS awareness score is either
70 or above or 40 or below, P(A or B). Now outcomes A and B are known as mutually exclusive outcomes.
If one has an AIDS awareness score above 70, one cannot also have an AIDS awareness score below 40.
When outcomes are mutually exclusive, a rule known as the addition rule tells us that
In this case,
Addition rule A rule by which when outcomes are mutually exclusive, the probability of
either outcome occurring is the sum of the probabilities of each outcome occurring.
I...