Cal State Sample Proportion Questions
CHAPTER 18:Comparing Two
Proportions
Lecture PowerPoint Slides
In chapter 18, we cover …
Two-sample
problems: proportions
The sampling distribution of a
difference between proportions
Large-sample confidence intervals for
comparing proportions
Significance tests for comparing
proportions
Plus four confidence intervals for
comparing proportions*
Two-sample problems
Comparing
two populations or two
treatments is one of the most common
situations encountered in statistical practice.
We call such situations two-sample
problems.
TWO-SAMPLE PROBLEMS
The goal of inference is to compare the
responses to two treatments or to compare
the characteristics of two populations.
We have a separate sample from each
treatment or each population.
Two-sample problems: proportions
Suppose we want to compare the proportions of
individuals with a certain characteristic in Population 1
and Population 2. Let’s call these parameters of interest
p1 and p2. The ideal strategy is to take a separate
random sample from each population and to compare
the sample proportions with that characteristic.
What if we want to compare the effectiveness of
Treatment 1 and Treatment 2 in a completely
randomized experiment? This time, the parameters 𝑝1
and 𝑝2 that we want to compare are the true proportions
of successful outcomes for each treatment. We use the
proportions of successes in the two treatment groups to
make the comparison. Here is our notation:
Population
Population
Proportion
Sample Size Sample
Proportion
1
𝑝1
𝑛1
𝑝Ƹ1
2
𝑝2
𝑛2
𝑝Ƹ 2
Sampling distribution
of a difference between proportions
The Sampling Distribution of the Difference Between
Proportions
Choose an SRS of size 𝑛1 from Population 1 with
proportion of successes 𝑝1 and an independent SRS of
size 𝑛2 from Population 2 with proportion of successes
𝑝2 .
When the samples are large, the distribution of 𝑝Ƹ1 − 𝑝Ƹ2 is
approximately Normal.
The mean of the sampling distribution is 𝑝1 − 𝑝2 . That is,
the difference between sample proportions is an
unbiased estimator of the difference between population
proportions.
The standard deviation of the distribution is
𝑝1 1 − 𝑝1
𝑝2 1 − 𝑝2
+
𝑛1
𝑛2
Large-sample
confidence intervals for comparing proportions
To obtain a confidence interval, replace the
population proportions 𝑝1 and 𝑝2 in the
standard deviation by the sample
proportions. The result is the standard
error or standard deviation of the statistic
𝑝Ƹ1 − 𝑝Ƹ 2 :
SE =
The
𝑝Ƹ1 1 − 𝑝Ƹ1
𝑝Ƹ 2 1 − 𝑝Ƹ 2
+
𝑛1
𝑛2
confidence interval has the same form
we met in Chapter 16:
estimate ± 𝑧 ∗ SE
Large-sample
confidence intervals for comparing proportions
LARGE-SAMPLE
CONFIDENCE INTERVAL
FOR COMPARING PROPORTIONS
Draw an SRS of size 𝑛1 from a large population
having proportion 𝑝1 of successes and draw an
independent SRS of size 𝑛2 from another large
population having proportion 𝑝1 of successes.
When 𝑛1 and 𝑛2 are large, an approximate level
C confidence interval for 𝑝1 − 𝑝2 is:
𝑝ො1 − 𝑝ො2 ± 𝑧 ∗ SE
where the standard error, SE, is calculated as
above, and 𝑧 ∗ is the critical value for the standard
Normal density curve with area C between –𝑧 ∗
and 𝑧 ∗ .
Example
STATE: “Would you date a person of a different race?”
Researchers collected data from Match.com: a random sample of
100 black males and a random sample of 100 black females were
selected from the dating site, with 75 of the black males indicating
their willingness to date white females and 56 of the black
females indicating their willingness to date white males. How
large is the difference between the proportions of black males
and females who would be willing to date whites?
PLAN: Take black males to be Population 1 and black females to
be Population 2. The population proportions who are willing to
date whites are 𝑝1 for black males and 𝑝2 for black females. We
want to give a confidence interval for the difference, 𝑝1 − 𝑝2 .
SOLVE: We will give a 95% confidence interval for 𝑝1 − 𝑝2 , the
difference between the proportions of black males and black
females who would be willing to date someone who is white. Look
at the counts of successes and failures in the two samples.
Example
SOLVE: All four of these are greater than 10, so the large-sample
method is appropriate:
𝑝Ƹ1 = 75ൗ100 = 0.75, (men)
𝑝Ƹ 2 = 56ൗ100 = 0.56, (women)
The standard error is:
SE =
=
𝑝Ƹ1 1 − 𝑝Ƹ1 ൗ
𝑝Ƹ 2 1 − 𝑝Ƹ 2 ൗ
+
𝑛1
𝑛2
0.75 0.25 ൗ
0.56 0.44 ൗ
+
100
100
=
0.004339 = 0.0659
The 95% confidence interval is
𝑝Ƹ1 − 𝑝Ƹ 2 ± 𝑧 ∗ SE = 0.75 − 0.56 ± 1.96 0.0659
=
0.19 ± 0.13
=
0.06 to 0.32
CONCLUDE: We are 95% confident that the percent of black males
willing to date whites is between 6 and 32 percentage points higher
than the percent of black females who are willing to date whites on
comparable Internet dating sites. Even with sample sizes of 100 in
each group, the resulting confidence interval 0.06 to 0.32 is quite
wide.
Significance test for comparing proportions
𝐻0 : 𝑝1 = 𝑝2 (the same as 𝐻0 : 𝑝1 − 𝑝2 = 0)
To do a hypothesis test, standardize the
difference between the sample proportions 𝑝Ƹ1 −
𝑝Ƹ 2 to get a z statistic.
If 𝐻0 : 𝑝1 = 𝑝2 is true, the two parameters are the
same. We call their common value p. But now
we need a way to estimate p, so it makes sense
to combine the data from the two samples. This
pooled (or combined) sample proportion is:
𝑝Ƹ
number of successes in both samples combined
=
number of individuals in both samples combined
Significance test for comparing proportions
SIGNIFICANCE TEST FOR COMPARING TWO
PROPORTIONS
Draw an SRS of size 𝑛1 from a large population having
proportion 𝑝1 of successes and draw an independent SRS of
size 𝑛2 from another large population having proportion 𝑝2 of
successes. To test the hypothesis 𝐻0 : 𝑝1 = 𝑝2 , first find the
pooled proportion 𝑝ො of successes in both samples combined.
Then compute the z statistic
𝑝ො1 − 𝑝ො2
𝑧=
𝑝ො 1 − 𝑝ො 1ൗ𝑛1 + 1ൗ𝑛2
In terms of a variable Z having the standard Normal
distribution, the P-value for a test of 𝐻0 against:
Ha : 𝑝1 > 𝑝2 is 𝑃 𝑍 ≥ 𝑧
Ha : 𝑝1 < 𝑝2 is 𝑃 𝑍 ≤ 𝑧
Ha : 𝑝1 ≠ 𝑝2 is 2 × 𝑃 𝑍 ≥ 𝑧
Example
STATE: A political
event was remembered by about
31% of those surveyed, despite the fact that it never
occurred. The event was viewed by and. The event
was falsely remembered as having occurred by 212 of
the 616 participants who categorized themselves as
progressive surveyed and by 7 of the 49 participants
who categorized themselves as conservative. How
strong is the evidence that a larger proportion of
progressives have a false memory of this event than
conservatives?
PLAN: Call the population proportions 𝑝1 for
progressives and 𝑝2 for conservatives. Our
hypotheses:
𝐻0 : 𝑝1 = 𝑝2
𝐻𝑎 : 𝑝1 > 𝑝2
Example
SOLVE: The sample proportions who falsely remembered the event
are:
212
= 0.344 (progressives)
616
7
𝑝Ƹ 2 = = 0.143 (conservatives)
49
212+7
219
=
=
= 0.329 (combined)
616+49
665
𝑝Ƹ1 =
𝑝Ƹ
So, our z-statistic is
𝑧
=
𝑝Ƹ1 − 𝑝Ƹ 2
𝑝Ƹ 1 − 𝑝Ƹ 1ൗ𝑛1 + 1ൗ𝑛2
=
0.344 − 0.143
0.329 1 − 0.329 1ൗ616 + 1ൗ49
0.201ൗ
=
0.0697 = 2.88
Software tells us that P = 0.00199.
CONCLUDE: There is strong evidence (P < 0.0025) that, among this
survey’s audience, progressives are more likely than conservatives
to have a false memory.
Plus four confidence
intervals for comparing proportions*
Once
again, adding imaginary observations greatly
improves the accuracy.
PLUS FOUR CONFIDENCE INTERVAL FOR
COMPARING PROPORTIONS
Draw
independent SRSs from two large populations
with population proportions of successes 𝑝1 and 𝑝2 .To
get the plus four confidence interval for the
difference 𝑝1 − 𝑝2 , add four imaginary observations,
one success and one failure in each of the two
samples. Then use the large-sample confidence
interval with the new sample sizes (actual sample
sizes + 2) and counts of successes actual counts + 1 .
Use this interval when the sample size is at least 5 in
each group, with any counts of successes and failures.