# Cal State Sample Proportion Questions

CHAPTER 18:Comparing Two

Proportions

Lecture PowerPoint Slides

In chapter 18, we cover …

Two-sample

problems: proportions

The sampling distribution of a

difference between proportions

Large-sample confidence intervals for

comparing proportions

Significance tests for comparing

proportions

Plus four confidence intervals for

comparing proportions*

Two-sample problems

Comparing

two populations or two

treatments is one of the most common

situations encountered in statistical practice.

We call such situations two-sample

problems.

TWO-SAMPLE PROBLEMS

The goal of inference is to compare the

responses to two treatments or to compare

the characteristics of two populations.

We have a separate sample from each

treatment or each population.

Two-sample problems: proportions

Suppose we want to compare the proportions of

individuals with a certain characteristic in Population 1

and Population 2. Let’s call these parameters of interest

p1 and p2. The ideal strategy is to take a separate

random sample from each population and to compare

the sample proportions with that characteristic.

What if we want to compare the effectiveness of

Treatment 1 and Treatment 2 in a completely

randomized experiment? This time, the parameters 𝑝1

and 𝑝2 that we want to compare are the true proportions

of successful outcomes for each treatment. We use the

proportions of successes in the two treatment groups to

make the comparison. Here is our notation:

Population

Population

Proportion

Sample Size Sample

Proportion

1

𝑝1

𝑛1

𝑝Ƹ1

2

𝑝2

𝑛2

𝑝Ƹ 2

Sampling distribution

of a difference between proportions

The Sampling Distribution of the Difference Between

Proportions

Choose an SRS of size 𝑛1 from Population 1 with

proportion of successes 𝑝1 and an independent SRS of

size 𝑛2 from Population 2 with proportion of successes

𝑝2 .

When the samples are large, the distribution of 𝑝Ƹ1 − 𝑝Ƹ2 is

approximately Normal.

The mean of the sampling distribution is 𝑝1 − 𝑝2 . That is,

the difference between sample proportions is an

unbiased estimator of the difference between population

proportions.

The standard deviation of the distribution is

𝑝1 1 − 𝑝1

𝑝2 1 − 𝑝2

+

𝑛1

𝑛2

Large-sample

confidence intervals for comparing proportions

To obtain a confidence interval, replace the

population proportions 𝑝1 and 𝑝2 in the

standard deviation by the sample

proportions. The result is the standard

error or standard deviation of the statistic

𝑝Ƹ1 − 𝑝Ƹ 2 :

SE =

The

𝑝Ƹ1 1 − 𝑝Ƹ1

𝑝Ƹ 2 1 − 𝑝Ƹ 2

+

𝑛1

𝑛2

confidence interval has the same form

we met in Chapter 16:

estimate ± 𝑧 ∗ SE

Large-sample

confidence intervals for comparing proportions

LARGE-SAMPLE

CONFIDENCE INTERVAL

FOR COMPARING PROPORTIONS

Draw an SRS of size 𝑛1 from a large population

having proportion 𝑝1 of successes and draw an

independent SRS of size 𝑛2 from another large

population having proportion 𝑝1 of successes.

When 𝑛1 and 𝑛2 are large, an approximate level

C confidence interval for 𝑝1 − 𝑝2 is:

𝑝ො1 − 𝑝ො2 ± 𝑧 ∗ SE

where the standard error, SE, is calculated as

above, and 𝑧 ∗ is the critical value for the standard

Normal density curve with area C between –𝑧 ∗

and 𝑧 ∗ .

Example

STATE: “Would you date a person of a different race?”

Researchers collected data from Match.com: a random sample of

100 black males and a random sample of 100 black females were

selected from the dating site, with 75 of the black males indicating

their willingness to date white females and 56 of the black

females indicating their willingness to date white males. How

large is the difference between the proportions of black males

and females who would be willing to date whites?

PLAN: Take black males to be Population 1 and black females to

be Population 2. The population proportions who are willing to

date whites are 𝑝1 for black males and 𝑝2 for black females. We

want to give a confidence interval for the difference, 𝑝1 − 𝑝2 .

SOLVE: We will give a 95% confidence interval for 𝑝1 − 𝑝2 , the

difference between the proportions of black males and black

females who would be willing to date someone who is white. Look

at the counts of successes and failures in the two samples.

Example

SOLVE: All four of these are greater than 10, so the large-sample

method is appropriate:

𝑝Ƹ1 = 75ൗ100 = 0.75, (men)

𝑝Ƹ 2 = 56ൗ100 = 0.56, (women)

The standard error is:

SE =

=

𝑝Ƹ1 1 − 𝑝Ƹ1 ൗ

𝑝Ƹ 2 1 − 𝑝Ƹ 2 ൗ

+

𝑛1

𝑛2

0.75 0.25 ൗ

0.56 0.44 ൗ

+

100

100

=

0.004339 = 0.0659

The 95% confidence interval is

𝑝Ƹ1 − 𝑝Ƹ 2 ± 𝑧 ∗ SE = 0.75 − 0.56 ± 1.96 0.0659

=

0.19 ± 0.13

=

0.06 to 0.32

CONCLUDE: We are 95% confident that the percent of black males

willing to date whites is between 6 and 32 percentage points higher

than the percent of black females who are willing to date whites on

comparable Internet dating sites. Even with sample sizes of 100 in

each group, the resulting confidence interval 0.06 to 0.32 is quite

wide.

Significance test for comparing proportions

𝐻0 : 𝑝1 = 𝑝2 (the same as 𝐻0 : 𝑝1 − 𝑝2 = 0)

To do a hypothesis test, standardize the

difference between the sample proportions 𝑝Ƹ1 −

𝑝Ƹ 2 to get a z statistic.

If 𝐻0 : 𝑝1 = 𝑝2 is true, the two parameters are the

same. We call their common value p. But now

we need a way to estimate p, so it makes sense

to combine the data from the two samples. This

pooled (or combined) sample proportion is:

𝑝Ƹ

number of successes in both samples combined

=

number of individuals in both samples combined

Significance test for comparing proportions

SIGNIFICANCE TEST FOR COMPARING TWO

PROPORTIONS

Draw an SRS of size 𝑛1 from a large population having

proportion 𝑝1 of successes and draw an independent SRS of

size 𝑛2 from another large population having proportion 𝑝2 of

successes. To test the hypothesis 𝐻0 : 𝑝1 = 𝑝2 , first find the

pooled proportion 𝑝ො of successes in both samples combined.

Then compute the z statistic

𝑝ො1 − 𝑝ො2

𝑧=

𝑝ො 1 − 𝑝ො 1ൗ𝑛1 + 1ൗ𝑛2

In terms of a variable Z having the standard Normal

distribution, the P-value for a test of 𝐻0 against:

Ha : 𝑝1 > 𝑝2 is 𝑃 𝑍 ≥ 𝑧

Ha : 𝑝1 < 𝑝2 is 𝑃 𝑍 ≤ 𝑧
Ha : 𝑝1 ≠ 𝑝2 is 2 × 𝑃 𝑍 ≥ 𝑧
Example
STATE: A political
event was remembered by about
31% of those surveyed, despite the fact that it never
occurred. The event was viewed by and. The event
was falsely remembered as having occurred by 212 of
the 616 participants who categorized themselves as
progressive surveyed and by 7 of the 49 participants
who categorized themselves as conservative. How
strong is the evidence that a larger proportion of
progressives have a false memory of this event than
conservatives?
PLAN: Call the population proportions 𝑝1 for
progressives and 𝑝2 for conservatives. Our
hypotheses:
𝐻0 : 𝑝1 = 𝑝2
𝐻𝑎 : 𝑝1 > 𝑝2

Example

SOLVE: The sample proportions who falsely remembered the event

are:

212

= 0.344 (progressives)

616

7

𝑝Ƹ 2 = = 0.143 (conservatives)

49

212+7

219

=

=

= 0.329 (combined)

616+49

665

𝑝Ƹ1 =

𝑝Ƹ

So, our z-statistic is

𝑧

=

𝑝Ƹ1 − 𝑝Ƹ 2

𝑝Ƹ 1 − 𝑝Ƹ 1ൗ𝑛1 + 1ൗ𝑛2

=

0.344 − 0.143

0.329 1 − 0.329 1ൗ616 + 1ൗ49

0.201ൗ

=

0.0697 = 2.88

Software tells us that P = 0.00199.

CONCLUDE: There is strong evidence (P < 0.0025) that, among this
survey’s audience, progressives are more likely than conservatives
to have a false memory.
Plus four confidence
intervals for comparing proportions*
Once
again, adding imaginary observations greatly
improves the accuracy.
PLUS FOUR CONFIDENCE INTERVAL FOR
COMPARING PROPORTIONS
Draw
independent SRSs from two large populations
with population proportions of successes 𝑝1 and 𝑝2 .To
get the plus four confidence interval for the
difference 𝑝1 − 𝑝2 , add four imaginary observations,
one success and one failure in each of the two
samples. Then use the large-sample confidence
interval with the new sample sizes (actual sample
sizes + 2) and counts of successes actual counts + 1 .
Use this interval when the sample size is at least 5 in
each group, with any counts of successes and failures.