# STA 238 University of Toronto W10 The Goodness of Fit & Bootstrapping Discussion

1. Hi class. I have a question about the goodness of fit and bootstrapping. Assume there are two data samples, X and Y. Would it increase the power of the goodness-of-fit test if we “increase” a second data set with some proper bootstrapping?

2.I have one question about this week’s topic. When the test level is constant, the sample size increases, and the sampling error and the denominator value decrease. Will the power of type II error increase? Why?

3.We discussed how the goodness of fit test uses categories in which categories must be at least 5 and all must have the same probability. However, does choose to have more categories, while maintaining our assumptions make our test more accurate? If so, how can we know the maximum amount of categories we can have to have the most accurate/precise goodness fit test?

STA 238
Probability, Statistics and Data Analysis II
Professor K. H. Wong
Week 10: Mar 21/22
March 19, 2022
1 / 12
Learning Outcomes
By the end of this lecture, we will cover…
Hypothesis testing and confidence intervals to compare between two
populations using:
Population means (quantitative data)
Population proportions (binary and categorical data)
Adjusting the sampling distribution and test statistic based on
available information about the populations under study:
Equal and known variances
Different and known variances
Equal and unknown variances
Different and unknown variances
March 19, 2022
2 / 12
Some Probability Distribution Review
Recall from probability that is X ∼ N(µX , σX2 ) and Y ∼ N(µY , σY2 ), that the
random variable for the difference X − Y is also normally distributed. In fact,
X − Y ∼ N(µX − µY , σX2 + σY2 )
Thus, if we have normally distributed data OR large enough samples from
two distinct populations for CLT to apply for each sample mean, then we
have the following:


σ2
Data from one population X1 , X2 , …, Xn such that X̄n ∼
˙ OR ∼ N µX , nX
Data from another
Y1 , Y2 , …, Ym such that
 population

σ2
Ȳm ∼
˙ OR ∼ N µY , mY
Then


σ2
σ2
X̄n − Ȳm ∼
˙ OR ∼ N µX − µY , X + Y
n
m
Note if σX2 = σY2 = σ 2 , then the variance can be simplified to σ 2
1
n
+
March 19, 2022
1
m

3 / 12
Comparing Populations
All the confidence intervals and hypothesis tests covered involved studying one
population only (univariate analysis). What if we want to answer the following
questions in a data-driven way?
Are males more likely to be colourblind than females?
Are females more likely to be paid less than their male counterparts?
Are people native to Finland taller than people native to Thailand?
These can be addressed through hypothesis testing! While we cannot make a
concrete conclusion, we can investigate the strength of evidence supporting one
trend over the other. If we’re interested in how large these differences are (if they
exist), we can use confidence intervals to estimate these differences to some
degree of certainty.
March 19, 2022
4 / 12
Comparing Populations – Cases
Depending on the amount of information available to use, we can have any one of
the following four cases when comparing between populations and investigating
mean differences between them:
Both populations have equal variances, and the variance is known:
March 19, 2022
5 / 12
Comparing Populations – Cases
Depending on the amount of information available to use, we can have any one of
the following four cases when comparing between populations and investigating
mean differences between them:
Both populations have equal variances, and the variance is known:

X −µY )
X̄n − Ȳm ∼N
˙
µX − µY , σ 2 n1 + m1 and X̄n −Ȳm√−(µ
∼ N(0, 1)
1
1
σ
n+m
Both populations have different variances, and the variances are unknown:
March 19, 2022
5 / 12
Comparing Populations – Cases
Depending on the amount of information available to use, we can have any one of
the following four cases when comparing between populations and investigating
mean differences between them:
Both populations have equal variances, and the variance is known:

X −µY )
X̄n − Ȳm ∼N
˙
µX − µY , σ 2 n1 + m1 and X̄n −Ȳm√−(µ
∼ N(0, 1)
1
1
σ
n+m
Both populations
variances,
and the variances are unknown:
 have different

σX2
σY2
X̄n −Ȳm −(µX −µY )
q
X̄n − Ȳm ∼N
˙
µX − µY , n + m and
∼N(0,
˙
1)
2
2
σ
X
n
+
σ
Y
m
March 19, 2022
5 / 12
Comparing Populations – Cases where Variance Unknown
Both populations are normally distributed with equal variance, and the
variance is unknown: Estimate σ 2 with pooled variance
(n−1)SX2 +(m−1)SY2
Sp2 =
, and:
n+m−2
March 19, 2022
6 / 12
Comparing Populations – Cases where Variance Unknown
Both populations are normally distributed with equal variance, and the
variance is unknown: Estimate σ 2 with pooled variance
(n−1)SX2 +(m−1)SY2
−(µX −µY )
∼ Tn+m−2
Sp2 =
, and: X̄n −Ȳm√
1
1
n+m−2
Sp
n+m
Both populations are normally distributed with different variances, and the
variances are unknown:
March 19, 2022
6 / 12
Comparing Populations – Cases where Variance Unknown
Both populations are normally distributed with equal variance, and the
variance is unknown: Estimate σ 2 with pooled variance
(n−1)SX2 +(m−1)SY2
−(µX −µY )
∼ Tn+m−2
Sp2 =
, and: X̄n −Ȳm√
1
1
n+m−2
n+m
Sp
Both populations are normally distributed with different variances, and the
Ȳm −(µX −µY )
∼ Tγ where γ, the degree of
variances are unknown: X̄n −q
2
2
S
X
n
+
S
Y
m
freedom is estimated using the Welch’s degree of freedom method.
March 19, 2022
6 / 12
Welch’s Degree of Freedom Estimation
Welch’s Degrees of Freedom
The degrees of freedom γ for the T-distribution used in comparing population
means where:
σ12 and σ22 are different
σ12 and σ22 are unknown and estimated with the sample variances
Is estimated with Welch’s Degree of Freedom:
γ=
˙
[s12 /n1 + s22 /n2 ]2
[s12 /n1 ]2
n1 −1
+
[s22 /n2 ]2
n2 −1
Note: The degrees of freedom should always be rounded down.
March 19, 2022
7 / 12
Example
MMSA ex. 10.7 The authors of ”Waiting for the Web: How Screen Color Affects Time
Perception” compared subjects’ time perception based on the background color of a
website being downloaded. Subjects were randomly assigned to see a blue or yellow
background for identical websites. Is there evidence in the data that the colour affects
time perception?The data collected was plotted using side-by-side boxplots, with the
following summary statistics:
Blue
Yyellow
Perceived Quickness Rating
n
mean
sd
25
3.67
1.07
24
3.04
1.07
March 19, 2022
8 / 12
Example – Fewer Assumptions
MMSA ex. 10.7 Generally, the more conservative test is one that includes fewer
unverified assumptions about the populations under study. When in doubt, it’s generally
better to assume unequal variances between the two populations. Let’s repeat the test
under this assumption:
Blue
Yyellow
Perceived Quickness Rating
n
mean
sd
25
3.67
1.07
24
3.04
1.07
March 19, 2022
9 / 12
Comparing Two Population Proportions
Similarly, we have learned from Central Limit Theorem, for a categorical (binary)
variable, the sample proportion of a particular outcome has an approximately
normal distribution when sample sizes are large enough:


p(1 − p)
p̂ ∼N
˙
p,
n
For large samples, we can compare between two proportions from independent
samples using the sampling distribution:
!
p1 (1 − p1 ) p2 (1 − p2 )
+
pˆ1 − pˆ2 ∼
˙ N p1 − p2 ,
n1
n2
Knowing the distribution, we can proceed as usual to…
Construct (1 − α)100% confidence intervals for the true difference in
proportions p1 − p2
Carry out statistical tests where the null hypothesis is H0 : p1 = p2 or
H0 : p1 − p2 = 0
***Statistical tests require an adjustment to p̂***
March 19, 2022
10 / 12
Example
MMSA Ex. 10.13 Context: 1954 Salk polio vaccine experiment and analysis of
resulting data. Part of the experiment focused on the efficacy of the vaccine in
combating paralytic polio. Because it was thought that without a control group of
children, there would be no sound basis for assessment of the vaccine, it was decided to
administer the vaccine to one group and a placebo injection to a control group.
For ethical reasons and also because it was thought that the knowledge of vaccine
administration might have an effect on treatment and diagnosis, the experiment was
conducted in a double-blind manner. The experiment sought to determine whether the
vaccine would affect the chances that a child contracts paralytic polio, which at the time
was estimated to be 0.0003 (incidence of 30 per 100,000). The following data was
collected:
Group
Placebo
Vaccine
n
201, 229
200, 745
cases
110
33
March 19, 2022
11 / 12
Example
MMSA Ex. 10.13 Find the 95% confidence interval that estimates the improvement in
Group
n
cases
110
combating paralytic polio of vaccines. Placebo 201, 229
Vaccine 200, 745
33
March 19, 2022
12 / 12
Discussion Board Rubric
a
Points
Quality of
contribu-
tion
1 point
Student has
substantial contribution
with detailed explanations
and/or clearly out-
lined / described process in
approaching a problem,
and/or
0 15 points
tribution to the discussion
that is dismissive and/or
lacking in detail, provides
no elaboration nor further
discussion that fosters a
collaborative learning en-
vironment.
0 points
Student
has not
contributed to
the
weekly discussion
Contributions are off-
topic or irrelevant to
was
Student
involved
in follow-up discussions
and worked collaboratively
with their peers to develop
a better understanding of
the concepts involved.
The only consists of
a solution with no ex-
planation or justifica-
tion of steps/process.
Examples: Responses
such as you just need to
integrate this and solve
for it’ or ‘this is what I
did’ or posting solutions
with minimal explanations
of thought process
justifications.
or
January 10, 2022
10 30

Pages (275 words)
Standard price: \$0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back