STAT 3640 WMU Determinants of Unemployment Rate in The US Project

Project Description:

* You choose any dataset of interest (raw data, not summarized data), come up with a research

question, and analyze it using a descriptive method or inferential method (e.g. hypothesis testor regression). You can also start by coming up with a research question and collect/find anappropriate dataset too. Please check the assumptions of the method you chose and interpretthe results to a person who does not have much statistical background.• Note that you need to conduct analysis using a dataset, not summarised table or figure fromanother resource. You can collect your own data oryou can find the data online using the links in the next section.Finding Dataset:You will need to use raw data for the final project, not summarized data as you used in mini-projects.You can collect your own dataset, or you can find a dataset online. The below are some websites youcan use to find dataset.• Google Datasets:

• Dataset Search on Googe:

• Kaggle Datasets:

https://www.kaggle.com/datasets

• Appen AI Resource Center:

https://appen.com/resources/datasets/

• UCI Machine Learning Repository:

https://archive.ics.uci.edu/ml/index.php

By:
Riley Lukomski
1. Winning percentage of home team
2. Average amount of points the home team is favored to win by
3. Cover percentage of home teams
NFL Attendance Situation
Background
• Due to Covid-19 NFL teams are restricting the number of fans
Number of Teams allowing 25%
capacity
Number of Teams allowing
Friends and Family to attend the
game
Number of Teams allowing 0 fans
in the building
12
6
14
Winning Percentage of the Home Team
Question 1
• 2 Proportion z Test at 95% confidence
• Null Hypothesis: p1–p2= 0
Alternative Hypothesis: p1-p2>0
p1= proportion of homes games won by NFL teams
from 2017-2019 seasons
Year
Winning % at
Home
p2= proportion of home games won by NFL teams
during the 2020 season
2017-2019
56.1265%
2020
50.2790%
n1= 759 n2 = 177
• Test Statistic: z = 1.2716
• p-value: normalcdf(1.2716, infinity, 0, 1) = .1018
Conclusion and Interpretation
Decision at 95% confidence:
.1018 > .05
We fail to reject the null
hypothesis
Because the p-value is greater
than .05 we fail to reject the
null hypothesis. We do not have
sufficient evidence to say that
the proportion of NFL games
won by the home team is less
when there the number of fans
is restricted.
Question 1
Average amount of points the home team is
favored by
Question 2
• Two Sample T Test for Difference in means at 95%
confidence
• Null Hypothesis: m1–m2=0 Alternative Hypothesis:
m1-m2>0
Year
Average Points
Home Team was
favored by
2017-2019
2.0775
2020
1.1921
m1=The mean amount of points Home NFL teams were
favored by during the 2017-2019 seasons
M2 = The mean amount of points home NFL teams have
been favored during the 2020 season
n1 = 759 n2 = 177
m1-m2= .885264
• Test statistic: 1.7104
• P-value: tcdf(1.7104, infinity, 934) = .0438
Conclusion and Interpretation
• Decision at 95% confidence: .0438 < .05 • Reject the null hypothesis • Because the p-value is less that .05 we can reject the null hypothesis, we have sufficient evidence to say that the mean amount of points that home teams are favored by in the NFL is less when capacity is restricted. Question 2 How Often Home Teams Cover the Spread Question 3 • 2 proportion z test at 95% confidence • Null hypothesis p1-p2=0 Alternative hypothesis p1-p2>0
p1= proportion of NFL teams that cover the
spread at home during the 2017-2019 seasons
p2= proportion of NFL teams that cover the
spread at home during the 2020 season
n1= 732 n2=176
• Test statistic: -.4361483618
• P-value: normalcdf(-.4361483618, infinity,
0,1) = .6686
Year
Percentage of time the
home team covers the
2017-2019
45.9016%
2020
47.7273%
Conclusion and Interpretation
• Decision at the 95% confidence level:
.6686 > .05
• We fail to reject the Null hypothesis
• Because the p-value is greater than
.05 we fail to reject the null
hypothesis, we do not have sufficient
evidence to say that NFL home teams
that play with restricted number of
fans cover the point spread less than
teams that play with an unrestricted
number of fans.
Question 3
Questions?
The End
HELLO!
Alicia Hartranft
1
When comparing the prices of name brand
products between Walmart and Meijer, are the
prices of Meijer’s products more than
Walmart’s?
Data Collection:
o
o
Data was collected from both the
Walmart and Meijer off of West Main St.
in Kalamazoo, MI
Prices of 30 name brands were recorded
at Walmart then at Meijer
2
Name Brand Item
Walmart
Price 𝒙𝟐
Meijer
Prices (𝒙𝟏 )
Differences
𝒙𝟏 − 𝒙𝟐
Sabra Roasted Red Pepper Hummus
3.34
3.99
0.65
Club Original Crackers
2.50
2.79
0.29
“Sparkling Ice” Sparkling Flavored Water
1.00
1.00
0
Family Size Cocoa Puffs Cereal
2.98
3.29
0.31
Hillshire Farm 1LB Oven Roasted Turkey Breast
4.98
5.49
0.51
Reynolds Wrap Heavy Duty Aluminum Foil
3.48
3.66
0.18
Digiorno Rising Crust Pepperoni Pizza
5.00
5.49
0.49
Tidy Cats Cat Litter
8.78
8.97
0.19
Tropicana No Pulp Orange Juice
3.28
3.69
0.41
CeraVe Lotion
11.62
12.19
0.57
Total = 129.16
Total = 143.06
Total = 12.90
Mean
𝑥ҧ2 = 4.305
𝑥1ҧ = 4.735
𝑑ҧ = 0.43
Standard Deviation
𝑠2 = 2.873
𝑠1 = 3.046
𝑠𝑑 = 0.321
Total Column Total
3
4
METHOD: Paired T test
𝑯𝟎 : 𝝁𝒅 = 𝟎
o
𝑯𝒂 : 𝝁𝒅 > 𝟎
𝑛 = 30 different name brand products
𝑥2 = the prices of the 30 different name
brand products from Walmart
o
𝑥1 = the prices of 30 different name brand
products from Meijer
o
𝜇𝑑 = 𝜇1 − 𝜇2 = the true mean differences
in prices for same name brand products
o
(differences in prices of name brand products between
Meijer and Walmart)
Test statistic:
p-value:
t = 7.326
p = 2.267 ∗ 10−8
=0.00000002665
5
Conclusions:
o
Decision: at 95% Confidence Level (α = 0.05)
p-value = 2.267 ∗ 10−8 < 0.05 reject the null hypothesis o Conclusion: Based off of this sample data, there is sufficient evidence to conclude that the price of name brand products available at Meijer are more than the same products available at Walmart. Thank you! 😉 6

