STAT 3640 WMU Determinants of Unemployment Rate in The US Project
Project Description:
* You choose any dataset of interest (raw data, not summarized data), come up with a research
question, and analyze it using a descriptive method or inferential method (e.g. hypothesis testor regression). You can also start by coming up with a research question and collect/find anappropriate dataset too. Please check the assumptions of the method you chose and interpretthe results to a person who does not have much statistical background.• Note that you need to conduct analysis using a dataset, not summarised table or figure fromanother resource. You can collect your own data oryou can find the data online using the links in the next section.Finding Dataset:You will need to use raw data for the final project, not summarized data as you used in mini-projects.You can collect your own dataset, or you can find a dataset online. The below are some websites youcan use to find dataset.• Google Datasets:
https://research.google/tools/datasets/
• Dataset Search on Googe:
https://datasetsearch.research.google.com/
• Kaggle Datasets:
https://www.kaggle.com/datasets
• Appen AI Resource Center:
https://appen.com/resources/datasets/
• UCI Machine Learning Repository:
https://archive.ics.uci.edu/ml/index.php
Project Outline:Your project should include the following sections:I. Abstract: Summarize the entire project.II. Introduction: Describe the dataset (the variables of the dataset) and your research questions.Literature review (if there’s any). Provide the background why you are concerned with suchresearch question.III. Method: Explain the analysis method of your choice and why you choose such method.VI. Analysis: Apply the method(s) to the dataset.i. Assumption Checkii. Results: Provide the results and the interpretation of the results.V. Conclusions/Discussions: Finalize your findings and discuss the implications of them.VI. Bibliography: List all the references you used during your project if there’s any.Proposal:(important. due is Apr.1)Your proposal should include1) Your name2) Research question (e.g. How do age, education level, and desire for more children affect theuse of contraceptive? Are their coefficients statistically significant?),3) Description of the variables in your chosen dataset (variable name, description, range/unit)along with the reference of your data (where you obtained the data), OR if you decide to collectyour own dataset, provide your plan and the sample questionnaire, and4) Analysis plan (specify which method to use)in maximum two pages. If you work in a group, one of your member can make a submission andspecify who are in the group. Please try to refrain yourself from going over two pages. You need tomake a submission to a corresponding on Elearning.Presentation:A 5-minute presentation per person is required. The presentation will be graded based on thefollowing.1) Content of Slides (Presents information in logical, interesting sequence that an audience canfollow. Research question is specified along with motivation for such research question, appropriate method is chosen and explained, and results are correctly interpreted.)2) Appearance of Slides (Slides look professional, slides are readable, appropriate use of tablesand figures, slides not too sparse or cluttered, effective use of bullet points, etc.)3) Interpretation (Results are interpreted correctly and effectively, and conclusions are drawn inthe context of the problem.),4) Delivery (Uses a clear voice, correct, precise pronunciation of terms and presents informationin logical, interesting sequence that an audience can follow. Voice not too loud or soft, not justreading from slides, appropriate transitions, flow, etc.)5) Use of Time (The time allowance was satisfied.Report:Your report should be nicely typed up and easy to understand. You may follow the project outline in theprevious page. Please include figures to corroborate your findings. Provide appropriate captions totables and figures. The report will be graded based on the following.1) Objectives (The research question is clearly stated and the motivation for the question is specified.)2) Method (Appropriate method is chosen for the research question, assumptions are checked, andmethod is well-applied.)3) Interpretation (Results are interpreted correctly and effectively.)4) Figures/Tables (All figures/tables are appropriately displayed are related to the content.)5) Format (Has all necessary parts and documents. Is student using correct spelling, grammar,punctuation, paragraph transitions, etc.? Are table and figure captions correctly formatted?) How has the reduction of fans allowed at NFL games
affected home team advantage?
By:
Riley Lukomski
1. Winning percentage of home team
2. Average amount of points the home team is favored to win by
3. Cover percentage of home teams
NFL Attendance Situation
Background
• Due to Covid-19 NFL teams are restricting the number of fans
allowed into the stadium.
Number of Teams allowing 25%
capacity
Number of Teams allowing
Friends and Family to attend the
game
Number of Teams allowing 0 fans
in the building
12
6
14
Winning Percentage of the Home Team
Question 1
• 2 Proportion z Test at 95% confidence
• Null Hypothesis: p1–p2= 0
Alternative Hypothesis: p1-p2>0
p1= proportion of homes games won by NFL teams
from 2017-2019 seasons
Year
Winning % at
Home
p2= proportion of home games won by NFL teams
during the 2020 season
2017-2019
56.1265%
2020
50.2790%
n1= 759 n2 = 177
• Test Statistic: z = 1.2716
• p-value: normalcdf(1.2716, infinity, 0, 1) = .1018
Conclusion and Interpretation
Decision at 95% confidence:
.1018 > .05
We fail to reject the null
hypothesis
Because the p-value is greater
than .05 we fail to reject the
null hypothesis. We do not have
sufficient evidence to say that
the proportion of NFL games
won by the home team is less
when there the number of fans
is restricted.
Question 1
Average amount of points the home team is
favored by
Question 2
• Two Sample T Test for Difference in means at 95%
confidence
• Null Hypothesis: m1–m2=0 Alternative Hypothesis:
m1-m2>0
Year
Average Points
Home Team was
favored by
2017-2019
2.0775
2020
1.1921
m1=The mean amount of points Home NFL teams were
favored by during the 2017-2019 seasons
M2 = The mean amount of points home NFL teams have
been favored during the 2020 season
n1 = 759 n2 = 177
m1-m2= .885264
• Test statistic: 1.7104
• P-value: tcdf(1.7104, infinity, 934) = .0438
Conclusion and Interpretation
• Decision at 95% confidence: .0438 < .05
• Reject the null hypothesis
• Because the p-value is less that .05 we
can reject the null hypothesis, we have
sufficient evidence to say that the
mean amount of points that home
teams are favored by in the NFL is less
when capacity is restricted.
Question 2
How Often Home Teams Cover the Spread
Question 3
• 2 proportion z test at 95% confidence
• Null hypothesis p1-p2=0 Alternative
hypothesis p1-p2>0
p1= proportion of NFL teams that cover the
spread at home during the 2017-2019 seasons
p2= proportion of NFL teams that cover the
spread at home during the 2020 season
n1= 732 n2=176
• Test statistic: -.4361483618
• P-value: normalcdf(-.4361483618, infinity,
0,1) = .6686
Year
Percentage of time the
home team covers the
spread
2017-2019
45.9016%
2020
47.7273%
Conclusion and Interpretation
• Decision at the 95% confidence level:
.6686 > .05
• We fail to reject the Null hypothesis
• Because the p-value is greater than
.05 we fail to reject the null
hypothesis, we do not have sufficient
evidence to say that NFL home teams
that play with restricted number of
fans cover the point spread less than
teams that play with an unrestricted
number of fans.
Question 3
Questions?
The End
HELLO!
Alicia Hartranft
1
When comparing the prices of name brand
products between Walmart and Meijer, are the
prices of Meijer’s products more than
Walmart’s?
Data Collection:
o
o
Data was collected from both the
Walmart and Meijer off of West Main St.
in Kalamazoo, MI
Prices of 30 name brands were recorded
at Walmart then at Meijer
2
Name Brand Item
Walmart
Price 𝒙𝟐
Meijer
Prices (𝒙𝟏 )
Differences
𝒙𝟏 − 𝒙𝟐
Sabra Roasted Red Pepper Hummus
3.34
3.99
0.65
Club Original Crackers
2.50
2.79
0.29
“Sparkling Ice” Sparkling Flavored Water
1.00
1.00
0
Family Size Cocoa Puffs Cereal
2.98
3.29
0.31
Hillshire Farm 1LB Oven Roasted Turkey Breast
4.98
5.49
0.51
Reynolds Wrap Heavy Duty Aluminum Foil
3.48
3.66
0.18
Digiorno Rising Crust Pepperoni Pizza
5.00
5.49
0.49
Tidy Cats Cat Litter
8.78
8.97
0.19
Tropicana No Pulp Orange Juice
3.28
3.69
0.41
CeraVe Lotion
11.62
12.19
0.57
Total = 129.16
Total = 143.06
Total = 12.90
Mean
𝑥ҧ2 = 4.305
𝑥1ҧ = 4.735
𝑑ҧ = 0.43
Standard Deviation
𝑠2 = 2.873
𝑠1 = 3.046
𝑠𝑑 = 0.321
Total Column Total
3
4
METHOD: Paired T test
𝑯𝟎 : 𝝁𝒅 = 𝟎
o
𝑯𝒂 : 𝝁𝒅 > 𝟎
𝑛 = 30 different name brand products
𝑥2 = the prices of the 30 different name
brand products from Walmart
o
𝑥1 = the prices of 30 different name brand
products from Meijer
o
𝜇𝑑 = 𝜇1 − 𝜇2 = the true mean differences
in prices for same name brand products
o
(differences in prices of name brand products between
Meijer and Walmart)
Test statistic:
p-value:
t = 7.326
p = 2.267 ∗ 10−8
=0.00000002665
5
Conclusions:
o
Decision: at 95% Confidence Level (α = 0.05)
p-value = 2.267 ∗ 10−8 < 0.05
reject the null hypothesis
o
Conclusion: Based off of this sample data, there is sufficient evidence to
conclude that the price of name brand products available at Meijer are
more than the same products available at Walmart.
Thank you!
😉
6