Data Science Based on R Studio Discussion
Lab: Airplane!The dataset
The dataset “Airplane_Crashes_and_Fatalities_Since_1908.csv” includes airplane crash data for all incidents
since 1908. It was downloaded from here: https://www.kaggle.com/cgurkan/airplane-crash-data-since-1908
The dataset will be loaded into a data frame called airplane.
filename %
as.integer()
Having done this, we can create a new variable, time of day, that is categorical, i.e. day or night
airplane$dayornight = 6 & airplane$Time “1985-01-01”) %>%
select(Company,
dayornight,
Time,
Aboard,
Fatalities,
very_deadly)
We can now look at what we have now.
head(airplane_subset)
##
Company dayornight Time Aboard Fatalities very_deadly
## 1 Boeing
day
10
274
0
FALSE
## 2 Boeing
day
9
148
148
TRUE
## 3 Boeing
NA
3
3
FALSE
3
## 4
## 5
## 6
Boeing
Boeing
Boeing
night
day
23
11
NA
11
89
153
11
0
1
TRUE
FALSE
FALSE
Notice that wherever the Time (hour) is missing/unknown, the day/night variable is also unknown. We can
see that this happens 18 times.
length(which(is.na(airplane_subset$dayornight)))
## [1] 18
Two-way tables: Does the company matter?
Now, let’s look at a two-way table of the plane company and whether the crash was “very deadly” according
to the given criteria (that more than 10 people aboard died). We can use table() as follows. Note that we
will store this table for use later on.
companyByDeadly