There are so many factors that play a role in a passengers comfortability on an airplane everyday. Whether people are taking a flight for business or vacation, millions of people fly everyday and have created opinions on how one should behave during a flight. Variables such as a baby crying, the middle armrest getting taken, or not having enough leg room after the person in front of you reclines their seat, have impacted the majority of passengers during a flight allowing them to relate this topic and each other. Now sit back and imagine you are going to take a flight where not everything goes as you had hoped.
#install.packages("fivethirtyeight")
library(fivethirtyeight)
glimpse(flying)
## Rows: 1,040
## Columns: 27
## $ respondent_id <dbl> 3436139758, 3434278696, 3434275578, 3434268208, …
## $ gender <chr> NA, "Male", "Male", "Male", "Male", "Male", "Mal…
## $ age <ord> NA, 30-44, 30-44, 30-44, 30-44, 30-44, 30-44, 30…
## $ height <ord> NA, 6'3", 5'8", 5'11", 5'7", 5'9", 6'2", 6'0", 6…
## $ children_under_18 <lgl> NA, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE,…
## $ household_income <ord> NA, NA, "$100,000 - $149,999", "$0 - $24,999", "…
## $ education <ord> NA, Graduate degree, Bachelor degree, Bachelor d…
## $ location <chr> NA, "Pacific", "Pacific", "Pacific", "Pacific", …
## $ frequency <ord> Once a year or less, Once a year or less, Once a…
## $ recline_frequency <ord> NA, About half the time, Usually, Always, About …
## $ recline_obligation <lgl> NA, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE,…
## $ recline_rude <ord> NA, Somewhat, No, No, No, No, Somewhat, No, No, …
## $ recline_eliminate <lgl> NA, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRU…
## $ switch_seats_friends <ord> NA, No, No, Somewhat, No, Somewhat, Somewhat, No…
## $ switch_seats_family <ord> NA, No, No, No, No, No, No, No, NA, Very, No, NA…
## $ wake_up_bathroom <ord> NA, No, No, No, Somewhat, Somewhat, No, No, NA, …
## $ wake_up_walk <ord> NA, No, Somewhat, Somewhat, Somewhat, Very, No, …
## $ baby <ord> NA, No, Somewhat, Somewhat, Somewhat, Very, No, …
## $ unruly_child <ord> NA, No, Very, Very, Very, Very, Somewhat, Very, …
## $ two_arm_rests <chr> NA, "The arm rests should be shared", "Whoever p…
## $ middle_arm_rest <chr> NA, "The arm rests should be shared", "The arm r…
## $ shade <chr> NA, "Everyone in the row should have some say", …
## $ unsold_seat <ord> NA, No, No, No, No, Somewhat, No, No, No, Very, …
## $ talk_stranger <ord> NA, No, No, No, No, No, Somewhat, No, No, Very, …
## $ get_up <ord> NA, Twice, Three times, Three times, Twice, Once…
## $ electronics <lgl> NA, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FAL…
## $ smoked <lgl> NA, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
FiveThirtyEight is the source of this data set. This dataset was published on September 5, 2014. They have conducted a surveymonkey that contains various questions regarding one’s experience throughout a flight. They have received 1,040 responses all throughout various regions within the United States (Mountain, West South Central,South Atlantic, West North Central, New England, Pacific, Middle Atlantic, East North Central, and East South Central) which has created a dynamic set of opinions. The data is trustworthy as it has been directly gathered from the passengers that have flown on airplanes. Although the dataset is from first hand sources, some answers may have been untruthful as it could have not been filled out to the best of their ability. Andrei Scheinkman collected the original data which was then published in an article by Walt Hickey. The data is pre-processed and is accesible from the “fivethirtyeight” library.
It is also avaialble on the gihub site : https://github.com/fivethirtyeight/data/tree/master/flying-etiquette-survey This set has not been altered since 2014.
flying_new <- flying %>%
drop_na()
flying_proj <- flying_new %>%
select(gender, age, height, household_income, education, location, frequency, recline_frequency, recline_frequency, recline_obligation, recline_rude, switch_seats_family, wake_up_bathroom, baby, unruly_child, two_arm_rests, middle_arm_rest, shade, unsold_seat, talk_stranger, get_up)
glimpse(flying_proj)
## Rows: 582
## Columns: 20
## $ gender <chr> "Male", "Male", "Male", "Male", "Male", "Male", "…
## $ age <ord> 30-44, 30-44, 30-44, 30-44, 30-44, 30-44, 30-44, …
## $ height <ord> 5'8", 5'11", 5'7", 5'9", 6'0", 5'6", 6'0", 5'8", …
## $ household_income <ord> "$100,000 - $149,999", "$0 - $24,999", "$50,000 -…
## $ education <ord> Bachelor degree, Bachelor degree, Bachelor degree…
## $ location <chr> "Pacific", "Pacific", "Pacific", "East North Cent…
## $ frequency <ord> Once a year or less, Once a year or less, Once a …
## $ recline_frequency <ord> Usually, Always, About half the time, Usually, On…
## $ recline_obligation <lgl> TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE…
## $ recline_rude <ord> No, No, No, No, No, Very, No, No, Very, No, No, N…
## $ switch_seats_family <ord> No, No, No, No, No, Very, No, No, No, No, No, Som…
## $ wake_up_bathroom <ord> No, No, Somewhat, Somewhat, No, Very, Somewhat, S…
## $ baby <ord> Somewhat, Somewhat, Somewhat, Very, Somewhat, Ver…
## $ unruly_child <ord> Very, Very, Very, Very, Very, Very, Very, No, No,…
## $ two_arm_rests <chr> "Whoever puts their arm on the arm rest first", "…
## $ middle_arm_rest <chr> "The arm rests should be shared", "The arm rests …
## $ shade <chr> "The person in the window seat should have exclus…
## $ unsold_seat <ord> No, No, No, Somewhat, No, Very, Very, No, No, No,…
## $ talk_stranger <ord> No, No, No, No, No, Very, No, No, No, No, No, No,…
## $ get_up <ord> Three times, Three times, Twice, Once, Four times…
flying_proj %>%
count(gender)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 2 x 4
## gender n p percent
## <chr> <int> <dbl> <dbl>
## 1 Female 303 0.521 52.1
## 2 Male 279 0.479 47.9
flying_proj %>%
count(age)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 4 x 4
## age n p percent
## <ord> <int> <dbl> <dbl>
## 1 18-29 128 0.220 22.0
## 2 30-44 153 0.263 26.3
## 3 45-60 161 0.277 27.7
## 4 > 60 140 0.241 24.0
flying_proj %>%
count(height)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 19 x 4
## height n p percent
## <ord> <int> <dbl> <dbl>
## 1 "Under 5 ft." 10 0.0172 1.72
## 2 "5'0\"" 10 0.0172 1.72
## 3 "5'1\"" 11 0.0189 1.89
## 4 "5'2\"" 32 0.0550 5.5
## 5 "5'3\"" 26 0.0447 4.47
## 6 "5'4\"" 53 0.0911 9.11
## 7 "5'5\"" 55 0.0945 9.45
## 8 "5'6\"" 49 0.0842 8.42
## 9 "5'7\"" 55 0.0945 9.45
## 10 "5'8\"" 47 0.0808 8.08
## 11 "5'9\"" 46 0.0790 7.9
## 12 "5'10\"" 54 0.0928 9.28
## 13 "5'11\"" 38 0.0653 6.53
## 14 "6'0\"" 36 0.0619 6.19
## 15 "6'1\"" 18 0.0309 3.09
## 16 "6'2\"" 18 0.0309 3.09
## 17 "6'3\"" 9 0.0155 1.55
## 18 "6'4\"" 10 0.0172 1.72
## 19 "6'5\"" 5 0.00859 0.86
flying_proj %>%
count(household_income)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 4 x 4
## household_income n p percent
## <ord> <int> <dbl> <dbl>
## 1 $0 - $24,999 60 0.103 10.3
## 2 $25,000 - $49,999 124 0.213 21.3
## 3 $50,000 - $99,999 254 0.436 43.6
## 4 $100,000 - $149,999 144 0.247 24.7
flying_proj %>%
count(education)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 5 x 4
## education n p percent
## <ord> <int> <dbl> <dbl>
## 1 Less than high school degree 3 0.00515 0.52
## 2 High school degree 47 0.0808 8.08
## 3 Some college or Associate degree 173 0.297 29.7
## 4 Bachelor degree 205 0.352 35.2
## 5 Graduate degree 154 0.265 26.5
flying_proj %>%
count(location)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 9 x 4
## location n p percent
## <chr> <int> <dbl> <dbl>
## 1 East North Central 85 0.146 14.6
## 2 East South Central 20 0.0344 3.44
## 3 Middle Atlantic 71 0.122 12.2
## 4 Mountain 37 0.0636 6.36
## 5 New England 38 0.0653 6.53
## 6 Pacific 125 0.215 21.5
## 7 South Atlantic 98 0.168 16.8
## 8 West North Central 52 0.0893 8.93
## 9 West South Central 56 0.0962 9.62
flying_proj %>%
count(frequency)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 5 x 4
## frequency n p percent
## <ord> <int> <dbl> <dbl>
## 1 Once a year or less 449 0.771 77.2
## 2 Once a month or less 117 0.201 20.1
## 3 A few times per month 14 0.0241 2.41
## 4 A few times per week 1 0.00172 0.17
## 5 Every day 1 0.00172 0.17
flying_proj %>%
count(recline_rude)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 3 x 4
## recline_rude n p percent
## <ord> <int> <dbl> <dbl>
## 1 No 353 0.607 60.6
## 2 Somewhat 182 0.313 31.3
## 3 Very 47 0.0808 8.08
flying_proj %>%
count(baby)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 3 x 4
## baby n p percent
## <ord> <int> <dbl> <dbl>
## 1 No 405 0.696 69.6
## 2 Somewhat 130 0.223 22.3
## 3 Very 47 0.0808 8.08
flying_proj %>%
count(unruly_child)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 3 x 4
## unruly_child n p percent
## <ord> <int> <dbl> <dbl>
## 1 No 101 0.174 17.4
## 2 Somewhat 245 0.421 42.1
## 3 Very 236 0.405 40.6
flying_proj %>%
count(middle_arm_rest)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 5 x 4
## middle_arm_rest n p percent
## <chr> <int> <dbl> <dbl>
## 1 Other (please specify) 28 0.0481 4.81
## 2 The arm rests should be shared 401 0.689 68.9
## 3 The person by the window 21 0.0361 3.61
## 4 The person in aisle 46 0.0790 7.9
## 5 Whoever puts their arm on the arm rest first 86 0.148 14.8
flying_proj %>%
count(shade)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 2 x 4
## shade n p percent
## <chr> <int> <dbl> <dbl>
## 1 Everyone in the row should have some say 332 0.570 57.0
## 2 The person in the window seat should have exclusive contr… 250 0.430 43.0
flying_proj %>%
count(unsold_seat)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 3 x 4
## unsold_seat n p percent
## <ord> <int> <dbl> <dbl>
## 1 No 469 0.806 80.6
## 2 Somewhat 87 0.149 15.0
## 3 Very 26 0.0447 4.47
flying_proj %>%
count(get_up)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
## # A tibble: 6 x 4
## get_up n p percent
## <ord> <int> <dbl> <dbl>
## 1 It is not okay to get up during flight 7 0.0120 1.2
## 2 Once 44 0.0756 7.56
## 3 Twice 195 0.335 33.5
## 4 Three times 209 0.359 35.9
## 5 Four times 66 0.113 11.3
## 6 More than five times times 61 0.105 10.5
To summarize and gather a basic overview of the variables within our dataset, we decided to compute the percentage to allow us to analyze the trends and compare within each variable. We will later utilize these variables to create graphs and visualizations within our report.
library(ggridges)
library(tidyverse)
flying_map <- flying_proj %>%
group_by(frequency)%>%
count(location, frequency)%>%
mutate(p = n/sum(n), percent = round(100 * p, 2))
ggplot(flying_map) +
geom_segment(aes(x = location,xend = location,y = 0,yend = percent, color= frequency),size=2) +
geom_point(aes(x=location, y=percent,color= frequency), size= 2.5)+
coord_flip()+
theme_minimal_vgrid(
font_family = 'Fira Sans Condensed',
font_size = 10 ) +
scale_y_continuous(
limits = c(0, 100))+
facet_wrap(~frequency, ncol=1)+
panel_border()+
labs(title = ' US travel frequency by region',
y= " Passenger Percentage",
x = 'US Region',
color = 'Frequency',
font_family = 'Fira Sans Condensed')