#reading in the data
jobs_gender <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/jobs_gender.csv")
earnings_female <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/earnings_female.csv")
employed_gender <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-05/employed_gender.csv")
The source we are using to explore the question of, “How has the US gender wage gap changed over time for different occupations and age groups” is provided by Tidy Tuesdays weekly project from May 3, 2019 of Women in The Workforce: https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-03-05
The original source of the specific jobs provided are derived from the U.S. Census Bureau: “Full-Time, Year-Round Workers and Median Earnings: 2000 and 2013-2019”. The source is: https://www.census.gov/data/tables/time-series/demo/industry-occupation/median-earnings.html.
Whereas, the historical data is extracted from the U.S. Bureau of Labor Statistics. The first historical data is Women’s earnings 1979-2011: https://www.bls.gov/opub/ted/2012/ted_20121123.htm. This data shows the earnings gap between women and men for most age groups. The second historical data set also extracted from the U.S. Bureau of Labor Statistics is about the percentage of employed women working full time since 1968: https://www.bls.gov/opub/ted/2017/percentage-of-employed-women-working-full-time-little-changed-over-past-5-decades.htm.
While the validity of the original source is reflected through the extraction of a government website, Tidy Tuesday did indeed re-process and clean up the data. It is important to note that some variables in this data set are only as recent as in 2011, whereas other variables have data as recent as 2019. There is always a potential for data to be biased depending on the incentive to collect it. While we do not presume this data would be biased based on the credibility of the US Census Bureau, it is always important to be cognizant of the possibility and explore all avenues of bias.
Data Dictionary for employed_gender
, earnings_female
, and jobs_gender
is in the Appendix
#full time compared to part time
employedfulltime_gender <- employed_gender %>%
select(year,full_time_female, full_time_male) %>%
rename(Female = full_time_female, Male = full_time_male) %>%
gather(key = "gender", value = "fullTime", Female:Male)
employedfulltime_gender %>%
ggplot(aes(x=year, y = fullTime, color = gender)) +
geom_line(size = .5) +
geom_text_repel(data = employedfulltime_gender %>%
filter(year == max(year)), aes(label = gender), hjust=0, nudge_x = 1, direction = "y", size = 4.5, segment.color = NA) +
geom_point(size =.5) +
scale_x_continuous(breaks = seq(1968,2016, 4),
expand = expansion(add = c(0,11))) +
scale_y_continuous(breaks = seq(0,100,2)) +
scale_color_manual(
values = c('#D95F02', '#1B9E77')) +
theme_half_open(font_size = 11) +
theme(legend.position = 'none') +
geom_curve(
data = data.frame(
x = 1969, xend = 1969, y = 85, yend = 91.5),
mapping = aes(x = x, xend = xend, y = y, yend = yend),
color = 'black', size = 0.5, curvature = 0,
arrow = arrow(length = unit(0.01, "npc"),
type = "closed")) +
geom_smooth(se = FALSE, linetype = 'dashed', method = "lm")+
geom_curve(
data = data.frame(
x = 1969, xend = 1969, y = 83.5, yend = 76),
mapping = aes(x = x, xend = xend, y = y, yend = yend),
color = 'black', size = 0.5, curvature = 0,
arrow = arrow(length = unit(0.01, "npc"),
type = "closed")) +
annotate(geom = 'text', x = 1975, y = 84,
label = 'Difference of 17.1% in 1968', size = 3, color = 'black') +
annotate(geom = 'text', x = 2017, y = 82,
label = 'Difference of 12.5% in 2016', size = 3, color = 'black') +
geom_curve(
data = data.frame(
x = 2017, xend = 2017, y = 82.5, yend = 87),
mapping = aes(x = x, xend = xend, y = y, yend = yend),
color = 'black', size = 0.5, curvature = 0,
arrow = arrow(length = unit(0.01, "npc"),
type = "closed")) +
geom_curve(
data = data.frame(
x = 2017, xend = 2017, y = 81, yend = 76),
mapping = aes(x = x, xend = xend, y = y, yend = yend),
color = 'black', size = 0.5, curvature = 0,
arrow = arrow(length = unit(0.01, "npc"),
type = "closed"))+
scale_y_continuous(labels = scales::percent_format(scale =1, accuracy = 1)) +
labs(x = "Year", y = "Percent Employed Full-time by Gender ", title = "Trends of Genders Working Full Time")
# calculate the difference in 1968
# male = 92.2 female = 75.1 difference = 17.1
# calculate the difference in 2016
# male = 87.6 female = 75.1, difference = 12.5
The above chart, “Trends of Genders Working Full time”, shows a visual representation of the wage gap over time. While the wage gap may be slightly less at the start of 2000 than it was in the 1960’s-1980’s, there is still very much so an apparent gap in salary between male and female in 2016.
While the gap between the two genders has decreased, the decrease is not formed from women entering the work force, but rather men leaving it. The percentage of women working full time in 1968 is the same as in 2016.
It is also clear that there was a large decline in the percentage of men and women employed in 2008 because of the 2008 financial crisis.
After analyzing this chart, we decided to look into whether occupation or age group have a larger effect on the gap.
Not only is there an evident difference between the distribution of the percent of genders working full time, there is also a reflection of a gender difference in salary. The following chart, “Age Trends of Female Salary as a Percentage of Male Salary” shows female salaries as a percentage of male salaries. This chart shows that since 1979 the total trends of female salary as a percent of male salary for all ages has increased over time. In 1979, the female salary as a percentage of male was 62.3% and 80.9% in 2011. While this is an 18.6% increase, females are still not paid the same percentage as male. As of 2011, there is still a 19.1% gap .
agegroupearnings <- earnings_female %>%
filter(group != "Total, 16 years and older")
totalearnings <- earnings_female %>%
filter(group == "Total, 16 years and older")
ggplot() +
geom_line(data = agegroupearnings, aes(x=Year, y = percent, color = group), alpha = 0.45, size =.5) +
geom_line(data = totalearnings, aes(x=Year, y = percent, color = group), alpha = 1.2, size =.7) +
geom_text_repel(data = earnings_female %>%
filter(Year == max(Year)), aes(x= Year, y = percent, label = group, color = group), hjust=0, nudge_x = 1, direction = "y", size = 4.3, segment.color = NA) +
scale_x_continuous(breaks = seq(1980,2011, 5),
expand = expansion(add = c(0,15.5))) +
scale_y_continuous(labels = scales::percent_format(scale =1)) +
scale_color_brewer(palette="Dark2") +
theme_half_open(font_size = 11) +
theme(legend.position = 'none') +
labs(x= "Year", y = "Female salary percent of male salary", title = "Age Trends of Female Salary as a Percentage of Male Salary")