Degrees of Inequality: The Story of Education in America

Educational attainment by demographic in the United States from 2001 to 2024

Author

Leshauna Hartman, Maya Schmidt

Published

December 7, 2025

Introduction

Educational attainment is said to play a vital role in shaping an individual’s career opportunities and economic outcomes. Higher levels of education are often associated with higher wages and salaries, greater access to employment and a lowered risk of unemployment.

This project explores patterns and trends in educational attainment across demographic groups in the United States from 2001 to 2024, highlighting changes, if any, before and after major economic and social disruptions, including the Great Recession of 2008 and the COVID-19 pandemic. By examining how educational attainment has shifted across race, gender and age groups during this period, this analysis is significant as it can provide insight into the equity of educational attainment across demographics, illuminating whether educational opportunities have become more equitable over time or if persistent disparities remain.

The significance of this project goes beyond academic interest. Identifying and understanding trends can help policymakers make informed decisions to address disparities and close educational gaps, ensuring equitable access to higher education. Moreover, identifying which demographic groups have made educational gains, and which have been left behind, can help address systemic inequities in the American education system.

Research Question

How has educational attainment in the United States changed between 2001 and 2024 for different demographics ages 25 and over?

Sub-questions:

To fully address this question, this analysis examines several further interconnected sub-questions:

How do different racial groups compare in terms of educational attainment levels over time?
How do men and women differ in educational attainment trends?
How do different age groups vary in educational attainment?
How have educational attainment levels changed from 2001 to 2024?

Data Sources

The data for this project is sourced from the U.S. Census Bureau found here.

Datasets sourced here provide counts and percentages of individuals aged 25 years and older, broken down by educational attainment level and various demographic categories, spanning the years 2001 to 2024.

The data publication format varied slightly across the time period. Datasets from 2010 to 2024 were published as integrated annual reports, with the exclusion of 2003, while data from 2003 to 2009 were organized into separate files by race and ethnicity. The data for 2001 and 2002 were organized in separate tables by demographic within one spreadsheet. The U.S Census Bureau is widely recognized as a principal Federal agency responsible for the production data about the American people and economy. The Census Bureau provides information that is accurate and unbiased, achieving its objective by using reliable data sources and data products that are prepared and carefully reviewed. (U.S Census Bureau, 2021).

The datasets are derived from an official agency for labor market and demographic statistics. The data is collected through national surveys and undergoes careful review and preparation, enhancing its reliability and validity.

Data Dictionary

Two data dictionary tables are provided below that lists the variable names and their descriptions. These descriptions were adapted from two parts of the U.S Census Bureau’s website. The first can be seen here and the second here.

Code

data_dictionary <- tibble(
  Variable = c("Detailed years of school", 
               "All races/ All people",
               "Male(s)",
               "Female(s)",
               "25 to 34 years old",
               "35 to 54 years old",
               "55 years and older",
               "White",
               "Non-Hispanic White",
               "Black",
               "Asian",
               "Hispanic (of any race)"), 
  Description = c("Detailed years of school", 
                  "Total number of people surveyed", 
                  "Respondents who identify as male",
                  "Respondents who identify as female",
                  "Respondents aged 25 to 34 years",
                  "Respondents aged 35 to 54 years",
                  "Respondents aged 55 and older",
                  "A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.  It includes people who indicate their race as “White” or report responses such as German, Irish, English, Italian, Lebanese, and Egyptian. The category also includes groups such as Polish, French, Iranian, Slavic, Cajun, Chaldean, etc", 
                  "A person having origins in any of the original peoples of Europe, the Middle East, or North Africa, and who does not identify as Hispanic or Latino",
                  "A person having origins in any of the Black racial groups of Africa.  It includes people who indicate their race as “Black or African American” or report responses such as African American, Jamaican, Haitian, Nigerian, Ethiopian, or Somali. The category also includes groups such as Ghanaian, South African, Barbadian, Kenyan, Liberian, Bahamian, etc.",
                   "A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, India, China, the Philippine Islands, Japan, Korea, or Vietnam.  It includes people who indicate their race as “Asian Indian,” “Chinese,” “Filipino,” “Korean,” “Japanese,” “Vietnamese,” and “Other Asian” or provide other detailed Asian responses such as Pakistani, Cambodian, Hmong, Thai, Bengali, Mien, etc.",
                  "A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race."
                             )
)

For the years 2010-2024, all columns except Detailed years of school had further subheadings, number and percent.

Code

data_dictionary2 <- tibble(
  Variable = c("Number", 
               "Percent"), 
  Description = c("The count of respondents (in thousands) within that specific demographic group who have completed a particular level of education", 
                  "The proportion of a demographic group's total population that has completed a particular level of education"
                             ),
  Variable_Type = c("double",
                    "character (2010 -2017, 2021, 2022, 2024),
                    Double (2018 - 2020)")
)

Data Analysis

Data Cleaning

To prepare the data for analysis, yearly datasets from 2001 to 2024 (excluding 2023) were imported. The 2023 data was not included as it was no longer available on the source website. Column names were renamed for clarity and readability. Placeholder values, such as “Z” or “-” were replaced with “0”, based on the dataset documentation indicating that these values represented zero counts. It is important to note that in some datasets such as 2020, there were values listed as “.”, with no clear description as to whether these represented zero or missing data. After replacing placeholders, missing values were removed and each dataset was reshaped from wide to long format using the pivot_longer function.

Specifically for the years 2010 to 2024, the demographic metrics were split into two categories: number(representing counts) and percent (representing percentages), using the names_pattern function. Following the conversion of counts to numeric and pivoting back to a wide format, a new variable, attainment was created to categorize educational attainment levels uniformly across the years. Each dataset then had a column added that listed its respective year. After cleaning each year, they were all merged into a single dataframe spanning 2010-2024.The percent variable was then removed from the combined dataset as there were errors present in the values from the original datasets.

For the earlier period (2001-2009), the data required a different cleaning approach because the Census Bureau published separate files for each race and ethnic category. Each file followed a similar structure but required careful extraction of rows. These datasets used row number indexing to assign sex categories and careful filtering to retain only specific age ranges needed that matches with the more recent years. The indentation markers on some of the years (“.” and “..”) were removed to clean age group label. These datasets were then merged with the 2010-2024 data. Finally, the educational attainment categories were consolidated through the creation of a new variable attainment_level, that combined related degree types together (for example, GED and High School Diploma combined into “HS” to represent High School, and Associate’s and Bachelor’s degrees into “Undergraduate Degree”).

Race and ethnicity categories were reduced to only include Asian, White and Black populations. This decision was made to ensure consistency across all years, and to avoid overlapping data present in categories such as “White” and “Non-Hispanic White”, which could lead to double-counting individuals in the analysis.

Previewing first few rows of original data

Code

# Load educational data, skipping the first 4 header rows
educational_data_2024 <- read_excel(here::here('data_raw', '2024_Educational_Data.xlsx'), skip = 4)

Code

# Preview first few rows of dataset 
head(educational_data_2024)

#> # A tibble: 6 × 23
#>   ...1      Number...2 Percent...3 Number...4 Percent...5 Number...6 Percent...7
#>   <chr>          <dbl> <chr>            <dbl> <chr>            <dbl> <chr>      
#> 1 Total         229800 100             111700 100             118100 100        
#> 2 Elementa…         NA <NA>                NA <NA>                NA <NA>       
#> 3 Less tha…        718 0.3                279 0.3                439 0.4        
#> 4 1st-4th …       1407 0.6                665 0.6                742 0.6        
#> 5 5th-6th …       2826 1.2               1481 1.3               1344 1.10000000…
#> 6 7th-8th …       3092 1.3               1592 1.4               1500 1.3        
#> # ℹ 16 more variables: Number...8 <dbl>, Percent...9 <chr>, Number...10 <dbl>,
#> #   Percent...11 <chr>, Number...12 <dbl>, Percent...13 <chr>,
#> #   Number...14 <dbl>, Percent...15 <chr>, Number...16 <dbl>,
#> #   Percent...17 <chr>, Number...18 <dbl>, Percent...19 <chr>,
#> #   Number...20 <dbl>, Percent...21 <chr>, Number...22 <dbl>,
#> #   Percent...23 <chr>

Code

# Rename columns for clarity and consistency 
educational_data_2024 <- educational_data_2024 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))

Code

# Replace placeholder values
# Drop missing values
educational_2024_clean <- educational_data_2024 %>%
  mutate(across(everything(), ~replace(., . =="Z", "0"))) %>% 
  drop_na()

Code

# Reshape to long format
educational_2024_tidy <- educational_2024_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )

Code

# Convert count values to numeric 
# Reshape back to wide format
educational_2024_tidy <- educational_2024_tidy %>%
  mutate(count = as.numeric(count)) %>%
  pivot_wider(
  names_from = value,
  values_from = count
)

Code

# Create a new variable to categorize educational attainment levels
# Add year

educational_2024_tidy <- educational_2024_tidy %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2024)

Previewing first few rows of cleaned data

Code

head(educational_2024_tidy)

#> # A tibble: 6 × 6
#>   years_of_school attainment demographic number percent  year
#>   <chr>           <chr>      <chr>        <dbl>   <dbl> <dbl>
#> 1 Total           Other      total       229800     100  2024
#> 2 Total           Other      male        111700     100  2024
#> 3 Total           Other      female      118100     100  2024
#> 4 Total           Other      25_to_34     44880     100  2024
#> 5 Total           Other      35_to_54     84430     100  2024
#> 6 Total           Other      55_plus     100500     100  2024

Code

# Load educational data, skipping the first 4 header rows
educational_data_2022 <- read_excel(here::here('data_raw', '2022_Educational_Data.xlsx'), skip = 4)

Previewing first few rows of original data

Code

# Preview the first few rows of the dataset
head(educational_data_2022)

#> # A tibble: 6 × 23
#>   ...1      Number...2 Percent...3 Number...4 Percent...5 Number...6 Percent...7
#>   <chr>          <dbl> <chr>            <dbl> <chr>            <dbl> <chr>      
#> 1 Total         226274 100             109979 100             116296 100        
#> 2 Elementa…         NA <NA>                NA <NA>                NA <NA>       
#> 3 Less tha…        727 0.3                368 0.3                358 0.3        
#> 4 1st-4th …       1476 0.7                782 0.7                694 0.6        
#> 5 5th-6th …       2732 1.2               1364 1.2               1368 1.2        
#> 6 7th-8th …       3001 1.3               1526 1.4               1475 1.3        
#> # ℹ 16 more variables: Number...8 <dbl>, Percent...9 <chr>, Number...10 <dbl>,
#> #   Percent...11 <chr>, Number...12 <dbl>, Percent...13 <chr>,
#> #   Number...14 <dbl>, Percent...15 <chr>, Number...16 <dbl>,
#> #   Percent...17 <chr>, Number...18 <dbl>, Percent...19 <chr>,
#> #   Number...20 <chr>, Percent...21 <chr>, Number...22 <dbl>,
#> #   Percent...23 <chr>

Code

# Rename columns for clarity and consistency

educational_data_2022 <- educational_data_2022 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))

Code

# Replace placeholder values
# Drop missing values

educational_2022_clean <- educational_data_2022 %>%
  mutate(across(everything(), ~replace(., . =="Z", "0"))) %>% 
  drop_na()

Code

# Reshape to long format
educational_2022_tidy <- educational_2022_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )

Code

# Convert count values to numeric 
# Reshape back to wide format
# Create a new variable to categorize educational attainment levels
# Add year 

educational_2022_tidy <- educational_2022_tidy %>%
  mutate(count = as.numeric(count)) %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2022)

Previewing first few rows of cleaned data

Code

head(educational_2022_tidy)

#> # A tibble: 6 × 6
#>   years_of_school attainment demographic number percent  year
#>   <chr>           <chr>      <chr>        <dbl>   <dbl> <dbl>
#> 1 Total           Other      total       226274     100  2022
#> 2 Total           Other      male        109979     100  2022
#> 3 Total           Other      female      116296     100  2022
#> 4 Total           Other      25_to_34     44583     100  2022
#> 5 Total           Other      35_to_54     83321     100  2022
#> 6 Total           Other      55_plus      98371     100  2022

Code

# Load educational data, skipping the first 4 header rows
educational_data_2021 <- read_excel(here::here('data_raw', '2021_Educational_Data.xlsx'), skip = 4)

Previewing first few rows of original data

Code

# Preview the first few rows of the dataset
head(educational_data_2021)

#> # A tibble: 6 × 23
#>   ...1      Number...2 Percent...3 Number...4 Percent...5 Number...6 Percent...7
#>   <chr>          <dbl> <chr>            <dbl> <chr>            <dbl> <chr>      
#> 1 Total         224580 100             108327 100             116253 100        
#> 2 Elementa…         NA <NA>                NA <NA>                NA <NA>       
#> 3 Less tha…        732 0.3                365 0.3                368 0.3        
#> 4 1st-4th …       1341 0.6                720 0.7                622 0.5        
#> 5 5th-6th …       2787 1.2               1377 1.3               1410 1.2        
#> 6 7th-8th …       3076 1.4               1548 1.4               1528 1.3        
#> # ℹ 16 more variables: Number...8 <dbl>, Percent...9 <chr>, Number...10 <dbl>,
#> #   Percent...11 <chr>, Number...12 <dbl>, Percent...13 <chr>,
#> #   Number...14 <dbl>, Percent...15 <chr>, Number...16 <dbl>,
#> #   Percent...17 <chr>, Number...18 <dbl>, Percent...19 <chr>,
#> #   Number...20 <chr>, Percent...21 <chr>, Number...22 <dbl>,
#> #   Percent...23 <chr>

Code

# Rename columns for clarity and consistency

educational_data_2021 <- educational_data_2021 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))

Code

# Replace placeholder values
# Drop missing values
educational_2021_clean <- educational_data_2021 %>%
  mutate(across(everything(), ~replace(., . =="Z", "0"))) %>% 
  drop_na()

Code

# Reshape to long format
educational_2021_tidy <- educational_2021_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )

Code

# Convert count values to numeric 
# Reshape back to wide format
# Create a new variable to categorize educational attainment levels
# Add year

educational_2021_tidy <- educational_2021_tidy %>%
  mutate(count = as.numeric(count)) %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2021)

Previewing first few rows of original data

Code

head(educational_2021_tidy)

#> # A tibble: 6 × 6
#>   years_of_school attainment demographic number percent  year
#>   <chr>           <chr>      <chr>        <dbl>   <dbl> <dbl>
#> 1 Total           Other      total       224580     100  2021
#> 2 Total           Other      male        108327     100  2021
#> 3 Total           Other      female      116253     100  2021
#> 4 Total           Other      25_to_34     45284     100  2021
#> 5 Total           Other      35_to_54     81684     100  2021
#> 6 Total           Other      55_plus      97613     100  2021

The cleaning process were applied for the remaining years 2010 - 2020

Code

# Load educational data, skipping the unnecessary header rows

educational_data_2020 <- read_excel(here::here('data_raw', '2020_Educational_Data.xlsx'), skip = 5)
educational_data_2019 <- read_excel(here::here('data_raw', '2019_Educational_Data.xlsx'), skip = 5)
educational_data_2018 <- read_excel(here::here('data_raw', '2018_Educational_Data.xlsx'), skip = 5)
educational_data_2017 <- read_excel(here::here('data_raw', '2017_Educational_Data.xlsx'), skip = 5)
educational_data_2016 <- read_excel(here::here('data_raw', '2016_Educational_Data.xlsx'), skip = 5)
educational_data_2015 <- read_excel(here::here('data_raw', '2015_Educational_Data.xlsx'), skip = 5)
educational_data_2014 <- read_excel(here::here('data_raw', '2014_Educational_Data.xlsx'), skip = 5)
educational_data_2013 <- read_excel(here::here('data_raw', '2013_Educational_Data.xlsx'), skip = 5)
educational_data_2012 <- read_excel(here::here('data_raw', '2012_Educational_Data.xlsx'), skip = 6)
educational_data_2011 <- read_excel(here::here('data_raw', '2011_Educational_Data.xlsx'), skip = 6)
educational_data_2010 <- read_excel(here::here('data_raw', '2010_Educational_Data.xlsx'), skip = 5)

Code

# Rename columns for clarity and consistency

educational_data_2020 <- educational_data_2020 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2019 <- educational_data_2019 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2018 <- educational_data_2018 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2017 <- educational_data_2017 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2016 <- educational_data_2016 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2015 <- educational_data_2015 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2014 <- educational_data_2014 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2013 <- educational_data_2013 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2012 <- educational_data_2012 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2011 <- educational_data_2011 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2010 <- educational_data_2010 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))

Code

# Replace placeholder values
# Drop missing values

educational_2020_clean <- educational_data_2020 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2019_clean <- educational_data_2019 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2018_clean <- educational_data_2018 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2017_clean <- educational_data_2017 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2016_clean <- educational_data_2016 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2015_clean <- educational_data_2015 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2014_clean <- educational_data_2014 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2013_clean <- educational_data_2013 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2012_clean <- educational_data_2012 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2011_clean <- educational_data_2011 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2010_clean <- educational_data_2010 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()

Code

# Reshape to long format
educational_2020_tidy <- educational_2020_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2019_tidy <- educational_2019_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
educational_2018_tidy <- educational_2018_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
educational_2017_tidy <- educational_2017_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2016_tidy <- educational_2016_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2015_tidy <- educational_2015_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2014_tidy <- educational_2014_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2013_tidy <- educational_2013_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2012_tidy <- educational_2012_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2011_tidy <- educational_2011_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2010_tidy <- educational_2010_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )

Code

# Convert count values to numeric 
# Reshape back to wide format
# Create a new variable to categorize educational attainment levels
# Add year

educational_2020_tidy <- educational_2020_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2020)
educational_2019_tidy <- educational_2019_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2019)
educational_2018_tidy <- educational_2018_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2018)
educational_2017_tidy <- educational_2017_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2017)
educational_2016_tidy <- educational_2016_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2016)
educational_2015_tidy <- educational_2015_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2015)
educational_2014_tidy <- educational_2014_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2014)
educational_2013_tidy <- educational_2013_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2013)
educational_2012_tidy <- educational_2012_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2012)
educational_2011_tidy <- educational_2011_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2011)
educational_2010_tidy <- educational_2010_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2010)

Code

# Check structure and datatypes of most recent years to ensure consistency
glimpse(educational_2021_tidy)

#> Rows: 407
#> Columns: 6
#> $ years_of_school <chr> "Total", "Total", "Total", "Total", "Total", "Total", …
#> $ attainment      <chr> "Other", "Other", "Other", "Other", "Other", "Other", …
#> $ demographic     <chr> "total", "male", "female", "25_to_34", "35_to_54", "55…
#> $ number          <dbl> 224580, 108327, 116253, 45284, 81684, 97613, 175206, 1…
#> $ percent         <dbl> 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0…
#> $ year            <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, …

Code

glimpse(educational_2022_tidy)

#> Rows: 407
#> Columns: 6
#> $ years_of_school <chr> "Total", "Total", "Total", "Total", "Total", "Total", …
#> $ attainment      <chr> "Other", "Other", "Other", "Other", "Other", "Other", …
#> $ demographic     <chr> "total", "male", "female", "25_to_34", "35_to_54", "55…
#> $ number          <dbl> 226274, 109979, 116296, 44583, 83321, 98371, 175898, 1…
#> $ percent         <dbl> 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0…
#> $ year            <dbl> 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …

Code

glimpse(educational_2024_tidy)

#> Rows: 407
#> Columns: 6
#> $ years_of_school <chr> "Total", "Total", "Total", "Total", "Total", "Total", …
#> $ attainment      <chr> "Other", "Other", "Other", "Other", "Other", "Other", …
#> $ demographic     <chr> "total", "male", "female", "25_to_34", "35_to_54", "55…
#> $ number          <dbl> 229800, 111700, 118100, 44880, 84430, 100500, 177100, …
#> $ percent         <dbl> 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0…
#> $ year            <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, …

Code

educational_2020_tidy <- educational_2020_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2019_tidy <- educational_2019_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2018_tidy <- educational_2018_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2017_tidy <- educational_2017_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2016_tidy <- educational_2016_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2015_tidy <- educational_2015_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2014_tidy <- educational_2014_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2013_tidy <- educational_2013_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2012_tidy <- educational_2012_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2011_tidy <- educational_2011_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2010_tidy <- educational_2010_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )

Code

# Combine all cleaned and tidy datasets
educational_data_combined <- bind_rows(
  educational_2010_tidy,
  educational_2011_tidy,
  educational_2012_tidy,
  educational_2013_tidy,
  educational_2014_tidy,
  educational_2015_tidy,
  educational_2016_tidy,
  educational_2017_tidy,
  educational_2018_tidy,
  educational_2019_tidy,
  educational_2020_tidy,
  educational_2021_tidy,
  educational_2022_tidy,
  educational_2024_tidy) %>%
  select(-percent) %>%
  filter(!is.na(number))
#view(educational_data_combined)

Code

# Grouping educational attainment categories

educational_data_combined <- educational_data_combined %>% 
  mutate(attainment_level = case_when(
    attainment == "Less than High School" ~ "Less than HS",
    attainment %in% c("GED", "High School Diploma") ~ "HS",
    attainment == "Some College, No Degree" ~ "Some College, No Degree",
    attainment %in% c("Associate's Degree", "Bachelor's Degree") ~ "Undergraduate Degree",
    attainment %in% c("Professional Degree", "Master's Degree") ~ "Graduate Degree",
    attainment == "Doctoral Degree" ~ "Doctoral Degree", 
    TRUE ~ "Total")
  )

educational_data_combined <- educational_data_combined %>%
  group_by(year, attainment_level, demographic) %>%
  summarise(number = sum(number, na.rm = TRUE), .groups = "drop")

Previewing first few rows of combined data

Code

educational_data_combined

#> # A tibble: 1,023 × 4
#>     year attainment_level demographic        number
#>    <dbl> <chr>            <chr>               <dbl>
#>  1  2010 Doctoral Degree  25_to_34             373.
#>  2  2010 Doctoral Degree  35_to_54            1237.
#>  3  2010 Doctoral Degree  55_plus             1169.
#>  4  2010 Doctoral Degree  asian                334.
#>  5  2010 Doctoral Degree  black                123.
#>  6  2010 Doctoral Degree  female              1015.
#>  7  2010 Doctoral Degree  hispanic             146.
#>  8  2010 Doctoral Degree  male                1763.
#>  9  2010 Doctoral Degree  non_hispanic_white  2141.
#> 10  2010 Doctoral Degree  total               2779.
#> # ℹ 1,013 more rows

Code

#loading and cleaning 2009_Asian
educational_data_2009_Asian <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Asian.xls'), skip = 7)

#set names
educational_data_2009_asian <- educational_data_2009_Asian %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

#2 = both, 1= female, 0= male
educational_data_2009_asian <- educational_data_2009_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))

#fix data types
educational_data_2009_asian <- educational_data_2009_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))

#pivot and check
educational_data_2009_asian <- educational_data_2009_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2009_asian)

Code

#cleaning and loading 2009_Black
educational_data_2009_Black <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Black.xls'), skip = 7)

educational_data_2009_black <- educational_data_2009_Black %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
#2 = both, 1= female, 0= male
educational_data_2009_black <- educational_data_2009_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))


educational_data_2009_black <- educational_data_2009_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

head(educational_data_2009_black)

#> # A tibble: 6 × 18
#>   sexes age_group    total  none `1st-4th_grade` `5th-6th_grade` `7th-8th_grade`
#>   <dbl> <chr>        <dbl> <dbl>           <dbl>           <dbl>           <dbl>
#> 1     2 18 to 24 ye…  4196    18              11               6              33
#> 2     2 25 to 29 ye…  2882     4               6               7              21
#> 3     2 30 to 34 ye…  2500     4               5              11              13
#> 4     2 35 to 39 ye…  2561     2              13              13              17
#> 5     2 40 to 44 ye…  2609     5               8               5              23
#> 6     2 45 to 49 ye…  2763     6               9               6              36
#> # ℹ 11 more variables: `9th_grade` <dbl>, `10th_grade` <dbl>,
#> #   `11th_grade` <dbl>, HS_graduate <dbl>, some_college <dbl>,
#> #   associates_degree_occupational <dbl>, associates_degree_academic <dbl>,
#> #   bachelors_degree <dbl>, masters_degree <dbl>, professional_degree <dbl>,
#> #   doctoral_degree <dbl>

Code

educational_data_2009_black <- educational_data_2009_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeating everything for 2009_Hispanic
educational_data_2009_Hispanic <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Hispanic.xls'), skip = 7)


educational_data_2009_hispanic <- educational_data_2009_Hispanic %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
#2 = both, 1= female, 0= male
educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))

educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

head(educational_data_2009_hispanic)

#> # A tibble: 6 × 18
#>   sexes age_group    total  none `1st-4th_grade` `5th-6th_grade` `7th-8th_grade`
#>   <dbl> <chr>        <dbl> <dbl>           <dbl>           <dbl>           <dbl>
#> 1     2 18 to 24 ye…  5072    23              34             162             121
#> 2     2 25 to 29 ye…  4260    14              96             296             184
#> 3     2 30 to 34 ye…  3867    44             100             353             187
#> 4     2 35 to 39 ye…  3768    26             124             400             167
#> 5     2 40 to 44 ye…  3260    38             135             301             161
#> 6     2 45 to 49 ye…  2835    41             134             279             156
#> # ℹ 11 more variables: `9th_grade` <dbl>, `10th_grade` <dbl>,
#> #   `11th_grade` <dbl>, HS_graduate <dbl>, some_college <dbl>,
#> #   associates_degree_occupational <dbl>, associates_degree_academic <dbl>,
#> #   bachelors_degree <dbl>, masters_degree <dbl>, professional_degree <dbl>,
#> #   doctoral_degree <dbl>

Code

educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeating everything for 2009_Non_Hispanic_White
educational_data_2009_non_hispanic_white <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Non_Hispanic_White.xls'), skip = 7)

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

#2 = both, 1= female, 0= male
educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))
head(educational_data_2009_non_hispanic_white)

#> # A tibble: 6 × 18
#>   sexes age_group    total none  `1st-4th_grade` `5th-6th_grade` `7th-8th_grade`
#>   <dbl> <chr>        <chr> <chr> <chr>           <chr>           <chr>          
#> 1     2 18 to 24 ye… 17717 22    8               17              110            
#> 2     2 25 to 29 ye… 12727 7     4               6               75             
#> 3     2 30 to 34 ye… 11464 20    5               6               60             
#> 4     2 35 to 39 ye… 12569 21    7               18              90             
#> 5     2 40 to 44 ye… 13646 11    12              14              122            
#> 6     2 45 to 49 ye… 15740 9     11              21              162            
#> # ℹ 11 more variables: `9th_grade` <chr>, `10th_grade` <chr>,
#> #   `11th_grade` <chr>, HS_graduate <chr>, some_college <chr>,
#> #   associates_degree_occupational <chr>, associates_degree_academic <chr>,
#> #   bachelors_degree <chr>, masters_degree <chr>, professional_degree <chr>,
#> #   doctoral_degree <chr>

Code

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

head(educational_data_2009_non_hispanic_white)

#> # A tibble: 6 × 18
#>   sexes age_group    total  none `1st-4th_grade` `5th-6th_grade` `7th-8th_grade`
#>   <dbl> <chr>        <dbl> <dbl>           <dbl>           <dbl>           <dbl>
#> 1     2 18 to 24 ye… 17717    22               8              17             110
#> 2     2 25 to 29 ye… 12727     7               4               6              75
#> 3     2 30 to 34 ye… 11464    20               5               6              60
#> 4     2 35 to 39 ye… 12569    21               7              18              90
#> 5     2 40 to 44 ye… 13646    11              12              14             122
#> 6     2 45 to 49 ye… 15740     9              11              21             162
#> # ℹ 11 more variables: `9th_grade` <dbl>, `10th_grade` <dbl>,
#> #   `11th_grade` <dbl>, HS_graduate <dbl>, some_college <dbl>,
#> #   associates_degree_occupational <dbl>, associates_degree_academic <dbl>,
#> #   bachelors_degree <dbl>, masters_degree <dbl>, professional_degree <dbl>,
#> #   doctoral_degree <dbl>

Code

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeating everything for 2009_White
educational_data_2009_white <- read_excel(here::here('data_raw', '2009_Educational_Attainment_White.xls'), skip = 7)

educational_data_2009_white <- educational_data_2009_white %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

#2 = both, 1= female, 0= male
educational_data_2009_white <- educational_data_2009_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))

educational_data_2009_white <- educational_data_2009_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2009_white <- educational_data_2009_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#joining 2009 tables
#adding race column

educational_data_2009_asian <- educational_data_2009_asian %>%
  mutate(race = "asian")

educational_data_2009_black <- educational_data_2009_black %>%
  mutate(race = "black")

educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  mutate(race = "hispanic")

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2009_white <- educational_data_2009_white %>%
  mutate(race = "white")


test_join_2009 <- educational_data_2009_asian %>%
  full_join(educational_data_2009_black) %>%
  full_join(educational_data_2009_hispanic) %>%
  full_join(educational_data_2009_non_hispanic_white) %>%
  full_join(educational_data_2009_white) %>%
  mutate(
    year = 2009
  )

head(test_join_2009)

#> # A tibble: 6 × 6
#>   sexes age_group      educational_attainment count race   year
#>   <dbl> <chr>          <chr>                  <dbl> <chr> <dbl>
#> 1     2 18 to 24 years total                   1137 asian  2009
#> 2     2 18 to 24 years none                       1 asian  2009
#> 3     2 18 to 24 years 1st-4th_grade              5 asian  2009
#> 4     2 18 to 24 years 5th-6th_grade              2 asian  2009
#> 5     2 18 to 24 years 7th-8th_grade              7 asian  2009
#> 6     2 18 to 24 years 9th_grade                 11 asian  2009

Code

#repeat for 2008 white
educational_data_2008_white <- read_excel(here::here('data_raw', '2008_Educational_Attainment_White.xls'), skip = 6)
educational_data_2008_white

#> # A tibble: 51 × 17
#>    `Both Sexes`       ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 ...10 ...11
#>    <chr>             <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 .18 years and o… 182714   609  1535  3162  4079  3333  4089  8134 56676 35759
#>  2 ..18 to 24 years  22056    36    45   206   205   367   609  2755  6377  8146
#>  3 .25 years and o… 160658   573  1490  2956  3874  2966  3479  5378 50299 27614
#>  4 ..25 to 29 years  16501    26    90   261   246   348   385   696  4609  3139
#>  5 ..30 to 34 years  14902    37   101   317   232   342   262   537  4099  2542
#>  6 ..35 to 39 years  16365    32   108   354   255   340   293   466  4387  2822
#>  7 ..40 to 44 years  17111    56   183   299   277   323   325   507  5156  2811
#>  8 ..45 to 49 years  18427    59   126   329   284   238   315   609  5896  3230
#>  9 ..50 to 54 years  17495    80   128   269   270   214   272   488  5584  2956
#> 10 ..55 to 59 years  15292    43   141   213   257   204   239   366  4591  2762
#> # ℹ 41 more rows
#> # ℹ 6 more variables: ...12 <dbl>, ...13 <dbl>, ...14 <dbl>, ...15 <dbl>,
#> #   ...16 <dbl>, ...17 <dbl>

Code

educational_data_2008_white <- educational_data_2008_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2008_white <- educational_data_2008_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_white <- educational_data_2008_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_white <- educational_data_2008_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeat for 2008 non-hispanic white
educational_data_2008_non_hispanic_white <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Non_Hispanic_White.xls'), skip = 6)


educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeat for 2008 black
educational_data_2008_black <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Black.xls'), skip = 6)


educational_data_2008_black <- educational_data_2008_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2008_black <- educational_data_2008_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_black <- educational_data_2008_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_black <- educational_data_2008_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeat for 2008 asian
educational_data_2008_asian <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Asian.xls'), skip = 6)

# Set names
educational_data_2008_asian <- educational_data_2008_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2008_asian <- educational_data_2008_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44 ~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

# Fix data types
educational_data_2008_asian <- educational_data_2008_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))

# Pivot
educational_data_2008_asian <- educational_data_2008_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#repeat for 2008 hispanic
educational_data_2008_hispanic <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Hispanic.xls'), skip = 6)

# Set names
educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

#joining 2008 tables

#adding race column

educational_data_2008_asian <- educational_data_2008_asian %>%
  mutate(race = "asian")

educational_data_2008_black <- educational_data_2008_black %>%
  mutate(race = "black")

educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  mutate(race = "hispanic")

educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2008_white <- educational_data_2008_white %>%
  mutate(race = "white")

test_join_2008 <- educational_data_2008_asian %>%
  full_join(educational_data_2008_black) %>%
  full_join(educational_data_2008_hispanic) %>%
  full_join(educational_data_2008_non_hispanic_white) %>%
  full_join(educational_data_2008_white) %>%
  filter(!is.na(sexes)) %>%
  mutate(
    year = 2008
  )

head(test_join_2008)

#> # A tibble: 6 × 6
#>   age_group         sexes educational_attainment count race   year
#>   <chr>             <dbl> <chr>                  <dbl> <chr> <dbl>
#> 1 18 years and over     2 total                  10277 asian  2008
#> 2 18 years and over     2 none                     111 asian  2008
#> 3 18 years and over     2 1st-4th_grade             98 asian  2008
#> 4 18 years and over     2 5th-6th_grade            214 asian  2008
#> 5 18 years and over     2 7th-8th_grade            170 asian  2008
#> 6 18 years and over     2 9th_grade                100 asian  2008

Code

# Loading and cleaning 2007 Data
# Asian
educational_data_2007_asian <- read_excel(here::here('data_raw', '2007_Educational_Data_asian.xlsx'), skip = 6)

# Set names
educational_data_2007_asian <- educational_data_2007_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2007_asian <- educational_data_2007_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44 ~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

# Fix data types
educational_data_2007_asian <- educational_data_2007_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))

# Pivot
educational_data_2007_asian <- educational_data_2007_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Repeat for other races/ethnicities
# Black
educational_data_2007_black <- read_excel(here::here('data_raw', '2007_Educational_Data_black.xlsx'), skip = 6)

educational_data_2007_black <- educational_data_2007_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2007_black <- educational_data_2007_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_black <- educational_data_2007_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_black <- educational_data_2007_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Hispanic
educational_data_2007_hispanic <- read_excel(here::here('data_raw', '2007_Educational_Data_hispanic.xlsx'), skip = 6)

# Set names
educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Non_Hispanic_White

educational_data_2007_non_hispanic_white <- read_excel(here::here('data_raw', '2007_Educational_Data_non_hispanic_white.xlsx'), skip = 6)

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# White
educational_data_2007_white <- read_excel(here::here('data_raw', '2007_Educational_Data_white.xlsx'), skip = 6)

educational_data_2007_white <- educational_data_2007_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2007_white <- educational_data_2007_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_white <- educational_data_2007_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_white <- educational_data_2007_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Joining 2007 tables
# Adding race column

educational_data_2007_asian <- educational_data_2007_asian %>%
  mutate(race = "asian")

educational_data_2007_black <- educational_data_2007_black %>%
  mutate(race = "black")

educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  mutate(race = "hispanic")

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2007_white <- educational_data_2007_white %>%
  mutate(race = "white")

test_join_2007 <- educational_data_2007_asian %>%
  full_join(educational_data_2007_black) %>%
  full_join(educational_data_2007_hispanic) %>%
  full_join(educational_data_2007_non_hispanic_white) %>%
  full_join(educational_data_2007_white) %>%
  mutate(
    year = 2007
  )

head(test_join_2007)

#> # A tibble: 6 × 6
#>   age_group         sexes educational_attainment count race   year
#>   <chr>             <dbl> <chr>                  <dbl> <chr> <dbl>
#> 1 18 years and over     2 total                  10221 asian  2007
#> 2 18 years and over     2 none                     151 asian  2007
#> 3 18 years and over     2 1st-4th_grade            129 asian  2007
#> 4 18 years and over     2 5th-6th_grade            191 asian  2007
#> 5 18 years and over     2 7th-8th_grade            189 asian  2007
#> 6 18 years and over     2 9th_grade                116 asian  2007

Code

# Loading and cleaning 2006 Data
# Asian
educational_data_2006_asian <- read_excel(here::here('data_raw', '2006_Educational_Data_asian.xlsx'), skip = 6)

# Set names
educational_data_2006_asian <- educational_data_2006_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2006_asian <- educational_data_2006_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2006_asian <- educational_data_2006_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2006_asian <- educational_data_2006_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Repeat for other races/ethnicities
# Black
educational_data_2006_black <- read_excel(here::here('data_raw', '2006_Educational_Data_black.xlsx'), skip = 6)

educational_data_2006_black <- educational_data_2006_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2006_black <- educational_data_2006_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_black <- educational_data_2006_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_black <- educational_data_2006_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Hispanic
educational_data_2006_hispanic <- read_excel(here::here('data_raw', '2006_Educational_Data_hispanic.xlsx'), skip = 6)

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Non_Hispanic_White

educational_data_2006_non_hispanic_white <- read_excel(here::here('data_raw', '2006_Educational_Data_non_hispanic_white.xlsx'), skip = 6)

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# White
educational_data_2006_white <- read_excel(here::here('data_raw', '2006_Educational_Data_white.xlsx'), skip = 6)

educational_data_2006_white <- educational_data_2006_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2006_white <- educational_data_2006_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_white <- educational_data_2006_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_white <- educational_data_2006_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Joining 2006 tables
# Adding race column

educational_data_2006_asian <- educational_data_2006_asian %>%
  mutate(race = "asian")

educational_data_2006_black <- educational_data_2006_black %>%
  mutate(race = "black")

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  mutate(race = "hispanic")

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2006_white <- educational_data_2006_white %>%
  mutate(race = "white")

test_join_2006 <- educational_data_2006_asian %>%
  full_join(educational_data_2006_black) %>%
  full_join(educational_data_2006_hispanic) %>%
  full_join(educational_data_2006_non_hispanic_white) %>%
  full_join(educational_data_2006_white) %>%
  mutate(
    year = 2006
  )

head(test_join_2006)

#> # A tibble: 6 × 6
#>   age_group      sexes educational_attainment count race   year
#>   <chr>          <dbl> <chr>                  <dbl> <chr> <dbl>
#> 1 15 to 17 years     2 total                    473 asian  2006
#> 2 15 to 17 years     2 none                       0 asian  2006
#> 3 15 to 17 years     2 1st-4th_grade              0 asian  2006
#> 4 15 to 17 years     2 5th-6th_grade              2 asian  2006
#> 5 15 to 17 years     2 7th-8th_grade             56 asian  2006
#> 6 15 to 17 years     2 9th_grade                152 asian  2006

Code

# Loading and cleaning 2005 Data
# Asian
educational_data_2005_asian <- read_excel(here::here('data_raw', '2005_Educational_Data_asian.xlsx'), skip = 5)

# Set names
educational_data_2005_asian <- educational_data_2005_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2005_asian <- educational_data_2005_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2005_asian <- educational_data_2005_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2005_asian <- educational_data_2005_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2005_asian)

Code

# Repeat for other races/ethnicities
# Black
educational_data_2005_black <- read_excel(here::here('data_raw', '2005_Educational_Data_black.xlsx'), skip = 5)

educational_data_2005_black <- educational_data_2005_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2005_black <- educational_data_2005_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_black <- educational_data_2005_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_black <- educational_data_2005_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Hispanic
educational_data_2005_hispanic <- read_excel(here::here('data_raw', '2005_Educational_Data_hispanic.xlsx'), skip = 5)

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Non_Hispanic_White

educational_data_2005_non_hispanic_white <- read_excel(here::here('data_raw', '2005_Educational_Data_non_hispanic_white.xlsx'), skip = 5)

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# White
educational_data_2005_white <- read_excel(here::here('data_raw', '2005_Educational_Data_white.xlsx'), skip = 5)

educational_data_2005_white <- educational_data_2005_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2005_white <- educational_data_2005_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_white <- educational_data_2005_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_white <- educational_data_2005_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Joining 2005 tables
# Adding race column

educational_data_2005_asian <- educational_data_2005_asian %>%
  mutate(race = "asian")

educational_data_2005_black <- educational_data_2005_black %>%
  mutate(race = "black")

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  mutate(race = "hispanic")

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2005_white <- educational_data_2005_white %>%
  mutate(race = "white")

test_join_2005 <- educational_data_2005_asian %>%
  full_join(educational_data_2005_black) %>%
  full_join(educational_data_2005_hispanic) %>%
  full_join(educational_data_2005_non_hispanic_white) %>%
  full_join(educational_data_2005_white) %>%
  mutate(
    year = 2005
  )

head(test_join_2005)

#> # A tibble: 6 × 6
#>   age_group         sexes educational_attainment count race   year
#>   <chr>             <dbl> <chr>                  <dbl> <chr> <dbl>
#> 1 15 years and over     2 total                   9873 asian  2005
#> 2 15 years and over     2 none                     124 asian  2005
#> 3 15 years and over     2 1st-4th_grade            119 asian  2005
#> 4 15 years and over     2 5th-6th_grade            216 asian  2005
#> 5 15 years and over     2 7th-8th_grade            256 asian  2005
#> 6 15 years and over     2 9th_grade                239 asian  2005

Code

# Loading and cleaning 2004 Data
# Asian
educational_data_2004_asian <- read_excel(here::here('data_raw', '2004_Educational_Data_asian.xlsx'), skip = 5)

# Set names
educational_data_2004_asian <- educational_data_2004_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2004_asian <- educational_data_2004_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2004_asian <- educational_data_2004_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2004_asian <- educational_data_2004_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2004_asian)

Code

# Repeat for other races/ethnicities
# Black
educational_data_2004_black <- read_excel(here::here('data_raw', '2004_Educational_Data_black.xlsx'), skip = 5)

educational_data_2004_black <- educational_data_2004_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2004_black <- educational_data_2004_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_black <- educational_data_2004_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_black <- educational_data_2004_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Hispanic
educational_data_2004_hispanic <- read_excel(here::here('data_raw', '2004_Educational_Data_hispanic.xlsx'), skip = 5)

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Non_Hispanic_White

educational_data_2004_non_hispanic_white <- read_excel(here::here('data_raw', '2004_Educational_Data_non_hispanic_white.xlsx'), skip = 5)

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# White
educational_data_2004_white <- read_excel(here::here('data_raw', '2004_Educational_Data_white.xlsx'), skip = 5)

educational_data_2004_white <- educational_data_2004_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2004_white <- educational_data_2004_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_white <- educational_data_2004_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_white <- educational_data_2004_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Joining 2004 tables
# Adding race column

educational_data_2004_asian <- educational_data_2004_asian %>%
  mutate(race = "asian")

educational_data_2004_black <- educational_data_2004_black %>%
  mutate(race = "black")

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  mutate(race = "hispanic")

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2004_white <- educational_data_2004_white %>%
  mutate(race = "white")

test_join_2004 <- educational_data_2004_asian %>%
  full_join(educational_data_2004_black) %>%
  full_join(educational_data_2004_hispanic) %>%
  full_join(educational_data_2004_non_hispanic_white) %>%
  full_join(educational_data_2004_white) %>%
  mutate(
    year = 2004
  )

head(test_join_2004)

#> # A tibble: 6 × 6
#>   age_group         sexes educational_attainment count race   year
#>   <chr>             <dbl> <chr>                  <dbl> <chr> <dbl>
#> 1 15 years and over     2 total                   9592 asian  2004
#> 2 15 years and over     2 none                     135 asian  2004
#> 3 15 years and over     2 1st-4th_grade            109 asian  2004
#> 4 15 years and over     2 5th-6th_grade            211 asian  2004
#> 5 15 years and over     2 7th-8th_grade            258 asian  2004
#> 6 15 years and over     2 9th_grade                265 asian  2004

Code

# Loading and cleaning 2003 Data
# Asian
educational_data_2003_asian <- read_excel(here::here('data_raw', '2003_Educational_Data_asian.xlsx'), skip = 5)

# Set names
educational_data_2003_asian <- educational_data_2003_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2003_asian <- educational_data_2003_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2003_asian <- educational_data_2003_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2003_asian <- educational_data_2003_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2003_asian)

Code

# Repeat for other races/ethnicities
# Black
educational_data_2003_black <- read_excel(here::here('data_raw', '2003_Educational_Data_black.xlsx'), skip = 5)

educational_data_2003_black <- educational_data_2003_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2003_black <- educational_data_2003_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_black <- educational_data_2003_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_black <- educational_data_2003_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Hispanic
educational_data_2003_hispanic <- read_excel(here::here('data_raw', '2003_Educational_Data_hispanic.xlsx'), skip = 5)

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Non_Hispanic_White

educational_data_2003_non_hispanic_white <- read_excel(here::here('data_raw', '2003_Educational_Data_non_hispanic_white.xlsx'), skip = 5)

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# White
educational_data_2003_white <- read_excel(here::here('data_raw', '2003_Educational_Data_white.xlsx'), skip = 5)

educational_data_2003_white <- educational_data_2003_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2003_white <- educational_data_2003_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_white <- educational_data_2003_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_white <- educational_data_2003_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# Joining 2003 tables
# Adding race column

educational_data_2003_asian <- educational_data_2003_asian %>%
  mutate(race = "asian")

educational_data_2003_black <- educational_data_2003_black %>%
  mutate(race = "black")

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  mutate(race = "hispanic")

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2003_white <- educational_data_2003_white %>%
  mutate(race = "white")

test_join_2003 <- educational_data_2003_asian %>%
  full_join(educational_data_2003_black) %>%
  full_join(educational_data_2003_hispanic) %>%
  full_join(educational_data_2003_non_hispanic_white) %>%
  full_join(educational_data_2003_white) %>%
  mutate(
    year = 2003
  )

head(test_join_2003)

#> # A tibble: 6 × 6
#>   age_group         sexes educational_attainment count race   year
#>   <chr>             <dbl> <chr>                  <dbl> <chr> <dbl>
#> 1 15 years and over     2 total                   9328 asian  2003
#> 2 15 years and over     2 none                     110 asian  2003
#> 3 15 years and over     2 1st-4th_grade            113 asian  2003
#> 4 15 years and over     2 5th-6th_grade            210 asian  2003
#> 5 15 years and over     2 7th-8th_grade            225 asian  2003
#> 6 15 years and over     2 9th_grade                250 asian  2003

Code

educational_data_2002 <- read_excel(here::here('data_raw', '2002_Educational_Attainment.xls'), skip = 5)

Code

#split and reattach: all races, both sexes, 2002
split_row_1 = 15
split_row_2 = 34
split_row_3 = 48
allraces_bothsexes_2002 <-educational_data_2002 %>% slice(1:split_row_1) 
allraces_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_2:split_row_3)

#renaming the columns for the first half of the allraces/sexes 2002 data
allraces_bothsexes_2002 <- allraces_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

#renaming the columns for the second half of the allraces/sexes 2002 data

allraces_bothsexes_2002_2 <- allraces_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

# re-merged: wide
allraces_bothsexes_2002_2 <- allraces_bothsexes_2002_2 %>%
  mutate(
    race = "all_races",
    sexes = "both_sexes"
  )

allraces_bothsexes_2002 <- allraces_bothsexes_2002 %>%
  mutate(
    race = "all_races",
    sexes = "both_sexes"
  )
allraces_bothsexes_2002_merged <- full_join(allraces_bothsexes_2002, allraces_bothsexes_2002_2)

allraces_bothsexes_2002_merged <- allraces_bothsexes_2002_merged %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

# collapsing age
allraces_bothsexes_2002_merged <- allraces_bothsexes_2002_merged %>%
  filter(age_group != "15 years and over") %>%
  filter(age_group != "15 to 17 years") %>%
  filter(age_group != "18 to 19 years") %>%
  filter(age_group != "20 to 24 years") %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

# pivot longer
allraces_bothsexes_2002_merged_2 <- allraces_bothsexes_2002_merged %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

# collapsing attainment
allraces_bothsexes_2002_merged_3 <- allraces_bothsexes_2002_merged_2 %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

#load and rename
split_row_4 = 65
split_row_5 = 81
allraces_male_2002 <-educational_data_2002 %>% slice(split_row_4:split_row_5)

split_row_6 = 99
split_row_7 = 114

allraces_male_2002_2 <-educational_data_2002 %>% slice(split_row_6:split_row_7)

#renaming the columns for the first half of the allraces/male 2002 data
allraces_male_2002 <- allraces_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the allraces/male 2002 data

allraces_male_2002_2 <- allraces_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

allraces_male_2002_2 <- allraces_male_2002_2 %>%
  mutate(
    race = "all_races",
    sexes = "male"
  )

allraces_male_2002 <- allraces_male_2002 %>%
  mutate(
    race = "all_races",
    sexes = "male"
  )
allraces_male_2002_merged <- full_join(allraces_male_2002, allraces_male_2002_2)

allraces_male_2002_merged <- allraces_male_2002_merged %>% slice(-(1:6))

allraces_male_2002_merged <- allraces_male_2002_merged %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

allraces_male_2002_merged <- allraces_male_2002_merged %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

allraces_male_2002_merged_2 <- allraces_male_2002_merged %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

allraces_male_2002_merged_3 <- allraces_male_2002_merged_2 %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_8 = 131
split_row_9 = 147
allraces_female_2002 <-educational_data_2002 %>% slice(split_row_8:split_row_9)

split_row_10 = 164
split_row_11 = 180
allraces_female_2002_2 <-educational_data_2002 %>% slice(split_row_10:split_row_11)

#renaming the columns for the first half of the allraces/female 2002 data
allraces_female_2002 <- allraces_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the allraces/female 2002 data

allraces_female_2002_2 <- allraces_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

allraces_female_2002_2 <- allraces_female_2002_2 %>%
  mutate(
    race = "all_races",
    sexes = "female"
  )

allraces_female_2002 <- allraces_female_2002 %>%
  mutate(
    race = "all_races",
    sexes = "female"
  )
allraces_female_2002_wide <- full_join(allraces_female_2002, allraces_female_2002_2)

allraces_female_2002_wide <- allraces_female_2002_wide %>% slice(-(1:6))

allraces_female_2002_wide <- allraces_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

allraces_female_2002_wide <- allraces_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

allraces_female_2002_long <- allraces_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

allraces_female_2002_long <- allraces_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

# Non-Hispanic White

split_row_12 = 197
split_row_13 = 213
nonhispanicwhite_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_12:split_row_13)

split_row_14 = 230
split_row_15 = 246
nonhispanicwhite_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_14:split_row_15)

#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 data
nonhispanicwhite_bothsexes_2002 <- nonhispanicwhite_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 data

nonhispanicwhite_bothsexes_2002_2 <- nonhispanicwhite_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

nonhispanicwhite_bothsexes_2002 <- nonhispanicwhite_bothsexes_2002 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "both_sexes"
  )

nonhispanicwhite_bothsexes_2002_2 <- nonhispanicwhite_bothsexes_2002_2 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "both_sexes"
  )
nonhispanicwhite_bothsexes_2002_merged <- full_join(nonhispanicwhite_bothsexes_2002, nonhispanicwhite_bothsexes_2002_2)

nonhispanicwhite_bothsexes_2002_merged <- nonhispanicwhite_bothsexes_2002_merged %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

nonhispanicwhite_bothsexes_2002_merged <- nonhispanicwhite_bothsexes_2002_merged %>% slice(-(1:6))

nonhispanicwhite_bothsexes_2002_wide<- nonhispanicwhite_bothsexes_2002_merged %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

nonhispanicwhite_bothsexes_2002_long <- nonhispanicwhite_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

nonhispanicwhite_bothsexes_2002_long <- nonhispanicwhite_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_16 = 263
split_row_17 = 279
nonhispanicwhite_male_2002 <-educational_data_2002 %>% slice(split_row_16:split_row_17)

split_row_18 = 296
split_row_19 = 312
nonhispanicwhite_male_2002_2 <-educational_data_2002 %>% slice(split_row_18:split_row_19)


#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 data
nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 data

nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2 %>% slice(-(1:6))
nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002 %>% slice(-(1:6))

Code

nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "male"
  )

nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "male"
  )
nonhispanicwhite_male_2002_wide <- full_join(nonhispanicwhite_male_2002, nonhispanicwhite_male_2002_2)

nonhispanicwhite_male_2002_wide <- nonhispanicwhite_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

nonhispanicwhite_male_2002_wide<- nonhispanicwhite_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

nonhispanicwhite_male_2002_long <- nonhispanicwhite_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

nonhispanicwhite_male_2002_long <- nonhispanicwhite_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_20 = 329
split_row_21 = 345

nonhispanicwhite_female_2002 <-educational_data_2002 %>% slice(split_row_20:split_row_21)

split_row_22 = 362
split_row_23 = 378

nonhispanicwhite_female_2002_2 <-educational_data_2002 %>% slice(split_row_22:split_row_23)

#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 data
nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 data

nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2 %>% slice(-(1:6))
nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002 %>% slice(-(1:6))

Code

nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "female"
  )

nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "female"
  )
nonhispanicwhite_female_2002_wide <- full_join(nonhispanicwhite_female_2002, nonhispanicwhite_female_2002_2)

nonhispanicwhite_female_2002_wide <- nonhispanicwhite_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

nonhispanicwhite_female_2002_wide<- nonhispanicwhite_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

nonhispanicwhite_female_2002_long <- nonhispanicwhite_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

nonhispanicwhite_female_2002_long <- nonhispanicwhite_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

# Non-Hispanic Black

split_row_24 = 395
split_row_25 = 411

nonhispanicblack_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_24:split_row_25)

split_row_26 = 428
split_row_27 = 444
nonhispanicblack_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_26:split_row_27)

nonhispanicblack_bothsexes_2002 <- nonhispanicblack_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

nonhispanicblack_bothsexes_2002_2 <- nonhispanicblack_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicblack_bothsexes_2002_2 <- nonhispanicblack_bothsexes_2002_2 %>% slice(-(1:6))
nonhispanicblack_bothsexes_2002 <- nonhispanicblack_bothsexes_2002 %>% slice(-(1:6))

Code

nonhispanicblack_bothsexes_2002 <- nonhispanicblack_bothsexes_2002 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "both_sexes"
  )

nonhispanicblack_bothsexes_2002_2 <- nonhispanicblack_bothsexes_2002_2 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "both_sexes"
  )
nonhispanicblack_bothsexes_2002_wide <- full_join(nonhispanicblack_bothsexes_2002, nonhispanicblack_bothsexes_2002_2)

nonhispanicblack_bothsexes_2002_wide <- nonhispanicblack_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

nonhispanicblack_bothsexes_2002_wide<- nonhispanicblack_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

nonhispanicblack_bothsexes_2002_long <- nonhispanicblack_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

nonhispanicblack_bothsexes_2002_long <- nonhispanicblack_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_28 = 461
split_row_29 = 477
nonhispanicblack_male_2002 <-educational_data_2002 %>% slice(split_row_28:split_row_29)

split_row_30 = 494
split_row_31 = 510
nonhispanicblack_male_2002_2 <-educational_data_2002 %>% slice(split_row_30:split_row_31)

nonhispanicblack_male_2002 <- nonhispanicblack_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

nonhispanicblack_male_2002_2 <- nonhispanicblack_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicblack_male_2002_2 <- nonhispanicblack_male_2002_2 %>% slice(-(1:6))
nonhispanicblack_male_2002 <- nonhispanicblack_male_2002 %>% slice(-(1:6))

Code

nonhispanicblack_male_2002 <- nonhispanicblack_male_2002 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "male"
  )

nonhispanicblack_male_2002_2 <- nonhispanicblack_male_2002_2 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "male"
  )
nonhispanicblack_male_2002_wide <- full_join(nonhispanicblack_male_2002, nonhispanicblack_male_2002_2)

nonhispanicblack_male_2002_wide <- nonhispanicblack_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

Code

nonhispanicblack_male_2002_wide<- nonhispanicblack_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

nonhispanicblack_male_2002_long <- nonhispanicblack_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

nonhispanicblack_male_2002_long <- nonhispanicblack_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_32 = 527
split_row_33 = 543
nonhispanicblack_female_2002 <-educational_data_2002 %>% slice(split_row_32:split_row_33)

split_row_34 = 560
split_row_35 = 576
nonhispanicblack_female_2002_2 <-educational_data_2002 %>% slice(split_row_34:split_row_35)

nonhispanicblack_female_2002 <- nonhispanicblack_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

nonhispanicblack_female_2002_2 <- nonhispanicblack_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

nonhispanicblack_female_2002 <- nonhispanicblack_female_2002 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "female"
  )

nonhispanicblack_female_2002_2 <- nonhispanicblack_female_2002_2 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "female"
  )


nonhispanicblack_female_2002_wide <- full_join(nonhispanicblack_female_2002, nonhispanicblack_female_2002_2)

nonhispanicblack_female_2002_wide <- nonhispanicblack_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicblack_female_2002_wide <- nonhispanicblack_female_2002_wide %>% slice(-(1:6))

Code

nonhispanicblack_female_2002_wide<- nonhispanicblack_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

nonhispanicblack_female_2002_long <- nonhispanicblack_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

nonhispanicblack_female_2002_long <- nonhispanicblack_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

# Asian

split_row_36 = 593
split_row_37 = 609
asian_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_36:split_row_37)

split_row_38 = 626
split_row_39 = 642
asian_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_38:split_row_39)

asian_bothsexes_2002 <- asian_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

asian_bothsexes_2002_2 <- asian_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

asian_bothsexes_2002 <- asian_bothsexes_2002 %>%
  mutate(
    race = "asian",
    sexes = "both_sexes"
  )

asian_bothsexes_2002_2 <- asian_bothsexes_2002_2 %>%
  mutate(
    race = "asian",
    sexes = "both_sexes"
  )

asian_bothsexes_2002_wide <- full_join(asian_bothsexes_2002, asian_bothsexes_2002_2)

asian_bothsexes_2002_wide <- asian_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

asian_bothsexes_2002_wide <- asian_bothsexes_2002_wide %>% slice(-(1:6))

Code

asian_bothsexes_2002_wide<- asian_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

asian_bothsexes_2002_long <- asian_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

asian_bothsexes_2002_long <- asian_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_40 = 659
split_row_41 = 675
asian_male_2002 <-educational_data_2002 %>% slice(split_row_40:split_row_41)

split_row_42 = 692
split_row_43 = 708

asian_male_2002_2 <-educational_data_2002 %>% slice(split_row_42:split_row_43)

asian_male_2002 <- asian_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

asian_male_2002_2 <- asian_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

asian_male_2002 <- asian_male_2002 %>%
  mutate(
    race = "asian",
    sexes = "male"
  )

asian_male_2002_2 <- asian_male_2002_2 %>%
  mutate(
    race = "asian",
    sexes = "male"
  )

asian_male_2002_wide <- full_join(asian_male_2002, asian_male_2002_2)

asian_male_2002_wide <- asian_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

asian_male_2002_wide <- asian_male_2002_wide %>% slice(-(1:6))

Code

asian_male_2002_wide<- asian_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

asian_male_2002_long <- asian_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

asian_male_2002_long <- asian_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_44 = 725
split_row_45 = 741
asian_female_2002 <-educational_data_2002 %>% slice(split_row_44:split_row_45)


split_row_46 = 758
split_row_47 = 774
asian_female_2002_2 <-educational_data_2002 %>% slice(split_row_46:split_row_47)

asian_female_2002 <- asian_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

asian_female_2002_2 <- asian_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

asian_female_2002 <- asian_female_2002 %>%
  mutate(
    race = "asian",
    sexes = "female"
  )

asian_female_2002_2 <- asian_female_2002_2 %>%
  mutate(
    race = "asian",
    sexes = "female"
  )

asian_female_2002_wide <- full_join(asian_female_2002, asian_female_2002_2)

asian_female_2002_wide <- asian_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

asian_female_2002_wide <- asian_female_2002_wide %>% slice(-(1:6))

Code

asian_female_2002_wide<- asian_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

asian_female_2002_long <- asian_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

asian_female_2002_long <- asian_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

# Hispanic

split_row_48 = 791
split_row_49 = 807
hispanic_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_48:split_row_49)

split_row_50 = 824
split_row_51 = 840

hispanic_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_50:split_row_51)


hispanic_bothsexes_2002 <- hispanic_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

hispanic_bothsexes_2002_2 <- hispanic_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

hispanic_bothsexes_2002 <- hispanic_bothsexes_2002 %>%
  mutate(
    race = "hispanic",
    sexes = "both_sexes"
  )

hispanic_bothsexes_2002_2 <- hispanic_bothsexes_2002_2 %>%
  mutate(
    race = "hispanic",
    sexes = "both_sexes"
  )

hispanic_bothsexes_2002_wide <- full_join(hispanic_bothsexes_2002, hispanic_bothsexes_2002_2)

hispanic_bothsexes_2002_wide <- hispanic_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

hispanic_bothsexes_2002_wide <- hispanic_bothsexes_2002_wide %>% slice(-(1:6))

Code

hispanic_bothsexes_2002_wide<- hispanic_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

hispanic_bothsexes_2002_long <- hispanic_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

hispanic_bothsexes_2002_long <- hispanic_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_52 = 857
split_row_53 = 873

hispanic_male_2002 <-educational_data_2002 %>% slice(split_row_52:split_row_53)

split_row_54 = 890
split_row_55 = 906

hispanic_male_2002_2 <-educational_data_2002 %>% slice(split_row_54:split_row_55)

hispanic_male_2002 <- hispanic_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

hispanic_male_2002_2 <- hispanic_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

hispanic_male_2002 <- hispanic_male_2002 %>%
  mutate(
    race = "hispanic",
    sexes = "male"
  )

hispanic_male_2002_2 <- hispanic_male_2002_2 %>%
  mutate(
    race = "hispanic",
    sexes = "male"
  )

hispanic_male_2002_wide <- full_join(hispanic_male_2002, hispanic_male_2002_2)

hispanic_male_2002_wide <- hispanic_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

hispanic_male_2002_wide <- hispanic_male_2002_wide %>% slice(-(1:6))

Code

hispanic_male_2002_wide<- hispanic_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

hispanic_male_2002_long <- hispanic_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

hispanic_male_2002_long <- hispanic_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_56 = 923
split_row_57 = 939
hispanic_female_2002 <-educational_data_2002 %>% slice(split_row_56:split_row_57)


split_row_58 = 956
split_row_59 = 972
hispanic_female_2002_2 <-educational_data_2002 %>% slice(split_row_58:split_row_59)


hispanic_female_2002 <- hispanic_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

hispanic_female_2002_2 <- hispanic_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

hispanic_female_2002 <- hispanic_female_2002 %>%
  mutate(
    race = "hispanic",
    sexes = "female"
  )

hispanic_female_2002_2 <- hispanic_female_2002_2 %>%
  mutate(
    race = "hispanic",
    sexes = "female"
  )

hispanic_female_2002_wide <- full_join(hispanic_female_2002, hispanic_female_2002_2)

hispanic_female_2002_wide <- hispanic_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

hispanic_female_2002_wide <- hispanic_female_2002_wide %>% slice(-(1:6))

Code

hispanic_female_2002_wide<- hispanic_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

hispanic_female_2002_long <- hispanic_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

hispanic_female_2002_long <- hispanic_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

#White

split_row_60 = 989
split_row_61 = 1005
white_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_60:split_row_61)

split_row_62 = 1022
split_row_63 = 1038
white_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_62:split_row_63)

white_bothsexes_2002 <- white_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

white_bothsexes_2002_2 <- white_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

white_bothsexes_2002 <- white_bothsexes_2002 %>%
  mutate(
    race = "white",
    sexes = "both_sexes"
  )

white_bothsexes_2002_2 <- white_bothsexes_2002_2 %>%
  mutate(
    race = "white",
    sexes = "both_sexes"
  )

white_bothsexes_2002_wide <- full_join(white_bothsexes_2002, white_bothsexes_2002_2)

white_bothsexes_2002_wide <- white_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

white_bothsexes_2002_wide <- white_bothsexes_2002_wide %>% slice(-(1:6))

Code

white_bothsexes_2002_wide<- white_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

white_bothsexes_2002_long <- white_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

white_bothsexes_2002_long <- white_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_64 = 1055
split_row_65 = 1071
white_male_2002 <-educational_data_2002 %>% slice(split_row_64:split_row_65)


split_row_66 = 1088
split_row_67 = 1104
white_male_2002_2 <-educational_data_2002 %>% slice(split_row_66:split_row_67)


white_male_2002 <- white_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

white_male_2002_2 <- white_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

white_male_2002 <- white_male_2002 %>%
  mutate(
    race = "white",
    sexes = "male"
  )

white_male_2002_2 <- white_male_2002_2 %>%
  mutate(
    race = "white",
    sexes = "male"
  )

white_male_2002_wide <- full_join(white_male_2002, white_male_2002_2)

white_male_2002_wide <- white_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

white_male_2002_wide <- white_male_2002_wide %>% slice(-(1:6))

Code

white_male_2002_wide<- white_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

white_male_2002_long <- white_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

white_male_2002_long <- white_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_68 = 1121
split_row_69 = 1137
white_female_2002 <-educational_data_2002 %>% slice(split_row_68:split_row_69)


split_row_70 = 1154
split_row_71 = 1170
white_female_2002_2 <-educational_data_2002 %>% slice(split_row_70:split_row_71)

white_female_2002 <- white_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

white_female_2002_2 <- white_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

white_female_2002 <- white_female_2002 %>%
  mutate(
    race = "white",
    sexes = "female"
  )

white_female_2002_2 <- white_female_2002_2 %>%
  mutate(
    race = "white",
    sexes = "female"
  )

white_female_2002_wide <- full_join(white_female_2002, white_female_2002_2)

white_female_2002_wide <- white_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

white_female_2002_wide <- white_female_2002_wide %>% slice(-(1:6))

Code

white_female_2002_wide<- white_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

white_female_2002_long <- white_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

white_female_2002_long <- white_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_72 = 1187
split_row_73 = 1203
justblack_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_72:split_row_73)

split_row_74 = 1220
split_row_75 = 1236
justblack_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_74:split_row_75)

justblack_bothsexes_2002 <- justblack_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

justblack_bothsexes_2002_2 <- justblack_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

## Black 

justblack_bothsexes_2002 <- justblack_bothsexes_2002 %>%
  mutate(
    race = "black",
    sexes = "both_sexes"
  )

justblack_bothsexes_2002_2 <- justblack_bothsexes_2002_2 %>%
  mutate(
    race = "black",
    sexes = "both_sexes"
  )

justblack_bothsexes_2002_wide <- full_join(justblack_bothsexes_2002, justblack_bothsexes_2002_2)

justblack_bothsexes_2002_wide <- justblack_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

justblack_bothsexes_2002_wide <- justblack_bothsexes_2002_wide %>% slice(-(1:6))

Code

justblack_bothsexes_2002_wide<- justblack_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

justblack_bothsexes_2002_long <- justblack_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

justblack_bothsexes_2002_long <- justblack_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_76 = 1253
split_row_77 = 1269
justblack_male_2002 <-educational_data_2002 %>% slice(split_row_76:split_row_77)

split_row_78 = 1286
split_row_79 = 1302
justblack_male_2002_2 <-educational_data_2002 %>% slice(split_row_78:split_row_79)


justblack_male_2002 <- justblack_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

justblack_male_2002_2 <- justblack_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

justblack_male_2002 <- justblack_male_2002 %>%
  mutate(
    race = "black",
    sexes = "male"
  )

justblack_male_2002_2 <- justblack_male_2002_2 %>%
  mutate(
    race = "black",
    sexes = "male"
  )

justblack_male_2002_wide <- full_join(justblack_male_2002, justblack_male_2002_2)

justblack_male_2002_wide <- justblack_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

justblack_male_2002_wide <- justblack_male_2002_wide %>% slice(-(1:6))

Code

justblack_male_2002_wide<- justblack_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

justblack_male_2002_long <- justblack_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

justblack_male_2002_long <- justblack_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

split_row_80 = 1319
split_row_81 = 1335
justblack_female_2002 <-educational_data_2002 %>% slice(split_row_80:split_row_81)


split_row_82 = 1352
split_row_83 = 1368
justblack_female_2002_2 <-educational_data_2002 %>% slice(split_row_82:split_row_83)

justblack_female_2002 <- justblack_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

justblack_female_2002_2 <- justblack_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

Code

justblack_female_2002 <- justblack_female_2002 %>%
  mutate(
    race = "black",
    sexes = "female"
  )

justblack_female_2002_2 <- justblack_female_2002_2 %>%
  mutate(
    race = "black",
    sexes = "female"
  )

justblack_female_2002_wide <- full_join(justblack_female_2002, justblack_female_2002_2)

justblack_female_2002_wide <- justblack_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

justblack_female_2002_wide <- justblack_female_2002_wide %>% slice(-(1:6))

Code

justblack_female_2002_wide<- justblack_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

Code

justblack_female_2002_long <- justblack_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

Code

justblack_female_2002_long <- justblack_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

Code

# Joining 2002

#all races
joined_2002 <- full_join(allraces_bothsexes_2002_merged_3, allraces_male_2002_merged_3)
joined_2002 <- full_join(joined_2002, allraces_female_2002_long)
#non hispanic white
joined_2002 <- full_join(joined_2002, nonhispanicwhite_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicwhite_male_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicwhite_female_2002_long)

#asian
joined_2002 <- full_join(joined_2002, asian_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, asian_female_2002_long)
joined_2002 <- full_join(joined_2002, asian_male_2002_long)

#hispanic
joined_2002 <- full_join(joined_2002, hispanic_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, hispanic_male_2002_long)
joined_2002 <- full_join(joined_2002, hispanic_female_2002_long)

#white
joined_2002 <- full_join(joined_2002, white_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, white_male_2002_long)
joined_2002 <- full_join(joined_2002, white_female_2002_long)

#test
joined_2002 <- full_join(joined_2002, nonhispanicblack_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicblack_male_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicblack_female_2002_long)


#black (joined w/ non-hispanic black)
joined_2002 <- full_join(joined_2002, justblack_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, justblack_female_2002_long)
joined_2002 <- full_join(joined_2002, justblack_male_2002_long)

test_join_2002 <- joined_2002 %>%
  filter(!is.na(age_group)) %>%
           filter(!is.na(educational_attainment)) %>%
  mutate(sexes = case_when(
    sexes == "both_sexes" ~ 2,
    sexes == "female" ~ 1,
    sexes == "male" ~ 0
   ))

Code

# Load 2001 data
educational_data_2001 <- read_excel(
  here::here("data_raw", "2001_Educational_Data.xlsx"),
  skip = 5) %>%
mutate(across(everything(), ~replace(., . %in% c("-", "."), "0")))

# Asian, both sexes

part1_asian_both <- educational_data_2001 %>%
  slice(595:616) %>%
  select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))

# Advanced degrees 
part2_asian_both <- educational_data_2001 %>%
  slice(628:649) %>%
  select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))

asian_2001 <- full_join(part1_asian_both, part2_asian_both, by="age_group") %>%
  mutate(race="asian", sexes="both_sexes") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational,associates_degree_academic,bachelors_degree,masters_degree,professional_degree,doctoral_degree), names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Asian, male
part1_asian_male <- educational_data_2001 %>%
  slice(661:682) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_asian_male <- educational_data_2001 %>%
  slice(694:715) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
asian_male_2001 <- full_join(part1_asian_male, part2_asian_male, by="age_group") %>%
  mutate(race="asian", sexes="male") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Asian, female
part1_asian_female <- educational_data_2001 %>%
  slice(727:748) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_asian_female <- educational_data_2001 %>%
  slice(760:781) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
asian_female_2001 <- full_join(part1_asian_female, part2_asian_female, by="age_group") %>%
  mutate(race="asian", sexes="female") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# White, both sexes

part1_white_both <- educational_data_2001 %>%
  slice(991:1012) %>%
  select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))

# Advanced degrees 
part2_white_both <- educational_data_2001 %>%
  slice(1024:1045) %>%
  select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))

white_2001 <- full_join(part1_white_both, part2_white_both, by="age_group") %>%
  mutate(race="white", sexes="both_sexes") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational,associates_degree_academic,bachelors_degree,masters_degree,professional_degree,doctoral_degree), names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# White, male
part1_white_male <- educational_data_2001 %>%
  slice(1057:1078) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_white_male <- educational_data_2001 %>%
  slice(1090:1111) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
white_male_2001 <- full_join(part1_white_male, part2_white_male, by="age_group") %>%
  mutate(race="white", sexes="male") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# White, female

part1_white_female <- educational_data_2001 %>%
  slice(1123:1144) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_white_female <- educational_data_2001 %>%
  slice(1156:1177) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
white_female_2001 <- full_join(part1_white_female, part2_white_female, by="age_group") %>%
  mutate(race="white", sexes="female") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Black, both sexes

part1_black_both <- educational_data_2001 %>%
  slice(1189:1210) %>%
  select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))

# Advanced degrees 
part2_black_both <- educational_data_2001 %>%
  slice(1222:1243) %>%
  select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))

black_2001 <- full_join(part1_black_both, part2_black_both, by="age_group") %>%
  mutate(race="black", sexes="both_sexes") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational,associates_degree_academic,bachelors_degree,masters_degree,professional_degree,doctoral_degree), names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Black, male

part1_black_male <- educational_data_2001 %>%
  slice(1255:1276) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade", "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_black_male <- educational_data_2001 %>%
  slice(1288:1309) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
black_male_2001 <- full_join(part1_black_male, part2_black_male, by="age_group") %>%
  mutate(race="black", sexes="male") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Black, female

part1_black_female <- educational_data_2001 %>%
  slice(1321:1342) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade", "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_black_female <- educational_data_2001 %>%
  slice(1354:1375) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
black_female_2001 <- full_join(part1_black_female, part2_black_female, by="age_group") %>%
  mutate(race="black", sexes="female") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)


# Joining 2001 data

test_join_2001 <- bind_rows(
  asian_2001, asian_male_2001, asian_female_2001, white_2001, white_male_2001, white_female_2001,
black_2001, black_male_2001, black_female_2001
) %>%
  mutate(age_group = case_when(
    age_group %in% c("25 to 29 years", "30 to 34 years") ~ "25_to_34",
    age_group %in% c("35 to 39 years", "40 to 44 years", "45 to 49 years", "50 to 54 years") ~ "35_to_54",
    age_group %in% c("55 to 59 years", "60 to 64 years", "65 to 69 years", "70 to 74 years", "75 years and over") ~ "55_plus"
  )) %>%
  filter(!is.na(age_group)) %>%  # Remove any age groups under 25
  # Convert sexes to numeric
  mutate(sexes = case_when(
    sexes == "both_sexes" ~ 2,
    sexes == "female" ~ 1,
    sexes == "male" ~ 0
  )) %>%
  mutate(educational_attainment = case_when(
    educational_attainment %in% c("none","1st-4th_grade","5th-6th_grade", "7th-8th_grade","9th_grade","10th_grade","11th_grade") ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment %in% c("associates_degree_occupational","associates_degree_academic","bachelors_degree") ~ "Undergraduate Degree",
    educational_attainment %in% c("masters_degree","professional_degree") ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "total" ~ "Total",
    TRUE ~ educational_attainment
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm=TRUE), .groups="drop")

view(test_join_2001)

Code

join_all_2001_2009 <- bind_rows(
  test_join_2001,
  test_join_2002,
  test_join_2003,
  test_join_2004,
  test_join_2005,
  test_join_2006,
  test_join_2007,
  test_join_2008,
  test_join_2009
)
#convert sexes back into strings
join_all_2001_2009 <- join_all_2001_2009 %>%
  mutate(sexes = as.integer(sexes)) %>%
  mutate(sexes = case_when(
    sexes == 2 ~ "both_sexes",
    sexes == 1 ~ "female",
    sexes == 0 ~ "male"
   ))


#collapse duplicate groups
join_all_2001_2009 <- join_all_2001_2009 %>%
  mutate(age_group = case_when(
  age_group %in% c("25 to 29 years", "30 to 34 years", "25_to_34") ~ "25_to_34",
  age_group %in% c("35 to 39 years", "40 to 44 years", "45 to 49 years", "50 to 54 years", "35_to_54") ~ "35_to_54",
  age_group %in% c("55 to 59 years", "60 to 64 years", "65 to 69 years", "70 to 74 years", "75 years and over", "55_plus") ~ "55_plus"
)) %>%
  filter(!is.na(age_group)) %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "HS" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "Some College, No Degree" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "Undergraduate Degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "Graduate Degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "Doctoral Degree" ~ "Doctoral Degree",
    educational_attainment == "total" ~ "Total",
    educational_attainment == "Total" ~ "Total",
    educational_attainment == "none" ~"Less than HS",
    educational_attainment == "None" ~"Less than HS",
    educational_attainment == "Less than HS" ~"Less than HS"
  ))%>%
  mutate(sexes = case_when(
    sexes == "both_sexes" ~ "Total",
    TRUE ~ sexes
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE), .groups = 'drop')

# Reshape to match 2010-2024 format (one demographic column)
data_2001_2009_formatted <- join_all_2001_2009 %>%
  filter(sexes %in% c("male", "female")) %>%
  group_by(sexes, educational_attainment, year) %>%
  summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
  rename(demographic = sexes) %>%
  
  # Age breakdown (sum across all races, using Total for sexes)
  bind_rows(
    join_all_2001_2009 %>%
      filter(sexes == "Total") %>%
      group_by(age_group, educational_attainment, year) %>%
      summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
      rename(demographic = age_group)
  ) %>%
  
  # Race breakdown (sum across all ages, using Total for sexes, exclude duplicates)
  bind_rows(
    join_all_2001_2009 %>%
      filter(sexes == "Total", !race %in% c("all_races", "non_hispanic_black")) %>%
      group_by(race, educational_attainment, year) %>%
      summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
      rename(demographic = race)
  ) %>%
  
  # Total demographic (male + female combined, sum across all ages and races)
  bind_rows(
    join_all_2001_2009 %>%
      filter(sexes %in% c("male", "female")) %>%
      group_by(educational_attainment, year) %>%
      summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
      mutate(demographic = "total")
  ) %>%
  
  # Create attainment and years_of_school columns
  mutate(
    attainment = case_when(
      educational_attainment == "Total" ~ "Other",
      educational_attainment == "Less than HS" ~ "Less than High School",
      educational_attainment == "HS" ~ "High School Diploma",
      educational_attainment == "Some College, No Degree" ~ "Some College, No Degree",
      educational_attainment == "Undergraduate Degree" ~ "Bachelor's Degree",
      educational_attainment == "Graduate Degree" ~ "Master's Degree",
      educational_attainment == "Doctoral Degree" ~ "Doctoral Degree",
      TRUE ~ "Other"
    ),
    
    years_of_school = case_when(
      educational_attainment == "Total" ~ "Total",
      educational_attainment == "Less than HS" ~ "Less than high school graduate",
      educational_attainment == "HS" ~ "High school graduate",
      educational_attainment == "Some College, No Degree" ~ "Some college, no degree",
      educational_attainment == "Undergraduate Degree" ~ "Bachelor's degree",
      educational_attainment == "Graduate Degree" ~ "Graduate degree",
      educational_attainment == "Doctoral Degree" ~ "Doctoral degree",
      TRUE ~ educational_attainment
    )
  ) %>%
  
  rename(attainment_level = educational_attainment) %>%
  select(years_of_school, attainment, demographic, number, year, attainment_level)

Code

# Combine 2001-2009 with 2010-2024
educational_data_combined <- bind_rows(
  data_2001_2009_formatted,
  educational_data_combined) %>% 
  mutate(number = as.integer(number)) %>% 
  filter(!demographic %in% c("non_hispanic_white", "hispanic")) %>%
  select(year, attainment_level, demographic, number)

view(educational_data_combined)

Saving cleaned data

Using the readr package, the cleaned and final dataset will be saved to the data processed folder as final_data.csv.

Code

# Saving cleaned data
write_csv(educational_data_combined, here::here('data_processed','final_data.csv'))

Data Visualizations

Data Summary

Code

summary(educational_data_combined)

#>       year      attainment_level   demographic            number      
#>  Min.   :2001   Length:1395        Length:1395        Min.   :    96  
#>  1st Qu.:2006   Class :character   Class :character   1st Qu.:  3927  
#>  Median :2012   Mode  :character   Mode  :character   Median : 14247  
#>  Mean   :2012                                         Mean   : 28751  
#>  3rd Qu.:2018                                         3rd Qu.: 31158  
#>  Max.   :2024                                         Max.   :357668

Code

table <- tibble(
  Variable = "number", 
  Mean = round(mean(educational_data_combined$number, na.rm = TRUE), 3),
  Median = round(median(educational_data_combined$number, na.rm = TRUE), 3),
  Std_Deviation = round(sd(educational_data_combined$number, na.rm = TRUE), 3),
  IQR = round(IQR(educational_data_combined$number, na.rm = TRUE), 3),
  Range = round(max(educational_data_combined$number, na.rm = TRUE) - 
                min(educational_data_combined$number, na.rm = TRUE), 3)
)

kable(table)

Variable	Mean	Median	Std_Deviation	IQR	Range
number	28750.82	14247	43800.15	27231	357572

Code

educational_data_combined %>%
  filter(demographic %in% c("male", "female"))  %>%
  ggplot(aes(x = demographic, y = number/1000, fill = demographic)) +
  geom_col(alpha = 0.7) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "Distribution of Male and Female Respondents",
    subtitle = "Female population slighlty larger across all education levels, 2001 - 2024",
    caption = "Source: U.S Census Bureau",
    x = "Gender",
    y = "Count (in millions)"
  ) +
  theme_classic() +
  scale_fill_viridis_d()+
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13))

This bar chart above comparing total counts (in millions) for both females and males across all education levels and years (2001-2024). From this, we can say that there is a slightly larger female population than male population in this dataset.

Visualizations & Analysis: Overview

2001 vs 2024 Overview

Code

educational_data_combined %>%
    filter(year %in% c(2001, 2024)) %>%
    filter(attainment_level != "Total") %>%
    group_by(year, attainment_level) %>%
    summarise(number = sum(number), .groups = "drop") %>%
    mutate(
    attainment_level = fct_reorder2(attainment_level, year, desc(number)),
    year = as.factor(year),
    label = paste(attainment_level, '(', round(number/1000), ')'),
    label_left = ifelse(year == 2001, label, NA),
    label_right = ifelse(year == 2024, label, NA)
  ) %>%
  ggplot(aes(x = year, y = number/1000, group = attainment_level, color = attainment_level)) +
  geom_line(size = 1.2, alpha = 0.7) +
  geom_point(size = 2) +
  geom_text_repel(aes(label = label_left), size = 5,
    hjust = 1, nudge_x = -0.02,
    direction = "y", segment.color = "grey"
  ) +
  
  # Right-side labels (2024)
  geom_text_repel(aes(label = label_right), size = 5,
    hjust = 0, nudge_x = 0.02, nudge_y = 0.75,
    direction = "y", segment.color = "grey"
  ) + 
    scale_x_discrete(position = 'top', expand = expansion(mult = c(1, 1))) +
    scale_y_continuous(limits = c(0, 350)) +
  labs(
    title = "2001 vs 2024: The Big Picture",
    subtitle = "More Americans are Earning Degrees",
    x = NULL,
    y = "Count (millions)",
    caption = "Source: U.S. Census Bureau"
  ) +
  theme_minimal_grid() +
  theme(panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        legend.position = 'none',
        plot.caption = element_text(size = 13))+
  scale_color_viridis_d()

Between 2001 and 2024, the United States saw a general rise in educational attainment. The slope chart reveals the extent of the change: undergraduate degree holders grew from 184 million to 316 million, while individuals without a high school diploma fell from 113 million to 78 million, making it the only category to decline. Similarly, graduate and doctoral degree holders see a climb in values from 54 million to 116 million, and 9 million to 20 million respectively.

But how did they change over time?

Code

label_2003 <- "2003 CPS question redesign:\nShift to 'highest degree attained'"
label_2010 <- "2010 ACS transition:\nNew primary data source"

attainment_plot <- educational_data_combined %>%
    filter(attainment_level != "Total") %>%
    group_by(year, attainment_level) %>%
    summarise(total_number = sum(number), .groups = "drop") %>%
    ggplot(aes(x = year, y = total_number/1000, color = attainment_level, group = attainment_level)) +
    geom_line(size = 1) +
    geom_text_repel(data = . %>% filter(year == max(year)),
            aes(label = attainment_level),
            hjust = -0.1, size = 5, nudge_x = 0, direction = "y", size = 3, show.legend = FALSE) +
    geom_vline(xintercept = 2008, linetype = "dashed", color = "black") +
    annotate("text", x = 2008, y = 680, label = "2008 Great Recession", 
           vjust = 0, hjust = 0, size = 5, fontface = "bold") +
    geom_vline(xintercept = 2020, linetype = "dashed", color = "black") +
    annotate("text", x = 2020, y = 680, label = "COVID-19 Pandemic", 
           vjust = 0, hjust = 0, size = 5, fontface = "bold") +
  # Annotation for 2003
    geom_curve(data = data.frame(x = 2004, xend = 2002.5, y = 600, yend = 500),
    mapping = aes(x = x, xend = xend, y = y, yend = yend), inherit.aes = FALSE,
    color = 'grey40', size = 0.5, curvature = -0.1,arrow = arrow(length = unit(0.015, "npc"), type = "closed")) +
    geom_label(data = data.frame(x = 2001.5, y = 600, label = label_2003),
    mapping = aes(x = x, y = y, label = label), inherit.aes = FALSE, hjust = 0, lineheight = 0.9, size = 4) +
  # Annotation for 2010
    geom_curve(data = data.frame(x = 2011, xend = 2009.5, y = 480, yend = 330),
    mapping = aes(x = x, xend = xend, y = y, yend = yend),inherit.aes = FALSE,
    color = 'grey40', size = 0.7, curvature = -0.35, arrow = arrow(length = unit(0.01, "npc"), type = "closed")) +
    geom_label(data = data.frame(x = 2009.5, y = 480, label = label_2010),
    mapping = aes(x = x, y = y, label = label), inherit.aes = FALSE,
    hjust = 0, lineheight = 0.9, size = 4) +
    scale_x_continuous(
    breaks = seq(2001, 2025, 5),
    expand = expansion(add = c(1, 8))) +
    scale_y_continuous(limits = c(0, 685)) +
    theme_half_open(font_size = 18) +
    theme(legend.position = 'none',
          panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
    labs(x = 'Year',
       y = 'Count (millions)',
       title = 'Educational Attainment Trends, 2001 - 2024',
       subtitle = "Methodology changes in 2003 and 2010 explain sharp drops",
       caption = "Source: U.S Census Bureau") 

attainment_plot

The sharp decrease from 2002-2003 and 2009-2010, reflect the changes in the Census Bureau’s methodology and how educational attainment data is collected and reported. In 2003, the Current Population Survey (CPS) revised its questions which resulted in changes in totals compared to earlier years. The Census Bureau’s Educational Attainment in the United States: 2003 report explains that the Current Population Survey (CPS) shifted from measuring “years of schooling completed” to asking for the highest grade or degree completed to determine educational attainment (Stroops, 2021). In 2010, the data collection and reporting transitioned from CPS to the American Community Survey (ACS). The Census Bureau began emphasizing the ACS as the primary source for educational attainment statistics, stating that in 2009 and earlier, data from the Annual Social and Economic Supplement (ASEC) to the Current Population Survey (CPS) were used (Ryan & Siebens, 2021).

Code

percent_data <- educational_data_combined %>%
  filter(year %in% c(2001, 2024), attainment_level != "Total") %>%
  group_by(attainment_level, year) %>%
  summarise(total_number = sum(number), .groups = "drop") %>%
    pivot_wider(names_from = year, values_from = total_number) %>%
    mutate(percent_change = (`2024` - `2001`) / `2001` * 100)

ggplot(percent_data, aes(x = reorder(attainment_level, percent_change), 
                        y = percent_change, 
                        fill = percent_change > 0)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = paste0(round(percent_change, 1), "%"),
                hjust = ifelse(abs(percent_change) < 5, 
                               ifelse(percent_change > 0, -0.1, 1.1), 
                               ifelse(percent_change > 0, 1.1, -0.1))), 
            color = "white", size = 4.5) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  annotate("text", x = 0, y = 0, label = "2001 baseline", 
           vjust = 0, hjust = 0, size = 5, fontface = "bold") +
  coord_flip() +
  scale_fill_manual(values = c("TRUE" = "steelblue", "FALSE" = "firebrick")) +
  labs(title = "Percentage Change in Educational Attainment (2001–2024)",
       subtitle = "Graduate and Doctoral Degrees more than doubled",
       x = "Attainment Level",
       y = "Percentage Change",
       caption = "Source: U.S Census Bureau") +
  theme_minimal_vgrid(font_size = 15) +
    theme(panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        legend.position = 'none',
        plot.caption = element_text(size = 13))

Examining percentage changes reveals which educational levels had the largest growth. Graduate and doctoral degrees more than doubled, increasing 114.6% and 139.1% respectively; the highest growth rates of any category. Undergraduate degree holders increased by 71.6%, while high school graduates grew just 9.9%. Adults without a high school diploma declined by roughly 31%, indicating that more individuals hold at least a GED or high school diploma. The graph suggests that more individuals are completing degrees rather than stopping partway, due to the small percentage change in “Some College, No Degree” category.

Visualizations and Analysis by Demographic

Educational Attainment Within Racial Groups

Code

#devtools::install_github("liamgilbey/ggwaffle")
library(ggwaffle)

The following charts examine how educational attainment evolved within each racial groups. Through this, the visuals show the changing proportions of educational achievement from 2001 to 2024, revealing both progress and persistent challenges.

Code

#install.packages("waffle")
library(waffle)
library(grid)
white_data <- educational_data_combined %>%
  filter(number != ".", demographic == "white", year %in% c(2001, 2024), attainment_level != "Total") %>%
  mutate(year = as.integer(year),
         attainment_level = factor(attainment_level,
                                   levels = c("Less than HS", "HS", "Some College, No Degree",
                                              "Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%
  group_by(year, attainment_level) %>%
  summarise(number = sum(as.numeric(number)), .groups = "drop") %>%
  group_by(year) %>%
  mutate(percent = number / sum(number) * 100,
         squares = round(percent))   # normalize to 100 squares

# Identify largest 
white_labels <- white_data %>%
  group_by(year) %>%
  slice_max(order_by = percent, n = 1) %>% 
  mutate(label = paste0(attainment_level, ":\n ", round(percent,1), "%"), size = 4)

# Plot
ggplot(white_data, aes(fill = attainment_level, values = squares)) +
  geom_waffle(color = "white", size = 0.5, n_rows = 10, flip = TRUE) +
  facet_wrap(~year, ncol = 2, strip.position = "top") +
  scale_fill_viridis_d(drop = FALSE) +
  theme_minimal() +
theme(panel.spacing = unit(4, "lines"),
      strip.text = element_text(size = 18, face = "bold"),
      panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "White Americans Shift from High School to College Degrees",
    subtitle = "High School dropout rate cut in half while graduate degrees rise",
    caption = "Each square = 1% of population \n\nSource: U.S Census Bureau",
    fill = "Degree"
  )

At the start of the 21st century, White Americans faced somewhat of a challenge of leaving school without a diploma. In 2001, a little over 15% had not obtained a high school diploma or equivalent. Fast forward to 2024, this number has fallen to about 8%; nearly cut in half. Visually, the ratios of high school, some college and undergraduate degrees appear about the same, though the balance shifts slightly between partial college and graduate degrees. Doctoral attainment, though still a small slice, doubles.

Code

#install.packages("waffle")
library(waffle)
library(grid)
asian_data <- educational_data_combined %>%
  filter(number != ".", demographic == "asian", year %in% c(2001, 2024), attainment_level != "Total") %>%
  mutate(year = as.integer(year),
         attainment_level = factor(attainment_level,
                                   levels = c("Less than HS", "HS", "Some College, No Degree",
                                              "Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%
  group_by(year, attainment_level) %>%
  summarise(number = sum(as.numeric(number)), .groups = "drop") %>%
  group_by(year) %>%
  mutate(percent = number / sum(number) * 100,
         squares = round(percent))   # normalize to 100 squares

# Identify largest 
asian_labels <- asian_data %>%
  group_by(year) %>%
  slice_max(order_by = percent, n = 1) %>% 
  mutate(label = paste0(attainment_level, ":\n ", round(percent,1), "%"), size = 4)

# Plot
ggplot(asian_data, aes(fill = attainment_level, values = squares)) +
  geom_waffle(color = "white", size = 0.5, n_rows = 10, flip = TRUE) +
  facet_wrap(~year, ncol = 2, strip.position = "top") +
  scale_fill_viridis_d(drop = FALSE) +
  theme_minimal() +
theme(panel.spacing = unit(4, "lines"),
      strip.text = element_text(size = 18, face = "bold"),
      panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "Asian Population Sees Staggering Improvement",
    subtitle = "Graduate and Doctoral degrees nearly double",
    caption = "Each square = 1% of population \n\nSource: U.S Census Bureau",
    fill = "Degree"
  )

For Asians, the story reveal a rather staggering improvement in the proportion of those who were able to attain at least a high school diploma. Looking at the darkest purple section at the bottom of the graph, in 2001, about 12% had not finished high school. By 2024, that share decreased to 8%.

Another area with visually significant growth is the lightest green, which corresponds to individuals who obtained graduate degrees. In 2001, just over one row is light green, roughly 12%. By 2024, it stretches just over two rows, representing about 23%. In just over two decades, the proportion of Asians with graduate degrees almost doubled.

Finally, looking at the yellow-colored section at the top: Doctoral attainment, also shows growth. Though always quite a small subset of the population, the value climbed from just under 3% in 2001 to about 5% in 2024.

Code

#install.packages("waffle")
library(waffle)
library(grid)
black_data <- educational_data_combined %>%
  filter(number != ".", demographic == "black", year %in% c(2001, 2024), attainment_level != "Total") %>%
  mutate(year = as.integer(year),
         attainment_level = factor(attainment_level,
                                   levels = c("Less than HS", "HS", "Some College, No Degree",
                                              "Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%
  group_by(year, attainment_level) %>%
  summarise(number = sum(as.numeric(number)), .groups = "drop") %>%
  group_by(year) %>%
  mutate(percent = number / sum(number) * 100,
         squares = round(percent))   # normalize to 100 squares

# Plot
ggplot(black_data, aes(fill = attainment_level, values = squares)) +
  geom_waffle(color = "white", size = 0.5, n_rows = 10, flip = TRUE) +
  facet_wrap(~year, ncol = 2, strip.position = "top") +
  scale_fill_viridis_d(drop = FALSE) +
  theme_minimal() +
theme(panel.spacing = unit(4, "lines"),
      strip.text = element_text(size = 18, face = "bold"),
      panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "Black Americans Make Major Educational Gains",
    subtitle = "Dropout rates falls, but college completion still trails other groups",
    caption = "Each square = ~1% of population \n\nSource: U.S Census Bureau",
    fill = "Degree"
  )

The graph above have slightly over 100 squares due to rounding quirks. However, each square still represents about one percent.

From 2001 to 2024, there appears to be significant improvement when it comes to graduating high school. The proportion of Black individuals with less than a high school diploma dropped from 21% to just under 9%: more than halved. High school attainment itself looks to hold about one-third of the Black population in both years. Undergraduate attainment inch upwards from about 2% in 2001 to about 3% in 2024. Graduate degree attainment, however, more than double, rising from approximately 4% to 10%, similarly to Doctoral attainment, from 1-2% to around 4%.

Educational Attainment Trends Across Racial Groups (2024)

Having seen how each racial group progressed over time, we can now compare them directly. The 2024 image reveals differences in educational attainment across racial categories.

Code

educational_percent <- educational_data_combined %>%
  filter(number != ".", year == 2024, attainment_level != "Total",
         demographic %in% c("white","black","asian")) %>%
  mutate(number = as.numeric(number)) %>%
  group_by(demographic, attainment_level) %>%
  summarise(total = sum(number), .groups = "drop") %>%
  group_by(demographic) %>%
  mutate(percent = round(total / sum(total) * 100, 2)) %>%
  ungroup() %>%
  group_by(attainment_level) %>%
  mutate(is_highest = percent == max(percent)) %>%
  ungroup() %>%
  mutate(fill_color = case_when(
    is_highest & demographic == "asian" ~ "asian",
    is_highest & demographic == "black" ~ "black",
    is_highest & demographic == "white" ~ "white",
    TRUE ~ "other"
  ),
,
  attainment_level = factor(attainment_level,
    levels = c("Doctoral Degree",
               "Graduate Degree",
               "Undergraduate Degree",
               "Some College, No Degree",
               "HS",
               "Less than HS")))

ggplot(educational_percent, aes(x = demographic, y = percent, fill = fill_color)) +
  geom_col(width = 0.7) +
  geom_text(
    data = educational_percent %>% filter(is_highest),
    aes(label = paste0(demographic, "\n", percent, "%")),
    vjust = -0.5,
    size = 4.5
  ) +
  facet_wrap(vars(attainment_level), nrow = 2) +   
  scale_fill_manual(
    values = c("asian" = "#440154", "black" = "#31688e", "white" = "#fde725", "other" = "grey80"),
    breaks = c("asian", "black", "white"),
    labels = c("Asian", "Black", "White")
  ) +
  scale_y_continuous(limits = c(0, 48), expand = expansion(mult = c(0, 0.05))) +
  theme_minimal_hgrid(font_size = 16) +
  theme(legend.position = "none",
panel.spacing = unit(4, "lines"),
panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "2024: Asian Americans Outpace Other Groups in Advanced Level Degrees",
    subtitle = "Black Americans most likely to stop at high school",
    y = " Percentage (%) ",
    x = " ",
    caption = "Source: U.S Census Bureau"
  )

By 2024, the contrasts across racial groups are striking. Asian populations show strong gains in advanced degrees, while the Black population continues to face steeper hurdles.

Looking closely at the Black population in 2024, most of the story unfolds at lower educational attainment levels. The Black population shows the highest concentration in the “High School” category at approximately 34%, and in “Some College, No Degree” at about 17%, higher than the White and Asian populations. This can suggest systemic barriers within this population that prevent degree completion even when higher education is attempted. These patterns reveal that college completion, which is considered a gateway to middle-class economic opportunity, remains a bigger challenge for Black individuals than for other racial groups, even when they successfully finish high school.

Educational Attainment Trends by Gender

Code

library(scales)
library(patchwork)

plot_data <- educational_data_combined %>%
  filter(attainment_level != "Total", demographic %in% c("male", "female"), year %in% c(2001, 2024)) %>%
  group_by(year, demographic) %>%
  mutate(total_by_gender = sum(number, na.rm = TRUE), percentage = (number / total_by_gender) * 100) %>%
  ungroup() %>%
  select(year, demographic, attainment_level, percentage)

# Calculating the differences (female % - male %)

differences <- plot_data %>%
  pivot_wider(names_from = demographic, values_from = percentage) %>%
  mutate(difference = female - male, diff_label = paste0(ifelse(difference > 0, "+", ""), round(difference, 1), "%"))

attainment_order <- c("Less than HS", "HS", "Some College, No Degree", 
                      "Undergraduate Degree", "Graduate Degree", "Doctoral Degree")

plot_data <- plot_data %>%
  mutate(attainment_level = factor(attainment_level, levels = attainment_order))

differences <- differences %>%
  mutate(attainment_level = factor(attainment_level, levels = attainment_order))
plot_data_2001 <- plot_data %>% filter(year == 2001)
plot_data_2024 <- plot_data %>% filter(year == 2024)
diff_2001 <- differences %>% filter(year == 2001)
diff_2024 <- differences %>% filter(year == 2024)

# 2001 plot
plot_2001 <- ggplot(plot_data_2001, aes(x = percentage, y = attainment_level)) +
  geom_segment(data = diff_2001, aes(x = male, xend = female, y = attainment_level, yend = attainment_level), color = "gray60", linewidth = 2) +
geom_point(aes(color = demographic), size = 5) +
    scale_color_viridis_d() +
  scale_x_continuous(labels = label_percent(scale = 1), breaks = seq(0, 50, 10), limits = c(0, 40)) +
labs(title = "2001", x = "Percentage", y = NULL) +
theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
axis.text.y = element_text(size = 12),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 11),
plot.margin = margin(10, 5, 10, 10),
legend.position = "bottom")

# 2001 difference section
diff_panel_2001 <- diff_2001 %>%
  ggplot(aes(x = 1, y = attainment_level)) +
  geom_text(aes(label = diff_label), size = 4, fontface = "bold") +
  labs(title = "Percentage \nDifference", x = NULL, y = NULL) +
  theme_void(base_size = 10) +
  theme(plot.title = element_text(face = "bold", size = 12, hjust = 0.5, margin = margin(b = 10)),
    plot.background = element_rect(fill = "gray90", color = NA),
    plot.margin = margin(10, 10, 10, 5))

# 2024 plot 
plot_2024 <- ggplot(plot_data_2024, aes(x = percentage, y = attainment_level)) +
  geom_segment(data = diff_2024, aes(x = male, xend = female, y = attainment_level, yend = attainment_level), color = "gray60", linewidth = 2) +
  geom_point(aes(color = demographic), size = 5) +
    scale_color_viridis_d() +
  scale_x_continuous(labels = label_percent(scale = 1), breaks = seq(0, 50, 10), limits = c(0, 40)) +
  labs(title = "2024", x = "Percentage", y = NULL) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 16, hjust = 0.5), panel.grid.major.y = element_blank(), legend.position = "none", panel.grid.minor = element_blank(), axis.text.y = element_blank(), axis.text.x = element_text(size = 10), axis.title.x = element_text(size = 11), plot.margin = margin(10, 5, 10, 10))

# 2024 difference section
diff_panel_2024 <- diff_2024 %>%
  ggplot(aes(x = 1, y = attainment_level)) +
  geom_text(aes(label = diff_label), size = 4, fontface = "bold") +
  labs(title = "Percentage \nDifference", x = NULL, y = NULL) +
  theme_void(base_size = 10) +
  theme(plot.title = element_text(face = "bold", size = 12, hjust = 0.5, margin = margin(b = 10)), plot.background = element_rect(fill = "gray90", color = NA), plot.margin = margin(10, 10, 10, 5))

# Combining
final_plot <- (plot_2001 | diff_panel_2001 | plot_2024 | diff_panel_2024) +
    scale_color_viridis_d() +
    plot_layout(widths = c(3, 1, 3, 1)) +
    plot_annotation(
    title = "Women Overtake Men in Graduate Degree Attainment",
    subtitle = "By 2024, women surpass men in graduate degree attainment",
    caption = "Source: U.S. Census Bureau",
    theme = theme(
      plot.title = element_text(face = "bold", size = 20, hjust = 0.5),
      plot.subtitle = element_text(hjust = 0.5, size = 14),
      plot.caption = element_text(hjust = 1, size = 13)))
  

final_plot

Gender patterns in educational attainment have undergone some reversal and change between 2001 and 2024. In 2001, men held advantages at the highest educational levels, leading women in graduate and doctoral degrees by 1.4% and 1.2% respectively. By 2024, women had not only closed these gaps, but reversed them entirely, now surpassing men in graduate degree attainment. Women’s undergraduate advantage has also grown from 0.2% to 2.8%.

Educational Attainment Trends by Age

Code

educational_data_combined %>%
  filter(demographic %in% c("25_to_34", "35_to_54", "55_plus")) %>%
  filter(attainment_level != "Other",
         attainment_level != "Total") %>%
  mutate(attainment_level = factor(
    attainment_level, 
    levels = c("Doctoral Degree",
               "Graduate Degree",
               "Undergraduate Degree",
               "Some College, No Degree",
               "HS",
               "Less than HS")
    )) %>%
  mutate(demographic = factor(demographic,
    levels = c("25_to_34", "35_to_54", "55_plus"),
    labels = c("25 to 34", "35 to 54", "55 Plus")
  )) %>%
  group_by(demographic, attainment_level) %>%
  summarise(total = sum(number)) %>%
  ggplot(aes(x = demographic, y = total/1000, fill = demographic)) +
  geom_col(position = "dodge") +
  facet_wrap(~attainment_level)+
  coord_flip()+
  labs(
    title = "Midlife Americans Lead in College Degree Attainment",
    subtitle = "Advanced degrees peak in midlife; adults 55 years and older less likely to pursue higher education",
    x = "Age group ",
    y = "Count (millions)",
    fill = "Age",
    caption = "\nSource: U.S. Census Bureau") +
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(hjust = 1),
    panel.grid.minor = element_blank(),
    panel.spacing = unit(4, "lines")
  )+
  scale_fill_viridis_d() +
  theme(axis.text.x = element_text(hjust = 1),
        panel.grid.minor = element_blank(),
        plot.title = element_text(size = 20, face = "bold"),
        plot.subtitle = element_text(size = 17),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        legend.position = "none",
        plot.caption = element_text(size = 13)
  )

Following the hierarchy of educational attainment, the distribution of attainment across age groups is shaped by cohort size and generational timing. The 35-54 age group shows the highest counts across undergraduate and graduate degrees, reflecting both their large size, and the expansion of higher education during their formative years.

Doctoral degree attainment sees more in ages 35 and over, with close values in the 35 to 54 and 55 and over age groups. This reflects the timeline for doctoral degree completion and the accumulation of degree holders who completed their doctorates in the previous decades.

The 25-34 age group shows rather smaller numbers across all education levels possibly due to the fact that many are still pursuing degrees. By midlife, the drive to pursue a higher education often peaks, and it is during this stage that individuals are most likely to complete undergraduate or graduate degrees.

Evaluation of Expectations

Proposal Expectations

Our datasets contain the following variables: race, sex, age group, detailed years of school (representing educational degree attainment) and year. For this project, our primary focus will be on race, educational attainment, and year, which serves as our temporal variable.

Our original proposal expectations were that the educational attainment variable will be multimodal, with different peaks at different attainment levels: there may be fewer individuals holding highly advanced degrees. We also expected that due to systemic racism, we would observe that a higher percentage of White respondents with higher educational attainment.

Our expectations about the variables inspired our research questions about how educational attainment differs across demographic groups. These expectations connected directly to the project’s main goal: exploring how education has changed over time in the U.S, with room for highlighting whether disparities exist across age, race, and gender group.

Evaluation

Based on the visualizations created, we found support for this anticipation. The aggregated bar charts and faceted comparisons by sex and race show that educational attainment is not evenly distributed across the aforementioned demographic groups.

The largest populations cluster around high school diplomas and undergraduate degrees, while doctoral degrees represent progressively smaller shares. This reflects typical educational pathways: most Americans complete high school, many pursue college, and fewer continue to advanced degrees.

Specifically, when looking at race it was indeed observed that white respondents consistently show higher counts across most educational levels compared to Black respondents. However, we discovered that Asians consistently outpace both White and Black populations in advanced educational attainments. These patterns educational reveal disparities that are more than a simple White/non-White divide, and can reflect other factors such as cultural emphasis on education or differential access to resources.

Conclusion

This project explored educational attainment trends across different demographic groups from 2001 to 2024, using a series of visualizations to highlight disparities and shifts over time. Between 2001 and 2024, the United States experienced overall growth in educational attainment: undergraduate degree holders increased from 184 million to 316 million (71.6% growth), while adults without high school diplomas declined from 113 million to 78 million; the only category to see a decrease. Graduate and doctoral degrees more than doubled, with growth rates of approximately 115% and 139% respectively, indicating that more individuals are pursuing education beyond the undergraduate degree level.

Our analysis confirmed initial expectations about educational inequality among demographic groups while revealing important complexities. The high proportion of Black Americans with “Some College, No Degree” suggests issues such as systemic financial and institutional barriers preventing degree completion. Gender patterns show a surprising reversal: women now surpass men in graduate degree attainment and have nearly closed the doctoral gap.

The use of slope charts, waffle charts, faceted comparisons and dumbbell plots provided clear visual evidence of patterns and disparities. These findings underscored inequalities in the education system and offer a foundation for future inquiry into the social and economic impacts of educational attainment.

Attribution

Leshauna Hartman: Introduction, Research Questions, Data Sources, Data Dictionary & README, Data Cleaning, Data Visualizations, Evaluation of Expectations, Conclusion.

Maya Schmidt: Introduction, Research Questions, Data Cleaning, Data Visualizations, Evaluation of Expectations, Conclusion.

References

Ryan, C. L., & Siebens, J. (2021, October 8). Educational attainment in the United States: 2009. Census.gov. https://www.census.gov/library/publications/2012/demo/p20-566.html

Stroops, N. (2021, October 8). Educational attainment in the United States: 2003. Census.gov. https://www.census.gov/library/publications/2004/demo/p20-550.html

U.S Census Bureau, (n.d.). Educational attainment tables. Census.gov. https://www.census.gov/topics/education/educational-attainment/data/tables.html?text-list-e9c3fe7baa%3Atab=2024#text-list-e9c3fe7baa

Appendix

Below is data dictionary table and all of the codes used in this report.

Code

datatable(data_dictionary, caption = "Data Dictionary 1",
          options = list(pageLength = 5, lengthMenu = c(3, 4, 6, 12)))

Code

datatable(data_dictionary2, caption = "Data Dictionary 2",
          options = list(pageLength = 5,lengthMenu = c(1, 2, 3, 4)))

Code

knitr::opts_chunk$set(
    warning = FALSE,
    message = FALSE,
    fig.path = "figs/", # Folder where rendered plots are saved
    fig.width = 12, # Default plot width
    fig.height = 7, # Default plot height
    fig.retina = 3, # For better plot resolution
    comment = "#>"
)

# Load necessary libraries
library(tidyverse)
library(readxl)
library(here)
library(janitor)
library(dplyr)
 #install.packages("unheadr")
library(unheadr)
library(dplyr)
library(cowplot)
library(forcats)
library(ggrepel)
library(ggwaffle)
library(waffle)
library(ggplot2)
library(knitr)
library(DT)
#quarto add mcanouil/quarto-elevator
data_dictionary <- tibble(
  Variable = c("Detailed years of school", 
               "All races/ All people",
               "Male(s)",
               "Female(s)",
               "25 to 34 years old",
               "35 to 54 years old",
               "55 years and older",
               "White",
               "Non-Hispanic White",
               "Black",
               "Asian",
               "Hispanic (of any race)"), 
  Description = c("Detailed years of school", 
                  "Total number of people surveyed", 
                  "Respondents who identify as male",
                  "Respondents who identify as female",
                  "Respondents aged 25 to 34 years",
                  "Respondents aged 35 to 54 years",
                  "Respondents aged 55 and older",
                  "A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.  It includes people who indicate their race as “White” or report responses such as German, Irish, English, Italian, Lebanese, and Egyptian. The category also includes groups such as Polish, French, Iranian, Slavic, Cajun, Chaldean, etc", 
                  "A person having origins in any of the original peoples of Europe, the Middle East, or North Africa, and who does not identify as Hispanic or Latino",
                  "A person having origins in any of the Black racial groups of Africa.  It includes people who indicate their race as “Black or African American” or report responses such as African American, Jamaican, Haitian, Nigerian, Ethiopian, or Somali. The category also includes groups such as Ghanaian, South African, Barbadian, Kenyan, Liberian, Bahamian, etc.",
                   "A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, India, China, the Philippine Islands, Japan, Korea, or Vietnam.  It includes people who indicate their race as “Asian Indian,” “Chinese,” “Filipino,” “Korean,” “Japanese,” “Vietnamese,” and “Other Asian” or provide other detailed Asian responses such as Pakistani, Cambodian, Hmong, Thai, Bengali, Mien, etc.",
                  "A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race."
                             )
)
data_dictionary2 <- tibble(
  Variable = c("Number", 
               "Percent"), 
  Description = c("The count of respondents (in thousands) within that specific demographic group who have completed a particular level of education", 
                  "The proportion of a demographic group's total population that has completed a particular level of education"
                             ),
  Variable_Type = c("double",
                    "character (2010 -2017, 2021, 2022, 2024),
                    Double (2018 - 2020)")
)

# Load educational data, skipping the first 4 header rows
educational_data_2024 <- read_excel(here::here('data_raw', '2024_Educational_Data.xlsx'), skip = 4)
# Preview first few rows of dataset 
head(educational_data_2024)
# Rename columns for clarity and consistency 
educational_data_2024 <- educational_data_2024 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))

# Replace placeholder values
# Drop missing values
educational_2024_clean <- educational_data_2024 %>%
  mutate(across(everything(), ~replace(., . =="Z", "0"))) %>% 
  drop_na() 
# Reshape to long format
educational_2024_tidy <- educational_2024_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
# Convert count values to numeric 
# Reshape back to wide format
educational_2024_tidy <- educational_2024_tidy %>%
  mutate(count = as.numeric(count)) %>%
  pivot_wider(
  names_from = value,
  values_from = count
)
# Create a new variable to categorize educational attainment levels
# Add year

educational_2024_tidy <- educational_2024_tidy %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2024)
head(educational_2024_tidy)
# Load educational data, skipping the first 4 header rows
educational_data_2022 <- read_excel(here::here('data_raw', '2022_Educational_Data.xlsx'), skip = 4)
# Preview the first few rows of the dataset
head(educational_data_2022)
# Rename columns for clarity and consistency

educational_data_2022 <- educational_data_2022 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
# Replace placeholder values
# Drop missing values

educational_2022_clean <- educational_data_2022 %>%
  mutate(across(everything(), ~replace(., . =="Z", "0"))) %>% 
  drop_na()
# Reshape to long format
educational_2022_tidy <- educational_2022_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
# Convert count values to numeric 
# Reshape back to wide format
# Create a new variable to categorize educational attainment levels
# Add year 

educational_2022_tidy <- educational_2022_tidy %>%
  mutate(count = as.numeric(count)) %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2022)
head(educational_2022_tidy)
# Load educational data, skipping the first 4 header rows
educational_data_2021 <- read_excel(here::here('data_raw', '2021_Educational_Data.xlsx'), skip = 4)
# Preview the first few rows of the dataset
head(educational_data_2021)
# Rename columns for clarity and consistency

educational_data_2021 <- educational_data_2021 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
# Replace placeholder values
# Drop missing values
educational_2021_clean <- educational_data_2021 %>%
  mutate(across(everything(), ~replace(., . =="Z", "0"))) %>% 
  drop_na()
# Reshape to long format
educational_2021_tidy <- educational_2021_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
# Convert count values to numeric 
# Reshape back to wide format
# Create a new variable to categorize educational attainment levels
# Add year

educational_2021_tidy <- educational_2021_tidy %>%
  mutate(count = as.numeric(count)) %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2021)
head(educational_2021_tidy)
# Load educational data, skipping the unnecessary header rows

educational_data_2020 <- read_excel(here::here('data_raw', '2020_Educational_Data.xlsx'), skip = 5)
educational_data_2019 <- read_excel(here::here('data_raw', '2019_Educational_Data.xlsx'), skip = 5)
educational_data_2018 <- read_excel(here::here('data_raw', '2018_Educational_Data.xlsx'), skip = 5)
educational_data_2017 <- read_excel(here::here('data_raw', '2017_Educational_Data.xlsx'), skip = 5)
educational_data_2016 <- read_excel(here::here('data_raw', '2016_Educational_Data.xlsx'), skip = 5)
educational_data_2015 <- read_excel(here::here('data_raw', '2015_Educational_Data.xlsx'), skip = 5)
educational_data_2014 <- read_excel(here::here('data_raw', '2014_Educational_Data.xlsx'), skip = 5)
educational_data_2013 <- read_excel(here::here('data_raw', '2013_Educational_Data.xlsx'), skip = 5)
educational_data_2012 <- read_excel(here::here('data_raw', '2012_Educational_Data.xlsx'), skip = 6)
educational_data_2011 <- read_excel(here::here('data_raw', '2011_Educational_Data.xlsx'), skip = 6)
educational_data_2010 <- read_excel(here::here('data_raw', '2010_Educational_Data.xlsx'), skip = 5)
# Rename columns for clarity and consistency

educational_data_2020 <- educational_data_2020 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2019 <- educational_data_2019 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2018 <- educational_data_2018 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2017 <- educational_data_2017 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2016 <- educational_data_2016 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2015 <- educational_data_2015 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2014 <- educational_data_2014 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2013 <- educational_data_2013 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2012 <- educational_data_2012 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2011 <- educational_data_2011 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
educational_data_2010 <- educational_data_2010 %>% set_names(c("years_of_school","total_number","total_percent","male_number", "male_percent", "female_number", "female_percent", "25_to_34_number","25_to_34_percent", "35_to_54_number","35_to_54_percent", "55_plus_number","55_plus_percent", "white_number","white_percent", "non_hispanic_white_number","non_hispanic_white_percent", "black_number","black_percent", "asian_number","asian_percent", "hispanic_number","hispanic_percent"))
# Replace placeholder values
# Drop missing values

educational_2020_clean <- educational_data_2020 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2019_clean <- educational_data_2019 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2018_clean <- educational_data_2018 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2017_clean <- educational_data_2017 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2016_clean <- educational_data_2016 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2015_clean <- educational_data_2015 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2014_clean <- educational_data_2014 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2013_clean <- educational_data_2013 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2012_clean <- educational_data_2012 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2011_clean <- educational_data_2011 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
educational_2010_clean <- educational_data_2010 %>%
  mutate(across(everything(), ~replace(., . =="-", "0"))) %>% 
  drop_na()
# Reshape to long format
educational_2020_tidy <- educational_2020_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2019_tidy <- educational_2019_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
educational_2018_tidy <- educational_2018_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
educational_2017_tidy <- educational_2017_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2016_tidy <- educational_2016_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2015_tidy <- educational_2015_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2014_tidy <- educational_2014_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2013_tidy <- educational_2013_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2012_tidy <- educational_2012_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2011_tidy <- educational_2011_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  ) 
educational_2010_tidy <- educational_2010_clean %>%
  pivot_longer(
    cols = -years_of_school,
    names_to = c("demographic", "value"),
    values_to = "count",
    names_pattern = "(.+)_(number|percent)"  
  )
# Convert count values to numeric 
# Reshape back to wide format
# Create a new variable to categorize educational attainment levels
# Add year

educational_2020_tidy <- educational_2020_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2020)
educational_2019_tidy <- educational_2019_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2019)
educational_2018_tidy <- educational_2018_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2018)
educational_2017_tidy <- educational_2017_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate's") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2017)
educational_2016_tidy <- educational_2016_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2016)
educational_2015_tidy <- educational_2015_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2015)
educational_2014_tidy <- educational_2014_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2014)
educational_2013_tidy <- educational_2013_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2013)
educational_2012_tidy <- educational_2012_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2012)
educational_2011_tidy <- educational_2011_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2011)
educational_2010_tidy <- educational_2010_tidy %>%
  pivot_wider(
  names_from = value,
  values_from = count
) %>%
  mutate(attainment = case_when(
    str_detect(years_of_school, "no diploma") ~ "Less than High School",
    str_detect(years_of_school, "GED") ~ "GED",
    str_detect(years_of_school, "High school diploma") ~ "High School Diploma",
    str_detect(years_of_school, "no degree") ~ "Some College, No Degree",
    str_detect(years_of_school, "associate") ~ "Associate's Degree",
    str_detect(years_of_school, "Bachelor") ~ "Bachelor's Degree",
    str_detect(years_of_school, "no master's") ~ "Bachelor's Degree",
    str_detect(years_of_school, "Master's") ~ "Master's Degree",
    str_detect(years_of_school, "Professional") ~ "Professional Degree",
    str_detect(years_of_school, "Doctorate") ~ "Doctoral Degree",
    TRUE ~ "Other"
  )) %>%
  select(years_of_school, attainment, demographic, number, percent) %>%
  mutate(year = 2010)
# Check structure and datatypes of most recent years to ensure consistency
glimpse(educational_2021_tidy)
glimpse(educational_2022_tidy)
glimpse(educational_2024_tidy)
educational_2020_tidy <- educational_2020_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2019_tidy <- educational_2019_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2018_tidy <- educational_2018_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2017_tidy <- educational_2017_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2016_tidy <- educational_2016_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2015_tidy <- educational_2015_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2014_tidy <- educational_2014_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2013_tidy <- educational_2013_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2012_tidy <- educational_2012_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2011_tidy <- educational_2011_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )
educational_2010_tidy <- educational_2010_tidy %>%
  mutate(
    number = as.numeric(number),
    percent = as.numeric(percent)
  )

# Combine all cleaned and tidy datasets
educational_data_combined <- bind_rows(
  educational_2010_tidy,
  educational_2011_tidy,
  educational_2012_tidy,
  educational_2013_tidy,
  educational_2014_tidy,
  educational_2015_tidy,
  educational_2016_tidy,
  educational_2017_tidy,
  educational_2018_tidy,
  educational_2019_tidy,
  educational_2020_tidy,
  educational_2021_tidy,
  educational_2022_tidy,
  educational_2024_tidy) %>%
  select(-percent) %>%
  filter(!is.na(number))
#view(educational_data_combined)
# Grouping educational attainment categories

educational_data_combined <- educational_data_combined %>% 
  mutate(attainment_level = case_when(
    attainment == "Less than High School" ~ "Less than HS",
    attainment %in% c("GED", "High School Diploma") ~ "HS",
    attainment == "Some College, No Degree" ~ "Some College, No Degree",
    attainment %in% c("Associate's Degree", "Bachelor's Degree") ~ "Undergraduate Degree",
    attainment %in% c("Professional Degree", "Master's Degree") ~ "Graduate Degree",
    attainment == "Doctoral Degree" ~ "Doctoral Degree", 
    TRUE ~ "Total")
  )

educational_data_combined <- educational_data_combined %>%
  group_by(year, attainment_level, demographic) %>%
  summarise(number = sum(number, na.rm = TRUE), .groups = "drop")
educational_data_combined
#loading and cleaning 2009_Asian
educational_data_2009_Asian <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Asian.xls'), skip = 7)

#set names
educational_data_2009_asian <- educational_data_2009_Asian %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

#2 = both, 1= female, 0= male
educational_data_2009_asian <- educational_data_2009_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))

#fix data types
educational_data_2009_asian <- educational_data_2009_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))

#pivot and check
educational_data_2009_asian <- educational_data_2009_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2009_asian)

#cleaning and loading 2009_Black
educational_data_2009_Black <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Black.xls'), skip = 7)

educational_data_2009_black <- educational_data_2009_Black %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
#2 = both, 1= female, 0= male
educational_data_2009_black <- educational_data_2009_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))


educational_data_2009_black <- educational_data_2009_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

head(educational_data_2009_black)

educational_data_2009_black <- educational_data_2009_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )


#repeating everything for 2009_Hispanic
educational_data_2009_Hispanic <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Hispanic.xls'), skip = 7)


educational_data_2009_hispanic <- educational_data_2009_Hispanic %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
#2 = both, 1= female, 0= male
educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))

educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

head(educational_data_2009_hispanic)

educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#repeating everything for 2009_Non_Hispanic_White
educational_data_2009_non_hispanic_white <- read_excel(here::here('data_raw', '2009_Educational_Attainment_Non_Hispanic_White.xls'), skip = 7)

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

#2 = both, 1= female, 0= male
educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))
head(educational_data_2009_non_hispanic_white)


educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

head(educational_data_2009_non_hispanic_white)

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#repeating everything for 2009_White
educational_data_2009_white <- read_excel(here::here('data_raw', '2009_Educational_Attainment_White.xls'), skip = 7)

educational_data_2009_white <- educational_data_2009_white %>%
  set_names(c(
    "sexes",
    "age_group_big",
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

#2 = both, 1= female, 0= male
educational_data_2009_white <- educational_data_2009_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  select(-c(age_group_big)) %>%
  filter(!is.na(age_group))

educational_data_2009_white <- educational_data_2009_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2009_white <- educational_data_2009_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )


#joining 2009 tables
#adding race column

educational_data_2009_asian <- educational_data_2009_asian %>%
  mutate(race = "asian")

educational_data_2009_black <- educational_data_2009_black %>%
  mutate(race = "black")

educational_data_2009_hispanic <- educational_data_2009_hispanic %>%
  mutate(race = "hispanic")

educational_data_2009_non_hispanic_white <- educational_data_2009_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2009_white <- educational_data_2009_white %>%
  mutate(race = "white")


test_join_2009 <- educational_data_2009_asian %>%
  full_join(educational_data_2009_black) %>%
  full_join(educational_data_2009_hispanic) %>%
  full_join(educational_data_2009_non_hispanic_white) %>%
  full_join(educational_data_2009_white) %>%
  mutate(
    year = 2009
  )

head(test_join_2009)
#repeat for 2008 white
educational_data_2008_white <- read_excel(here::here('data_raw', '2008_Educational_Attainment_White.xls'), skip = 6)
educational_data_2008_white

educational_data_2008_white <- educational_data_2008_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2008_white <- educational_data_2008_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_white <- educational_data_2008_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_white <- educational_data_2008_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
#repeat for 2008 non-hispanic white
educational_data_2008_non_hispanic_white <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Non_Hispanic_White.xls'), skip = 6)


educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
#repeat for 2008 black
educational_data_2008_black <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Black.xls'), skip = 6)


educational_data_2008_black <- educational_data_2008_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2008_black <- educational_data_2008_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_black <- educational_data_2008_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_black <- educational_data_2008_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#repeat for 2008 asian
educational_data_2008_asian <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Asian.xls'), skip = 6)

# Set names
educational_data_2008_asian <- educational_data_2008_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2008_asian <- educational_data_2008_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44 ~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

# Fix data types
educational_data_2008_asian <- educational_data_2008_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))

# Pivot
educational_data_2008_asian <- educational_data_2008_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#repeat for 2008 hispanic
educational_data_2008_hispanic <- read_excel(here::here('data_raw', '2008_Educational_Attainment_Hispanic.xls'), skip = 6)

# Set names
educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#joining 2008 tables

#adding race column

educational_data_2008_asian <- educational_data_2008_asian %>%
  mutate(race = "asian")

educational_data_2008_black <- educational_data_2008_black %>%
  mutate(race = "black")

educational_data_2008_hispanic <- educational_data_2008_hispanic %>%
  mutate(race = "hispanic")

educational_data_2008_non_hispanic_white <- educational_data_2008_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2008_white <- educational_data_2008_white %>%
  mutate(race = "white")

test_join_2008 <- educational_data_2008_asian %>%
  full_join(educational_data_2008_black) %>%
  full_join(educational_data_2008_hispanic) %>%
  full_join(educational_data_2008_non_hispanic_white) %>%
  full_join(educational_data_2008_white) %>%
  filter(!is.na(sexes)) %>%
  mutate(
    year = 2008
  )

head(test_join_2008)
# Loading and cleaning 2007 Data
# Asian
educational_data_2007_asian <- read_excel(here::here('data_raw', '2007_Educational_Data_asian.xlsx'), skip = 6)

# Set names
educational_data_2007_asian <- educational_data_2007_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2007_asian <- educational_data_2007_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44 ~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

# Fix data types
educational_data_2007_asian <- educational_data_2007_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))

# Pivot
educational_data_2007_asian <- educational_data_2007_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# Repeat for other races/ethnicities
# Black
educational_data_2007_black <- read_excel(here::here('data_raw', '2007_Educational_Data_black.xlsx'), skip = 6)

educational_data_2007_black <- educational_data_2007_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2007_black <- educational_data_2007_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_black <- educational_data_2007_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_black <- educational_data_2007_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# Hispanic
educational_data_2007_hispanic <- read_excel(here::here('data_raw', '2007_Educational_Data_hispanic.xlsx'), skip = 6)

# Set names
educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Non_Hispanic_White

educational_data_2007_non_hispanic_white <- read_excel(here::here('data_raw', '2007_Educational_Data_non_hispanic_white.xlsx'), skip = 6)

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# White
educational_data_2007_white <- read_excel(here::here('data_raw', '2007_Educational_Data_white.xlsx'), skip = 6)

educational_data_2007_white <- educational_data_2007_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2007_white <- educational_data_2007_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:14 ~ 2,
    row_number() %in% 15:29 ~ 0,
    row_number() %in% 30:44~ 1
  )) %>%
  filter(!is.na(age_group)) %>%
  mutate(age_group = str_remove_all(age_group, "^\\.+\\s*"))

educational_data_2007_white <- educational_data_2007_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2007_white <- educational_data_2007_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Joining 2007 tables
# Adding race column

educational_data_2007_asian <- educational_data_2007_asian %>%
  mutate(race = "asian")

educational_data_2007_black <- educational_data_2007_black %>%
  mutate(race = "black")

educational_data_2007_hispanic <- educational_data_2007_hispanic %>%
  mutate(race = "hispanic")

educational_data_2007_non_hispanic_white <- educational_data_2007_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2007_white <- educational_data_2007_white %>%
  mutate(race = "white")

test_join_2007 <- educational_data_2007_asian %>%
  full_join(educational_data_2007_black) %>%
  full_join(educational_data_2007_hispanic) %>%
  full_join(educational_data_2007_non_hispanic_white) %>%
  full_join(educational_data_2007_white) %>%
  mutate(
    year = 2007
  )

head(test_join_2007)
# Loading and cleaning 2006 Data
# Asian
educational_data_2006_asian <- read_excel(here::here('data_raw', '2006_Educational_Data_asian.xlsx'), skip = 6)

# Set names
educational_data_2006_asian <- educational_data_2006_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2006_asian <- educational_data_2006_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2006_asian <- educational_data_2006_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2006_asian <- educational_data_2006_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Repeat for other races/ethnicities
# Black
educational_data_2006_black <- read_excel(here::here('data_raw', '2006_Educational_Data_black.xlsx'), skip = 6)

educational_data_2006_black <- educational_data_2006_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2006_black <- educational_data_2006_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_black <- educational_data_2006_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_black <- educational_data_2006_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# Hispanic
educational_data_2006_hispanic <- read_excel(here::here('data_raw', '2006_Educational_Data_hispanic.xlsx'), skip = 6)

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Non_Hispanic_White

educational_data_2006_non_hispanic_white <- read_excel(here::here('data_raw', '2006_Educational_Data_non_hispanic_white.xlsx'), skip = 6)

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# White
educational_data_2006_white <- read_excel(here::here('data_raw', '2006_Educational_Data_white.xlsx'), skip = 6)

educational_data_2006_white <- educational_data_2006_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2006_white <- educational_data_2006_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
    row_number() %in% 1:19 ~ 2,
    row_number() %in% 32:51 ~ 0,
    row_number() %in% 64:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2006_white <- educational_data_2006_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2006_white <- educational_data_2006_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Joining 2006 tables
# Adding race column

educational_data_2006_asian <- educational_data_2006_asian %>%
  mutate(race = "asian")

educational_data_2006_black <- educational_data_2006_black %>%
  mutate(race = "black")

educational_data_2006_hispanic <- educational_data_2006_hispanic %>%
  mutate(race = "hispanic")

educational_data_2006_non_hispanic_white <- educational_data_2006_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2006_white <- educational_data_2006_white %>%
  mutate(race = "white")

test_join_2006 <- educational_data_2006_asian %>%
  full_join(educational_data_2006_black) %>%
  full_join(educational_data_2006_hispanic) %>%
  full_join(educational_data_2006_non_hispanic_white) %>%
  full_join(educational_data_2006_white) %>%
  mutate(
    year = 2006
  )

head(test_join_2006)
# Loading and cleaning 2005 Data
# Asian
educational_data_2005_asian <- read_excel(here::here('data_raw', '2005_Educational_Data_asian.xlsx'), skip = 5)

# Set names
educational_data_2005_asian <- educational_data_2005_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2005_asian <- educational_data_2005_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2005_asian <- educational_data_2005_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2005_asian <- educational_data_2005_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2005_asian)
# Repeat for other races/ethnicities
# Black
educational_data_2005_black <- read_excel(here::here('data_raw', '2005_Educational_Data_black.xlsx'), skip = 5)

educational_data_2005_black <- educational_data_2005_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2005_black <- educational_data_2005_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_black <- educational_data_2005_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_black <- educational_data_2005_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# Hispanic
educational_data_2005_hispanic <- read_excel(here::here('data_raw', '2005_Educational_Data_hispanic.xlsx'), skip = 5)

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Non_Hispanic_White

educational_data_2005_non_hispanic_white <- read_excel(here::here('data_raw', '2005_Educational_Data_non_hispanic_white.xlsx'), skip = 5)

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# White
educational_data_2005_white <- read_excel(here::here('data_raw', '2005_Educational_Data_white.xlsx'), skip = 5)

educational_data_2005_white <- educational_data_2005_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2005_white <- educational_data_2005_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 34:54 ~ 0,
row_number() %in% 67:87 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2005_white <- educational_data_2005_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2005_white <- educational_data_2005_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Joining 2005 tables
# Adding race column

educational_data_2005_asian <- educational_data_2005_asian %>%
  mutate(race = "asian")

educational_data_2005_black <- educational_data_2005_black %>%
  mutate(race = "black")

educational_data_2005_hispanic <- educational_data_2005_hispanic %>%
  mutate(race = "hispanic")

educational_data_2005_non_hispanic_white <- educational_data_2005_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2005_white <- educational_data_2005_white %>%
  mutate(race = "white")

test_join_2005 <- educational_data_2005_asian %>%
  full_join(educational_data_2005_black) %>%
  full_join(educational_data_2005_hispanic) %>%
  full_join(educational_data_2005_non_hispanic_white) %>%
  full_join(educational_data_2005_white) %>%
  mutate(
    year = 2005
  )

head(test_join_2005)
# Loading and cleaning 2004 Data
# Asian
educational_data_2004_asian <- read_excel(here::here('data_raw', '2004_Educational_Data_asian.xlsx'), skip = 5)

# Set names
educational_data_2004_asian <- educational_data_2004_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2004_asian <- educational_data_2004_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2004_asian <- educational_data_2004_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2004_asian <- educational_data_2004_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2004_asian)
# Repeat for other races/ethnicities
# Black
educational_data_2004_black <- read_excel(here::here('data_raw', '2004_Educational_Data_black.xlsx'), skip = 5)

educational_data_2004_black <- educational_data_2004_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2004_black <- educational_data_2004_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_black <- educational_data_2004_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_black <- educational_data_2004_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# Hispanic
educational_data_2004_hispanic <- read_excel(here::here('data_raw', '2004_Educational_Data_hispanic.xlsx'), skip = 5)

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Non_Hispanic_White

educational_data_2004_non_hispanic_white <- read_excel(here::here('data_raw', '2004_Educational_Data_non_hispanic_white.xlsx'), skip = 5)

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# White
educational_data_2004_white <- read_excel(here::here('data_raw', '2004_Educational_Data_white.xlsx'), skip = 5)

educational_data_2004_white <- educational_data_2004_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2004_white <- educational_data_2004_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2004_white <- educational_data_2004_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2004_white <- educational_data_2004_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Joining 2004 tables
# Adding race column

educational_data_2004_asian <- educational_data_2004_asian %>%
  mutate(race = "asian")

educational_data_2004_black <- educational_data_2004_black %>%
  mutate(race = "black")

educational_data_2004_hispanic <- educational_data_2004_hispanic %>%
  mutate(race = "hispanic")

educational_data_2004_non_hispanic_white <- educational_data_2004_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2004_white <- educational_data_2004_white %>%
  mutate(race = "white")

test_join_2004 <- educational_data_2004_asian %>%
  full_join(educational_data_2004_black) %>%
  full_join(educational_data_2004_hispanic) %>%
  full_join(educational_data_2004_non_hispanic_white) %>%
  full_join(educational_data_2004_white) %>%
  mutate(
    year = 2004
  )

head(test_join_2004)
# Loading and cleaning 2003 Data
# Asian
educational_data_2003_asian <- read_excel(here::here('data_raw', '2003_Educational_Data_asian.xlsx'), skip = 5)

# Set names
educational_data_2003_asian <- educational_data_2003_asian %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# Gender
# 2 = both, 1= female, 0= male
educational_data_2003_asian <- educational_data_2003_asian %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

# Fix data types
educational_data_2003_asian <- educational_data_2003_asian %>%
  mutate(across(-c(age_group, sexes), as.numeric))
# Pivot
educational_data_2003_asian <- educational_data_2003_asian %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

#view(educational_data_2003_asian)
# Repeat for other races/ethnicities
# Black
educational_data_2003_black <- read_excel(here::here('data_raw', '2003_Educational_Data_black.xlsx'), skip = 5)

educational_data_2003_black <- educational_data_2003_black %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2003_black <- educational_data_2003_black %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_black <- educational_data_2003_black %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_black <- educational_data_2003_black %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# Hispanic
educational_data_2003_hispanic <- read_excel(here::here('data_raw', '2003_Educational_Data_hispanic.xlsx'), skip = 5)

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))
# 2 = both, 1= female, 0= male
educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Non_Hispanic_White

educational_data_2003_non_hispanic_white <- read_excel(here::here('data_raw', '2003_Educational_Data_non_hispanic_white.xlsx'), skip = 5)

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# White
educational_data_2003_white <- read_excel(here::here('data_raw', '2003_Educational_Data_white.xlsx'), skip = 5)

educational_data_2003_white <- educational_data_2003_white %>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  ))

# 2 = both, 1= female, 0= male
educational_data_2003_white <- educational_data_2003_white %>%
  mutate(across(everything(), ~replace(., . == "-", "0"))) %>%
  mutate(sexes = case_when(
row_number() %in% 1:21 ~ 2,
row_number() %in% 32:52 ~ 0,
row_number() %in% 63:83 ~ 1
  )) %>%
  filter(!is.na(age_group))

educational_data_2003_white <- educational_data_2003_white %>%
  mutate(across(-c(age_group, sexes), as.numeric))

educational_data_2003_white <- educational_data_2003_white %>%
  pivot_longer(
    cols = c(total, 
             none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

# Joining 2003 tables
# Adding race column

educational_data_2003_asian <- educational_data_2003_asian %>%
  mutate(race = "asian")

educational_data_2003_black <- educational_data_2003_black %>%
  mutate(race = "black")

educational_data_2003_hispanic <- educational_data_2003_hispanic %>%
  mutate(race = "hispanic")

educational_data_2003_non_hispanic_white <- educational_data_2003_non_hispanic_white %>%
  mutate(race = "non_hispanic_white")

educational_data_2003_white <- educational_data_2003_white %>%
  mutate(race = "white")

test_join_2003 <- educational_data_2003_asian %>%
  full_join(educational_data_2003_black) %>%
  full_join(educational_data_2003_hispanic) %>%
  full_join(educational_data_2003_non_hispanic_white) %>%
  full_join(educational_data_2003_white) %>%
  mutate(
    year = 2003
  )

head(test_join_2003)
educational_data_2002 <- read_excel(here::here('data_raw', '2002_Educational_Attainment.xls'), skip = 5)
#split and reattach: all races, both sexes, 2002
split_row_1 = 15
split_row_2 = 34
split_row_3 = 48
allraces_bothsexes_2002 <-educational_data_2002 %>% slice(1:split_row_1) 
allraces_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_2:split_row_3)

#renaming the columns for the first half of the allraces/sexes 2002 data
allraces_bothsexes_2002 <- allraces_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

#renaming the columns for the second half of the allraces/sexes 2002 data

allraces_bothsexes_2002_2 <- allraces_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

# re-merged: wide
allraces_bothsexes_2002_2 <- allraces_bothsexes_2002_2 %>%
  mutate(
    race = "all_races",
    sexes = "both_sexes"
  )

allraces_bothsexes_2002 <- allraces_bothsexes_2002 %>%
  mutate(
    race = "all_races",
    sexes = "both_sexes"
  )
allraces_bothsexes_2002_merged <- full_join(allraces_bothsexes_2002, allraces_bothsexes_2002_2)

allraces_bothsexes_2002_merged <- allraces_bothsexes_2002_merged %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )
# collapsing age
allraces_bothsexes_2002_merged <- allraces_bothsexes_2002_merged %>%
  filter(age_group != "15 years and over") %>%
  filter(age_group != "15 to 17 years") %>%
  filter(age_group != "18 to 19 years") %>%
  filter(age_group != "20 to 24 years") %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
# pivot longer
allraces_bothsexes_2002_merged_2 <- allraces_bothsexes_2002_merged %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
# collapsing attainment
allraces_bothsexes_2002_merged_3 <- allraces_bothsexes_2002_merged_2 %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
#load and rename
split_row_4 = 65
split_row_5 = 81
allraces_male_2002 <-educational_data_2002 %>% slice(split_row_4:split_row_5)

split_row_6 = 99
split_row_7 = 114

allraces_male_2002_2 <-educational_data_2002 %>% slice(split_row_6:split_row_7)

#renaming the columns for the first half of the allraces/male 2002 data
allraces_male_2002 <- allraces_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the allraces/male 2002 data

allraces_male_2002_2 <- allraces_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
allraces_male_2002_2 <- allraces_male_2002_2 %>%
  mutate(
    race = "all_races",
    sexes = "male"
  )

allraces_male_2002 <- allraces_male_2002 %>%
  mutate(
    race = "all_races",
    sexes = "male"
  )
allraces_male_2002_merged <- full_join(allraces_male_2002, allraces_male_2002_2)

allraces_male_2002_merged <- allraces_male_2002_merged %>% slice(-(1:6))

allraces_male_2002_merged <- allraces_male_2002_merged %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )


allraces_male_2002_merged <- allraces_male_2002_merged %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

allraces_male_2002_merged_2 <- allraces_male_2002_merged %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )


allraces_male_2002_merged_3 <- allraces_male_2002_merged_2 %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
split_row_8 = 131
split_row_9 = 147
allraces_female_2002 <-educational_data_2002 %>% slice(split_row_8:split_row_9)

split_row_10 = 164
split_row_11 = 180
allraces_female_2002_2 <-educational_data_2002 %>% slice(split_row_10:split_row_11)

#renaming the columns for the first half of the allraces/female 2002 data
allraces_female_2002 <- allraces_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the allraces/female 2002 data

allraces_female_2002_2 <- allraces_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

allraces_female_2002_2 <- allraces_female_2002_2 %>%
  mutate(
    race = "all_races",
    sexes = "female"
  )

allraces_female_2002 <- allraces_female_2002 %>%
  mutate(
    race = "all_races",
    sexes = "female"
  )
allraces_female_2002_wide <- full_join(allraces_female_2002, allraces_female_2002_2)

allraces_female_2002_wide <- allraces_female_2002_wide %>% slice(-(1:6))

allraces_female_2002_wide <- allraces_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

allraces_female_2002_wide <- allraces_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

allraces_female_2002_long <- allraces_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

allraces_female_2002_long <- allraces_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping


# Non-Hispanic White

split_row_12 = 197
split_row_13 = 213
nonhispanicwhite_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_12:split_row_13)

split_row_14 = 230
split_row_15 = 246
nonhispanicwhite_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_14:split_row_15)

#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 data
nonhispanicwhite_bothsexes_2002 <- nonhispanicwhite_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 data

nonhispanicwhite_bothsexes_2002_2 <- nonhispanicwhite_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicwhite_bothsexes_2002 <- nonhispanicwhite_bothsexes_2002 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "both_sexes"
  )

nonhispanicwhite_bothsexes_2002_2 <- nonhispanicwhite_bothsexes_2002_2 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "both_sexes"
  )
nonhispanicwhite_bothsexes_2002_merged <- full_join(nonhispanicwhite_bothsexes_2002, nonhispanicwhite_bothsexes_2002_2)

nonhispanicwhite_bothsexes_2002_merged <- nonhispanicwhite_bothsexes_2002_merged %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicwhite_bothsexes_2002_merged <- nonhispanicwhite_bothsexes_2002_merged %>% slice(-(1:6))

nonhispanicwhite_bothsexes_2002_wide<- nonhispanicwhite_bothsexes_2002_merged %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

nonhispanicwhite_bothsexes_2002_long <- nonhispanicwhite_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

nonhispanicwhite_bothsexes_2002_long <- nonhispanicwhite_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_16 = 263
split_row_17 = 279
nonhispanicwhite_male_2002 <-educational_data_2002 %>% slice(split_row_16:split_row_17)

split_row_18 = 296
split_row_19 = 312
nonhispanicwhite_male_2002_2 <-educational_data_2002 %>% slice(split_row_18:split_row_19)


#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 data
nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 data

nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2 %>% slice(-(1:6))
nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002 %>% slice(-(1:6))

nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "male"
  )

nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "male"
  )
nonhispanicwhite_male_2002_wide <- full_join(nonhispanicwhite_male_2002, nonhispanicwhite_male_2002_2)

nonhispanicwhite_male_2002_wide <- nonhispanicwhite_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicwhite_male_2002_wide<- nonhispanicwhite_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
nonhispanicwhite_male_2002_long <- nonhispanicwhite_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

nonhispanicwhite_male_2002_long <- nonhispanicwhite_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_20 = 329
split_row_21 = 345

nonhispanicwhite_female_2002 <-educational_data_2002 %>% slice(split_row_20:split_row_21)

split_row_22 = 362
split_row_23 = 378

nonhispanicwhite_female_2002_2 <-educational_data_2002 %>% slice(split_row_22:split_row_23)

#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 data
nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))
#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 data

nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2 %>% slice(-(1:6))
nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002 %>% slice(-(1:6))

nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "female"
  )

nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2 %>%
  mutate(
    race = "non_hispanic_white",
    sexes = "female"
  )
nonhispanicwhite_female_2002_wide <- full_join(nonhispanicwhite_female_2002, nonhispanicwhite_female_2002_2)

nonhispanicwhite_female_2002_wide <- nonhispanicwhite_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicwhite_female_2002_wide<- nonhispanicwhite_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

nonhispanicwhite_female_2002_long <- nonhispanicwhite_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

nonhispanicwhite_female_2002_long <- nonhispanicwhite_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
# Non-Hispanic Black

split_row_24 = 395
split_row_25 = 411

nonhispanicblack_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_24:split_row_25)

split_row_26 = 428
split_row_27 = 444
nonhispanicblack_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_26:split_row_27)

nonhispanicblack_bothsexes_2002 <- nonhispanicblack_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

nonhispanicblack_bothsexes_2002_2 <- nonhispanicblack_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicblack_bothsexes_2002_2 <- nonhispanicblack_bothsexes_2002_2 %>% slice(-(1:6))
nonhispanicblack_bothsexes_2002 <- nonhispanicblack_bothsexes_2002 %>% slice(-(1:6))

nonhispanicblack_bothsexes_2002 <- nonhispanicblack_bothsexes_2002 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "both_sexes"
  )

nonhispanicblack_bothsexes_2002_2 <- nonhispanicblack_bothsexes_2002_2 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "both_sexes"
  )
nonhispanicblack_bothsexes_2002_wide <- full_join(nonhispanicblack_bothsexes_2002, nonhispanicblack_bothsexes_2002_2)

nonhispanicblack_bothsexes_2002_wide <- nonhispanicblack_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicblack_bothsexes_2002_wide<- nonhispanicblack_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

nonhispanicblack_bothsexes_2002_long <- nonhispanicblack_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

nonhispanicblack_bothsexes_2002_long <- nonhispanicblack_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_28 = 461
split_row_29 = 477
nonhispanicblack_male_2002 <-educational_data_2002 %>% slice(split_row_28:split_row_29)

split_row_30 = 494
split_row_31 = 510
nonhispanicblack_male_2002_2 <-educational_data_2002 %>% slice(split_row_30:split_row_31)

nonhispanicblack_male_2002 <- nonhispanicblack_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

nonhispanicblack_male_2002_2 <- nonhispanicblack_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicblack_male_2002_2 <- nonhispanicblack_male_2002_2 %>% slice(-(1:6))
nonhispanicblack_male_2002 <- nonhispanicblack_male_2002 %>% slice(-(1:6))

nonhispanicblack_male_2002 <- nonhispanicblack_male_2002 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "male"
  )

nonhispanicblack_male_2002_2 <- nonhispanicblack_male_2002_2 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "male"
  )
nonhispanicblack_male_2002_wide <- full_join(nonhispanicblack_male_2002, nonhispanicblack_male_2002_2)

nonhispanicblack_male_2002_wide <- nonhispanicblack_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicblack_male_2002_wide<- nonhispanicblack_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

nonhispanicblack_male_2002_long <- nonhispanicblack_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

nonhispanicblack_male_2002_long <- nonhispanicblack_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_32 = 527
split_row_33 = 543
nonhispanicblack_female_2002 <-educational_data_2002 %>% slice(split_row_32:split_row_33)

split_row_34 = 560
split_row_35 = 576
nonhispanicblack_female_2002_2 <-educational_data_2002 %>% slice(split_row_34:split_row_35)

nonhispanicblack_female_2002 <- nonhispanicblack_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

nonhispanicblack_female_2002_2 <- nonhispanicblack_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

nonhispanicblack_female_2002 <- nonhispanicblack_female_2002 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "female"
  )

nonhispanicblack_female_2002_2 <- nonhispanicblack_female_2002_2 %>%
  mutate(
    race = "non_hispanic_black",
    sexes = "female"
  )


nonhispanicblack_female_2002_wide <- full_join(nonhispanicblack_female_2002, nonhispanicblack_female_2002_2)

nonhispanicblack_female_2002_wide <- nonhispanicblack_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

nonhispanicblack_female_2002_wide <- nonhispanicblack_female_2002_wide %>% slice(-(1:6))

nonhispanicblack_female_2002_wide<- nonhispanicblack_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

nonhispanicblack_female_2002_long <- nonhispanicblack_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

nonhispanicblack_female_2002_long <- nonhispanicblack_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
# Asian

split_row_36 = 593
split_row_37 = 609
asian_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_36:split_row_37)

split_row_38 = 626
split_row_39 = 642
asian_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_38:split_row_39)

asian_bothsexes_2002 <- asian_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

asian_bothsexes_2002_2 <- asian_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

asian_bothsexes_2002 <- asian_bothsexes_2002 %>%
  mutate(
    race = "asian",
    sexes = "both_sexes"
  )

asian_bothsexes_2002_2 <- asian_bothsexes_2002_2 %>%
  mutate(
    race = "asian",
    sexes = "both_sexes"
  )

asian_bothsexes_2002_wide <- full_join(asian_bothsexes_2002, asian_bothsexes_2002_2)

asian_bothsexes_2002_wide <- asian_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

asian_bothsexes_2002_wide <- asian_bothsexes_2002_wide %>% slice(-(1:6))

asian_bothsexes_2002_wide<- asian_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

asian_bothsexes_2002_long <- asian_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

asian_bothsexes_2002_long <- asian_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_40 = 659
split_row_41 = 675
asian_male_2002 <-educational_data_2002 %>% slice(split_row_40:split_row_41)

split_row_42 = 692
split_row_43 = 708

asian_male_2002_2 <-educational_data_2002 %>% slice(split_row_42:split_row_43)

asian_male_2002 <- asian_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

asian_male_2002_2 <- asian_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

asian_male_2002 <- asian_male_2002 %>%
  mutate(
    race = "asian",
    sexes = "male"
  )

asian_male_2002_2 <- asian_male_2002_2 %>%
  mutate(
    race = "asian",
    sexes = "male"
  )

asian_male_2002_wide <- full_join(asian_male_2002, asian_male_2002_2)

asian_male_2002_wide <- asian_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

asian_male_2002_wide <- asian_male_2002_wide %>% slice(-(1:6))

asian_male_2002_wide<- asian_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

asian_male_2002_long <- asian_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
asian_male_2002_long <- asian_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_44 = 725
split_row_45 = 741
asian_female_2002 <-educational_data_2002 %>% slice(split_row_44:split_row_45)


split_row_46 = 758
split_row_47 = 774
asian_female_2002_2 <-educational_data_2002 %>% slice(split_row_46:split_row_47)

asian_female_2002 <- asian_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

asian_female_2002_2 <- asian_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

asian_female_2002 <- asian_female_2002 %>%
  mutate(
    race = "asian",
    sexes = "female"
  )

asian_female_2002_2 <- asian_female_2002_2 %>%
  mutate(
    race = "asian",
    sexes = "female"
  )

asian_female_2002_wide <- full_join(asian_female_2002, asian_female_2002_2)

asian_female_2002_wide <- asian_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

asian_female_2002_wide <- asian_female_2002_wide %>% slice(-(1:6))

asian_female_2002_wide<- asian_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

asian_female_2002_long <- asian_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )


asian_female_2002_long <- asian_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
# Hispanic

split_row_48 = 791
split_row_49 = 807
hispanic_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_48:split_row_49)

split_row_50 = 824
split_row_51 = 840

hispanic_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_50:split_row_51)


hispanic_bothsexes_2002 <- hispanic_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

hispanic_bothsexes_2002_2 <- hispanic_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

hispanic_bothsexes_2002 <- hispanic_bothsexes_2002 %>%
  mutate(
    race = "hispanic",
    sexes = "both_sexes"
  )

hispanic_bothsexes_2002_2 <- hispanic_bothsexes_2002_2 %>%
  mutate(
    race = "hispanic",
    sexes = "both_sexes"
  )

hispanic_bothsexes_2002_wide <- full_join(hispanic_bothsexes_2002, hispanic_bothsexes_2002_2)

hispanic_bothsexes_2002_wide <- hispanic_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

hispanic_bothsexes_2002_wide <- hispanic_bothsexes_2002_wide %>% slice(-(1:6))

hispanic_bothsexes_2002_wide<- hispanic_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
hispanic_bothsexes_2002_long <- hispanic_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
hispanic_bothsexes_2002_long <- hispanic_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
split_row_52 = 857
split_row_53 = 873

hispanic_male_2002 <-educational_data_2002 %>% slice(split_row_52:split_row_53)

split_row_54 = 890
split_row_55 = 906

hispanic_male_2002_2 <-educational_data_2002 %>% slice(split_row_54:split_row_55)

hispanic_male_2002 <- hispanic_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

hispanic_male_2002_2 <- hispanic_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )

hispanic_male_2002 <- hispanic_male_2002 %>%
  mutate(
    race = "hispanic",
    sexes = "male"
  )

hispanic_male_2002_2 <- hispanic_male_2002_2 %>%
  mutate(
    race = "hispanic",
    sexes = "male"
  )

hispanic_male_2002_wide <- full_join(hispanic_male_2002, hispanic_male_2002_2)

hispanic_male_2002_wide <- hispanic_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

hispanic_male_2002_wide <- hispanic_male_2002_wide %>% slice(-(1:6))
hispanic_male_2002_wide<- hispanic_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
hispanic_male_2002_long <- hispanic_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
hispanic_male_2002_long <- hispanic_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_56 = 923
split_row_57 = 939
hispanic_female_2002 <-educational_data_2002 %>% slice(split_row_56:split_row_57)


split_row_58 = 956
split_row_59 = 972
hispanic_female_2002_2 <-educational_data_2002 %>% slice(split_row_58:split_row_59)


hispanic_female_2002 <- hispanic_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

hispanic_female_2002_2 <- hispanic_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
hispanic_female_2002 <- hispanic_female_2002 %>%
  mutate(
    race = "hispanic",
    sexes = "female"
  )

hispanic_female_2002_2 <- hispanic_female_2002_2 %>%
  mutate(
    race = "hispanic",
    sexes = "female"
  )

hispanic_female_2002_wide <- full_join(hispanic_female_2002, hispanic_female_2002_2)

hispanic_female_2002_wide <- hispanic_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

hispanic_female_2002_wide <- hispanic_female_2002_wide %>% slice(-(1:6))
hispanic_female_2002_wide<- hispanic_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
hispanic_female_2002_long <- hispanic_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

hispanic_female_2002_long <- hispanic_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
#White

split_row_60 = 989
split_row_61 = 1005
white_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_60:split_row_61)

split_row_62 = 1022
split_row_63 = 1038
white_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_62:split_row_63)

white_bothsexes_2002 <- white_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

white_bothsexes_2002_2 <- white_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
white_bothsexes_2002 <- white_bothsexes_2002 %>%
  mutate(
    race = "white",
    sexes = "both_sexes"
  )

white_bothsexes_2002_2 <- white_bothsexes_2002_2 %>%
  mutate(
    race = "white",
    sexes = "both_sexes"
  )

white_bothsexes_2002_wide <- full_join(white_bothsexes_2002, white_bothsexes_2002_2)

white_bothsexes_2002_wide <- white_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

white_bothsexes_2002_wide <- white_bothsexes_2002_wide %>% slice(-(1:6))


white_bothsexes_2002_wide<- white_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

white_bothsexes_2002_long <- white_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

white_bothsexes_2002_long <- white_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_64 = 1055
split_row_65 = 1071
white_male_2002 <-educational_data_2002 %>% slice(split_row_64:split_row_65)


split_row_66 = 1088
split_row_67 = 1104
white_male_2002_2 <-educational_data_2002 %>% slice(split_row_66:split_row_67)


white_male_2002 <- white_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

white_male_2002_2 <- white_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
white_male_2002 <- white_male_2002 %>%
  mutate(
    race = "white",
    sexes = "male"
  )

white_male_2002_2 <- white_male_2002_2 %>%
  mutate(
    race = "white",
    sexes = "male"
  )

white_male_2002_wide <- full_join(white_male_2002, white_male_2002_2)

white_male_2002_wide <- white_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

white_male_2002_wide <- white_male_2002_wide %>% slice(-(1:6))

white_male_2002_wide<- white_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 

white_male_2002_long <- white_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
white_male_2002_long <- white_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_68 = 1121
split_row_69 = 1137
white_female_2002 <-educational_data_2002 %>% slice(split_row_68:split_row_69)


split_row_70 = 1154
split_row_71 = 1170
white_female_2002_2 <-educational_data_2002 %>% slice(split_row_70:split_row_71)

white_female_2002 <- white_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

white_female_2002_2 <- white_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
white_female_2002 <- white_female_2002 %>%
  mutate(
    race = "white",
    sexes = "female"
  )

white_female_2002_2 <- white_female_2002_2 %>%
  mutate(
    race = "white",
    sexes = "female"
  )

white_female_2002_wide <- full_join(white_female_2002, white_female_2002_2)

white_female_2002_wide <- white_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

white_female_2002_wide <- white_female_2002_wide %>% slice(-(1:6))
white_female_2002_wide<- white_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
white_female_2002_long <- white_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
white_female_2002_long <- white_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
split_row_72 = 1187
split_row_73 = 1203
justblack_bothsexes_2002 <-educational_data_2002 %>% slice(split_row_72:split_row_73)

split_row_74 = 1220
split_row_75 = 1236
justblack_bothsexes_2002_2 <-educational_data_2002 %>% slice(split_row_74:split_row_75)

justblack_bothsexes_2002 <- justblack_bothsexes_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

justblack_bothsexes_2002_2 <- justblack_bothsexes_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
## Black 

justblack_bothsexes_2002 <- justblack_bothsexes_2002 %>%
  mutate(
    race = "black",
    sexes = "both_sexes"
  )

justblack_bothsexes_2002_2 <- justblack_bothsexes_2002_2 %>%
  mutate(
    race = "black",
    sexes = "both_sexes"
  )

justblack_bothsexes_2002_wide <- full_join(justblack_bothsexes_2002, justblack_bothsexes_2002_2)

justblack_bothsexes_2002_wide <- justblack_bothsexes_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

justblack_bothsexes_2002_wide <- justblack_bothsexes_2002_wide %>% slice(-(1:6))
justblack_bothsexes_2002_wide<- justblack_bothsexes_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  )) 
justblack_bothsexes_2002_long <- justblack_bothsexes_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )
justblack_bothsexes_2002_long <- justblack_bothsexes_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_76 = 1253
split_row_77 = 1269
justblack_male_2002 <-educational_data_2002 %>% slice(split_row_76:split_row_77)

split_row_78 = 1286
split_row_79 = 1302
justblack_male_2002_2 <-educational_data_2002 %>% slice(split_row_78:split_row_79)


justblack_male_2002 <- justblack_male_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

justblack_male_2002_2 <- justblack_male_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )
justblack_male_2002 <- justblack_male_2002 %>%
  mutate(
    race = "black",
    sexes = "male"
  )

justblack_male_2002_2 <- justblack_male_2002_2 %>%
  mutate(
    race = "black",
    sexes = "male"
  )

justblack_male_2002_wide <- full_join(justblack_male_2002, justblack_male_2002_2)

justblack_male_2002_wide <- justblack_male_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

justblack_male_2002_wide <- justblack_male_2002_wide %>% slice(-(1:6))
justblack_male_2002_wide<- justblack_male_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))
justblack_male_2002_long <- justblack_male_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

justblack_male_2002_long <- justblack_male_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping

split_row_80 = 1319
split_row_81 = 1335
justblack_female_2002 <-educational_data_2002 %>% slice(split_row_80:split_row_81)


split_row_82 = 1352
split_row_83 = 1368
justblack_female_2002_2 <-educational_data_2002 %>% slice(split_row_82:split_row_83)

justblack_female_2002 <- justblack_female_2002%>%
  set_names(c(
    "age_group",
    "total",
    "none",
    "1st-4th_grade",
    "5th-6th_grade",
    "7th-8th_grade",
    "9th_grade",
    "10th_grade",
    "11th_grade",
    "HS_graduate"
  ))

justblack_female_2002_2 <- justblack_female_2002_2%>%
  set_names(c(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree",
    "drop",
    "drop"
    )) %>%
  select(
    "age_group",
    "some_college",
    "associates_degree_occupational",
    "associates_degree_academic",
    "bachelors_degree",
    "masters_degree",
    "professional_degree",
    "doctoral_degree"
  )


justblack_female_2002 <- justblack_female_2002 %>%
  mutate(
    race = "black",
    sexes = "female"
  )

justblack_female_2002_2 <- justblack_female_2002_2 %>%
  mutate(
    race = "black",
    sexes = "female"
  )

justblack_female_2002_wide <- full_join(justblack_female_2002, justblack_female_2002_2)

justblack_female_2002_wide <- justblack_female_2002_wide %>%
  mutate(total = as.numeric(total),
         none = as.numeric(none),
         `1st-4th_grade` = as.numeric(`1st-4th_grade`),
         `5th-6th_grade` = as.numeric(`5th-6th_grade`),
         `7th-8th_grade` = as.numeric(`7th-8th_grade`),
         `9th_grade` = as.numeric(`9th_grade`),
         `10th_grade` = as.numeric(`10th_grade`),
         `11th_grade` = as.numeric(`11th_grade`),
         `HS_graduate` = as.numeric(`HS_graduate`),
         `some_college` = as.numeric(`some_college`),
         `associates_degree_occupational` = as.numeric(`associates_degree_occupational`),
         `associates_degree_academic` = as.numeric(`associates_degree_academic`),
         `bachelors_degree` = as.numeric(`bachelors_degree`),
         `masters_degree` = as.numeric(`masters_degree`),
         professional_degree = as.numeric(professional_degree),
         doctoral_degree = as.numeric(doctoral_degree),
         year = 2002
  )

justblack_female_2002_wide <- justblack_female_2002_wide %>% slice(-(1:6))

justblack_female_2002_wide<- justblack_female_2002_wide %>%
  mutate(age_group = case_when(
    age_group == "25 to 29 years" ~ "25_to_34",
    age_group == "30 to 34 years" ~ "25_to_34",
    age_group == "35 to 39 years" ~ "35_to_54",
    age_group == "40 to 44 years" ~ "35_to_54",
    age_group == "45 to 49 years" ~ "35_to_54",
    age_group == "50 to 54 years" ~ "35_to_54",
    age_group == "55 to 59 years" ~ "55_plus",
    age_group == "60 to 64 years" ~ "55_plus",
    age_group == "65 to 69 years" ~ "55_plus",
    age_group == "70 to 74 years" ~ "55_plus",
    age_group == "75 years and over" ~ "55_plus"
  ))

justblack_female_2002_long <- justblack_female_2002_wide %>%
  pivot_longer(
    cols = c(none, 
             `1st-4th_grade`,
             `5th-6th_grade`,
             `7th-8th_grade`,
             `9th_grade`,
             `10th_grade`,
             `11th_grade`,
             `HS_graduate`,
             some_college,
             associates_degree_occupational,
             associates_degree_academic,
             bachelors_degree,
             masters_degree,
             professional_degree,
             doctoral_degree
             ),
    names_to = "educational_attainment",
    values_to = "count"
  )

justblack_female_2002_long <- justblack_female_2002_long %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "count" ~ "Count",
    educational_attainment == "none" ~"None"
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE),  # Sum the counts
    .groups = 'drop'
    )# Remove grouping
# Joining 2002

#all races
joined_2002 <- full_join(allraces_bothsexes_2002_merged_3, allraces_male_2002_merged_3)
joined_2002 <- full_join(joined_2002, allraces_female_2002_long)
#non hispanic white
joined_2002 <- full_join(joined_2002, nonhispanicwhite_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicwhite_male_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicwhite_female_2002_long)

#asian
joined_2002 <- full_join(joined_2002, asian_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, asian_female_2002_long)
joined_2002 <- full_join(joined_2002, asian_male_2002_long)

#hispanic
joined_2002 <- full_join(joined_2002, hispanic_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, hispanic_male_2002_long)
joined_2002 <- full_join(joined_2002, hispanic_female_2002_long)

#white
joined_2002 <- full_join(joined_2002, white_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, white_male_2002_long)
joined_2002 <- full_join(joined_2002, white_female_2002_long)

#test
joined_2002 <- full_join(joined_2002, nonhispanicblack_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicblack_male_2002_long)
joined_2002 <- full_join(joined_2002, nonhispanicblack_female_2002_long)


#black (joined w/ non-hispanic black)
joined_2002 <- full_join(joined_2002, justblack_bothsexes_2002_long)
joined_2002 <- full_join(joined_2002, justblack_female_2002_long)
joined_2002 <- full_join(joined_2002, justblack_male_2002_long)

test_join_2002 <- joined_2002 %>%
  filter(!is.na(age_group)) %>%
           filter(!is.na(educational_attainment)) %>%
  mutate(sexes = case_when(
    sexes == "both_sexes" ~ 2,
    sexes == "female" ~ 1,
    sexes == "male" ~ 0
   ))
# Load 2001 data
educational_data_2001 <- read_excel(
  here::here("data_raw", "2001_Educational_Data.xlsx"),
  skip = 5) %>%
mutate(across(everything(), ~replace(., . %in% c("-", "."), "0")))

# Asian, both sexes

part1_asian_both <- educational_data_2001 %>%
  slice(595:616) %>%
  select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))

# Advanced degrees 
part2_asian_both <- educational_data_2001 %>%
  slice(628:649) %>%
  select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))

asian_2001 <- full_join(part1_asian_both, part2_asian_both, by="age_group") %>%
  mutate(race="asian", sexes="both_sexes") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational,associates_degree_academic,bachelors_degree,masters_degree,professional_degree,doctoral_degree), names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Asian, male
part1_asian_male <- educational_data_2001 %>%
  slice(661:682) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_asian_male <- educational_data_2001 %>%
  slice(694:715) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
asian_male_2001 <- full_join(part1_asian_male, part2_asian_male, by="age_group") %>%
  mutate(race="asian", sexes="male") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Asian, female
part1_asian_female <- educational_data_2001 %>%
  slice(727:748) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_asian_female <- educational_data_2001 %>%
  slice(760:781) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
asian_female_2001 <- full_join(part1_asian_female, part2_asian_female, by="age_group") %>%
  mutate(race="asian", sexes="female") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# White, both sexes

part1_white_both <- educational_data_2001 %>%
  slice(991:1012) %>%
  select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))

# Advanced degrees 
part2_white_both <- educational_data_2001 %>%
  slice(1024:1045) %>%
  select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))

white_2001 <- full_join(part1_white_both, part2_white_both, by="age_group") %>%
  mutate(race="white", sexes="both_sexes") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational,associates_degree_academic,bachelors_degree,masters_degree,professional_degree,doctoral_degree), names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# White, male
part1_white_male <- educational_data_2001 %>%
  slice(1057:1078) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_white_male <- educational_data_2001 %>%
  slice(1090:1111) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
white_male_2001 <- full_join(part1_white_male, part2_white_male, by="age_group") %>%
  mutate(race="white", sexes="male") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# White, female

part1_white_female <- educational_data_2001 %>%
  slice(1123:1144) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade",
              "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_white_female <- educational_data_2001 %>%
  slice(1156:1177) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational",
              "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
white_female_2001 <- full_join(part1_white_female, part2_white_female, by="age_group") %>%
  mutate(race="white", sexes="female") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Black, both sexes

part1_black_both <- educational_data_2001 %>%
  slice(1189:1210) %>%
  select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))

# Advanced degrees 
part2_black_both <- educational_data_2001 %>%
  slice(1222:1243) %>%
  select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))

black_2001 <- full_join(part1_black_both, part2_black_both, by="age_group") %>%
  mutate(race="black", sexes="both_sexes") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational,associates_degree_academic,bachelors_degree,masters_degree,professional_degree,doctoral_degree), names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Black, male

part1_black_male <- educational_data_2001 %>%
  slice(1255:1276) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade", "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_black_male <- educational_data_2001 %>%
  slice(1288:1309) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
black_male_2001 <- full_join(part1_black_male, part2_black_male, by="age_group") %>%
  mutate(race="black", sexes="male") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)

# Black, female

part1_black_female <- educational_data_2001 %>%
  slice(1321:1342) %>% select(1:10) %>%
  set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade", "7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate"))
part2_black_female <- educational_data_2001 %>%
  slice(1354:1375) %>% select(1:8) %>%
  set_names(c("age_group","some_college","associates_degree_occupational", "associates_degree_academic","bachelors_degree","masters_degree",
              "professional_degree","doctoral_degree"))
black_female_2001 <- full_join(part1_black_female, part2_black_female, by="age_group") %>%
  mutate(race="black", sexes="female") %>%
  mutate(across(-c(age_group,race,sexes), as.numeric)) %>%
  pivot_longer(cols=c(total,none,`1st-4th_grade`,`5th-6th_grade`,`7th-8th_grade`,`9th_grade`,`10th_grade`,`11th_grade`,HS_graduate, some_college,associates_degree_occupational, associates_degree_academic,bachelors_degree,masters_degree,
                      professional_degree,doctoral_degree),
               names_to="educational_attainment", values_to="count") %>%
  mutate(year=2001)


# Joining 2001 data

test_join_2001 <- bind_rows(
  asian_2001, asian_male_2001, asian_female_2001, white_2001, white_male_2001, white_female_2001,
black_2001, black_male_2001, black_female_2001
) %>%
  mutate(age_group = case_when(
    age_group %in% c("25 to 29 years", "30 to 34 years") ~ "25_to_34",
    age_group %in% c("35 to 39 years", "40 to 44 years", "45 to 49 years", "50 to 54 years") ~ "35_to_54",
    age_group %in% c("55 to 59 years", "60 to 64 years", "65 to 69 years", "70 to 74 years", "75 years and over") ~ "55_plus"
  )) %>%
  filter(!is.na(age_group)) %>%  # Remove any age groups under 25
  # Convert sexes to numeric
  mutate(sexes = case_when(
    sexes == "both_sexes" ~ 2,
    sexes == "female" ~ 1,
    sexes == "male" ~ 0
  )) %>%
  mutate(educational_attainment = case_when(
    educational_attainment %in% c("none","1st-4th_grade","5th-6th_grade", "7th-8th_grade","9th_grade","10th_grade","11th_grade") ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment %in% c("associates_degree_occupational","associates_degree_academic","bachelors_degree") ~ "Undergraduate Degree",
    educational_attainment %in% c("masters_degree","professional_degree") ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "total" ~ "Total",
    TRUE ~ educational_attainment
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm=TRUE), .groups="drop")

view(test_join_2001)


join_all_2001_2009 <- bind_rows(
  test_join_2001,
  test_join_2002,
  test_join_2003,
  test_join_2004,
  test_join_2005,
  test_join_2006,
  test_join_2007,
  test_join_2008,
  test_join_2009
)
#convert sexes back into strings
join_all_2001_2009 <- join_all_2001_2009 %>%
  mutate(sexes = as.integer(sexes)) %>%
  mutate(sexes = case_when(
    sexes == 2 ~ "both_sexes",
    sexes == 1 ~ "female",
    sexes == 0 ~ "male"
   ))


#collapse duplicate groups
join_all_2001_2009 <- join_all_2001_2009 %>%
  mutate(age_group = case_when(
  age_group %in% c("25 to 29 years", "30 to 34 years", "25_to_34") ~ "25_to_34",
  age_group %in% c("35 to 39 years", "40 to 44 years", "45 to 49 years", "50 to 54 years", "35_to_54") ~ "35_to_54",
  age_group %in% c("55 to 59 years", "60 to 64 years", "65 to 69 years", "70 to 74 years", "75 years and over", "55_plus") ~ "55_plus"
)) %>%
  filter(!is.na(age_group)) %>%
  mutate(educational_attainment = case_when(
    educational_attainment == "1st-4th_grade" ~ "Less than HS",
    educational_attainment == "5th-6th_grade" ~ "Less than HS",
    educational_attainment == "7th-8th_grade" ~ "Less than HS",
    educational_attainment == "9th_grade" ~ "Less than HS",
    educational_attainment == "10th_grade" ~ "Less than HS",
    educational_attainment == "11th_grade" ~ "Less than HS",
    educational_attainment == "HS_graduate" ~ "HS",
    educational_attainment == "HS" ~ "HS",
    educational_attainment == "some_college" ~ "Some College, No Degree",
    educational_attainment == "Some College, No Degree" ~ "Some College, No Degree",
    educational_attainment == "associates_degree_occupational" ~ "Undergraduate Degree",
    educational_attainment == "associates_degree_academic" ~ "Undergraduate Degree",
    educational_attainment == "bachelors_degree" ~ "Undergraduate Degree",
    educational_attainment == "Undergraduate Degree" ~ "Undergraduate Degree",
    educational_attainment == "masters_degree" ~ "Graduate Degree",
    educational_attainment == "professional_degree" ~ "Graduate Degree",
    educational_attainment == "Graduate Degree" ~ "Graduate Degree",
    educational_attainment == "doctoral_degree" ~ "Doctoral Degree",
    educational_attainment == "Doctoral Degree" ~ "Doctoral Degree",
    educational_attainment == "total" ~ "Total",
    educational_attainment == "Total" ~ "Total",
    educational_attainment == "none" ~"Less than HS",
    educational_attainment == "None" ~"Less than HS",
    educational_attainment == "Less than HS" ~"Less than HS"
  ))%>%
  mutate(sexes = case_when(
    sexes == "both_sexes" ~ "Total",
    TRUE ~ sexes
  )) %>%
  group_by(sexes, age_group, educational_attainment, race, year) %>%
  summarize(count = sum(count, na.rm = TRUE), .groups = 'drop')

# Reshape to match 2010-2024 format (one demographic column)
data_2001_2009_formatted <- join_all_2001_2009 %>%
  filter(sexes %in% c("male", "female")) %>%
  group_by(sexes, educational_attainment, year) %>%
  summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
  rename(demographic = sexes) %>%
  
  # Age breakdown (sum across all races, using Total for sexes)
  bind_rows(
    join_all_2001_2009 %>%
      filter(sexes == "Total") %>%
      group_by(age_group, educational_attainment, year) %>%
      summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
      rename(demographic = age_group)
  ) %>%
  
  # Race breakdown (sum across all ages, using Total for sexes, exclude duplicates)
  bind_rows(
    join_all_2001_2009 %>%
      filter(sexes == "Total", !race %in% c("all_races", "non_hispanic_black")) %>%
      group_by(race, educational_attainment, year) %>%
      summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
      rename(demographic = race)
  ) %>%
  
  # Total demographic (male + female combined, sum across all ages and races)
  bind_rows(
    join_all_2001_2009 %>%
      filter(sexes %in% c("male", "female")) %>%
      group_by(educational_attainment, year) %>%
      summarise(number = sum(count, na.rm = TRUE), .groups = "drop") %>%
      mutate(demographic = "total")
  ) %>%
  
  # Create attainment and years_of_school columns
  mutate(
    attainment = case_when(
      educational_attainment == "Total" ~ "Other",
      educational_attainment == "Less than HS" ~ "Less than High School",
      educational_attainment == "HS" ~ "High School Diploma",
      educational_attainment == "Some College, No Degree" ~ "Some College, No Degree",
      educational_attainment == "Undergraduate Degree" ~ "Bachelor's Degree",
      educational_attainment == "Graduate Degree" ~ "Master's Degree",
      educational_attainment == "Doctoral Degree" ~ "Doctoral Degree",
      TRUE ~ "Other"
    ),
    
    years_of_school = case_when(
      educational_attainment == "Total" ~ "Total",
      educational_attainment == "Less than HS" ~ "Less than high school graduate",
      educational_attainment == "HS" ~ "High school graduate",
      educational_attainment == "Some College, No Degree" ~ "Some college, no degree",
      educational_attainment == "Undergraduate Degree" ~ "Bachelor's degree",
      educational_attainment == "Graduate Degree" ~ "Graduate degree",
      educational_attainment == "Doctoral Degree" ~ "Doctoral degree",
      TRUE ~ educational_attainment
    )
  ) %>%
  
  rename(attainment_level = educational_attainment) %>%
  select(years_of_school, attainment, demographic, number, year, attainment_level)
# Combine 2001-2009 with 2010-2024
educational_data_combined <- bind_rows(
  data_2001_2009_formatted,
  educational_data_combined) %>% 
  mutate(number = as.integer(number)) %>% 
  filter(!demographic %in% c("non_hispanic_white", "hispanic")) %>%
  select(year, attainment_level, demographic, number)

view(educational_data_combined)
# Saving cleaned data
write_csv(educational_data_combined, here::here('data_processed','final_data.csv'))
summary(educational_data_combined)
table <- tibble(
  Variable = "number", 
  Mean = round(mean(educational_data_combined$number, na.rm = TRUE), 3),
  Median = round(median(educational_data_combined$number, na.rm = TRUE), 3),
  Std_Deviation = round(sd(educational_data_combined$number, na.rm = TRUE), 3),
  IQR = round(IQR(educational_data_combined$number, na.rm = TRUE), 3),
  Range = round(max(educational_data_combined$number, na.rm = TRUE) - 
                min(educational_data_combined$number, na.rm = TRUE), 3)
)

kable(table)
educational_data_combined %>%
  filter(demographic %in% c("male", "female"))  %>%
  ggplot(aes(x = demographic, y = number/1000, fill = demographic)) +
  geom_col(alpha = 0.7) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05))) +
  labs(
    title = "Distribution of Male and Female Respondents",
    subtitle = "Female population slighlty larger across all education levels, 2001 - 2024",
    caption = "Source: U.S Census Bureau",
    x = "Gender",
    y = "Count (in millions)"
  ) +
  theme_classic() +
  scale_fill_viridis_d()+
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13))
educational_data_combined %>%
    filter(year %in% c(2001, 2024)) %>%
    filter(attainment_level != "Total") %>%
    group_by(year, attainment_level) %>%
    summarise(number = sum(number), .groups = "drop") %>%
    mutate(
    attainment_level = fct_reorder2(attainment_level, year, desc(number)),
    year = as.factor(year),
    label = paste(attainment_level, '(', round(number/1000), ')'),
    label_left = ifelse(year == 2001, label, NA),
    label_right = ifelse(year == 2024, label, NA)
  ) %>%
  ggplot(aes(x = year, y = number/1000, group = attainment_level, color = attainment_level)) +
  geom_line(size = 1.2, alpha = 0.7) +
  geom_point(size = 2) +
  geom_text_repel(aes(label = label_left), size = 5,
    hjust = 1, nudge_x = -0.02,
    direction = "y", segment.color = "grey"
  ) +
  
  # Right-side labels (2024)
  geom_text_repel(aes(label = label_right), size = 5,
    hjust = 0, nudge_x = 0.02, nudge_y = 0.75,
    direction = "y", segment.color = "grey"
  ) + 
    scale_x_discrete(position = 'top', expand = expansion(mult = c(1, 1))) +
    scale_y_continuous(limits = c(0, 350)) +
  labs(
    title = "2001 vs 2024: The Big Picture",
    subtitle = "More Americans are Earning Degrees",
    x = NULL,
    y = "Count (millions)",
    caption = "Source: U.S. Census Bureau"
  ) +
  theme_minimal_grid() +
  theme(panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        legend.position = 'none',
        plot.caption = element_text(size = 13))+
  scale_color_viridis_d()
label_2003 <- "2003 CPS question redesign:\nShift to 'highest degree attained'"
label_2010 <- "2010 ACS transition:\nNew primary data source"

attainment_plot <- educational_data_combined %>%
    filter(attainment_level != "Total") %>%
    group_by(year, attainment_level) %>%
    summarise(total_number = sum(number), .groups = "drop") %>%
    ggplot(aes(x = year, y = total_number/1000, color = attainment_level, group = attainment_level)) +
    geom_line(size = 1) +
    geom_text_repel(data = . %>% filter(year == max(year)),
            aes(label = attainment_level),
            hjust = -0.1, size = 5, nudge_x = 0, direction = "y", size = 3, show.legend = FALSE) +
    geom_vline(xintercept = 2008, linetype = "dashed", color = "black") +
    annotate("text", x = 2008, y = 680, label = "2008 Great Recession", 
           vjust = 0, hjust = 0, size = 5, fontface = "bold") +
    geom_vline(xintercept = 2020, linetype = "dashed", color = "black") +
    annotate("text", x = 2020, y = 680, label = "COVID-19 Pandemic", 
           vjust = 0, hjust = 0, size = 5, fontface = "bold") +
  # Annotation for 2003
    geom_curve(data = data.frame(x = 2004, xend = 2002.5, y = 600, yend = 500),
    mapping = aes(x = x, xend = xend, y = y, yend = yend), inherit.aes = FALSE,
    color = 'grey40', size = 0.5, curvature = -0.1,arrow = arrow(length = unit(0.015, "npc"), type = "closed")) +
    geom_label(data = data.frame(x = 2001.5, y = 600, label = label_2003),
    mapping = aes(x = x, y = y, label = label), inherit.aes = FALSE, hjust = 0, lineheight = 0.9, size = 4) +
  # Annotation for 2010
    geom_curve(data = data.frame(x = 2011, xend = 2009.5, y = 480, yend = 330),
    mapping = aes(x = x, xend = xend, y = y, yend = yend),inherit.aes = FALSE,
    color = 'grey40', size = 0.7, curvature = -0.35, arrow = arrow(length = unit(0.01, "npc"), type = "closed")) +
    geom_label(data = data.frame(x = 2009.5, y = 480, label = label_2010),
    mapping = aes(x = x, y = y, label = label), inherit.aes = FALSE,
    hjust = 0, lineheight = 0.9, size = 4) +
    scale_x_continuous(
    breaks = seq(2001, 2025, 5),
    expand = expansion(add = c(1, 8))) +
    scale_y_continuous(limits = c(0, 685)) +
    theme_half_open(font_size = 18) +
    theme(legend.position = 'none',
          panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
    labs(x = 'Year',
       y = 'Count (millions)',
       title = 'Educational Attainment Trends, 2001 - 2024',
       subtitle = "Methodology changes in 2003 and 2010 explain sharp drops",
       caption = "Source: U.S Census Bureau") 

attainment_plot

percent_data <- educational_data_combined %>%
  filter(year %in% c(2001, 2024), attainment_level != "Total") %>%
  group_by(attainment_level, year) %>%
  summarise(total_number = sum(number), .groups = "drop") %>%
    pivot_wider(names_from = year, values_from = total_number) %>%
    mutate(percent_change = (`2024` - `2001`) / `2001` * 100)

ggplot(percent_data, aes(x = reorder(attainment_level, percent_change), 
                        y = percent_change, 
                        fill = percent_change > 0)) +
  geom_col(width = 0.7, show.legend = FALSE) +
  geom_text(aes(label = paste0(round(percent_change, 1), "%"),
                hjust = ifelse(abs(percent_change) < 5, 
                               ifelse(percent_change > 0, -0.1, 1.1), 
                               ifelse(percent_change > 0, 1.1, -0.1))), 
            color = "white", size = 4.5) +
    geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  annotate("text", x = 0, y = 0, label = "2001 baseline", 
           vjust = 0, hjust = 0, size = 5, fontface = "bold") +
  coord_flip() +
  scale_fill_manual(values = c("TRUE" = "steelblue", "FALSE" = "firebrick")) +
  labs(title = "Percentage Change in Educational Attainment (2001–2024)",
       subtitle = "Graduate and Doctoral Degrees more than doubled",
       x = "Attainment Level",
       y = "Percentage Change",
       caption = "Source: U.S Census Bureau") +
  theme_minimal_vgrid(font_size = 15) +
    theme(panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        legend.position = 'none',
        plot.caption = element_text(size = 13))

#devtools::install_github("liamgilbey/ggwaffle")
library(ggwaffle)
#install.packages("waffle")
library(waffle)
library(grid)
white_data <- educational_data_combined %>%
  filter(number != ".", demographic == "white", year %in% c(2001, 2024), attainment_level != "Total") %>%
  mutate(year = as.integer(year),
         attainment_level = factor(attainment_level,
                                   levels = c("Less than HS", "HS", "Some College, No Degree",
                                              "Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%
  group_by(year, attainment_level) %>%
  summarise(number = sum(as.numeric(number)), .groups = "drop") %>%
  group_by(year) %>%
  mutate(percent = number / sum(number) * 100,
         squares = round(percent))   # normalize to 100 squares

# Identify largest 
white_labels <- white_data %>%
  group_by(year) %>%
  slice_max(order_by = percent, n = 1) %>% 
  mutate(label = paste0(attainment_level, ":\n ", round(percent,1), "%"), size = 4)

# Plot
ggplot(white_data, aes(fill = attainment_level, values = squares)) +
  geom_waffle(color = "white", size = 0.5, n_rows = 10, flip = TRUE) +
  facet_wrap(~year, ncol = 2, strip.position = "top") +
  scale_fill_viridis_d(drop = FALSE) +
  theme_minimal() +
theme(panel.spacing = unit(4, "lines"),
      strip.text = element_text(size = 18, face = "bold"),
      panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "White Americans Shift from High School to College Degrees",
    subtitle = "High School dropout rate cut in half while graduate degrees rise",
    caption = "Each square = 1% of population \n\nSource: U.S Census Bureau",
    fill = "Degree"
  )
#install.packages("waffle")
library(waffle)
library(grid)
asian_data <- educational_data_combined %>%
  filter(number != ".", demographic == "asian", year %in% c(2001, 2024), attainment_level != "Total") %>%
  mutate(year = as.integer(year),
         attainment_level = factor(attainment_level,
                                   levels = c("Less than HS", "HS", "Some College, No Degree",
                                              "Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%
  group_by(year, attainment_level) %>%
  summarise(number = sum(as.numeric(number)), .groups = "drop") %>%
  group_by(year) %>%
  mutate(percent = number / sum(number) * 100,
         squares = round(percent))   # normalize to 100 squares

# Identify largest 
asian_labels <- asian_data %>%
  group_by(year) %>%
  slice_max(order_by = percent, n = 1) %>% 
  mutate(label = paste0(attainment_level, ":\n ", round(percent,1), "%"), size = 4)

# Plot
ggplot(asian_data, aes(fill = attainment_level, values = squares)) +
  geom_waffle(color = "white", size = 0.5, n_rows = 10, flip = TRUE) +
  facet_wrap(~year, ncol = 2, strip.position = "top") +
  scale_fill_viridis_d(drop = FALSE) +
  theme_minimal() +
theme(panel.spacing = unit(4, "lines"),
      strip.text = element_text(size = 18, face = "bold"),
      panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "Asian Population Sees Staggering Improvement",
    subtitle = "Graduate and Doctoral degrees nearly double",
    caption = "Each square = 1% of population \n\nSource: U.S Census Bureau",
    fill = "Degree"
  )
#install.packages("waffle")
library(waffle)
library(grid)
black_data <- educational_data_combined %>%
  filter(number != ".", demographic == "black", year %in% c(2001, 2024), attainment_level != "Total") %>%
  mutate(year = as.integer(year),
         attainment_level = factor(attainment_level,
                                   levels = c("Less than HS", "HS", "Some College, No Degree",
                                              "Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%
  group_by(year, attainment_level) %>%
  summarise(number = sum(as.numeric(number)), .groups = "drop") %>%
  group_by(year) %>%
  mutate(percent = number / sum(number) * 100,
         squares = round(percent))   # normalize to 100 squares

# Plot
ggplot(black_data, aes(fill = attainment_level, values = squares)) +
  geom_waffle(color = "white", size = 0.5, n_rows = 10, flip = TRUE) +
  facet_wrap(~year, ncol = 2, strip.position = "top") +
  scale_fill_viridis_d(drop = FALSE) +
  theme_minimal() +
theme(panel.spacing = unit(4, "lines"),
      strip.text = element_text(size = 18, face = "bold"),
      panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "Black Americans Make Major Educational Gains",
    subtitle = "Dropout rates falls, but college completion still trails other groups",
    caption = "Each square = ~1% of population \n\nSource: U.S Census Bureau",
    fill = "Degree"
  )
educational_percent <- educational_data_combined %>%
  filter(number != ".", year == 2024, attainment_level != "Total",
         demographic %in% c("white","black","asian")) %>%
  mutate(number = as.numeric(number)) %>%
  group_by(demographic, attainment_level) %>%
  summarise(total = sum(number), .groups = "drop") %>%
  group_by(demographic) %>%
  mutate(percent = round(total / sum(total) * 100, 2)) %>%
  ungroup() %>%
  group_by(attainment_level) %>%
  mutate(is_highest = percent == max(percent)) %>%
  ungroup() %>%
  mutate(fill_color = case_when(
    is_highest & demographic == "asian" ~ "asian",
    is_highest & demographic == "black" ~ "black",
    is_highest & demographic == "white" ~ "white",
    TRUE ~ "other"
  ),
,
  attainment_level = factor(attainment_level,
    levels = c("Doctoral Degree",
               "Graduate Degree",
               "Undergraduate Degree",
               "Some College, No Degree",
               "HS",
               "Less than HS")))

ggplot(educational_percent, aes(x = demographic, y = percent, fill = fill_color)) +
  geom_col(width = 0.7) +
  geom_text(
    data = educational_percent %>% filter(is_highest),
    aes(label = paste0(demographic, "\n", percent, "%")),
    vjust = -0.5,
    size = 4.5
  ) +
  facet_wrap(vars(attainment_level), nrow = 2) +   
  scale_fill_manual(
    values = c("asian" = "#440154", "black" = "#31688e", "white" = "#fde725", "other" = "grey80"),
    breaks = c("asian", "black", "white"),
    labels = c("Asian", "Black", "White")
  ) +
  scale_y_continuous(limits = c(0, 48), expand = expansion(mult = c(0, 0.05))) +
  theme_minimal_hgrid(font_size = 16) +
  theme(legend.position = "none",
panel.spacing = unit(4, "lines"),
panel.grid = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(size = 22, face = "bold"),
        plot.subtitle = element_text(size = 16),
        axis.title.x = element_text(size = 18),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        plot.caption = element_text(size = 13)) +
  labs(
    title = "2024: Asian Americans Outpace Other Groups in Advanced Level Degrees",
    subtitle = "Black Americans most likely to stop at high school",
    y = " Percentage (%) ",
    x = " ",
    caption = "Source: U.S Census Bureau"
  )

library(scales)
library(patchwork)

plot_data <- educational_data_combined %>%
  filter(attainment_level != "Total", demographic %in% c("male", "female"), year %in% c(2001, 2024)) %>%
  group_by(year, demographic) %>%
  mutate(total_by_gender = sum(number, na.rm = TRUE), percentage = (number / total_by_gender) * 100) %>%
  ungroup() %>%
  select(year, demographic, attainment_level, percentage)

# Calculating the differences (female % - male %)

differences <- plot_data %>%
  pivot_wider(names_from = demographic, values_from = percentage) %>%
  mutate(difference = female - male, diff_label = paste0(ifelse(difference > 0, "+", ""), round(difference, 1), "%"))

attainment_order <- c("Less than HS", "HS", "Some College, No Degree", 
                      "Undergraduate Degree", "Graduate Degree", "Doctoral Degree")

plot_data <- plot_data %>%
  mutate(attainment_level = factor(attainment_level, levels = attainment_order))

differences <- differences %>%
  mutate(attainment_level = factor(attainment_level, levels = attainment_order))
plot_data_2001 <- plot_data %>% filter(year == 2001)
plot_data_2024 <- plot_data %>% filter(year == 2024)
diff_2001 <- differences %>% filter(year == 2001)
diff_2024 <- differences %>% filter(year == 2024)

# 2001 plot
plot_2001 <- ggplot(plot_data_2001, aes(x = percentage, y = attainment_level)) +
  geom_segment(data = diff_2001, aes(x = male, xend = female, y = attainment_level, yend = attainment_level), color = "gray60", linewidth = 2) +
geom_point(aes(color = demographic), size = 5) +
    scale_color_viridis_d() +
  scale_x_continuous(labels = label_percent(scale = 1), breaks = seq(0, 50, 10), limits = c(0, 40)) +
labs(title = "2001", x = "Percentage", y = NULL) +
theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
axis.text.y = element_text(size = 12),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 11),
plot.margin = margin(10, 5, 10, 10),
legend.position = "bottom")

# 2001 difference section
diff_panel_2001 <- diff_2001 %>%
  ggplot(aes(x = 1, y = attainment_level)) +
  geom_text(aes(label = diff_label), size = 4, fontface = "bold") +
  labs(title = "Percentage \nDifference", x = NULL, y = NULL) +
  theme_void(base_size = 10) +
  theme(plot.title = element_text(face = "bold", size = 12, hjust = 0.5, margin = margin(b = 10)),
    plot.background = element_rect(fill = "gray90", color = NA),
    plot.margin = margin(10, 10, 10, 5))

# 2024 plot 
plot_2024 <- ggplot(plot_data_2024, aes(x = percentage, y = attainment_level)) +
  geom_segment(data = diff_2024, aes(x = male, xend = female, y = attainment_level, yend = attainment_level), color = "gray60", linewidth = 2) +
  geom_point(aes(color = demographic), size = 5) +
    scale_color_viridis_d() +
  scale_x_continuous(labels = label_percent(scale = 1), breaks = seq(0, 50, 10), limits = c(0, 40)) +
  labs(title = "2024", x = "Percentage", y = NULL) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", size = 16, hjust = 0.5), panel.grid.major.y = element_blank(), legend.position = "none", panel.grid.minor = element_blank(), axis.text.y = element_blank(), axis.text.x = element_text(size = 10), axis.title.x = element_text(size = 11), plot.margin = margin(10, 5, 10, 10))

# 2024 difference section
diff_panel_2024 <- diff_2024 %>%
  ggplot(aes(x = 1, y = attainment_level)) +
  geom_text(aes(label = diff_label), size = 4, fontface = "bold") +
  labs(title = "Percentage \nDifference", x = NULL, y = NULL) +
  theme_void(base_size = 10) +
  theme(plot.title = element_text(face = "bold", size = 12, hjust = 0.5, margin = margin(b = 10)), plot.background = element_rect(fill = "gray90", color = NA), plot.margin = margin(10, 10, 10, 5))

# Combining
final_plot <- (plot_2001 | diff_panel_2001 | plot_2024 | diff_panel_2024) +
    scale_color_viridis_d() +
    plot_layout(widths = c(3, 1, 3, 1)) +
    plot_annotation(
    title = "Women Overtake Men in Graduate Degree Attainment",
    subtitle = "By 2024, women surpass men in graduate degree attainment",
    caption = "Source: U.S. Census Bureau",
    theme = theme(
      plot.title = element_text(face = "bold", size = 20, hjust = 0.5),
      plot.subtitle = element_text(hjust = 0.5, size = 14),
      plot.caption = element_text(hjust = 1, size = 13)))
  

final_plot
educational_data_combined %>%
  filter(demographic %in% c("25_to_34", "35_to_54", "55_plus")) %>%
  filter(attainment_level != "Other",
         attainment_level != "Total") %>%
  mutate(attainment_level = factor(
    attainment_level, 
    levels = c("Doctoral Degree",
               "Graduate Degree",
               "Undergraduate Degree",
               "Some College, No Degree",
               "HS",
               "Less than HS")
    )) %>%
  mutate(demographic = factor(demographic,
    levels = c("25_to_34", "35_to_54", "55_plus"),
    labels = c("25 to 34", "35 to 54", "55 Plus")
  )) %>%
  group_by(demographic, attainment_level) %>%
  summarise(total = sum(number)) %>%
  ggplot(aes(x = demographic, y = total/1000, fill = demographic)) +
  geom_col(position = "dodge") +
  facet_wrap(~attainment_level)+
  coord_flip()+
  labs(
    title = "Midlife Americans Lead in College Degree Attainment",
    subtitle = "Advanced degrees peak in midlife; adults 55 years and older less likely to pursue higher education",
    x = "Age group ",
    y = "Count (millions)",
    fill = "Age",
    caption = "\nSource: U.S. Census Bureau") +
  theme_minimal_hgrid() +
  theme(axis.text.x = element_text(hjust = 1),
    panel.grid.minor = element_blank(),
    panel.spacing = unit(4, "lines")
  )+
  scale_fill_viridis_d() +
  theme(axis.text.x = element_text(hjust = 1),
        panel.grid.minor = element_blank(),
        plot.title = element_text(size = 20, face = "bold"),
        plot.subtitle = element_text(size = 17),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 18),
        axis.text = element_text(size = 15),
        legend.position = "none",
        plot.caption = element_text(size = 13)
  )
datatable(data_dictionary, caption = "Data Dictionary 1",
          options = list(pageLength = 5, lengthMenu = c(3, 4, 6, 12)))
datatable(data_dictionary2, caption = "Data Dictionary 2",
          options = list(pageLength = 5,lengthMenu = c(1, 2, 3, 4)))