Educational attainment by demographic in the United States from 2001 to 2024
Author
Leshauna Hartman, Maya Schmidt
Published
December 7, 2025
Introduction
Educational attainment is said to play a vital role in shaping an individual’s career opportunities and economic outcomes. Higher levels of education are often associated with higher wages and salaries, greater access to employment and a lowered risk of unemployment.
This project explores patterns and trends in educational attainment across demographic groups in the United States from 2001 to 2024, highlighting changes, if any, before and after major economic and social disruptions, including the Great Recession of 2008 and the COVID-19 pandemic. By examining how educational attainment has shifted across race, gender and age groups during this period, this analysis is significant as it can provide insight into the equity of educational attainment across demographics, illuminating whether educational opportunities have become more equitable over time or if persistent disparities remain.
The significance of this project goes beyond academic interest. Identifying and understanding trends can help policymakers make informed decisions to address disparities and close educational gaps, ensuring equitable access to higher education. Moreover, identifying which demographic groups have made educational gains, and which have been left behind, can help address systemic inequities in the American education system.
Research Question
How has educational attainment in the United States changed between 2001 and 2024 for different demographics ages 25 and over?
Sub-questions:
To fully address this question, this analysis examines several further interconnected sub-questions:
How do different racial groups compare in terms of educational attainment levels over time?
How do men and women differ in educational attainment trends?
How do different age groups vary in educational attainment?
How have educational attainment levels changed from 2001 to 2024?
Data Sources
The data for this project is sourced from the U.S. Census Bureau found here.
Datasets sourced here provide counts and percentages of individuals aged 25 years and older, broken down by educational attainment level and various demographic categories, spanning the years 2001 to 2024.
The data publication format varied slightly across the time period. Datasets from 2010 to 2024 were published as integrated annual reports, with the exclusion of 2003, while data from 2003 to 2009 were organized into separate files by race and ethnicity. The data for 2001 and 2002 were organized in separate tables by demographic within one spreadsheet. The U.S Census Bureau is widely recognized as a principal Federal agency responsible for the production data about the American people and economy. The Census Bureau provides information that is accurate and unbiased, achieving its objective by using reliable data sources and data products that are prepared and carefully reviewed. (U.S Census Bureau, 2021).
The datasets are derived from an official agency for labor market and demographic statistics. The data is collected through national surveys and undergoes careful review and preparation, enhancing its reliability and validity.
Data Dictionary
Two data dictionary tables are provided below that lists the variable names and their descriptions. These descriptions were adapted from two parts of the U.S Census Bureau’s website. The first can be seen here and the second here.
Code
data_dictionary <-tibble(Variable =c("Detailed years of school", "All races/ All people","Male(s)","Female(s)","25 to 34 years old","35 to 54 years old","55 years and older","White","Non-Hispanic White","Black","Asian","Hispanic (of any race)"), Description =c("Detailed years of school", "Total number of people surveyed", "Respondents who identify as male","Respondents who identify as female","Respondents aged 25 to 34 years","Respondents aged 35 to 54 years","Respondents aged 55 and older","A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. It includes people who indicate their race as “White” or report responses such as German, Irish, English, Italian, Lebanese, and Egyptian. The category also includes groups such as Polish, French, Iranian, Slavic, Cajun, Chaldean, etc", "A person having origins in any of the original peoples of Europe, the Middle East, or North Africa, and who does not identify as Hispanic or Latino","A person having origins in any of the Black racial groups of Africa. It includes people who indicate their race as “Black or African American” or report responses such as African American, Jamaican, Haitian, Nigerian, Ethiopian, or Somali. The category also includes groups such as Ghanaian, South African, Barbadian, Kenyan, Liberian, Bahamian, etc.","A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, India, China, the Philippine Islands, Japan, Korea, or Vietnam. It includes people who indicate their race as “Asian Indian,” “Chinese,” “Filipino,” “Korean,” “Japanese,” “Vietnamese,” and “Other Asian” or provide other detailed Asian responses such as Pakistani, Cambodian, Hmong, Thai, Bengali, Mien, etc.","A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin regardless of race." ))
For the years 2010-2024, all columns except Detailed years of school had further subheadings, number and percent.
Code
data_dictionary2 <-tibble(Variable =c("Number", "Percent"), Description =c("The count of respondents (in thousands) within that specific demographic group who have completed a particular level of education", "The proportion of a demographic group's total population that has completed a particular level of education" ),Variable_Type =c("double","character (2010 -2017, 2021, 2022, 2024), Double (2018 - 2020)"))
Data Analysis
Data Cleaning
To prepare the data for analysis, yearly datasets from 2001 to 2024 (excluding 2023) were imported. The 2023 data was not included as it was no longer available on the source website. Column names were renamed for clarity and readability. Placeholder values, such as “Z” or “-” were replaced with “0”, based on the dataset documentation indicating that these values represented zero counts. It is important to note that in some datasets such as 2020, there were values listed as “.”, with no clear description as to whether these represented zero or missing data. After replacing placeholders, missing values were removed and each dataset was reshaped from wide to long format using the pivot_longer function.
Specifically for the years 2010 to 2024, the demographic metrics were split into two categories: number(representing counts) and percent (representing percentages), using the names_pattern function. Following the conversion of counts to numeric and pivoting back to a wide format, a new variable, attainment was created to categorize educational attainment levels uniformly across the years. Each dataset then had a column added that listed its respective year. After cleaning each year, they were all merged into a single dataframe spanning 2010-2024.The percent variable was then removed from the combined dataset as there were errors present in the values from the original datasets.
For the earlier period (2001-2009), the data required a different cleaning approach because the Census Bureau published separate files for each race and ethnic category. Each file followed a similar structure but required careful extraction of rows. These datasets used row number indexing to assign sex categories and careful filtering to retain only specific age ranges needed that matches with the more recent years. The indentation markers on some of the years (“.” and “..”) were removed to clean age group label. These datasets were then merged with the 2010-2024 data. Finally, the educational attainment categories were consolidated through the creation of a new variable attainment_level, that combined related degree types together (for example, GED and High School Diploma combined into “HS” to represent High School, and Associate’s and Bachelor’s degrees into “Undergraduate Degree”).
Race and ethnicity categories were reduced to only include Asian, White and Black populations. This decision was made to ensure consistency across all years, and to avoid overlapping data present in categories such as “White” and “Non-Hispanic White”, which could lead to double-counting individuals in the analysis.
# Reshape to long formateducational_2024_tidy <- educational_2024_clean %>%pivot_longer(cols =-years_of_school,names_to =c("demographic", "value"),values_to ="count",names_pattern ="(.+)_(number|percent)" )
Code
# Convert count values to numeric # Reshape back to wide formateducational_2024_tidy <- educational_2024_tidy %>%mutate(count =as.numeric(count)) %>%pivot_wider(names_from = value,values_from = count)
Code
# Create a new variable to categorize educational attainment levels# Add yeareducational_2024_tidy <- educational_2024_tidy %>%mutate(attainment =case_when(str_detect(years_of_school, "no diploma") ~"Less than High School",str_detect(years_of_school, "GED") ~"GED",str_detect(years_of_school, "High school diploma") ~"High School Diploma",str_detect(years_of_school, "no degree") ~"Some College, No Degree",str_detect(years_of_school, "associate's") ~"Associate's Degree",str_detect(years_of_school, "Bachelor's") ~"Bachelor's Degree",str_detect(years_of_school, "no master's") ~"Bachelor's Degree",str_detect(years_of_school, "Master's") ~"Master's Degree",str_detect(years_of_school, "Professional") ~"Professional Degree",str_detect(years_of_school, "Doctorate") ~"Doctoral Degree",TRUE~"Other" )) %>%select(years_of_school, attainment, demographic, number, percent) %>%mutate(year =2024)
Previewing first few rows of cleaned data
Code
head(educational_2024_tidy)
#> # A tibble: 6 × 6
#> years_of_school attainment demographic number percent year
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Total Other total 229800 100 2024
#> 2 Total Other male 111700 100 2024
#> 3 Total Other female 118100 100 2024
#> 4 Total Other 25_to_34 44880 100 2024
#> 5 Total Other 35_to_54 84430 100 2024
#> 6 Total Other 55_plus 100500 100 2024
Code
# Load educational data, skipping the first 4 header rowseducational_data_2022 <-read_excel(here::here('data_raw', '2022_Educational_Data.xlsx'), skip =4)
Previewing first few rows of original data
Code
# Preview the first few rows of the datasethead(educational_data_2022)
# Reshape to long formateducational_2022_tidy <- educational_2022_clean %>%pivot_longer(cols =-years_of_school,names_to =c("demographic", "value"),values_to ="count",names_pattern ="(.+)_(number|percent)" )
Code
# Convert count values to numeric # Reshape back to wide format# Create a new variable to categorize educational attainment levels# Add year educational_2022_tidy <- educational_2022_tidy %>%mutate(count =as.numeric(count)) %>%pivot_wider(names_from = value,values_from = count) %>%mutate(attainment =case_when(str_detect(years_of_school, "no diploma") ~"Less than High School",str_detect(years_of_school, "GED") ~"GED",str_detect(years_of_school, "High school diploma") ~"High School Diploma",str_detect(years_of_school, "no degree") ~"Some College, No Degree",str_detect(years_of_school, "associate's") ~"Associate's Degree",str_detect(years_of_school, "Bachelor's") ~"Bachelor's Degree",str_detect(years_of_school, "no master's") ~"Bachelor's Degree",str_detect(years_of_school, "Master's") ~"Master's Degree",str_detect(years_of_school, "Professional") ~"Professional Degree",str_detect(years_of_school, "Doctorate") ~"Doctoral Degree",TRUE~"Other" )) %>%select(years_of_school, attainment, demographic, number, percent) %>%mutate(year =2022)
Previewing first few rows of cleaned data
Code
head(educational_2022_tidy)
#> # A tibble: 6 × 6
#> years_of_school attainment demographic number percent year
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Total Other total 226274 100 2022
#> 2 Total Other male 109979 100 2022
#> 3 Total Other female 116296 100 2022
#> 4 Total Other 25_to_34 44583 100 2022
#> 5 Total Other 35_to_54 83321 100 2022
#> 6 Total Other 55_plus 98371 100 2022
Code
# Load educational data, skipping the first 4 header rowseducational_data_2021 <-read_excel(here::here('data_raw', '2021_Educational_Data.xlsx'), skip =4)
Previewing first few rows of original data
Code
# Preview the first few rows of the datasethead(educational_data_2021)
# Reshape to long formateducational_2021_tidy <- educational_2021_clean %>%pivot_longer(cols =-years_of_school,names_to =c("demographic", "value"),values_to ="count",names_pattern ="(.+)_(number|percent)" )
Code
# Convert count values to numeric # Reshape back to wide format# Create a new variable to categorize educational attainment levels# Add yeareducational_2021_tidy <- educational_2021_tidy %>%mutate(count =as.numeric(count)) %>%pivot_wider(names_from = value,values_from = count) %>%mutate(attainment =case_when(str_detect(years_of_school, "no diploma") ~"Less than High School",str_detect(years_of_school, "GED") ~"GED",str_detect(years_of_school, "High school diploma") ~"High School Diploma",str_detect(years_of_school, "no degree") ~"Some College, No Degree",str_detect(years_of_school, "associate's") ~"Associate's Degree",str_detect(years_of_school, "Bachelor's") ~"Bachelor's Degree",str_detect(years_of_school, "no master's") ~"Bachelor's Degree",str_detect(years_of_school, "Master's") ~"Master's Degree",str_detect(years_of_school, "Professional") ~"Professional Degree",str_detect(years_of_school, "Doctorate") ~"Doctoral Degree",TRUE~"Other" )) %>%select(years_of_school, attainment, demographic, number, percent) %>%mutate(year =2021)
Previewing first few rows of original data
Code
head(educational_2021_tidy)
#> # A tibble: 6 × 6
#> years_of_school attainment demographic number percent year
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Total Other total 224580 100 2021
#> 2 Total Other male 108327 100 2021
#> 3 Total Other female 116253 100 2021
#> 4 Total Other 25_to_34 45284 100 2021
#> 5 Total Other 35_to_54 81684 100 2021
#> 6 Total Other 55_plus 97613 100 2021
The cleaning process were applied for the remaining years 2010 - 2020
#> # A tibble: 6 × 6
#> sexes age_group educational_attainment count race year
#> <dbl> <chr> <chr> <dbl> <chr> <dbl>
#> 1 2 18 to 24 years total 1137 asian 2009
#> 2 2 18 to 24 years none 1 asian 2009
#> 3 2 18 to 24 years 1st-4th_grade 5 asian 2009
#> 4 2 18 to 24 years 5th-6th_grade 2 asian 2009
#> 5 2 18 to 24 years 7th-8th_grade 7 asian 2009
#> 6 2 18 to 24 years 9th_grade 11 asian 2009
Code
#repeat for 2008 whiteeducational_data_2008_white <-read_excel(here::here('data_raw', '2008_Educational_Attainment_White.xls'), skip =6)educational_data_2008_white
#> # A tibble: 51 × 17
#> `Both Sexes` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 .18 years and o… 182714 609 1535 3162 4079 3333 4089 8134 56676 35759
#> 2 ..18 to 24 years 22056 36 45 206 205 367 609 2755 6377 8146
#> 3 .25 years and o… 160658 573 1490 2956 3874 2966 3479 5378 50299 27614
#> 4 ..25 to 29 years 16501 26 90 261 246 348 385 696 4609 3139
#> 5 ..30 to 34 years 14902 37 101 317 232 342 262 537 4099 2542
#> 6 ..35 to 39 years 16365 32 108 354 255 340 293 466 4387 2822
#> 7 ..40 to 44 years 17111 56 183 299 277 323 325 507 5156 2811
#> 8 ..45 to 49 years 18427 59 126 329 284 238 315 609 5896 3230
#> 9 ..50 to 54 years 17495 80 128 269 270 214 272 488 5584 2956
#> 10 ..55 to 59 years 15292 43 141 213 257 204 239 366 4591 2762
#> # ℹ 41 more rows
#> # ℹ 6 more variables: ...12 <dbl>, ...13 <dbl>, ...14 <dbl>, ...15 <dbl>,
#> # ...16 <dbl>, ...17 <dbl>
#> # A tibble: 6 × 6
#> age_group sexes educational_attainment count race year
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 18 years and over 2 total 10277 asian 2008
#> 2 18 years and over 2 none 111 asian 2008
#> 3 18 years and over 2 1st-4th_grade 98 asian 2008
#> 4 18 years and over 2 5th-6th_grade 214 asian 2008
#> 5 18 years and over 2 7th-8th_grade 170 asian 2008
#> 6 18 years and over 2 9th_grade 100 asian 2008
#> # A tibble: 6 × 6
#> age_group sexes educational_attainment count race year
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 18 years and over 2 total 10221 asian 2007
#> 2 18 years and over 2 none 151 asian 2007
#> 3 18 years and over 2 1st-4th_grade 129 asian 2007
#> 4 18 years and over 2 5th-6th_grade 191 asian 2007
#> 5 18 years and over 2 7th-8th_grade 189 asian 2007
#> 6 18 years and over 2 9th_grade 116 asian 2007
#> # A tibble: 6 × 6
#> age_group sexes educational_attainment count race year
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 15 to 17 years 2 total 473 asian 2006
#> 2 15 to 17 years 2 none 0 asian 2006
#> 3 15 to 17 years 2 1st-4th_grade 0 asian 2006
#> 4 15 to 17 years 2 5th-6th_grade 2 asian 2006
#> 5 15 to 17 years 2 7th-8th_grade 56 asian 2006
#> 6 15 to 17 years 2 9th_grade 152 asian 2006
#> # A tibble: 6 × 6
#> age_group sexes educational_attainment count race year
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 15 years and over 2 total 9873 asian 2005
#> 2 15 years and over 2 none 124 asian 2005
#> 3 15 years and over 2 1st-4th_grade 119 asian 2005
#> 4 15 years and over 2 5th-6th_grade 216 asian 2005
#> 5 15 years and over 2 7th-8th_grade 256 asian 2005
#> 6 15 years and over 2 9th_grade 239 asian 2005
#> # A tibble: 6 × 6
#> age_group sexes educational_attainment count race year
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 15 years and over 2 total 9592 asian 2004
#> 2 15 years and over 2 none 135 asian 2004
#> 3 15 years and over 2 1st-4th_grade 109 asian 2004
#> 4 15 years and over 2 5th-6th_grade 211 asian 2004
#> 5 15 years and over 2 7th-8th_grade 258 asian 2004
#> 6 15 years and over 2 9th_grade 265 asian 2004
#> # A tibble: 6 × 6
#> age_group sexes educational_attainment count race year
#> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 15 years and over 2 total 9328 asian 2003
#> 2 15 years and over 2 none 110 asian 2003
#> 3 15 years and over 2 1st-4th_grade 113 asian 2003
#> 4 15 years and over 2 5th-6th_grade 210 asian 2003
#> 5 15 years and over 2 7th-8th_grade 225 asian 2003
#> 6 15 years and over 2 9th_grade 250 asian 2003
#split and reattach: all races, both sexes, 2002split_row_1 =15split_row_2 =34split_row_3 =48allraces_bothsexes_2002 <-educational_data_2002 %>%slice(1:split_row_1) allraces_bothsexes_2002_2 <-educational_data_2002 %>%slice(split_row_2:split_row_3)#renaming the columns for the first half of the allraces/sexes 2002 dataallraces_bothsexes_2002 <- allraces_bothsexes_2002%>%set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate" ))#renaming the columns for the second half of the allraces/sexes 2002 dataallraces_bothsexes_2002_2 <- allraces_bothsexes_2002_2%>%set_names(c("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree","drop","drop" )) %>%select("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree" )
# collapsing ageallraces_bothsexes_2002_merged <- allraces_bothsexes_2002_merged %>%filter(age_group !="15 years and over") %>%filter(age_group !="15 to 17 years") %>%filter(age_group !="18 to 19 years") %>%filter(age_group !="20 to 24 years") %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
# collapsing attainmentallraces_bothsexes_2002_merged_3 <- allraces_bothsexes_2002_merged_2 %>%mutate(educational_attainment =case_when( educational_attainment =="1st-4th_grade"~"Less than HS", educational_attainment =="5th-6th_grade"~"Less than HS", educational_attainment =="7th-8th_grade"~"Less than HS", educational_attainment =="9th_grade"~"Less than HS", educational_attainment =="10th_grade"~"Less than HS", educational_attainment =="11th_grade"~"Less than HS", educational_attainment =="HS_graduate"~"HS", educational_attainment =="some_college"~"Some College, No Degree", educational_attainment =="associates_degree_occupational"~"Undergraduate Degree", educational_attainment =="associates_degree_academic"~"Undergraduate Degree", educational_attainment =="bachelors_degree"~"Undergraduate Degree", educational_attainment =="masters_degree"~"Graduate Degree", educational_attainment =="professional_degree"~"Graduate Degree", educational_attainment =="doctoral_degree"~"Doctoral Degree", educational_attainment =="count"~"Count", educational_attainment =="none"~"None" )) %>%group_by(sexes, age_group, educational_attainment, race, year) %>%summarize(count =sum(count, na.rm =TRUE), # Sum the counts.groups ='drop' )# Remove grouping
Code
#load and renamesplit_row_4 =65split_row_5 =81allraces_male_2002 <-educational_data_2002 %>%slice(split_row_4:split_row_5)split_row_6 =99split_row_7 =114allraces_male_2002_2 <-educational_data_2002 %>%slice(split_row_6:split_row_7)#renaming the columns for the first half of the allraces/male 2002 dataallraces_male_2002 <- allraces_male_2002%>%set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate" ))#renaming the columns for the second half of the allraces/male 2002 dataallraces_male_2002_2 <- allraces_male_2002_2%>%set_names(c("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree","drop","drop" )) %>%select("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree" )
allraces_male_2002_merged_3 <- allraces_male_2002_merged_2 %>%mutate(educational_attainment =case_when( educational_attainment =="1st-4th_grade"~"Less than HS", educational_attainment =="5th-6th_grade"~"Less than HS", educational_attainment =="7th-8th_grade"~"Less than HS", educational_attainment =="9th_grade"~"Less than HS", educational_attainment =="10th_grade"~"Less than HS", educational_attainment =="11th_grade"~"Less than HS", educational_attainment =="HS_graduate"~"HS", educational_attainment =="some_college"~"Some College, No Degree", educational_attainment =="associates_degree_occupational"~"Undergraduate Degree", educational_attainment =="associates_degree_academic"~"Undergraduate Degree", educational_attainment =="bachelors_degree"~"Undergraduate Degree", educational_attainment =="masters_degree"~"Graduate Degree", educational_attainment =="professional_degree"~"Graduate Degree", educational_attainment =="doctoral_degree"~"Doctoral Degree", educational_attainment =="count"~"Count", educational_attainment =="none"~"None" )) %>%group_by(sexes, age_group, educational_attainment, race, year) %>%summarize(count =sum(count, na.rm =TRUE), # Sum the counts.groups ='drop' )# Remove grouping
Code
split_row_8 =131split_row_9 =147allraces_female_2002 <-educational_data_2002 %>%slice(split_row_8:split_row_9)split_row_10 =164split_row_11 =180allraces_female_2002_2 <-educational_data_2002 %>%slice(split_row_10:split_row_11)#renaming the columns for the first half of the allraces/female 2002 dataallraces_female_2002 <- allraces_female_2002%>%set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate" ))#renaming the columns for the second half of the allraces/female 2002 dataallraces_female_2002_2 <- allraces_female_2002_2%>%set_names(c("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree","drop","drop" )) %>%select("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree" )
allraces_female_2002_long <- allraces_female_2002_long %>%mutate(educational_attainment =case_when( educational_attainment =="1st-4th_grade"~"Less than HS", educational_attainment =="5th-6th_grade"~"Less than HS", educational_attainment =="7th-8th_grade"~"Less than HS", educational_attainment =="9th_grade"~"Less than HS", educational_attainment =="10th_grade"~"Less than HS", educational_attainment =="11th_grade"~"Less than HS", educational_attainment =="HS_graduate"~"HS", educational_attainment =="some_college"~"Some College, No Degree", educational_attainment =="associates_degree_occupational"~"Undergraduate Degree", educational_attainment =="associates_degree_academic"~"Undergraduate Degree", educational_attainment =="bachelors_degree"~"Undergraduate Degree", educational_attainment =="masters_degree"~"Graduate Degree", educational_attainment =="professional_degree"~"Graduate Degree", educational_attainment =="doctoral_degree"~"Doctoral Degree", educational_attainment =="count"~"Count", educational_attainment =="none"~"None" )) %>%group_by(sexes, age_group, educational_attainment, race, year) %>%summarize(count =sum(count, na.rm =TRUE), # Sum the counts.groups ='drop' )# Remove grouping
Code
# Non-Hispanic Whitesplit_row_12 =197split_row_13 =213nonhispanicwhite_bothsexes_2002 <-educational_data_2002 %>%slice(split_row_12:split_row_13)split_row_14 =230split_row_15 =246nonhispanicwhite_bothsexes_2002_2 <-educational_data_2002 %>%slice(split_row_14:split_row_15)#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 datanonhispanicwhite_bothsexes_2002 <- nonhispanicwhite_bothsexes_2002%>%set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate" ))#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 datanonhispanicwhite_bothsexes_2002_2 <- nonhispanicwhite_bothsexes_2002_2%>%set_names(c("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree","drop","drop" )) %>%select("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree" )
nonhispanicwhite_bothsexes_2002_long <- nonhispanicwhite_bothsexes_2002_long %>%mutate(educational_attainment =case_when( educational_attainment =="1st-4th_grade"~"Less than HS", educational_attainment =="5th-6th_grade"~"Less than HS", educational_attainment =="7th-8th_grade"~"Less than HS", educational_attainment =="9th_grade"~"Less than HS", educational_attainment =="10th_grade"~"Less than HS", educational_attainment =="11th_grade"~"Less than HS", educational_attainment =="HS_graduate"~"HS", educational_attainment =="some_college"~"Some College, No Degree", educational_attainment =="associates_degree_occupational"~"Undergraduate Degree", educational_attainment =="associates_degree_academic"~"Undergraduate Degree", educational_attainment =="bachelors_degree"~"Undergraduate Degree", educational_attainment =="masters_degree"~"Graduate Degree", educational_attainment =="professional_degree"~"Graduate Degree", educational_attainment =="doctoral_degree"~"Doctoral Degree", educational_attainment =="count"~"Count", educational_attainment =="none"~"None" )) %>%group_by(sexes, age_group, educational_attainment, race, year) %>%summarize(count =sum(count, na.rm =TRUE), # Sum the counts.groups ='drop' )# Remove grouping
Code
split_row_16 =263split_row_17 =279nonhispanicwhite_male_2002 <-educational_data_2002 %>%slice(split_row_16:split_row_17)split_row_18 =296split_row_19 =312nonhispanicwhite_male_2002_2 <-educational_data_2002 %>%slice(split_row_18:split_row_19)#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 datanonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002%>%set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate" ))#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 datanonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2%>%set_names(c("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree","drop","drop" )) %>%select("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree" )nonhispanicwhite_male_2002_2 <- nonhispanicwhite_male_2002_2 %>%slice(-(1:6))nonhispanicwhite_male_2002 <- nonhispanicwhite_male_2002 %>%slice(-(1:6))
nonhispanicwhite_male_2002_wide<- nonhispanicwhite_male_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
nonhispanicwhite_male_2002_long <- nonhispanicwhite_male_2002_long %>%mutate(educational_attainment =case_when( educational_attainment =="1st-4th_grade"~"Less than HS", educational_attainment =="5th-6th_grade"~"Less than HS", educational_attainment =="7th-8th_grade"~"Less than HS", educational_attainment =="9th_grade"~"Less than HS", educational_attainment =="10th_grade"~"Less than HS", educational_attainment =="11th_grade"~"Less than HS", educational_attainment =="HS_graduate"~"HS", educational_attainment =="some_college"~"Some College, No Degree", educational_attainment =="associates_degree_occupational"~"Undergraduate Degree", educational_attainment =="associates_degree_academic"~"Undergraduate Degree", educational_attainment =="bachelors_degree"~"Undergraduate Degree", educational_attainment =="masters_degree"~"Graduate Degree", educational_attainment =="professional_degree"~"Graduate Degree", educational_attainment =="doctoral_degree"~"Doctoral Degree", educational_attainment =="count"~"Count", educational_attainment =="none"~"None" )) %>%group_by(sexes, age_group, educational_attainment, race, year) %>%summarize(count =sum(count, na.rm =TRUE), # Sum the counts.groups ='drop' )# Remove grouping
Code
split_row_20 =329split_row_21 =345nonhispanicwhite_female_2002 <-educational_data_2002 %>%slice(split_row_20:split_row_21)split_row_22 =362split_row_23 =378nonhispanicwhite_female_2002_2 <-educational_data_2002 %>%slice(split_row_22:split_row_23)#renaming the columns for the first half of the nonhispanicwhite/bothsexes 2002 datanonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002%>%set_names(c("age_group","total","none","1st-4th_grade","5th-6th_grade","7th-8th_grade","9th_grade","10th_grade","11th_grade","HS_graduate" ))#renaming the columns for the second half of the nonhispanicwhite/bothsexes 2002 datanonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2%>%set_names(c("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree","drop","drop" )) %>%select("age_group","some_college","associates_degree_occupational","associates_degree_academic","bachelors_degree","masters_degree","professional_degree","doctoral_degree" )nonhispanicwhite_female_2002_2 <- nonhispanicwhite_female_2002_2 %>%slice(-(1:6))nonhispanicwhite_female_2002 <- nonhispanicwhite_female_2002 %>%slice(-(1:6))
nonhispanicwhite_female_2002_wide<- nonhispanicwhite_female_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
nonhispanicblack_bothsexes_2002_wide<- nonhispanicblack_bothsexes_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
nonhispanicblack_male_2002_wide<- nonhispanicblack_male_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
nonhispanicblack_female_2002_wide<- nonhispanicblack_female_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
asian_bothsexes_2002_wide<- asian_bothsexes_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
asian_male_2002_wide<- asian_male_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
asian_female_2002_wide<- asian_female_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
hispanic_bothsexes_2002_wide<- hispanic_bothsexes_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
hispanic_male_2002_wide<- hispanic_male_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
hispanic_female_2002_wide<- hispanic_female_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
white_bothsexes_2002_wide<- white_bothsexes_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
white_male_2002_wide<- white_male_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
white_female_2002_wide<- white_female_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
justblack_bothsexes_2002_wide<- justblack_bothsexes_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
justblack_male_2002_wide<- justblack_male_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
justblack_female_2002_wide<- justblack_female_2002_wide %>%mutate(age_group =case_when( age_group =="25 to 29 years"~"25_to_34", age_group =="30 to 34 years"~"25_to_34", age_group =="35 to 39 years"~"35_to_54", age_group =="40 to 44 years"~"35_to_54", age_group =="45 to 49 years"~"35_to_54", age_group =="50 to 54 years"~"35_to_54", age_group =="55 to 59 years"~"55_plus", age_group =="60 to 64 years"~"55_plus", age_group =="65 to 69 years"~"55_plus", age_group =="70 to 74 years"~"55_plus", age_group =="75 years and over"~"55_plus" ))
#> year attainment_level demographic number
#> Min. :2001 Length:1395 Length:1395 Min. : 96
#> 1st Qu.:2006 Class :character Class :character 1st Qu.: 3927
#> Median :2012 Mode :character Mode :character Median : 14247
#> Mean :2012 Mean : 28751
#> 3rd Qu.:2018 3rd Qu.: 31158
#> Max. :2024 Max. :357668
educational_data_combined %>%filter(demographic %in%c("male", "female")) %>%ggplot(aes(x = demographic, y = number/1000, fill = demographic)) +geom_col(alpha =0.7) +scale_y_continuous(expand =expansion(mult =c(0, 0.05))) +labs(title ="Distribution of Male and Female Respondents",subtitle ="Female population slighlty larger across all education levels, 2001 - 2024",caption ="Source: U.S Census Bureau",x ="Gender",y ="Count (in millions)" ) +theme_classic() +scale_fill_viridis_d()+theme(legend.position ="none",panel.grid =element_blank(),axis.ticks =element_blank(),plot.title =element_text(size =22, face ="bold"),plot.subtitle =element_text(size =16),axis.title.x =element_text(size =18),axis.title.y =element_text(size =18),axis.text =element_text(size =15),plot.caption =element_text(size =13))
This bar chart above comparing total counts (in millions) for both females and males across all education levels and years (2001-2024). From this, we can say that there is a slightly larger female population than male population in this dataset.
Between 2001 and 2024, the United States saw a general rise in educational attainment. The slope chart reveals the extent of the change: undergraduate degree holders grew from 184 million to 316 million, while individuals without a high school diploma fell from 113 million to 78 million, making it the only category to decline. Similarly, graduate and doctoral degree holders see a climb in values from 54 million to 116 million, and 9 million to 20 million respectively.
label_2003 <-"2003 CPS question redesign:\nShift to 'highest degree attained'"label_2010 <-"2010 ACS transition:\nNew primary data source"attainment_plot <- educational_data_combined %>%filter(attainment_level !="Total") %>%group_by(year, attainment_level) %>%summarise(total_number =sum(number), .groups ="drop") %>%ggplot(aes(x = year, y = total_number/1000, color = attainment_level, group = attainment_level)) +geom_line(size =1) +geom_text_repel(data = . %>%filter(year ==max(year)),aes(label = attainment_level),hjust =-0.1, size =5, nudge_x =0, direction ="y", size =3, show.legend =FALSE) +geom_vline(xintercept =2008, linetype ="dashed", color ="black") +annotate("text", x =2008, y =680, label ="2008 Great Recession", vjust =0, hjust =0, size =5, fontface ="bold") +geom_vline(xintercept =2020, linetype ="dashed", color ="black") +annotate("text", x =2020, y =680, label ="COVID-19 Pandemic", vjust =0, hjust =0, size =5, fontface ="bold") +# Annotation for 2003geom_curve(data =data.frame(x =2004, xend =2002.5, y =600, yend =500),mapping =aes(x = x, xend = xend, y = y, yend = yend), inherit.aes =FALSE,color ='grey40', size =0.5, curvature =-0.1,arrow =arrow(length =unit(0.015, "npc"), type ="closed")) +geom_label(data =data.frame(x =2001.5, y =600, label = label_2003),mapping =aes(x = x, y = y, label = label), inherit.aes =FALSE, hjust =0, lineheight =0.9, size =4) +# Annotation for 2010geom_curve(data =data.frame(x =2011, xend =2009.5, y =480, yend =330),mapping =aes(x = x, xend = xend, y = y, yend = yend),inherit.aes =FALSE,color ='grey40', size =0.7, curvature =-0.35, arrow =arrow(length =unit(0.01, "npc"), type ="closed")) +geom_label(data =data.frame(x =2009.5, y =480, label = label_2010),mapping =aes(x = x, y = y, label = label), inherit.aes =FALSE,hjust =0, lineheight =0.9, size =4) +scale_x_continuous(breaks =seq(2001, 2025, 5),expand =expansion(add =c(1, 8))) +scale_y_continuous(limits =c(0, 685)) +theme_half_open(font_size =18) +theme(legend.position ='none',panel.grid =element_blank(),axis.ticks =element_blank(),plot.title =element_text(size =22, face ="bold"),plot.subtitle =element_text(size =16),axis.title.x =element_text(size =18),axis.title.y =element_text(size =18),axis.text =element_text(size =15),plot.caption =element_text(size =13)) +labs(x ='Year',y ='Count (millions)',title ='Educational Attainment Trends, 2001 - 2024',subtitle ="Methodology changes in 2003 and 2010 explain sharp drops",caption ="Source: U.S Census Bureau") attainment_plot
The sharp decrease from 2002-2003 and 2009-2010, reflect the changes in the Census Bureau’s methodology and how educational attainment data is collected and reported. In 2003, the Current Population Survey (CPS) revised its questions which resulted in changes in totals compared to earlier years. The Census Bureau’s Educational Attainment in the United States: 2003 report explains that the Current Population Survey (CPS) shifted from measuring “years of schooling completed” to asking for the highest grade or degree completed to determine educational attainment (Stroops, 2021). In 2010, the data collection and reporting transitioned from CPS to the American Community Survey (ACS). The Census Bureau began emphasizing the ACS as the primary source for educational attainment statistics, stating that in 2009 and earlier, data from the Annual Social and Economic Supplement (ASEC) to the Current Population Survey (CPS) were used (Ryan & Siebens, 2021).
Examining percentage changes reveals which educational levels had the largest growth. Graduate and doctoral degrees more than doubled, increasing 114.6% and 139.1% respectively; the highest growth rates of any category. Undergraduate degree holders increased by 71.6%, while high school graduates grew just 9.9%. Adults without a high school diploma declined by roughly 31%, indicating that more individuals hold at least a GED or high school diploma. The graph suggests that more individuals are completing degrees rather than stopping partway, due to the small percentage change in “Some College, No Degree” category.
The following charts examine how educational attainment evolved within each racial groups. Through this, the visuals show the changing proportions of educational achievement from 2001 to 2024, revealing both progress and persistent challenges.
#install.packages("waffle")library(waffle)library(grid)white_data <- educational_data_combined %>%filter(number !=".", demographic =="white", year %in%c(2001, 2024), attainment_level !="Total") %>%mutate(year =as.integer(year),attainment_level =factor(attainment_level,levels =c("Less than HS", "HS", "Some College, No Degree","Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%group_by(year, attainment_level) %>%summarise(number =sum(as.numeric(number)), .groups ="drop") %>%group_by(year) %>%mutate(percent = number /sum(number) *100,squares =round(percent)) # normalize to 100 squares# Identify largest white_labels <- white_data %>%group_by(year) %>%slice_max(order_by = percent, n =1) %>%mutate(label =paste0(attainment_level, ":\n ", round(percent,1), "%"), size =4)# Plotggplot(white_data, aes(fill = attainment_level, values = squares)) +geom_waffle(color ="white", size =0.5, n_rows =10, flip =TRUE) +facet_wrap(~year, ncol =2, strip.position ="top") +scale_fill_viridis_d(drop =FALSE) +theme_minimal() +theme(panel.spacing =unit(4, "lines"),strip.text =element_text(size =18, face ="bold"),panel.grid =element_blank(),axis.ticks =element_blank(),plot.title =element_text(size =22, face ="bold"),plot.subtitle =element_text(size =16),axis.title.x =element_text(size =18),axis.title.y =element_text(size =18),axis.text =element_text(size =15),plot.caption =element_text(size =13)) +labs(title ="White Americans Shift from High School to College Degrees",subtitle ="High School dropout rate cut in half while graduate degrees rise",caption ="Each square = 1% of population \n\nSource: U.S Census Bureau",fill ="Degree" )
At the start of the 21st century, White Americans faced somewhat of a challenge of leaving school without a diploma. In 2001, a little over 15% had not obtained a high school diploma or equivalent. Fast forward to 2024, this number has fallen to about 8%; nearly cut in half. Visually, the ratios of high school, some college and undergraduate degrees appear about the same, though the balance shifts slightly between partial college and graduate degrees. Doctoral attainment, though still a small slice, doubles.
Code
#install.packages("waffle")library(waffle)library(grid)asian_data <- educational_data_combined %>%filter(number !=".", demographic =="asian", year %in%c(2001, 2024), attainment_level !="Total") %>%mutate(year =as.integer(year),attainment_level =factor(attainment_level,levels =c("Less than HS", "HS", "Some College, No Degree","Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%group_by(year, attainment_level) %>%summarise(number =sum(as.numeric(number)), .groups ="drop") %>%group_by(year) %>%mutate(percent = number /sum(number) *100,squares =round(percent)) # normalize to 100 squares# Identify largest asian_labels <- asian_data %>%group_by(year) %>%slice_max(order_by = percent, n =1) %>%mutate(label =paste0(attainment_level, ":\n ", round(percent,1), "%"), size =4)# Plotggplot(asian_data, aes(fill = attainment_level, values = squares)) +geom_waffle(color ="white", size =0.5, n_rows =10, flip =TRUE) +facet_wrap(~year, ncol =2, strip.position ="top") +scale_fill_viridis_d(drop =FALSE) +theme_minimal() +theme(panel.spacing =unit(4, "lines"),strip.text =element_text(size =18, face ="bold"),panel.grid =element_blank(),axis.ticks =element_blank(),plot.title =element_text(size =22, face ="bold"),plot.subtitle =element_text(size =16),axis.title.x =element_text(size =18),axis.title.y =element_text(size =18),axis.text =element_text(size =15),plot.caption =element_text(size =13)) +labs(title ="Asian Population Sees Staggering Improvement",subtitle ="Graduate and Doctoral degrees nearly double",caption ="Each square = 1% of population \n\nSource: U.S Census Bureau",fill ="Degree" )
For Asians, the story reveal a rather staggering improvement in the proportion of those who were able to attain at least a high school diploma. Looking at the darkest purple section at the bottom of the graph, in 2001, about 12% had not finished high school. By 2024, that share decreased to 8%.
Another area with visually significant growth is the lightest green, which corresponds to individuals who obtained graduate degrees. In 2001, just over one row is light green, roughly 12%. By 2024, it stretches just over two rows, representing about 23%. In just over two decades, the proportion of Asians with graduate degrees almost doubled.
Finally, looking at the yellow-colored section at the top: Doctoral attainment, also shows growth. Though always quite a small subset of the population, the value climbed from just under 3% in 2001 to about 5% in 2024.
Code
#install.packages("waffle")library(waffle)library(grid)black_data <- educational_data_combined %>%filter(number !=".", demographic =="black", year %in%c(2001, 2024), attainment_level !="Total") %>%mutate(year =as.integer(year),attainment_level =factor(attainment_level,levels =c("Less than HS", "HS", "Some College, No Degree","Undergraduate Degree", "Graduate Degree", "Doctoral Degree"))) %>%group_by(year, attainment_level) %>%summarise(number =sum(as.numeric(number)), .groups ="drop") %>%group_by(year) %>%mutate(percent = number /sum(number) *100,squares =round(percent)) # normalize to 100 squares# Plotggplot(black_data, aes(fill = attainment_level, values = squares)) +geom_waffle(color ="white", size =0.5, n_rows =10, flip =TRUE) +facet_wrap(~year, ncol =2, strip.position ="top") +scale_fill_viridis_d(drop =FALSE) +theme_minimal() +theme(panel.spacing =unit(4, "lines"),strip.text =element_text(size =18, face ="bold"),panel.grid =element_blank(),axis.ticks =element_blank(),plot.title =element_text(size =22, face ="bold"),plot.subtitle =element_text(size =16),axis.title.x =element_text(size =18),axis.title.y =element_text(size =18),axis.text =element_text(size =15),plot.caption =element_text(size =13)) +labs(title ="Black Americans Make Major Educational Gains",subtitle ="Dropout rates falls, but college completion still trails other groups",caption ="Each square = ~1% of population \n\nSource: U.S Census Bureau",fill ="Degree" )
The graph above have slightly over 100 squares due to rounding quirks. However, each square still represents about one percent.
From 2001 to 2024, there appears to be significant improvement when it comes to graduating high school. The proportion of Black individuals with less than a high school diploma dropped from 21% to just under 9%: more than halved. High school attainment itself looks to hold about one-third of the Black population in both years. Undergraduate attainment inch upwards from about 2% in 2001 to about 3% in 2024. Graduate degree attainment, however, more than double, rising from approximately 4% to 10%, similarly to Doctoral attainment, from 1-2% to around 4%.
Educational Attainment Trends Across Racial Groups (2024)
Having seen how each racial group progressed over time, we can now compare them directly. The 2024 image reveals differences in educational attainment across racial categories.
Code
educational_percent <- educational_data_combined %>%filter(number !=".", year ==2024, attainment_level !="Total", demographic %in%c("white","black","asian")) %>%mutate(number =as.numeric(number)) %>%group_by(demographic, attainment_level) %>%summarise(total =sum(number), .groups ="drop") %>%group_by(demographic) %>%mutate(percent =round(total /sum(total) *100, 2)) %>%ungroup() %>%group_by(attainment_level) %>%mutate(is_highest = percent ==max(percent)) %>%ungroup() %>%mutate(fill_color =case_when( is_highest & demographic =="asian"~"asian", is_highest & demographic =="black"~"black", is_highest & demographic =="white"~"white",TRUE~"other" ),,attainment_level =factor(attainment_level,levels =c("Doctoral Degree","Graduate Degree","Undergraduate Degree","Some College, No Degree","HS","Less than HS")))ggplot(educational_percent, aes(x = demographic, y = percent, fill = fill_color)) +geom_col(width =0.7) +geom_text(data = educational_percent %>%filter(is_highest),aes(label =paste0(demographic, "\n", percent, "%")),vjust =-0.5,size =4.5 ) +facet_wrap(vars(attainment_level), nrow =2) +scale_fill_manual(values =c("asian"="#440154", "black"="#31688e", "white"="#fde725", "other"="grey80"),breaks =c("asian", "black", "white"),labels =c("Asian", "Black", "White") ) +scale_y_continuous(limits =c(0, 48), expand =expansion(mult =c(0, 0.05))) +theme_minimal_hgrid(font_size =16) +theme(legend.position ="none",panel.spacing =unit(4, "lines"),panel.grid =element_blank(),axis.ticks =element_blank(),plot.title =element_text(size =22, face ="bold"),plot.subtitle =element_text(size =16),axis.title.x =element_text(size =18),axis.title.y =element_text(size =18),axis.text =element_text(size =15),plot.caption =element_text(size =13)) +labs(title ="2024: Asian Americans Outpace Other Groups in Advanced Level Degrees",subtitle ="Black Americans most likely to stop at high school",y =" Percentage (%) ",x =" ",caption ="Source: U.S Census Bureau" )
By 2024, the contrasts across racial groups are striking. Asian populations show strong gains in advanced degrees, while the Black population continues to face steeper hurdles.
Looking closely at the Black population in 2024, most of the story unfolds at lower educational attainment levels. The Black population shows the highest concentration in the “High School” category at approximately 34%, and in “Some College, No Degree” at about 17%, higher than the White and Asian populations. This can suggest systemic barriers within this population that prevent degree completion even when higher education is attempted. These patterns reveal that college completion, which is considered a gateway to middle-class economic opportunity, remains a bigger challenge for Black individuals than for other racial groups, even when they successfully finish high school.
Gender patterns in educational attainment have undergone some reversal and change between 2001 and 2024. In 2001, men held advantages at the highest educational levels, leading women in graduate and doctoral degrees by 1.4% and 1.2% respectively. By 2024, women had not only closed these gaps, but reversed them entirely, now surpassing men in graduate degree attainment. Women’s undergraduate advantage has also grown from 0.2% to 2.8%.
Educational Attainment Trends by Age
Code
educational_data_combined %>%filter(demographic %in%c("25_to_34", "35_to_54", "55_plus")) %>%filter(attainment_level !="Other", attainment_level !="Total") %>%mutate(attainment_level =factor( attainment_level, levels =c("Doctoral Degree","Graduate Degree","Undergraduate Degree","Some College, No Degree","HS","Less than HS") )) %>%mutate(demographic =factor(demographic,levels =c("25_to_34", "35_to_54", "55_plus"),labels =c("25 to 34", "35 to 54", "55 Plus") )) %>%group_by(demographic, attainment_level) %>%summarise(total =sum(number)) %>%ggplot(aes(x = demographic, y = total/1000, fill = demographic)) +geom_col(position ="dodge") +facet_wrap(~attainment_level)+coord_flip()+labs(title ="Midlife Americans Lead in College Degree Attainment",subtitle ="Advanced degrees peak in midlife; adults 55 years and older less likely to pursue higher education",x ="Age group ",y ="Count (millions)",fill ="Age",caption ="\nSource: U.S. Census Bureau") +theme_minimal_hgrid() +theme(axis.text.x =element_text(hjust =1),panel.grid.minor =element_blank(),panel.spacing =unit(4, "lines") )+scale_fill_viridis_d() +theme(axis.text.x =element_text(hjust =1),panel.grid.minor =element_blank(),plot.title =element_text(size =20, face ="bold"),plot.subtitle =element_text(size =17),axis.title.x =element_blank(),axis.title.y =element_text(size =18),axis.text =element_text(size =15),legend.position ="none",plot.caption =element_text(size =13) )
Following the hierarchy of educational attainment, the distribution of attainment across age groups is shaped by cohort size and generational timing. The 35-54 age group shows the highest counts across undergraduate and graduate degrees, reflecting both their large size, and the expansion of higher education during their formative years.
Doctoral degree attainment sees more in ages 35 and over, with close values in the 35 to 54 and 55 and over age groups. This reflects the timeline for doctoral degree completion and the accumulation of degree holders who completed their doctorates in the previous decades.
The 25-34 age group shows rather smaller numbers across all education levels possibly due to the fact that many are still pursuing degrees. By midlife, the drive to pursue a higher education often peaks, and it is during this stage that individuals are most likely to complete undergraduate or graduate degrees.
Evaluation of Expectations
Proposal Expectations
Our datasets contain the following variables: race, sex, age group, detailed years of school (representing educational degree attainment) and year. For this project, our primary focus will be on race, educational attainment, and year, which serves as our temporal variable.
Our original proposal expectations were that the educational attainment variable will be multimodal, with different peaks at different attainment levels: there may be fewer individuals holding highly advanced degrees. We also expected that due to systemic racism, we would observe that a higher percentage of White respondents with higher educational attainment.
Our expectations about the variables inspired our research questions about how educational attainment differs across demographic groups. These expectations connected directly to the project’s main goal: exploring how education has changed over time in the U.S, with room for highlighting whether disparities exist across age, race, and gender group.
Evaluation
Based on the visualizations created, we found support for this anticipation. The aggregated bar charts and faceted comparisons by sex and race show that educational attainment is not evenly distributed across the aforementioned demographic groups.
The largest populations cluster around high school diplomas and undergraduate degrees, while doctoral degrees represent progressively smaller shares. This reflects typical educational pathways: most Americans complete high school, many pursue college, and fewer continue to advanced degrees.
Specifically, when looking at race it was indeed observed that white respondents consistently show higher counts across most educational levels compared to Black respondents. However, we discovered that Asians consistently outpace both White and Black populations in advanced educational attainments. These patterns educational reveal disparities that are more than a simple White/non-White divide, and can reflect other factors such as cultural emphasis on education or differential access to resources.
Conclusion
This project explored educational attainment trends across different demographic groups from 2001 to 2024, using a series of visualizations to highlight disparities and shifts over time. Between 2001 and 2024, the United States experienced overall growth in educational attainment: undergraduate degree holders increased from 184 million to 316 million (71.6% growth), while adults without high school diplomas declined from 113 million to 78 million; the only category to see a decrease. Graduate and doctoral degrees more than doubled, with growth rates of approximately 115% and 139% respectively, indicating that more individuals are pursuing education beyond the undergraduate degree level.
Our analysis confirmed initial expectations about educational inequality among demographic groups while revealing important complexities. The high proportion of Black Americans with “Some College, No Degree” suggests issues such as systemic financial and institutional barriers preventing degree completion. Gender patterns show a surprising reversal: women now surpass men in graduate degree attainment and have nearly closed the doctoral gap.
The use of slope charts, waffle charts, faceted comparisons and dumbbell plots provided clear visual evidence of patterns and disparities. These findings underscored inequalities in the education system and offer a foundation for future inquiry into the social and economic impacts of educational attainment.
Attribution
Leshauna Hartman: Introduction, Research Questions, Data Sources, Data Dictionary & README, Data Cleaning, Data Visualizations, Evaluation of Expectations, Conclusion.
Maya Schmidt: Introduction, Research Questions, Data Cleaning, Data Visualizations, Evaluation of Expectations, Conclusion.
References
Ryan, C. L., & Siebens, J. (2021, October 8). Educational attainment in the United States: 2009. Census.gov. https://www.census.gov/library/publications/2012/demo/p20-566.html
Stroops, N. (2021, October 8). Educational attainment in the United States: 2003. Census.gov. https://www.census.gov/library/publications/2004/demo/p20-550.html
U.S Census Bureau, (n.d.). Educational attainment tables. Census.gov. https://www.census.gov/topics/education/educational-attainment/data/tables.html?text-list-e9c3fe7baa%3Atab=2024#text-list-e9c3fe7baa
Appendix
Below is data dictionary table and all of the codes used in this report.