Research Problem and Significance

Data Sources

The first data source that I utilized was from the Virginia Department of Elections ( The Virginia Department of Elections is an agency of the Commonwealth of Virginia, and it has released different voting statistics summaries of the individual election results for Presidential elections since 2008. They collect the data from the election ballots that are filled out at designated elections on specific voting days. I downloaded the original 2016 November General Election data directly from the website as a csv file. I filtered out the dataset so that it would only include the data for the “President and Vice President Votes”. Then, I selected only the variables that I felt I would need for my visualizations which was the full name of the candidate voted for, the party that the voter is affiliated with, and the county where the vote was placed. The data from the Virginia Department of Elections was also extracted into an Excel file, which was copy and pasted into the data. This was due to difficulty wrangling the percentage numbers to be utilizable in data visualizations.

The second data source that I utilized was from the U.S. Census Bureau (( The U.S. Census Bureau is a federal agency that is responsible for collecting and producing data about the American people. They collect the data from census forms that are mailed out to the American public and submitted to the agency. I navigated to a section of their website where they featured county population estimates by demographic characteristics. From there, I downloaded the original Virginia csv file and noticed that they had variable names that were not immediately recognizable. I checked their data variable descriptions and renamed them. I also combined the male and female columns of each race, so that my scope can be focused solely on race and not sex. I filtered the year to 6 which is the 2016 population estimate, and the agegroup to greater than 4 which includes the ages 20+. Lastly, I selected only the variables I thought I would need for my visualizations which was the name of the county, the total population of the county, and the population estimates for each racial demographic.

The largest concern I currently have with my data sources is the data from the Census Bureau. I looked through many different data sources including the tidycensus package, different census datasets, and other online sources, but the current data I have was the only one that had the necessary information of racial demographics broken down by Virginia county. However, it is broken down in a way where there are age groups divided by 5 (0-4 years old, 5-9 years old, etc.). Therefore, 18 and 19 year olds are not included in the demographics data, while they are included in the election data.

##   CountyName        TotalPopulation MinorityPercentage
##  Length:133         Min.   :  104   Min.   : 0.6897   
##  Class :character   1st Qu.:  808   1st Qu.:10.7527   
##  Mode  :character   Median : 1639   Median :25.0591   
##                     Mean   : 4459   Mean   :27.1095   
##                     3rd Qu.: 3746   3rd Qu.:40.2127   
##                     Max.   :63331   Max.   :84.4271
##   CountyName        TotalPopulation MinorityPercentage WinningMargin    
##  Length:133         Min.   :  104   Min.   : 0.6897    Min.   :-0.6754  
##  Class :character   1st Qu.:  808   1st Qu.:10.7527    1st Qu.:-0.4260  
##  Mode  :character   Median : 1639   Median :25.0591    Median :-0.2635  
##                     Mean   : 4459   Mean   :27.1095    Mean   :-0.2495  
##                     3rd Qu.: 3746   3rd Qu.:40.2127    3rd Qu.:-0.1146  
##                     Max.   :63331   Max.   :84.4271    Max.   : 0.5919  
##                                                        NA's   :38

Proposal Expectations

I was able to find the minority percentage by calculating the number of all non-caucasians over the total population of each county. However, I believe there is an error in my data with calculating the results of the 2016 Presidential elections. For some reason, when I calculate the number of Democratic votes and the number of Republican votes from 2016 election data, the numbers are equivalent for every county. Therefore the margin of victory in each county is currently 0. In order to create my graphs while I try to fix this error, I added in a variable with the margin of victory from each county using the election data.

I then created a joint dataset with the county name, total population, minority percentage, and winning margin. From this dataset, I was able to gather that the mean minority percentage across counties in Virginia was about 23%, while the mean winning margin was about -25% in favor of the Republicans. This was able to tell me that Republicans either won by more in each county on average or that Republicans won in more counties. Considering the fact that in the 2016 election, the Democratic presidential candidate won, I was able to draw the conclusion that counties with larger populations (and more voting power), were the ones to lead the Democratic Party to victory in Virginia. This population trend can also be seen through the summary of the dataset, where the median of the county populations are at 1525 people, while the mean is 3734 people; meaning that although there are many counties with smaller populations, there are large outlier counties bringing the mean population up to almost three times the median.

Data Visualizations

The first chart I created shows points with the minority percentage of a county on the y axis and the winning margin (with Democratic win being positive and Republican win being negative) on the x axis. From this chart I was able to identify a general upwards trend, with a larger minority percentage leading to a winning margin of Democratic nominee. Although this does not confirm my original hypothesis that more racial diversity will lead to a smaller margin of victory, we are able to see that more racial diversity will bring liberal leaning in counties. However, when looking specifically at the the right side of the graph where a Democratic win is indicated, there does not seem to be a strong correlation between an increasing minority percentage and a winning margin. There could be other factors affecting the winning margin such as income and age.

The second chart I created shows the total population of a county on the x axis and the winning margin (with Democratic win being positive and Republican win being negative) on the y axis. From this chart, although we don’t see a necessarily steep positive correlation, we are still able to see that the Democratic nominee won most of the counties where there are significantly larger populations. We are also able to confirm what we saw earlier in the data summary, where there are outlier counties that are largely skewing the population mean in Virginia, and these counties are winning in favor of the Democratic candidate. Although there are many more red “dots”, signifying that the Republican candidate won in many more counties, the outlier counties are what have been driving Virginia to turn from a “red” to a “purple” to a “slightly blue” state so rapidly.

Although there are 95 counties in Virginia, Fairfax County makes up about 13% of Virginia’s population, giving it very large sway over Virginia elections. To look a bit more closely at this outlier county in Virginia (in terms of population), I took information off of the Fairfax County Government website to examine the winning margins of each district in Fairfax County ( From this graph, we are able to see that there is a pretty significant winning margin in all of the counties in favor of the Democratic nominee. Even if the district with the lowest winning margin (Springfield) were to overcome its 14% winning margin, it seems virtually impossible to flip all of the other districts of this large county in favor of the Republican nomineee.


From my research, I was able to conclude that there is a positive correlation between minority percentage and winning margin, as well as population county and winning margin (both in favor of the Democratic party). Virginia is still considered a swing state and although the past 3 Presidential elections swung in favor of the Democratic nominee, there is still opportunity for the state to swing back, as an overwhelming majority of counties still win in favor of the Republican Party. However, if the total population and minority population of Virginia keep growing (and it has), it could be likely that the Democratic party will win again iin Virginia in 2020 based on these trends.

One bright side for Republicans are that large counties are the ones skewing the state to be more “blue”, but if the Republican candidate were to strategically target some larger counties that had smaller margins of victory in 2016, there could be a different outcome in 2016. Although Fairfax County seems to be a bit of a reach to win for the Republicans, a county like Chesterfield County would be perfect to try and flip. Chesterfield is the third largest county in Virginia, and had a winning margin of only about 2%. If the Republican Party targets larger counties like Chesterfield where there is not a large margin of victory, who knows what could happen in the 2020 Presidential election.


Election Data

Name Type Description
Party String The party of the candidate that was voted for.
CountyName String The county that the vote was placed in.
TotalVotes String The total votes for the specific party in the specific county.

Demographic Data

Name Type Description
County Name String The name of the county.
Total Population Numeric The total population of the county.
Caucasian Numeric The population estimate of Caucasians in the county.
African American Numeric The population estimate of Blacks/African Americans in the county.
Native American Numeric The population estimate of Native Americans/Alaskan natives in the county.
Asian Numeric The population estimate of Asians in the county.
Hawaiian or Pacific Islander Numeric The population estimate of Hawaiians/Pacific Islanders in the county.
Mixed Numeric The population estimate of those of 2 or more races in the county.

Winning Margins Data

Name Type Description
County Name String The name of the county.
Winning Margin Numeric The percentage of the winning margin, with positive being in favor of the Democratic party.