Research Question and Background

The Olympic Games have been a longstanding tradition that brings the world together to watch the best athletes compete in the world’s favorite sports. Every 4 years, the summer Olympics host hundreds of countries and each country has a goal of bringing home either a bronze, silver or gold medal. In the recent century, the Summer Olympics have been dominated by a select few countries who always go home with the highest medal counts. Do those countries just have more talented athletes, or is there an underlying factor that gives them an advantage before the games even start? This research project will answer the question, how does a country’s GDP affect overall medals earned at the Summer Olympics?

Discuss Data Sources

https://www.kaggle.com/the-guardian/olympic-games?select=summer.csv

The different medal counts for each country were found using the Olympic Sports and Medal data set which was found on kaggle.com. The original data set was created by the IOC Research and Reference Service. It was then published by The Guardian and was last updated 4 years ago. This data set consists of every winner from each Olympic games, their sport, and their country. It also includes the silver and bronze medalists as well.

https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=excel

For the GDP, a raw data set was used from worldbank.org. The source appears reliable as the World Bank is specialized in providing this type of data sets. The data is updated yearly to be kept up to date. It is important to note that this data set excludes countries that are currently still countries.

Data Cleaning

For our GDP dataset, we had to gather the data in order to get it into long format. After this process, it was possible to join the GDP dataset to the medals dataset. Because the two datasets had a column of ISO country codes, the join was made with the codes and the year. Another detail worth mentioning is that the GDP dataset began in 1960. Even though the medal dataset included years prior to this year, it was modified to only include the years from 1960 and onward. Another step taken to clean the data, entries with no GDP info were eliminated as a result some countries such as the Soviet Union were not included in this report.

Top 10 Medal Counts

To further explore the relationship between GDP and Olympic medals earned, the graphs below highlight the total medals won for each country from every Olympics since 1960. The graph is split up by bronze, gold and silver.

The United States has an obvious lead in gold, silver and bronze compared to the rest of the countries. Unlike other countries, the US has more gold medals earned than silver and bronze. 8 of the other countries have more bronze than the other medals. This could be more common because it is easier to win a bronze medal than getting gold.

Exploring Correlation of GDP and Medal Count

In order to explore if there is a relationship between a country’s medals earned and GDP, a scatter plot was created. In order to better display the data, the log was taken from both axes.

After graphing the data, there appears to be a positive correlation between GDP and total medal count. In order to quantify the correlation, the Pearson r-value was calculated and displayed. The outliers are evident. These points represent countries that won a few medals in one summer Olympics. Our methodology allows a gold metal to carry the same weight as a bronze which may be able to explain some of the skewness. For this reason, countries that earned less than five medals were filtered out. Dispute the evident outliers, the r-value of 0.72 shows a fairly strong positive correlation between the two factors. After viewing the results, it must be noted that countries such as the Soviet Union, West Germany, and East Germany are not represented. Although all three countries performed well in the Olympics, GDP data was not collected from those countries by the world bank. In an attempt to keep the GDP data reliable, the group did not add outside data for these countries. However, based on sources it is evident that these countries would have similar characteristics as the United States. All three countries had higher GDP’s and Olympics when compared to other countries.

World’s Current Superpower Countries

The following two charts show the GDP and medal count vs year for the United States, Russia, and China. In the following charts Russia only has data from 1996 due to it being a part of the USSR prior. China also only has data available from 1984. Looking at both charts it is clear that there is a general increase in both medal count and GDP over time. This conclusion is more clear in the United States over Russia and China due to the few data points available for latter two.

Conclusion and Limitations

Based on the graphs and data sets above, it is evident that the countries with the highest GDP tend to earn more Olympic medals than the rest of the world. After analyzing data sources, it has been common for super power countries such as the United States, China, and Russia to be among the top performers in Olympic competitions. After viewing the superpower countries, it was evident that there was not enough GDP information on Russia to find a correlation. However, a correlation was more evident when focusing on the United States with both charts hinting a positive trend between GDP and medal count. China’s lack of GDP info may have been a result of a change of name in the 1980’s. Although there was a correlation found, it must be noted that correlation does not imply causation. This is evident with the outlier countries as seen in the scatter plot of all the countries. Clearly, if a country has a high GDP but does not invest money in Olympic preparatory facilities for their athletes then it becomes more difficult for those athletes to perform well. Our data did have limitations. For this project, the focus remained on the summer Olympics which may not represent all the countries well. For example, countries that are more successful in winter sports may not be as successful in the summer Olympics.

knitr::opts_chunk$set(
    warning = FALSE,
    message = FALSE,
    comment = "#>",
    fig.path = "figs/", # Folder where rendered plots are saved
    fig.width = 7.252, # Default plot width
    fig.height = 4, # Default plot height
    fig.retina = 3 # For better plot resolution
)

library(tidyverse)
library(here)
library(xlsx)
library("readxl")
library(extrafont)
library(cowplot)
library(knitr)
library(viridis)
library(plotly)
library(ggrepel)
library(gganimate)

GDP <- read_excel(here('data', 'CountryGDP.xls'), skip = 2)
medals <- read_csv(here('data', 'Medals.csv'))
medals <- medals %>% 
  filter(Year >= 1960)
gdp <- GDP %>% 
  gather(key = 'Year', value = 'GDP', `1960`:`2020`) %>% 
  mutate(Year = as.double(Year))
scatterplotdata <- left_join(medals, gdp, by = c("Country"= "Country Code", "Year")) %>% 
    select(Year, `Country Name`, GDP, Medal) %>%
    filter(!is.na(GDP))%>% 
    mutate(GDP = (GDP/1000000000))%>% 
    group_by(`Country Name`, Year, GDP) %>% 
    count() %>% 
    filter(n >= 5)

finalgold <- left_join(medals, gdp, by = c("Country"= "Country Code", "Year")) %>% 
    select(Year, Country, `Country Name`, GDP) %>%
    filter(!is.na(GDP))%>% 
    group_by(Country, Year, GDP) %>% 
    count()

corr <- cor(     
  scatterplotdata$GDP, 
  scatterplotdata$n,     
  method = 'pearson', 
  use = "complete.obs")

corrLabel <- paste("r =", round(corr, 2))
finalgold1 <- finalgold %>% 
    mutate(GDP = (GDP*1000000000)) %>% 
    mutate(logGDP = log(GDP)) %>%
    filter(Country %in% c('USA','CHN','FRA', 'IND', 'GBR', 'AUS', 'RUS', 'ITA', 'JPN', 'CAN'))

 finalgold1 <-  ggplot(finalgold1,
  aes(x = Year, y = logGDP)) +
  geom_line(color = 'steelblue', size = 0.25) +
  geom_point(color = 'steelblue', size = 1) +
  facet_wrap(vars(Country), nrow = 2) +
  theme_half_open(font_size = 5) +
  labs(x = 'Year',
       y = 'GDP',
       subtitle = 'Top 10 Countries',
       title = 'GDP Trend from 1960 to 2020')
  
finalgold1
#Medal Counts over the years 

medalCount <- medals %>%
  filter(Country %in% c('USA','CHN','FRA', 'IND', 'GBR', 'AUS', 'RUS', 'ITA', 'JPN', 'CAN'))%>%
  group_by(Medal, Country) %>%
  count() %>%
  mutate(isBronze = if_else(Medal == 'Bronze', TRUE, FALSE),
         isGold = if_else(Medal == 'Gold', TRUE, FALSE)) %>%
  ggplot() +
  geom_col(aes(x = Medal, y = n, fill = Medal)) + 
  facet_wrap(vars(Country), nrow = 2) +
  theme_minimal_hgrid(font_size = 10) +
  labs(x = " ",
       y = "Medal Count",
       title = "Total Medal Count for Each Country")

medalCount <- medalCount +
  scale_fill_manual(values = c('sienna', 'gold', 'grey'))

medalCountUSA <- medals %>%
  filter(Country == "USA")%>%
  group_by(Medal, Country) %>%
  count() %>%
  mutate(isBronze = if_else(Medal == 'Bronze', TRUE, FALSE),
         isGold = if_else(Medal == 'Gold', TRUE, FALSE)) %>%
  ggplot() +
  geom_col(aes(x = Medal, y = n, fill = Medal)) + 
  facet_wrap(vars(Country), nrow = 2) +
  theme_minimal_hgrid(font_size = 10) +
  labs(x = " ",
       y = "Medal Count",
       title = "Total Medal Count for Each Country")

medalCountUSA <- medalCountUSA +
  scale_fill_manual(values = c('sienna', 'gold', 'grey'))

medalCount

medalCountUSA
ggplot(scatterplotdata, aes(x=GDP, y=n)) + 
  geom_point() +
  annotate(geom = 'text', x = 3, y = 130,              
            label = corrLabel,              
            hjust = 0, size = 4, )+
  geom_smooth(method = 'lm', se = FALSE)+
  geom_point(size = .25)+
        scale_y_log10() +
        scale_x_log10() +
  theme_classic()+
  theme(
    text = element_text(family = "Times New Roman", color = "grey30"),
    plot.title = element_text(face = "bold", size = 12))+
  labs(x = "GDP (Trillions of Dollars)",
         y = "Medals Earned", title = "Exploring GDP and \nTotal Medal Count Relationship")
finalgold <- finalgold %>% 
    mutate(GDP = GDP/1000000000000)

Usda <- finalgold%>%
  filter(Country == "USA" | Country == "CHN" | Country == "RUS")

Animated <- Usda %>%
  ggplot(
    aes(x = Year, y = n,
        color = Country)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  geom_text_repel(
    aes(label = Country),
    hjust = 0, nudge_x = 1, direction = "y",
    size = 6, segment.color = NA) +
  scale_x_continuous(
    breaks = c(1960, 1980, 2000, 2020),
    limits = c(1960 , 2020),
    expand = expansion(add = c(1, 13))) +
  theme_classic(base_size = 20) +
  scale_color_manual(values = c(
      'sienna', 'forestgreen', 'dodgerblue')) +
  theme(legend.position = 'none') +
  labs(x = 'Year',
       y = 'Medal Count',
       title = 'Annual Medal count \nfor USA, China, and Russia')

Animated <- Animated +
    theme(title = element_text(face = "bold"))

Animated <- Animated +
    transition_reveal(Year)

animate(Animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
AnimatedGdp <- Usda %>%
  ggplot(
    aes(x = Year, y = GDP,
        color = Country)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  geom_text_repel(
    aes(label = Country),
    hjust = 0, nudge_x = 1, direction = "y",
    size = 6, segment.color = NA) +
  scale_x_continuous(
    breaks = c(1960, 1980, 2000, 2020),
    limits = c(1960 , 2020),
    expand = expansion(add = c(1, 13))) +
  theme_classic(base_size = 20) +
  scale_color_manual(values = c(
      'sienna', 'forestgreen', 'dodgerblue')) +
  theme(legend.position = 'none') +
  labs(x = 'Year',
       y = 'GDP (in Trillions)',
       title = 'Annual GDP of \nUSA, China, and Russia')

AnimatedGdp <- AnimatedGdp +
    theme(title = element_text(face = "bold"))

AnimatedGdp <- AnimatedGdp +
    transition_reveal(Year)

animate(AnimatedGdp,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())