Research Question and Background

The Olympic Games have been a longstanding tradition that brings the world together to watch the best athletes compete in the world’s favorite sports. Every 4 years, the summer Olympics host hundreds of countries and each country has a goal of bringing home either a bronze, silver or gold medal. In the recent century, the Summer Olympics have been dominated by a select few countries who always go home with the highest medal counts. Do those countries just have more talented athletes, or is there an underlying factor that gives them an advantage before the games even start? This research project will answer the question, how does a country’s GDP affect overall medals earned at the Summer Olympics?

Discuss Data Sources

https://www.kaggle.com/the-guardian/olympic-games?select=summer.csv

The different medal counts for each country were found using the Olympic Sports and Medal data set which was found on kaggle.com. The original data set was created by the IOC Research and Reference Service. It was then published by The Guardian and was last updated 4 years ago. This data set consists of every winner from each Olympic games, their sport, and their country. It also includes the silver and bronze medalists as well.

https://api.worldbank.org/v2/en/indicator/NY.GDP.MKTP.CD?downloadformat=excel

For the GDP, a raw data set was used from worldbank.org. The source appears reliable as the World Bank is specialized in providing this type of data sets. The data is updated yearly to be kept up to date. It is important to note that this data set excludes countries that are currently still countries.

Data Cleaning

For our GDP dataset, we had to gather the data in order to get it into long format. After this process, it was possible to join the GDP dataset to the medals dataset. Because the two datasets had a column of ISO country codes, the join was made with the codes and the year. Another detail worth mentioning is that the GDP dataset began in 1960. Even though the medal dataset included years prior to this year, it was modified to only include the years from 1960 and onward. Another step taken to clean the data, entries with no GDP info were eliminated as a result some countries such as the Soviet Union were not included in this report.