Research Question

How do factors such as a country’s GDP, health statistics, and average life expectancy affect its population growth rate?

Data Sources

World Population Data Set

Our first data set is a World Population Data set. It includes columns with country names and/or their abbreviations, the populations for different years (1970, 1980, 1990, 2000, 2010, 2015, 2020, and 2022), the area size of the country, the population density, the growth rate, the percentage of the world population, and the rank the country is in population size. The data comes from https://www.kaggle.com/datasets/whenamancodes/world-population-live-dataset. This is not the original data source. This data has been pre-processed from https://www.worldometers.info/world-population/ , which counts the current population. The data from Kaggle is compiled by Aman Chauhan from the worldometers website. The data from worldometers is an elaboration of data by The United Nations, the Department of Economic and Social Affairs, Population Division. The data could be slightly biased by the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America because these countries are projected to have high growth in population in the recent and coming years. However, this may just be a trend and not biased.

World Development Indicators by Countries Dataset

The second data set we used comes from the World Development Indicators by Countries. This data set can be found and downloaded at https://www.kaggle.com/datasets/hn4ever/world-development-indicators-by-countries. The creator of this data set is Noor B.A. and uploaded to Kaggle. From this data set, we used the health_risk_factors.csv file. This data set is originally from World Development Indicators: Health risk factors and future challenges, accessible at World Bank: http://wdi.worldbank.org/table/2.17

This data set consists of country names, the percentage of males over 15 who smoke, the percentage of females over 15 that smoke, the incidence of tuberculosis within the country, the prevalence of diabetes in the country, the number of new HIV infections, the percentage of people between 15-49 that have HIV, the percentage of females with HIV, and the percentage of males with HIV.

The health risk factors data set has NA values, which have to be removed to utilize the data. This means that data for certain countries are missing. Since some values are missing, certain conclusions could be lacking data since not all countries have data available for different variables.

GDP in USD

This data set comes form worldbank.org. This data set gives the GDP per country in US Dollars. The original data comes from a combination of data from the world bank and OECD National Accounts data files. The data is accessible at: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD. This data set contains data on the country name, the country abbreviation, the indicator name, the indicator code, and years 1960-2021 GDP data.

Fertility Rate

This data set comes form worldbank.org. This data set gives the fertility rate of women in different countries. The original data comes from a combination of data from the world bank, world population prospects, Census reports and other statistical publications from national statistical offices, Eurostat: Demographic Statistics, United Nations Statistical Division. Population and Vital Statistics Report, U.S. Census Bureau: International Database, and Secretariat of the Pacific Community: Statistics and Demography Programme. The data is accessible at: https://data.worldbank.org/indicator/SP.DYN.TFRT.IN. This data set contains data on the country name, the country abbreviation, the indicator name, the indicator code, and years 1960-2020 of fertility data.

Life Expectancy at Birth

This data set comes form worldbank.org. This data set gives the life expectancy data in different countries. The original data comes from a combination of data from the world bank, United Nations Population Division, World Population Prospects: 2019 Revision, or derived from male and female life expectancy at birth from sources such as: Census reports and other statistical publications from national statistical offices, Eurostat: Demographic Statistics, United Nations Statistical Division. Population and Vital Statistics Report, U.S. Census Bureau: International Database, and Secretariat of the Pacific Community: Statistics and Demography. The data is accessible at: https://data.worldbank.org/indicator/SP.DYN.LE00.IN. This data set contains data on the country name, the country abbreviation, the indicator name, the indicator code, and years 1960-2020 of life expectancy data.

Answering the Question

Analysis and Charts

To begin to answer our research question, we wanted to get a sense of the countries with the largest population growth. To do so we created a world map showing the range of population growth by country, as shown below. It can be seen from the visual that the regions with higher population growth rates are sub-Saharan Africa, Southeast Asia, and parts of South America. Having this high level view of patterns in population growth starts to give us an idea of what factors contribute to a country having a high population growth rate.

#initial charts: maps showing GDP and Growth rate by country
data1 <- health_system %>% 
    clean_names() %>% 
    filter((health_expenditure_current_percent_of_gdp_2016 != "NA"),
           (health_expenditure_public_percent_of_current_2016 != "NA"))
        

#### 
data2 <- worldpop %>% 
    clean_names() %>% 
    arrange(desc(growth_rate)) %>% 
    mutate(name = fct_reorder(name, growth_rate)) %>% 
    slice(1:20) %>% 
    ggplot() +
    geom_col(
        aes(x = growth_rate, y = name), linewidth = 0.1, color = "black", fill = "black")




#min(worldpop$GrowthRate)
#max(worldpop$GrowthRate)


world <- ne_countries(
    scale = "medium",
    returnclass = "sf")

world_map <- world %>% 
    left_join(worldpop, by = c('iso_a2' = 'CCA3')) 


growth_rate_map <- ggplot(world_map) +
    geom_sf(aes(fill = GrowthRate), color = NA) +
    scale_fill_viridis_c(limits = c(0.90, 1.1))+
    guides(fill = guide_colorbar(
        title.position = "top",
        title.hjust = 0.5,
        barwidth = 10, barheight = 0.5)) +
    theme_void() +
    theme(
        legend.position = 'bottom',
        plot.title = element_text(hjust = 0.5)) +
    labs(
        title = "Population Growth Rate by Country",
        fill = "Growth Rate"
    )
growth_rate_map