Research Question

How do factors such as a country’s GDP, health statistics, and average life expectancy affect its population growth rate?

Data Sources

World Population Data Set

Our first data set is a World Population Data set. It includes columns with country names and/or their abbreviations, the populations for different years (1970, 1980, 1990, 2000, 2010, 2015, 2020, and 2022), the area size of the country, the population density, the growth rate, the percentage of the world population, and the rank the country is in population size. The data comes from https://www.kaggle.com/datasets/whenamancodes/world-population-live-dataset. This is not the original data source. This data has been pre-processed from https://www.worldometers.info/world-population/ , which counts the current population. The data from Kaggle is compiled by Aman Chauhan from the worldometers website. The data from worldometers is an elaboration of data by The United Nations, the Department of Economic and Social Affairs, Population Division. The data could be slightly biased by the Democratic Republic of the Congo, Ethiopia, India, Indonesia, Nigeria, Pakistan, Uganda, the United Republic of Tanzania, and the United States of America because these countries are projected to have high growth in population in the recent and coming years. However, this may just be a trend and not biased.

World Development Indicators by Countries Dataset

The second data set we used comes from the World Development Indicators by Countries. This data set can be found and downloaded at https://www.kaggle.com/datasets/hn4ever/world-development-indicators-by-countries. The creator of this data set is Noor B.A. and uploaded to Kaggle. From this data set, we used the health_risk_factors.csv file. This data set is originally from World Development Indicators: Health risk factors and future challenges, accessible at World Bank: http://wdi.worldbank.org/table/2.17

This data set consists of country names, the percentage of males over 15 who smoke, the percentage of females over 15 that smoke, the incidence of tuberculosis within the country, the prevalence of diabetes in the country, the number of new HIV infections, the percentage of people between 15-49 that have HIV, the percentage of females with HIV, and the percentage of males with HIV.

The health risk factors data set has NA values, which have to be removed to utilize the data. This means that data for certain countries are missing. Since some values are missing, certain conclusions could be lacking data since not all countries have data available for different variables.

GDP in USD

This data set comes form worldbank.org. This data set gives the GDP per country in US Dollars. The original data comes from a combination of data from the world bank and OECD National Accounts data files. The data is accessible at: https://data.worldbank.org/indicator/NY.GDP.MKTP.CD. This data set contains data on the country name, the country abbreviation, the indicator name, the indicator code, and years 1960-2021 GDP data.

Fertility Rate

This data set comes form worldbank.org. This data set gives the fertility rate of women in different countries. The original data comes from a combination of data from the world bank, world population prospects, Census reports and other statistical publications from national statistical offices, Eurostat: Demographic Statistics, United Nations Statistical Division. Population and Vital Statistics Report, U.S. Census Bureau: International Database, and Secretariat of the Pacific Community: Statistics and Demography Programme. The data is accessible at: https://data.worldbank.org/indicator/SP.DYN.TFRT.IN. This data set contains data on the country name, the country abbreviation, the indicator name, the indicator code, and years 1960-2020 of fertility data.

Life Expectancy at Birth

This data set comes form worldbank.org. This data set gives the life expectancy data in different countries. The original data comes from a combination of data from the world bank, United Nations Population Division, World Population Prospects: 2019 Revision, or derived from male and female life expectancy at birth from sources such as: Census reports and other statistical publications from national statistical offices, Eurostat: Demographic Statistics, United Nations Statistical Division. Population and Vital Statistics Report, U.S. Census Bureau: International Database, and Secretariat of the Pacific Community: Statistics and Demography. The data is accessible at: https://data.worldbank.org/indicator/SP.DYN.LE00.IN. This data set contains data on the country name, the country abbreviation, the indicator name, the indicator code, and years 1960-2020 of life expectancy data.

Answering the Question

Analysis and Charts

To begin to answer our research question, we wanted to get a sense of the countries with the largest population growth. To do so we created a world map showing the range of population growth by country, as shown below. It can be seen from the visual that the regions with higher population growth rates are sub-Saharan Africa, Southeast Asia, and parts of South America. Having this high level view of patterns in population growth starts to give us an idea of what factors contribute to a country having a high population growth rate.

#initial charts: maps showing GDP and Growth rate by country
data1 <- health_system %>% 
    clean_names() %>% 
    filter((health_expenditure_current_percent_of_gdp_2016 != "NA"),
           (health_expenditure_public_percent_of_current_2016 != "NA"))
        

#### 
data2 <- worldpop %>% 
    clean_names() %>% 
    arrange(desc(growth_rate)) %>% 
    mutate(name = fct_reorder(name, growth_rate)) %>% 
    slice(1:20) %>% 
    ggplot() +
    geom_col(
        aes(x = growth_rate, y = name), linewidth = 0.1, color = "black", fill = "black")




#min(worldpop$GrowthRate)
#max(worldpop$GrowthRate)


world <- ne_countries(
    scale = "medium",
    returnclass = "sf")

world_map <- world %>% 
    left_join(worldpop, by = c('iso_a2' = 'CCA3')) 


growth_rate_map <- ggplot(world_map) +
    geom_sf(aes(fill = GrowthRate), color = NA) +
    scale_fill_viridis_c(limits = c(0.90, 1.1))+
    guides(fill = guide_colorbar(
        title.position = "top",
        title.hjust = 0.5,
        barwidth = 10, barheight = 0.5)) +
    theme_void() +
    theme(
        legend.position = 'bottom',
        plot.title = element_text(hjust = 0.5)) +
    labs(
        title = "Population Growth Rate by Country",
        fill = "Growth Rate"
    )
growth_rate_map

gdpClean <- GDPdata2022 %>% 
    clean_names()

#View(gdpClean)

world_gdp <- world %>% 
    left_join(gdpClean, by = c('iso_a3' = 'country_code'))
    
gdp_map <- world_gdp %>% 
    ggplot() +
    geom_sf(aes(fill = x2020/10000000000), color = NA) +
    scale_fill_viridis_c(
        direction = -1,
        trans = 'sqrt', 
        labels = scales::comma) +
    guides(fill = guide_colorbar(
        title.position = "top", reverse = FALSE,
        title.hjust = 0.5,
        barwidth = 13, barheight = 0.5)) +
    theme_void() +
    theme(
        legend.position = 'bottom',
        plot.title = element_text(hjust = 0.5)) +
    labs(
        title = "GDP in Millions by Country",
        fill = "GDP (tens of billions, $)"
    ) 


gdp_map

Through research we have come to understand that economic factors majorly contribute to the population growth rate of a country. Gross Domestic Product (GDP) is a significant indicator of economic status. In order to gain a broad sense of the difference in GDP, we created a world map showing the differences in GDP by country in tens of billions of United States Dollars ($). Here we can see the country with the largest GDP’s in dark purple and the country’s with lower GDP’s in yellow. Comparing the two world maps, we start to see more often than not, country’s with smaller GDP’s have larger population growth rates.

Influential Factors for Population Growth

#Secondary Charts #2: Ave Life Expectancy vs Ave Family Size
family_size_data <- ChildrenPerFamilyData %>% 
    clean_names() %>% 
    mutate(kids_2020 = x2020) %>% 
    filter(kids_2020 != "NA")


#View(family_size_data)    

life_expectancy_data <- AveLifeExpectancyData %>% 
    clean_names() %>% 
    left_join(family_size_data, by = c("country_code" = 'country_code')) 

corr <- cor(
    life_expectancy_data$x2020.x,
    life_expectancy_data$kids_2020,
    use = "complete.obs"
)
corLab <- paste("r = ", round(corr,2))

cor_plot <- ggplot(life_expectancy_data) +
    geom_point(aes(x = x2020.x, y = kids_2020), alpha = 0.7) +
    annotate(
        geom = 'text',
        x = 75, y = 6, 
        label = corLab,
        hjust = 0, size = 6
    ) +
    labs(
        x = "Average Life Expectancy (years)",
        y = "Average Number of Children per Family", 
        title = "Life Expectancy vs Children per Family"
    )

cor_plot

#View(life_expectancy_data)

#add another plot connecting it to growth rate

Above is a scatter plot showing the correlation between average life expectancy in a country in years and the average number of children in a family by country. As you can see the correlation is quite strong and has a pearson correlation coefficient of 85% which is significant. Clearly the longer people expect to live, the less children they have. When your country’s average life expectancy is lower, then a life timeline is shortened as well. This mean you typically start having children at a younger age and your kids survival is much less than certain, therefore you have multiple children. Countries with a lower average life expectancy also have a lower GDP, translating to less available resources for citizens such as access to health care, education, and even birth control. All these factors result in higher average number of children per family and more children being born leads to a higher population growth rate. This explains the high correlation coefficient shown above.

Diving Deeper

Population Change

#Secondary Maps:
 population_difference <- worldpop %>%
    select(Name,"1970","2022",Rank) %>%
    mutate(
        `2022` = as.numeric(`2022`), 
        `1970` = as.numeric(`1970`),
        `2022` = `2022` / 10^3, # Population in Millions
        `1970` = `1970` / 10^3,
        difference_pop = (`2022` - `1970`)
    )

population_difference %>%
    # select(Name, difference_pop) %>%
    arrange(desc(`2022`)) %>% 
    slice(1:20) %>% 
    mutate(Name = fct_reorder(Name, `2022`)) %>% 
    ggplot() +
    geom_segment((aes(y = Name, yend = Name, x = `1970`, xend = `2022`))) +
    geom_point(aes(y = Name, x = `1970`), size = 2.5, color = 'lightblue') +
    geom_point(aes(y = Name, x = `2022`), size = 2.5, color = 'steelblue') +
    scale_x_continuous(labels = scales::comma) +
    labs(x = 'Increase in Population in Millions',
         y = 'Country',
         color = 'Year',
         title = 'The Top 20 Countries in Population Growth',
         subtitle = "(1970 - 2022)"
         )

#dumbell charts -- claire

This graph displays the top 20 countries with the highest population growth from 1970 to 2022. This graph covers a portion of our research question through displaying the top 20 countries in population growth which describes the growth rate of the top 20 twenty countries.This graph coincides with other results and research that we have found where China and India are the top two countries in population growth. Moreover, it displays that China and India’s population have grown exponentially more than other countries in the top 20 for population growth. India’s growth rate from 1970 to 2022 was 20,666,666.6667 million per year. China’s growth rate from 1970 to 2022 was 14,238,095.24 million per year. On the other hand, Germany’s growth rate (in the top 20 for population growth) from 1970 to 2022 was only 880,000 per year.

Health Care Factors

Gdp_plot <- health_system %>%
    arrange(desc(`Health expenditure Current % of GDP 2016`)) %>%
    slice(1:20)

Gdp_plot  %>%
    select(`Health expenditure Current % of GDP 2016`, Country) %>%
    ggplot() +
    geom_col(aes(x = `Health expenditure Current % of GDP 2016`/100, y = reorder( Country,`Health expenditure Current % of GDP 2016`)), fill = "light blue") +
    scale_x_continuous(labels = scales::percent)+
    labs(
        title = "% of GDP Spent on Healthcare, Top 20",
        x = "% of GDP Spent on Healthcare, 2016",
        y = "Country"
    )

This graph displays the top 20 countries in healthcare expenses in % of GDP. This graph covers a portion of our research question through displaying the top 20 countries in healthcare expenses in % of GDP which allows us to compare the top 20 countries in this GDP stat to the countries with the highest growth rate. The majority of these countries in this graph are the most developed countries in the world such as the United States, Switzerland, Sweden, Germany, France, Canada, and Norway. This graph allows us to compare the 20 countries in this GDP stat to the top 20 countries in heath stats, other GDP stats, and avg. life expectancy. Utilizing those comparisons, we should be able to find countries that are in the top for multiple stats and compare how those countries stand in term of their growth rate among other countries.

https://worldpopulationreview.com/country-rankings/developed-countries

#countrys with growth rate under 1%
Gdp_plot <- health_system %>%
    arrange(desc(`Health expenditure Current % of GDP 2016`)) %>%
    slice(1:20)

Gdp_plot  %>%
    select(`Health expenditure Current % of GDP 2016`, Country) %>%
    ggplot() +
    geom_col(aes(x = `Health expenditure Current % of GDP 2016`/100, y = reorder( Country,`Health expenditure Current % of GDP 2016`)), fill = "dark green") +
    scale_x_continuous(labels = scales::percent)+
    scale_y_discrete(
    breaks = c('United States', 'Switzerland', 'Cuba', "Brazil", 'Palau', 'France', 'Germany', 'Sweden', 'Japan', 'Maldives', 'Norway', 'Canada', 'Denmark', 'Austria', 'Andorra')) +
    labs(
        title = "% of GDP Spent on Healthcare, Top 20",
        x = "% of GDP Spent on Healthcare, 2016",
        y = "Country"
    )

The figure above is the same as the figure before, however, this time only the countries with a growth rate of under 1% are shown. Fifteen of the twenty countries with large percentages of their GDP spent on healthcare expenditures also have a pretty low growth rate. This signifies that there is a strong correlation between health care accessibility and a country’s population growth rate.

Gdp_plot_least <- health_system %>%
    arrange((`Health expenditure Current % of GDP 2016`)) %>%
    slice(1:20)

Gdp_plot_least  %>%
    select(`Health expenditure Current % of GDP 2016`, Country) %>%
    ggplot() +
    geom_col(aes(x = `Health expenditure Current % of GDP 2016`/100, y = reorder( Country,`Health expenditure Current % of GDP 2016`)), fill = "light blue") +
    scale_x_continuous(labels = scales::percent) +
    labs(
        title = "% of GDP Spent on Healthcare, Bottom 20",
        x = "% of GDP Spent on Healthcare, 2016",
        y = "Country"
    )

This graph displays the lowest 20 countries in healthcare expenses in % of GDP. This graph covers a portion of our research question through allowing us to compare the lowest 20 countries in this GDP stat to the countries with the lowest growth rate. The majority of these countries in this graph are undeveloped countries in the world such as the Lao People’s Democratic Republic (Loa PDR), Papua New Guinea, Eritrea, Bangladesh, Bhutan,and Angola.This graph allows us to compare the lowest 20 countries in this GDP stat to the top 20 countries in heath stats, other GDP stats, and average life expectancy. Utilizing those comparisons, we should be able to find countries that have multiple low stats in common and compare how those countries stand in terms of their growth rate among other countries.

#countries with a growth rate greater than 1.5%
Gdp_plot_least <- health_system %>%
    arrange((`Health expenditure Current % of GDP 2016`)) %>%
    slice(1:20)

Gdp_plot_least  %>%
    select(`Health expenditure Current % of GDP 2016`, Country) %>%
    ggplot() +
    geom_col(aes(x = `Health expenditure Current % of GDP 2016`/100, y = reorder( Country,`Health expenditure Current % of GDP 2016`)), fill = "dark green") +
    scale_x_continuous(labels = scales::percent) +
    scale_y_discrete(
    breaks = c('Nigeria', 'Equatorial Guinea', 'Iraq', "Qatar", 'Gabon', 'Eritrea', 'Angola', 'Pakistan', 'Japan', 'Maldives', 'Norway', 'Canada', 'Denmark', 'Papua New Guinea')) +
    labs(
        title = "% of GDP Spent on Healthcare, Bottom 20",
        x = "% of GDP Spent on Healthcare, 2016",
        y = "Country"
    )

The figure above is the same as the previous chart, but now it only displays countries with a growth rate greater than 1.5%. Of the twenty countries highlighted as having a very small percentage of their GDP’s going towards health care measures, nine also have a growth rate of over 1.5%. Clearly, health care spending and a country’s population growth rate are positively correlated.

Conclusions

In conclusion, our data analysis has provided valuable insights into the factors that influence population growth by country. After carefully examining a range of variables including average life expectancy, GDP per capita, and the average number of children per family, we have identified several key trends and patterns.

One of the most significant findings from our analysis is the relationship between average life expectancy and population growth. Our data shows that countries with higher average life expectancies tend to have slower population growth, likely due to a combination of factors including improved healthcare and increased access to education. On the other hand, countries with lower average life expectancies tend to have faster population growth, potentially due to higher fertility rates and a younger population.

Another important factor that emerged from our analysis is the role of economic development in driving population growth. Countries with higher GDP per capita tend to have faster population growth, likely due to a range of factors including increased access to healthcare and education, as well as improved living standards. This suggests that economic development can play a critical role in fostering population growth, by providing individuals with the resources and opportunities they need to thrive.

Finally, our analysis also highlighted the impact of cultural and social factors on population growth. In particular, we found that countries with higher average number of children per family tend to have faster population growth. This could be due to a range of factors including cultural norms and values, as well as access to healthcare and other resources.

Overall, our analysis has provided valuable insights into the complex factors that influence population growth by country. While further research is needed to fully understand the mechanisms at play, our findings suggest that a combination of economic, social, and demographic factors play a critical role in determining population growth in different countries.

#spelling::spell_check_files('report.Rmd')

Appendix

Dictionary for World Population Data Set:

variable class description
CCA3 string 3 Digit Country/Territories Code
Name string Name of the Country/Territories
2022 double Population of the Country/Territories in year 2022
2020 double Population of the Country/Territories in year 2020
2015 double Population of the Country/Territories in year 2015
2010 double Population of the Country/Territories in year 2010
2000 double Population of the Country/Territories in year 2000
1990 double Population of the Country/Territories in year 1990
1980 double Population of the Country/Territories in year 1980
1970 double Population of the Country/Territories in year 1970
Area (km²) double Area of the Country/Territories
Density (per km²) double Population density of Country/Territories
GrowthRate double The rate which the population is growing
World Population Percentage double The % the population of the country/territories makes up of the whole world population
Rank double What the country/territory ranks worldwide in population

Dictionary for Health Risk Factors Data Set:

variable class description
Country string Country/Territories
Prevalence of smoking Male % of adults 2016 double The percentages of males that smoke
Prevalence of smoking Female % of adults 2016 double The percentages of females that smoke
Incidence of tuberculosis per 100,000 people 2018 double The number of people out of 100,000 that have tuberculosis
Prevalence of diabetes % of population ages 20 to 79 2019 double The amount of people 20-79 with diabetes
Incidence of HIV Total per 1,000 uninfected population ages 15-49 2018 double The number of people per 1000 that have HIV
Prevalence of HIV Total % of population ages 15-49 2018 double The percentage of people that have HIV
Prevalence of HIV Women’s share of population ages 15+ living with HIV % 2018 double The number of women with HIV
Prevalence of HIV Youth, Male % of population ages 15-24 2018 double % of males with HIV
Prevalence of HIV Youth, Female % of population ages 15-24 2018 double % of women with HIV
Antiretroviral therapy coverage % of people living with HIV 2018 double % of people covered by HIV therapy
Cause of death Communicable diseases and maternal, prenatal, and nutrition conditions % of population 2016 double % of deaths caused by communicable diseases
Cause of death Non-communicable diseases % of population 2016 double % of death caused by non-communicable dieases
Cause of death Injuries % of population 2016 double % of death caused by injuries

Dictionary for GDP Data Set:

variable class description
Country Name string The name of the country
Country Code string 3 Digit Country/Territories Code
Indicator Name string Noting that the GDP values are in USD
Indicator Name string Noting that the GDP values are indicated
1960-2021 double The GDP Values per country from 1960-2021

Dictionary for Fertility Data Set:

variable class description
Country_Name string The name of the country
Country_Code string 3 Digit Country/Territories Code
Indicator_Name string Noting that the fertility rates are in total
Indicator_Name string Noting that the fertility rates are indicated
x1960-x2021 double The Fertility rates per country from 1960-2021

Dictionary for Life Expectancy Data Set:

variable class description
Country_Name.x string The name of the country
Country_Code string 3 Digit Country/Territories Code
Indicator_Name.x string Noting that the life expectancy percentages are in total
Indicator_Name string Noting that the life expectancy rates are indicated
x1960.x-x2021.x double The life expectancy rates per country from 1960-2021