Cuban’s migration

Author

Sabina Pereira, Federica Negron, Marena Marzari

Published

December 10, 2023

Introduction

In the recent decades the Cuban immigrant population in the United States has witnessed a remarkable increase, positioning itself among the top ten national origin immigrant groups. This increase prompts you to question the potential factors which might have caused this substantial growth. Researching this went beyond academic curiosity, as it would shed a light on the elements shaping the cultural, social, and economic dynamics of both the immigrant community and the host society. By examining and comparing multiple data sets as well as reputable sources such as Statista and the Pew Research Center, our study delves into a spectrum of variables, ranging from nativity and race to socioeconomic indicators. Our goal is to highlight and inform about the multifaceted influences contributing to the expansion of the Cuban immigrant population in the United States.

Research Question

The Cuban immigrant population has doubled in the last decades making it now one of the top ten national origin U.S immigrant groups. What factors do you think have influenced this?

Data Description

link1

Data Source

Description: Statista is a leading provider of market and consumer data. It offers statistical information on various topics, including demographics, business, and economy. Reference: The information was obtained from a publication by the Statista Research Department. #### Data Collection - Original Data: The original data is from Statista’s research, and the numbers represent the number of Cuban emigrants by destination country in 2020. - Pre-processing: The data presented is pre-processed since the numbers are already provided in a tabular form, listing the number of emigrants for each destination country. - Collection Details: Statista generally aggregates data from various sources, including government reports, international organizations, and reputable research institutions.

Data Characteristics

  • Potential Biases: If there were any potential biasses would depend on wether Statista’s original data sources were bias.

link2

Data Source

Pew Research Center: Description: The data is sourced from the Pew Research Center, a nonpartisan think tank that conducts public opinion polling, demographic research, and social science analysis. Reference: The information is gathered from the 2018 American Community Survey (1% IPUMS) and presented in the report titled “Statistical Portrait of the Foreign-Born Population in the United States, 2018.”

Data Collection

  • Original Data: The original data is collected from the 2018 American Community Survey (1% IPUMS), which is a nationally representative survey conducted by the U.S. Census Bureau.
  • Pre-processing: The data is pre-processed by the Pew Research Center for presentation in summary tables.
  • Collection Details: The American Community Survey is an ongoing survey that collects detailed demographic, social, economic, and housing information from a sample of households annually. It covers a wide range of topics and provides valuable information on the U.S. population.

Data Characteristics

  • Potential Biases: The American Community Survey aims to be representative of the U.S. population, but biases can still exist due to sampling methods, non-response, and other factors. Pew Research Center typically provides information on their methodologies to address potential biases.

Data Dictionary

Here is a data dictionary for the variables mentioned:

  • Nativity of U.S. Immigrants:
    • Foreign-born population total
    • Percent born in Mexico
    • Percent who are citizens
  • Race of U.S. Immigrants:
    • Percent who are white alone, not Hispanic
  • Language Use Among U.S. Immigrants:
    • Percent speaking English at least very well (ages 5 and older)
  • Age and Gender of U.S. Immigrants:
    • Median age (in years)
    • Percent who are female
  • Marital Status and Fertility of U.S. Immigrants:
    • Percent who are married (ages 18 and older)
    • Percent who are women ages 15-44 giving birth in the past year
  • Education of U.S. Immigrants:
    • High school or less
    • Two-year degree/some college
    • Bachelor’s degree or more
  • Work Status and Occupations of U.S. Immigrants:
    • Percent in labor force (among civilian population)
  • Earnings and Income of U.S. Immigrants:
    • Median annual personal earnings (in 2018 dollars, among those with earnings)
    • Median annual household income (in 2018 dollars)
  • Poverty Among U.S. Immigrants:
    • Percent living in poverty
  • Homeownership and Households of U.S. Immigrants:
    • Percent in family households
  • Region and Top States of Residence of U.S. Immigrants:
    • West, California
    • South, Texas, Florida
    • Northeast, New York, New Jersey
    • Midwest

link3

Data Source

  • Description: The data on population growth in the Dominican Republic does not explicitly mention the original source. It is most likely compiled from demographic surveys, censuses, and statistical reports conducted by national authorities in the Dominican Republic.
  • Data Collection Details:
  • Original Data: The data is likely collected through official reports provided by the National Statistics Office or other relevant government agencies in the Dominican Republic.

Data Characteristics

  • Potential Biases: Biases may exist due to variations in data collection methods, underreporting, or other factors.

The source in this article is unspecified, possibly a research or statistical organization. The validity of this article is taken from the original data source and collection methods which are not clearly mentioned, which might raise concerns about data validity. No apparent missing data.

link4

Data Source

  • Description:The population data for Haiti from 1950 to 2023, including United Nations projections, is sourced from the United Nations - World Population Prospects. The data provides historical population figures, growth rates, and projections.
  • Data Collection Details:
    • Original Data: The primary data source is the United Nations - World Population Prospects.
    • Pre-processing: The data is a pre-processed format in the provided text, summarizing population figures, annual growth rates, and projections.

Data Characteristics

  • Potential Biases:
    • The reliability of the data depends on the accuracy of reporting and estimation methods used by the United Nations.
    • As with any projections, future estimates may be subject to uncertainties and assumptions.

The source of this data is the United Nations - World Population Prospects. The Validity is high due to the data being taken from the original source, collected by the United Nations using standardized methods. No missing data.

link5

Data Source

  • Description:The population data for Cuba in 1980, along with historical trends, is provided by countryeconomy.com. The data includes details about the total population, gender distribution, and population density for the specified year.
  • Data Collection Details:
    • Original Data: The primary data source is countryeconomy.com.
    • Pre-processing: The data appears to be presented in a pre-processed format in the provided text, summarizing population figures, gender distribution, and density.

Data Characteristics

  • Potential Biases
    • The reliability of the data depends on the accuracy of reporting and estimation methods used by countryeconomy.com.
    • No information is provided about potential biases in the data.

The data source on Cuban migration appears to be from an unspecified source, and it’s crucial to have more information about its origin. The Original Source is not provided in the information. The Pre-Processed Data is not disclosed. Therefore the data validity is not the best because the origin of the data and how it was collected is not mentioned. We cannot be completely sure if there are biases without knowing the source and collection method.

link 6

Data Source

  • Description: The data source provides information on the live population of Cuba, historical population data from 1950 to 2023, and forecasts for the years 2025 to 2050. Additionally, it includes details on the yearly population growth rate, demographic indicators, and main cities by population in Cuba.
  • Data Collection Details:
    • Original Data: The primary data sources are Worldometer and the United Nations, Department of Economic and Social Affairs, Population Division.
    • Pre-processing: The data appears to be presented in a pre-processed format in the provided text, summarizing population figures, growth rates, and other demographic indicators.

Data Characteristics

  • Missing Data: There is no apparent missing data.
  • Potential Biases:
    • The reliability of the data depends on the accuracy of reporting and estimation methods used by Worldometer and the United Nations.
    • No information is provided about potential biases in the data.

Data Dictionary

  • Live Population Data:
    • Current Population (as of December 10, 2023): 11,185,626
  • Historical Population Data (1950 - 2023):
    • Year
    • Population
  • Yearly Population Growth Rate (%):
    • Year
    • Yearly Growth Rate (%)
  • Population Forecast (2025 - 2050):
    • Year
    • Population
    • Yearly Growth Rate (%)
    • Yearly Change
    • Migrants (net)
    • Median Age
    • Fertility Rate
    • Density (P/Km²)
    • Urban Pop %
    • Urban Population
    • Country’s Share of World Pop
    • World Population
    • Global Rank
  • Demographic Indicators:
    • Life Expectancy
    • Infant Mortality
    • Deaths Under Age 5
  • Main Cities by Population in Cuba:
    • Rank
    • City Name
    • Population

Data Processing

The chart depicting Cuba’s population over the years reveals dynamic demographic trends. Despite facing economic challenges and political shifts, Cuba’s population has exhibited resilience and steady growth.

Code
xlsx_path4 <- here::here('data_raw','population.xlsx')
dataraw4<- read_excel(xlsx_path4 , sheet = 1, skip = 16)

data4 <- dataraw4[c(14982:15052), c(3,11,13)]
colnames(data4) <- c("country","year","total")

dataplot4 <- data4 %>% 
    mutate(total = as.numeric(total)) %>% 
    ggplot(
        aes(x = year, y = total)
    )+
    geom_line()+
    labs(x = 'Year',
       y = 'Total Population (in thousands)',
       title = "Cuba's Population through the years") +
    theme_bw(base_size = 12)


   

ggplotly(dataplot4)

Seeing the chart below, there is another peak in 2008. In 2008, Cuba experienced a significant event related to a leadership transition. Fidel Castro, who had been the long-time leader of Cuba, officially resigned from the presidency in February 2008 due to health issues. His brother, Raúl Castro, succeeded him as the new president of Cuba.

In 2008, economic difficulties, political repression, and a desire for greater opportunities were among the factors that led some Cubans to attempt to leave their country. Cuba faced economic challenges, including low wages, limited job opportunities, and economic stagnation. The country’s centrally planned economy struggled to provide a high standard of living for its citizens, prompting some individuals to seek better economic prospects abroad. The 2008 economic crisis, triggered by the collapse of the U.S. subprime mortgage market, resulted in a severe financial downturn. Widespread issues with the liquidity and solvency of financial institutions led to a credit freeze and a substantial economic decline. The crisis had global ramifications, causing recessions, significant job losses, and prolonged economic instability in various countries.

Code
xlsx_path3 <- here::here('data_raw','stats1990.xlsx')
dataraw3<- read_excel(xlsx_path3 , sheet = 1, skip = 8)

data3 <- dataraw3[c(133),c(1,2,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31)]
colnames(data3) <- (c("country", "year_2000", "year_2006", "year_2007", "year_2008","year_2009","year_2010","year_2011","year_2012","year_2013","year_2014","year_2015","year_2016","year_2017","year_2018","year_2019","year_2021"))


data3long <- data3 %>%
    pivot_longer(
        names_to = 'Year',
        values_to = "Number",
        cols = year_2000:year_2021
    )

data_plot3 <- data3long %>% 
    separate(Year, into = c('drop 1', 'year'), sep = '_') %>% 
    select(country, year, Number) %>% 
    mutate(year = as.numeric(year)) %>% 
    mutate(Number = as.numeric(Number))

plot3 <- data_plot3 %>% 
    mutate(Number = Number/10^3) %>% 
    ggplot(
        aes(x = year, y = Number)
    )+
    geom_line(linewidth = 1)+
    geom_point(size = 1)+
    geom_text_repel(
    aes(label = country),
    hjust = 0, nudge_x = 1, direction = "y",
    size = 6, segment.color = NA)+
    theme_half_open(font_size = 18)+
    theme(legend.position = 'none')+
    labs(x = 'Year',
       y = 'Amount of people (in thousands)',
       title = "Migration out of Cuba",
       subtitle = "Watch out for 2008")

animation2 <- plot3 +
    transition_reveal(year)

animation2

2020 was a year in which things took a turn in many ways. In 2020, Covid hit and economics all over the world were bad. What happened in the other countries that most Cubans who fled their country flew mostly to the United States of America?

In 2020, the world experienced a profound shift due to the COVID-19 pandemic, leading to significant economic challenges globally. The pandemic caused widespread health crises and triggered economic downturns in many countries. Governments implemented lockdowns, travel restrictions, and social distancing measures to curb the spread of the virus, which resulted in disruptions to businesses, job losses, and economic hardships.

The choice of the United States as a destination for many Cuban migrants historically stems from a variety of factors, including political differences with the Cuban government, economic motivations, and the pursuit of better opportunities and freedoms. The economic upheaval and health crisis of 2020 may have influenced the experiences of Cuban expatriates in the United States, as they navigated the unique challenges posed by the pandemic in their adopted country.

Code
xlsx_path <- here::here('data_raw','statistic_2020.xlsx')
data1 <- read_excel(xlsx_path , sheet = 2, skip = 4)

data1$english <- (c('United States', 'Spain', 'Italy', 'Chile', 'Canada', 'Germany', 'Brazil', "Mexico", "Puerto Rico", "Venezuela"))
colnames(data1) <- (c("country", "number", "english"))

data1 %>% 
    drop_na() %>% 
    mutate(number = as.numeric(number)) %>% 
    mutate(number = number/10^3) %>% 
    mutate(is_country = if_else(english == "United States", TRUE, FALSE)) %>% 
    ggplot(
        aes(x =  number, y = reorder(english, number), fill = is_country)
    )+
    geom_col()+ 
    labs(
        x = 'Total of migrants (in thousands)',
        y = 'Countries',
        title = '96% of Cubans go to the US',
        subtitle = 'Top 10 countries where cubans migrated in 2020'
    )+ scale_fill_manual(values= c('grey','pink'))+
    theme_minimal()+
    theme(legend.position = 'none')

Compared to other countries in the Caribbean, is Cuba one of the most who have migrated? Why is Cuba the one with the most number of people who fled the country? What happened in 1968 where it peaked?

In 1968, Cuba, led by Fidel Castro, continued its trajectory as a socialist state aligned with the Soviet Union during the Cold War. Castro’s government implemented economic reforms, including nationalization of industries and agrarian changes, aiming for social equality. The country strengthened ties with the Soviet Union, solidifying its role as a key ally. Internationally, Cuba supported revolutionary movements in Latin America and Africa. This period marked a consolidation of socialist policies and the enduring influence of the Cuban government’s alignment with the Soviet bloc.

Code
xlsx_path2 <- here::here('data_raw','stats1990.xlsx')
data2 <- read_excel(xlsx_path2 , sheet = 2, skip = 9)

data2c <- data2[c(137, 139,141),]
colnames(data2c) <- (c("country", "year_1960", "year_1970", "year_1980", "year_1990"))

data2_long <- data2c %>% 
    pivot_longer(
        names_to = "years",
        values_to = 'number',
        cols = year_1960:year_1990
    )

data4plot <- data2_long %>%
    separate(years, into = c('drop 1', 'year'), sep = '_') %>% 
    select(country, year, number) %>% 
    mutate(year = as.numeric(year)) %>% 
    mutate(number = as.numeric(number)) %>% 
    mutate(number = number/10^6) %>% 
    mutate(total_pop_in_millions = c('7.2','8.8','9.8','10.6','3.4','4.5','5.8','9.8','3.9','4.6','5.6','6.9')) %>% 
    drop_na() %>% 
    mutate(percentage = number/as.numeric(total_pop_in_millions)*100)

anim_plot <- data4plot %>% 
    ggplot(
        aes(x = year, y = percentage, color = country)
    )+
    geom_line(linewidth = 1)+
    geom_point(size = 1)+
      geom_text_repel(
    aes(label = country),
    hjust = 0, nudge_x = 1, direction = "y",
    size = 6, segment.color = NA)+
    scale_x_continuous(
        breaks = seq(1960, 1990, 10),
        expand = expansion(add = c(1, 13)))+
    scale_color_manual(
        values = c('darkgreen','blue','orange')
    )+
    theme_half_open(font_size = 18)+
    theme(legend.position = 'none')+
    labs(x = 'Year',
       y = 'Percentage of population',
       title = "What happened in Cuba in 1968?")

animation <- anim_plot +
    transition_reveal(year)

animation

Conclusion

Based on the comprehensive analysis of diverse datasets, a compelling narrative emerges, shedding light on the substantial emigration trend of Cubans seeking opportunities abroad, particularly in the United States. The data from Statista reveals the scale of Cuban emigrants by destination country in 2020, emphasizing the U.S. as a primary choice. Insights from the Pew Research Center’s 2018 American Community Survey further reinforces our understanding, delving into the demographic characteristics, socio-economic indicators, and regional distribution of U.S. immigrants. However, acknowledging potential biases in the American Community Survey is crucial for a nuanced interpretation.The population data from countryeconomy.com in 1980 provides historical context, and the Worldometer and United Nations data offer a comprehensive view of Cuba’s live population trends. Despite economic and political challenges, the resilience and steady growth exhibited in the chart depicting Cuba’s population evolution underscore the nation’s dynamic demographic landscape.

Moving forward, our analysis prompts critical questions about the future of Cuban migration. Will the current emigration rate persist, and how might shifts in Cuba’s political and socio-economic landscape influence these patterns? The prospect of reverse migration, if positive transformations occur, introduces a layer of complexity to this dynamic. In conclusion, the trajectory of Cuban migration is intricately tied to the evolving internal dynamics of the nation, emphasizing the complex relationship between political stability, economic conditions, and the mobility of its people. To deepen our understanding, future research could explore additional data sources, such as qualitative studies capturing the personal narratives of Cuban immigrants and their experiences, providing a more nuanced perspective on the factors influencing migration decisions.

Appendix

Code
# Load libraries and settings here
library(tidyverse)
library(here)
library(ggplot2)
library(dplyr)
library(tidyr)
library(plotly)
library(stringr)
library(dslabs)
library(readxl)
library(cowplot)
library(viridis)
library(magick)
library(ggrepel)
library(gganimate)

knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  comment = "#>",
  fig.path = "figs/", # Folder where rendered plots are saved
  fig.width = 7.252, # Default plot width
  fig.height = 4, # Default plot height
  fig.retina = 3 # For better plot resolution
)

# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))

# Write code below here to load any data used in project


xlsx_path4 <- here::here('data_raw','population.xlsx')
dataraw4<- read_excel(xlsx_path4 , sheet = 1, skip = 16)

data4 <- dataraw4[c(14982:15052), c(3,11,13)]
colnames(data4) <- c("country","year","total")

dataplot4 <- data4 %>% 
    mutate(total = as.numeric(total)) %>% 
    ggplot(
        aes(x = year, y = total)
    )+
    geom_line()+
    labs(x = 'Year',
       y = 'Total Population (in thousands)',
       title = "Cuba's Population through the years") +
    theme_bw(base_size = 12)


   

ggplotly(dataplot4)

xlsx_path3 <- here::here('data_raw','stats1990.xlsx')
dataraw3<- read_excel(xlsx_path3 , sheet = 1, skip = 8)

data3 <- dataraw3[c(133),c(1,2,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31)]
colnames(data3) <- (c("country", "year_2000", "year_2006", "year_2007", "year_2008","year_2009","year_2010","year_2011","year_2012","year_2013","year_2014","year_2015","year_2016","year_2017","year_2018","year_2019","year_2021"))


data3long <- data3 %>%
    pivot_longer(
        names_to = 'Year',
        values_to = "Number",
        cols = year_2000:year_2021
    )

data_plot3 <- data3long %>% 
    separate(Year, into = c('drop 1', 'year'), sep = '_') %>% 
    select(country, year, Number) %>% 
    mutate(year = as.numeric(year)) %>% 
    mutate(Number = as.numeric(Number))

plot3 <- data_plot3 %>% 
    mutate(Number = Number/10^3) %>% 
    ggplot(
        aes(x = year, y = Number)
    )+
    geom_line(linewidth = 1)+
    geom_point(size = 1)+
    geom_text_repel(
    aes(label = country),
    hjust = 0, nudge_x = 1, direction = "y",
    size = 6, segment.color = NA)+
    theme_half_open(font_size = 18)+
    theme(legend.position = 'none')+
    labs(x = 'Year',
       y = 'Amount of people (in thousands)',
       title = "Migration out of Cuba",
       subtitle = "Watch out for 2008")

animation2 <- plot3 +
    transition_reveal(year)

animation2


xlsx_path <- here::here('data_raw','statistic_2020.xlsx')
data1 <- read_excel(xlsx_path , sheet = 2, skip = 4)

data1$english <- (c('United States', 'Spain', 'Italy', 'Chile', 'Canada', 'Germany', 'Brazil', "Mexico", "Puerto Rico", "Venezuela"))
colnames(data1) <- (c("country", "number", "english"))

data1 %>% 
    drop_na() %>% 
    mutate(number = as.numeric(number)) %>% 
    mutate(number = number/10^3) %>% 
    mutate(is_country = if_else(english == "United States", TRUE, FALSE)) %>% 
    ggplot(
        aes(x =  number, y = reorder(english, number), fill = is_country)
    )+
    geom_col()+ 
    labs(
        x = 'Total of migrants (in thousands)',
        y = 'Countries',
        title = '96% of Cubans go to the US',
        subtitle = 'Top 10 countries where cubans migrated in 2020'
    )+ scale_fill_manual(values= c('grey','pink'))+
    theme_minimal()+
    theme(legend.position = 'none')





xlsx_path2 <- here::here('data_raw','stats1990.xlsx')
data2 <- read_excel(xlsx_path2 , sheet = 2, skip = 9)

data2c <- data2[c(137, 139,141),]
colnames(data2c) <- (c("country", "year_1960", "year_1970", "year_1980", "year_1990"))

data2_long <- data2c %>% 
    pivot_longer(
        names_to = "years",
        values_to = 'number',
        cols = year_1960:year_1990
    )

data4plot <- data2_long %>%
    separate(years, into = c('drop 1', 'year'), sep = '_') %>% 
    select(country, year, number) %>% 
    mutate(year = as.numeric(year)) %>% 
    mutate(number = as.numeric(number)) %>% 
    mutate(number = number/10^6) %>% 
    mutate(total_pop_in_millions = c('7.2','8.8','9.8','10.6','3.4','4.5','5.8','9.8','3.9','4.6','5.6','6.9')) %>% 
    drop_na() %>% 
    mutate(percentage = number/as.numeric(total_pop_in_millions)*100)

anim_plot <- data4plot %>% 
    ggplot(
        aes(x = year, y = percentage, color = country)
    )+
    geom_line(linewidth = 1)+
    geom_point(size = 1)+
      geom_text_repel(
    aes(label = country),
    hjust = 0, nudge_x = 1, direction = "y",
    size = 6, segment.color = NA)+
    scale_x_continuous(
        breaks = seq(1960, 1990, 10),
        expand = expansion(add = c(1, 13)))+
    scale_color_manual(
        values = c('darkgreen','blue','orange')
    )+
    theme_half_open(font_size = 18)+
    theme(legend.position = 'none')+
    labs(x = 'Year',
       y = 'Percentage of population',
       title = "What happened in Cuba in 1968?")

animation <- anim_plot +
    transition_reveal(year)

animation

Attribution

All members contributed equally.