Research question

The research question that will be answered during the project is: “How do different countries compare in their responses and difficulties while fighting the COVID-19 pandemic?”

The COVID-19 pandemic has tested every type of market in many of the most developed countries in the World during the last months. It started in China, and then expanded to Europe, where Italy and Spain have been the most affected countries, and finally reached the United States. This pandemic has had an impact never seen before, creating a state of panic in many communities that led to confusion and shortages of products such as face masks, toilet paper… Governments were forced to take measures that in some cases were critized, specially by the time that it took to implement them.

Data sources

Source 1: The first source used is the World Health Organization (https://covid19.who.int/). Downloading the “Map Data”, I obtained the daily data for the number of infections and deaths from coronavirus. The earliest entry is in China, where the first death due to coronavirus happened on 01/11/2020, while other countries did not have any infection until late February or early March. The World Health Organization or WHO is an specialized agency of the United Nations whose primary role is to direct international health. The World Health Organization is the original source of the data, and the data was collected from routine reporting by health services and desease surveillance systems from the agency’s 194 Member States. [global_data]

Source 2: The second source is ‘Our World in Data’ (https://ourworldindata.org/grapher/total-covid-deaths-per-million?tab=chart). The dataset contains the number of deaths per million people. The data was published by the European Centre for Disease Prevention and Control (ECDC) and the original source is in Github (https://github.com/owid/covid-19-data). ECDC is an agency of the European Union, and the data was collected from the national goverments or institutes of health. Github is a git repository hosting service whose data comes from many studies and researches, and has been used, specially with the current pandemic, many media companies. [test]

Source 3: The third source is ‘Statista’ (https://es.statista.com/estadisticas/1107740/covid-19-tasa-de-pruebas-realizadas-en-paises-seleccionados-del-mundo/) and it contains the number of COVID-19 tests performed per million people in different countries. The source is ‘Our world in Data’, but the data was originally published again by the European Centre for Disease Prevention and Control (ECDC) and the original source is in Github as in Source 2. [test_per_million]

Source 4: The forth and last source is ‘The World Bank Group’ (https://data.worldbank.org/indicator/SP.POP.65UP.TO.ZS?name_desc=false) and it contains the percentage of population for each ountry that are 65 years old and above. The World Bank Group is a global partnership of 189 member countries whose goal is to fight poverty worldwide through sustainable solutions. This is the original source and the data was collected by World Bank staff through national institutes and government websites of its member countries. [ages]

Descrition of data

The first variables used are the number of deaths and the number of deaths per million people. These variables will indicate how bad the pandemic has been in different countries and the social damage it has caused. The difference of population between many countries is inmense, so it is important to consider that information with similar proportions. In the first visualization, we see that the United States have the most deaths, only closely followed by 3 other countries. However, looking at the second visualization, the United States is sixth in deaths per million people, while Belgium, with less than four times the number of deaths than the United States, is first in number of deaths per million people. In terms of proportions, the United States does not have the worst numbers, but its rate for number of confirmed cases and deaths keeps increasing at a higher rate than any other country.

While the number of deaths per million people is specially important in terms of measuring the impact of COVID-19, the number of cases confirmed gives an idea of the extension of the pandemic. This variable also helps us compare the reaction time that different governments had once the first cases started to appear in their countries.

Looking at the summary below, we see that for both daily or cumulative values, the mean and median are very different, while there is also a very wide range. This is due to the moment in time in which the datasets were obtained. When the pandemic started in China and its number of cases and deaths started to grow exponentially, most of the rest of countries in the world did not have any cases of COVID-19 (United States, Spain…). So for all those countries, there are very small numbers during the first months (January and part of February), until these countries suffered an exponential growth after February. When the datasets were downloaded, and until weeks or even months after this project is over, the number of cases and deaths for these countries will not decrease to the levels before March, so there is a lot of data for the decreasing part of the curve that still has to be collected.

summary(global_data)
##       day               Country          Country Name          Region         
##  Min.   :2020-01-08   Length:8829        Length:8829        Length:8829       
##  1st Qu.:2020-03-12   Class :character   Class :character   Class :character  
##  Median :2020-03-25   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :2020-03-21                                                           
##  3rd Qu.:2020-04-05                                                           
##  Max.   :2020-04-15                                                           
##      Deaths        Cumulative Deaths   Confirmed       Cumulative Confirmed
##  Min.   :   0.00   Min.   :    0.0   Min.   :    0.0   Min.   :     1      
##  1st Qu.:   0.00   1st Qu.:    0.0   1st Qu.:    0.0   1st Qu.:     6      
##  Median :   0.00   Median :    0.0   Median :    2.0   Median :    43      
##  Mean   :  13.95   Mean   :  182.6   Mean   :  217.2   Mean   :  3446      
##  3rd Qu.:   1.00   3rd Qu.:    6.0   3rd Qu.:   32.0   3rd Qu.:   418      
##  Max.   :2003.00   Max.   :23476.0   Max.   :35386.0   Max.   :578268
summary(test)
##     Entity              Code               Date          
##  Length:7326        Length:7326        Length:7326       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##  Total confirmed deaths due to COVID-19 per million people (deaths per million)
##  Min.   :   0.001                                                              
##  1st Qu.:   0.317                                                              
##  Median :   1.718                                                              
##  Mean   :  26.658                                                              
##  3rd Qu.:  10.171                                                              
##  Max.   :1208.085

The last main variable of interest in my project is the number of tests performed per million people. The lack of COVID-19 tests has been one of the biggest problems that governments have faced and that has revealed the lack of preparation for a crisis like this pandemic. Having enough tests available for the population was crucial during the early stages of the pandemic to be able to assess the extension of the virus and create a response plan.

The summary indicates a higher mean than the median. That means that their distributions are not symmetric, but skewed to the left. Their range are also very large, so there are some countries that were much more prepared for such a pandemic than others.

summary(test_per_million)
##      Pais           Tests realizados
##  Length:25          Min.   :   2.1  
##  Class :character   1st Qu.: 109.6  
##  Mode  :character   Median : 957.1  
##                     Mean   :1309.2  
##                     3rd Qu.:1777.8  
##                     Max.   :6148.0

Description of results

The COVID-19 pandemic has been considered a worldwide crisis that has touched every country in different scales. For this project, 15 selected countries, most of them by their high death rates due to COVID-19, but also by their different responses. The countries studied in this project are the following:

The first three graphs show the damage and extension of the virus compared to the reaction time of some governments. However, it is very important to understand that these are not real numbers, but the results of how far the preparation and knowledge of those countries has been able to go. As it will be seen later, it is impossible to test the whole entire population of a country. There were not enough tests, and many people with mild symptoms were asked to stay at home and quarantine, without being sure that they would be positive for COVID-19 or that they could have another type of virus. Some people also had the virus and did not present any symptoms at all. Therefore, there is a fraction of the population for each country that has had the virus but that it has not been recorded in any dataset. A way of recording this data would be to have an antibodies test, which would indicate that an individual has already been in contact with the virus or that it currently has it.

Moreover, the collecting and recording methods of some countries have been different. Due to the extreme pressure that healthcare systems have suffered, the recording of deaths has not been as accurate as it shoud have been. In some cases, people that were positive for COVID-19 but that have passed away because of other reasons were still placed in the records of deaths due to COVID-19, while other people passed away due to coronavirus without having been tested.

Visualization 1: Deaths due to COVID-19.

This first graphs shows the daily cumulative deaths due to COVID-19 until 04/15/2020. The pandemic started in China, where until mid-February the number of deaths grew exponentially, and then it slowed down and became constant around March, meaning that there are barely any new deaths.

For the rest of the countries, the number of deaths started to growth exponentially during late-February and the beginning of March. Although the virus appeared in China and it seemed that it was there where it did the most damage, the United States, Italy and Spain have more than doubled the number of deaths in China. However, they the time to react and prepare themselves that China did not have. As seen in the graph, they had more than a month and a half, in which they seemed to think that the virus was not going to affect them. Many governments seem to see the virus as a surprise that did not let them any occassion to establish a preventing plan, while in fact they had around two months.

The United States recently became the country with the most deaths, and its rate seems to not have attained the peak of the curve, while Spain and Italy are the two countries that have been the most overwhelmed by the virus that has collapsed their entire healthcare systems.

global_data <- global_data %>% 
    clean_names() %>%
    filter(country_name %in% c('Spain', 'United States of America', 'Italy', 'France', 'Germany', 'United Kingdom', 'Turkey', 'Iran', 'China', 'Russia', 'Brazil', 'Belgium', 'Canada', 'South Korea', 'Portugal')) %>%  
    select(-country, -region) %>% 
    rename('date' = 'day') 

confirmed_animation <- ggplot(global_data, aes(x = date, y = cumulative_deaths, color = country_name))+
    scale_y_log10()+
    geom_point(size = 2, na.rm = TRUE)+
    geom_line(na.rm = TRUE)+
    geom_text_repel(data = global_data %>% filter(country_name %in% c('Spain', 'Italy', 'United States of America')), aes(label = country_name),
                hjust = 0,
                nudge_x = 1,
                nudge_y = 0,
                direction = 'y',
                segment.color = 'grey',
                na.rm = TRUE)+
    geom_text(data = global_data %>% filter(country_name %in% c('China')), aes(label = country_name),
                hjust = 0,
                nudge_x = 1,
                nudge_y = 0,
                na.rm = TRUE)+
    theme_half_open(font_size = 15)+
    scale_color_manual(values = c('grey', 'grey','grey', 'red', 'grey', 'grey', 'forestgreen','grey', 'purple','grey', 'blue', 'grey', 'grey', 'grey', 'grey'))+
    coord_cartesian(clip = 'off')+
    theme(legend.position = 'none',
            plot.margin = margin(0.1,8,0.1,0.1, 'cm'))+
    labs(x = "Date",
           y = "Number of Deaths",
         title = "Number of Deaths due to COVID-19\n",
        subtitle = 'Last time updated:4/15/2020',
        caption = 'Source: World Health Organization')+
    transition_reveal(date)

animate(confirmed_animation, end_pause = 40, width = 1000)
## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis

## Warning: Transformation introduced infinite values in continuous y-axis

Visualization 2: Deaths per Mllion People due to COVID-19.

This graph shows the number of deaths per million people due to COVID-19 on 04/20/2020. As explained in the ‘description of data’, this variable shows the real damage cause by COVID-19 in a country. While some countries might have similar numbers of deaths, their total population might be extremely different, meaning that in the country with a lower population there has been a greater social damage than in the country with the same number of deaths but a greater population.

For example, while the United States now have the greatest number of total deaths, their number of death per million is sixth between the numbers selected. However, for Spain and Italy, they have the second and third number of total deaths and deaths per million people, respectively.

This variable is specially important for countries with big populations, such as China and the United States. In both of these two countries, the COVID-19 was catastrophic in some specific areas (Hubei in China and New York City in the United States), and then it expanded to other cities and areas. We have seen that for China, the death rate has slowed down and decreased to practically zero. However, the death rate for the United States has not, and taking into account the fact there are still a lot of areas that have not been affected, it could potentially become a problem if the correct measures are not taken both by the government and population.

test <- test %>% 
  clean_names() %>%
  filter(entity %in% c('Spain', 'United States', 'Italy', 'France', 'Germany', 'United Kingdom', 'Turkey', 'Iran', 'China', 'Russia', 'Brazil', 'Belgium', 'Canada', 'South Korea', 'Portugal'),
         date == 'Apr 20, 2020',) %>% 
  mutate(entity = fct_reorder(entity, total_confirmed_deaths_due_to_covid_19_per_million_people_deaths_per_million),
         country_color = case_when(
             entity == 'United States' ~ 'blue',
             entity == 'Spain' ~ 'purple',
             entity == 'Italy' ~ 'forestgreen',
            TRUE ~ 'other'))
             

ggplot(test, aes(x = entity, y = total_confirmed_deaths_due_to_covid_19_per_million_people_deaths_per_million))+
    geom_col(aes(fill = country_color),width = 0.8, alpha = 0.7)+
    coord_flip()+
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.05)))+
    theme_minimal_vgrid()+
    scale_fill_manual(values = c('blue', 'forestgreen', 'grey', 'purple'))+
    theme(legend.position = 'none')+
    labs(x = 'Country',
         y = 'Number of Deaths per Million people',
         title = 'Deaths per Mllion People due to COVID-19',
         subtitle = 'Date: 4/20/2020',
         caption = 'Source: Our World in Data')

Visualization 3: Confirmed Cases of COVID-19.

This graph shows the cumulative number of daily confirmed cases of COVID-19 on 07/15/2020. While studying the number of deaths gives an idea of the social impact of the virus, the number of confirmed cases can show the different responses taken by different coutries. One of the main characteristics of the virus is the high risk of infectivity of individuals when having the virus. That means that during the first weeks of the pandemic, the number of confirmed could be much higher than the number of deaths because it would take longer for this virus to kill someone, while it does not take much time to infect someone. This gives some weeks for governments to respond before the number of deahts and people at risk starts to grow.

As seen in visualization 1, China was the first country to present any cases of coronavirus. The epicentre of the pandemic was located in Hubei, where a first lockdown happened on 01/23/2020, when the exponential growth of cases had already started, and then the lockdown was amplified to the whole country. By mid-February, the curve had already flatten out. In a month, the Chinese government was able to find solutions to the situation.

For Italy, which as it can be seen in visualization 2 had the highest number of deaths between mid-March to mid-April, the national lockdown was declared on 03/09/2020. This national lockdown followed some territorial lockdowns that were not able to contain the spread of the virus in the whole country. The graph shows that after the national lockdown, the exponential growth of confirmed cases started to slow, and by the beginning of April it has already almost flatten out. So again, it took around one month to reduce the exponential growth of confirmed cases.

Finally, the United States declared the state of emergency on 03/01/2020. This state was declared before Italy’s national lockdown, and the graph shows that the United states had a lower number of confirmed cases than Italy and China during most of March. However, its growth rate was much higher and by the end of March, it surpassed both countries, becoming the leading country in confirmed cases of COVID-19. This shows that although the national emergency was declared before Italy’s national lockdown, and it seemed to be reacting on time, the measures taken were probably not enough to slow down the spread of the virus. One of the main measures that the United States did not take is a mandatory quarantine for all the population. A month and a half after the state of emergency was declared, the number of cases is still increasing.

global_data_reduced <- global_data %>%
    mutate(date = ymd(date),
           label = ifelse(date == max(date), country_name, NA))


ggplot(global_data_reduced, aes(x = date, y = cumulative_confirmed, color = country_name))+
    scale_y_log10()+
    geom_line(size = 0.5)+
    geom_text(data = global_data_reduced %>% filter(label %in% c('Italy', 'United States of America', 'China')),aes(label = label),
            hjust = 0, 
            nudge_x = 1,
            nudge_y = 0,
            size = 4)+
    theme_half_open(font_size = 15)+
    scale_color_manual(
        values = c('grey', 'grey','grey', 'red', 
                   'grey', 'grey', 'forestgreen','grey', 'grey',
                   'grey', 'blue', 'grey', 'grey', 'grey', 'grey'))+
    theme_half_open(font_size = 15)+
    coord_cartesian(clip = 'off')+
    theme(legend.position = 'none',
        plot.margin = margin(0.1,4,0.1,0.1, 'cm'))+
    labs(x = "Date",
         y = "Number of Confirmed Cases",
         title = "Confirmed cases of COVID-19",
        subtitle = 'Last time updated:04/15/2020',
        caption = 'Source: World Health Organization')

Visualization 4: Number of COVID-19 Tests Performed per Million People.

This graph shows the number of COVID-19 tests performed per million people on 03/20/2020. The number of tests has been a determinant point in comparing countries in terms of their readiness and preparation to face such a pandemic. Many countries lacked enough tests for all the people that required one, which also disturbed the collection of data for confirmed cases. Being able to quickly test people and quarantine individuals with positive results is crucial to stop the spread of the virus, specially in the earliest stages of the pandemic. Many countries had difficulties developing and creating tests, which caused a lack of knowledge of the real severity of the situation.

One of the virus’ characteristics that has caused the biggest problems due to its nature is the possibility of not presenting any symptoms of sickness while carrying the virus. Since there were not enough tests, specially before any lockdown or quarantine, only people with sickness symptoms were tested, leading people without symptoms that also had the virus to believe that they did not need to quarantine, while in fact they could potentially infect other individuals in the streets, in their family…

South Korea has been considered one of the biggest role models in terms of its readiness to confront the virus. As seen in this graph, they have peformed at least twice as many tests per million people than almost any other country in the world. Being geographically close to China, South Korea could have been quickly affected by the virus too, but its high rate of people being tested, in addition to other measures, helped them become one of the countries with the lowest deaths per million people, as seen in Visualization 2.

Following South Korea, there is Italy, which was the next big epicentre after China and had the most deaths due to COVID-19 until Spain and the United States surpassed it. Being the second epicentre of the virus, they received a lot of outside help. However, it was a little late and they are still the third country with the most deaths per million people, mainly because of reasons explained in the next visualization.

On the other side of the curve, we see the United States and Spain. The governments of these two countries have been critized for not being ready for such a pandemic, with the scarcity of tests being one of the main reasons.

test_per_million <- test_per_million %>% 
    clean_names() %>%
    filter(pais %in% c('España', 'Estados Unidos', 'Italia', 'Francia', 'Alemania', 'Reino Unido', 'Turquía', 'Iran', 'China', 'Rusia', 'Brasil', 'Bélgica', 'Canada', 'Corea del Sur', 'Portugal')) %>%
    mutate(pais = fct_reorder(pais, tests_realizados),
         country_color = if_else(pais == 'Corea del Sur', 'Corea del Sur', 'Other'),
          pais = fct_recode(pais, 
                            'South Korea' = 'Corea del Sur',
                            'Italy' = 'Italia',
                            'Germany' = 'Alemania',
                            'Belgium' = 'Bélgica',
                            'United Kingdom' = 'Reino Unido',
                            'Spain' = 'España',
                            'France' = 'Francia',
                            'United States' = 'Estados Unidos',
                            'Turkey' = 'Turquía',
                            'Brazil' = 'Brasil'))
           
ggplot(test_per_million, aes(x = pais, y = tests_realizados))+
  geom_col(aes(fill = country_color), width = 0.7, alpha = 0.8)+
  coord_flip()+
  scale_fill_manual(values = c('burlywood', 'deepskyblue3'))+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05)))+
  theme_minimal_vgrid()+
  theme(legend.position = 'none')+
  labs(x = 'Country',
       y = 'Number of Tests performed per Million people',
       title = 'COVID-19 Tests performed per Mllion People',
       subtitle = 'Date: 03/20/2020',
       caption = 'Source: Statista')

Visualization 5: Share of the Population Ages 65 and above

This last graph shows the share of the population that is 65 years old or older in 2018. Another characteristic of the COVID-19 is that it affects more severely older people and peole with pre-existing medical conditions. While everyone could get infected by the virus and present symptoms and even die because of it, older people have a much higher chance of getting sicker. The death rate for people above 65 years old has been much higher than for younger people.

Italy has the oldest population in Europe (and between our selected countries) and the second oldest in the world behind Japan. Having 24% of the total population being 65 years old or above in addition to the vulnerability of the older generations towards this virus created a collapse in hospitals and the whole healthcare systems in Italy that was hard to respond to after the virus had expanded. Having an old population meant that between the number of cases, there would be more severe cases and people at risk, which was one of the main reasons why this country to became the leading country in number of deaths due to COVID-19 in February and March.

On the other side, only 11% of China’s population is between that age range. This could also have been a factor why, in addition to the measures taken by the government, the country was able to flatten out the curve of deaths as seen in Visualization 1.

ages <- ages %>% 
    clean_names() %>%
  filter(data_source %in% c('Spain', 'United States', 'Italy', 'France', 'Germany', 'United Kingdom', 'Turkey', 'Iran', 'China', 'Russia', 'Brazil', 'Belgium', 'Canada', 'South Korea', 'Portugal')) %>% 
    mutate(x64 = as.numeric(x63),
           data_source = fct_reorder(data_source, x64),
           country_color = if_else(data_source == 'Italy', 'Italy', 'Other'))

ggplot(ages, aes(x = data_source, y = x64))+
    geom_col(aes(fill = country_color), width = 0.8, alpha = 0.7)+
    coord_flip()+
    scale_fill_manual(values = c('forestgreen', 'deepskyblue3'))+
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.05)))+
    theme_minimal_vgrid()+
    theme(legend.position = 'none')+
    labs(x = 'Country',
       y = 'Percentage of Total Population',
       title = 'Share of the Population Ages 65 and above',
       subtitle = 'Date: 2018',
       caption = 'Source: The World Bank Group')

Conclusion

Countries have had different responses and dificulties when facing the COVID-19. Reasons regarding the time of a national response, the number of tests prepared, and the share of the oldest population have been crucial factors that have determined the death and confirmed case rates. However, there are also other factors that could infuence this and that have not been studied in this project, such as the number of doctors, nurses, or hospital beds per million people. This pandemic is still far to be over, and a project of this characteristics would have much different and interesting results in the future.

Appendix

Source 1: (https://covid19.who.int/) for global_data

Name                |     Type  | Description 
-------------------------------------------------------------------------------------------------------------
Day                 | date      | Every date until 4/15/2020.The starting day is different for each country
Country             | character | Names of the countries in abbreviated form 
Country Name        | character | Name of each country 
Region              | character | Region where those countries are located
Deaths              | double    | Number of daily deaths due to COVID-19 
Cumulative Deaths   | double    | Cumulative number of deaths due to COVID-19 
Confirmed           | double    | Number of confirmed cases of COVID-19 
Cumulative Confirmed| double    | Cumulative number of cases of COVID-19 



Source 2: (https://ourworldindata.org/grapher/total-covid-deaths-per-million?tab=chart) for test

Name                         |   Type    |  Description
---------------------------------------------------------------------------------------------------------------
Entity                       | character | Name of each country
Code                         | character | Names of the countries in abbreviated form (different than source 1)
Date                         |   date    | Every date until 4/15/20. Starting day is different for each country
Cumulative tests per million |  double   | Number of COVID-19 tests for every million people
Confirmed cases per million  |  double   | Number of confirmed cases of COVID-19 for every million people
Confirmed deaths per million |  double   | Number of confirmed deaths due to COVID-19 for every million people



Source 3: (https://es.statista.com/estadisticas/1107740/covid-19-tasa-de-pruebas-realizadas-en-paises-seleccionados-del-mundo/) for test_per_million

Name                         |   Type    |  Description
---------------------------------------------------------------------------------------------------------------
Pais                         | character | Name of each country
Test realizados              |  double   | Number of COVID-19 tests performed for every million people



Source 4: (https://data.worldbank.org/indicator/SP.POP.65UP.TO.ZS?name_desc=false) for ages

 Name                         |   Type    |  Description
---------------------------------------------------------------------------------------------------------------
Country Name                  | character | Name of each country
Country Code                  | character | Code of each country (abbreviation)
19..                          |  double   | Population ages 65 and above (% of total population) for every year