1) What is the Premier League?

The premier league, more known as the English premier league, is the top-flight football league in the United Kingdom, and arguably the most competitive league in the world. It was established in 1992 and consists of 20 teams, 3 new teams get promoted to the first division every year, while an equal amount gets relegated to the second, lower division. Seasons typically run from August to May and consist of 38 matches played per team.

2) Research Question

What are the metrics that define a squads performance in the English Premier League?

3) Discussing data sources

The datasets we are using are extracted from FBREF, a sports reference database. The data is pre-processed to make it easier for the user to manipulate. The following links represent the datasets from the 2019-2020 season up until 2021-2022 season:

https://fbref.com/en/comps/9/2019-2020/2019-2020-Premier-League-Stats https://fbref.com/en/comps/9/2020-2021/2020-2021-Premier-League-Stats https://fbref.com/en/comps/9/2021-2022/2021-2022-Premier-League-Stats

If we click one of the three links above, we can scroll through a multitude of datasets that can be exported as CSV files. We extracted a few CSV files that we thought would be representative to our research question, and included them in the data_processed file. Although the data_processed folder contains a lot of CSV files, we decided to only use 12 of them, which were imported in the initial code chunk of this report.

FBREF, has its own raw data from a site called OPTA Sports, a data analysis subdivision of Stats Preform, a sports data company. Unfortunately, OPTA’s services are not intended to be used for free, but many sites like FBREF use their services and process the data to share it with the public. The raw data is recorded by international and in-stadium analysts before being published to the public. The data is based on encounters that happen throughout Premier League games, and is unaffected by biases. The dataset is thorough and is not missing any data points that are relevant to our topic.

There might be some variables in the data that need some explaining, therefore we will be including a data dictionary at the end of the report.

First, we will look at the number of goals scored per season from the 2019-2020 season to 2021-2022 season. The above visualization represents the number of goals scored per season per squad. As we can see, Manchester City and Liverpool rank among the top teams that score the most goals throughout the duration of these seasons. Manchester City and Liverpool average around 80-90 goals per season, as opposed to Crystal Palace and Burnley who score a lower, 40-50 goals per season. Along with goals, in recent years a new metric has emerged that represents a team’s attacking performance in a better manner. Expected goals (xG) is a metric designed to measure the probability of a shot resulting in a goal. Expected goals are calculated by taking various information on thousands of shots, such as distance, angle, speed, positions etc, and gauges the likelihood of a goal on a scale between 0 and 1. It is, overall, a more accurate way to gauge any two teams’ ability to make an offensive play in their favor. We decided to create a new variable that assesses the expected goals per game minus the actual goals per game. This aids in visualizing whether a team is over performing or under performing with respect to their expected goals. We can see that Manchester City during seasons 2019-20, and 2020-21 over-performed, meaning they scored more goals than expected. This contrasts the 2020-21 season, where Manchester City under performed compared to their own unique standards due to the xG model.

#Goals by season
SS2122 <- X2021_2022_Squad_Standard_Stats %>% 
    select("Squad", "Gls", "xG", "xGper90", "Glsper90" ) %>% 
    mutate(xG_Vs_Gls = Glsper90 - xGper90) %>% 
    add_column(season = "2021-2022") %>% 
    arrange(desc(xG_Vs_Gls))

SS2021 <- X2020_2021_Squad_Standard_Stats %>% 
    select("Squad", "Gls", "xG", "xGper90", "Glsper90" ) %>% 
    mutate(xG_Vs_Gls = Glsper90 - xGper90) %>% 
    add_column(season = "2020-2021") %>% 
    arrange(desc(xG_Vs_Gls))

SS1920 <- X2019_2020_Squad_Standard_Stats %>% 
    select("Squad", "Gls", "xG", "xGper90", "Glsper90" ) %>% 
    mutate(xG_Vs_Gls = Glsper90 - xGper90) %>% 
    add_column(season = "2019-2020") %>% 
    arrange(desc(xG_Vs_Gls))


SS_Combined <- SS1920 %>% 
    full_join(SS2021) %>% 
    full_join(SS2122) %>% 
    group_by(season)
#> Joining, by = c("Squad", "Gls", "xG", "xGper90", "Glsper90", "xG_Vs_Gls",
#> "season")
#> Joining, by = c("Squad", "Gls", "xG", "xGper90", "Glsper90", "xG_Vs_Gls",
#> "season")
#Goals performance by season
SS_Combined %>%
    group_by(season) %>% 
  count(Squad, Gls) %>%
    arrange(season, desc(Gls))%>% 
    mutate(
        teams = if_else( Squad %in% c('Manchester City', 'Liverpool', 'Crystal Palace', 'Burnley'), TRUE, FALSE
    ))%>% 
  ggplot() +
  geom_col(aes(x = reorder(Squad, Gls), y = Gls, fill = teams)) +
    coord_flip()+
   facet_wrap(vars(season), nrow = 1, scale = "free_y") +
   scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
    labs(
        x = "Squad name",
        y = "Goals",
        title = "Figure 1: EPL squads' goals scored each season from 2019-2020 to 2021-2022 season") +
    theme_replace()+
  theme(legend.position="none")

#Goals per 90 - xG per 90
SS_Combined %>%
    group_by(season) %>% 
  count(Squad, xGper90, Glsper90) %>%
  mutate(
      xG_Vs_Gls = Glsper90 - xGper90,
      teams = if_else( Squad %in% c('Manchester City', 'Liverpool', 'Crystal Palace', 'Burnley'), TRUE, FALSE
    )) %>% 
    arrange(season, desc(xG_Vs_Gls)) %>% 
  ggplot() +
  geom_col(aes(x = reorder(Squad, xG_Vs_Gls), y = xG_Vs_Gls, fill = teams)) +
    coord_flip() +
   facet_wrap(vars(season), nrow = 1, scales = 'free_y') +
   scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
    theme_replace()+
    theme(legend.position="none") +
    labs(
        x = "Squad Name",
        y = "xG per 90 compared to goals per 90 ",
        title = "Figure 2: EPL squads' difference between expected goals per 90 and Goals per 90 from 2019-2020 to 2021-2022 season")

A team’s performance is also affected by whether they are at their home or the opposition’s stadium. Teams tend to perform better when they are backed by their home supporters and their familiarity with their home-field environment. In the next two visualizations, we will be looking at the accumulation of expected goals for the four teams that have been selected when they are playing at home and when they are playing away throughout the 2019-2020 to the 2021-2022 season. As you can see in Figure 3, Manchester City and Liverpool throughout the three seasons managed to average more than 125 expected home goals compared to the mid-table teams Burnley and Crystal Palace that averaged around 75 expected home goals. Now if you compare that to the second visualization, which represents the expected goals per team for the teams while they are playing away. Teams on average whether they are top-table teams and lower-table teams score less goals when they are playing away from home. For example, Liverpool averages around 100 away goals over the three seasons, compared to the 125 home goals they scored. While Burnley and Crystal Palace average approximately 40 away goals throughout these three seasons. This proves that a team’s performance is affected by the playing environment of a game. Teams tend to score more goals at home than when they are visiting an opposing team.

#HOME VIZ

FR1920H <- FR_2019_2020 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Home) %>% 
    add_column(Season = "2019-2020")
FR2021H <- FR_2020_2021 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Home) %>% 
    add_column(Season = "2020-2021")
FR2122H <- FR_2021_2022 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Home) %>% 
    add_column(Season = "2021-2022")

#Combined table for three seasons
FR_Home_Combined <- FR1920H %>% 
    full_join(FR2021H) %>% 
    full_join(FR2122H) %>%
    arrange(Home) %>%
    group_by(Home) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xG = cumsum(xGH))
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
#Animated graph
FR_Home_animated<- FR_Home_Combined %>% 
    filter(Home == "Manchester City" | Home == "Liverpool" | Home == "Crystal Palace" | Home == "Burnley")

animation_data <- FR_Home_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xG, color = Home), size = 2) +
    geom_line(aes(x = count, y = total_xG, color = Home)) +
    geom_text_repel(aes(x = count, y = total_xG, color = Home, label = Home),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Home games played",
        y = "xG playing at home",
        title = "Figure 3: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?

#AWAY VIZ
FR1920A <- FR_2019_2020 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Away)%>% 
    add_column(Season = "2019-2020")
FR2122A <- FR_2021_2022 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Away) %>%
    add_column(Season = "2021-2022")
FR2021A <- FR_2020_2021 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Away)%>% 
    add_column(Season = "2020-2021")

#Combined table for three seasons
FR_Away_Combined <- FR1920A %>% 
    full_join(FR2021A) %>% 
    full_join(FR2122A) %>%
    arrange(Away) %>%
    group_by(Away) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xG = cumsum(xGA))
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
view(FR_Away_Combined)


#Animated graph
FR_Away_animated<- FR_Away_Combined %>% 
    filter(Away == "Manchester City" | Away == "Liverpool" | Away == "Crystal Palace" | Away == "Burnley")

animation_data <- FR_Away_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xG, color = Away), size = 2) +
    geom_line(aes(x = count, y = total_xG, color = Away)) +
    geom_text_repel(aes(x = count, y = total_xG, color = Away, label = Away),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Away games played",
        y = "xG playing away",
        title = "Figure 4: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?

Now that we have analyzed some metrics that represent a squad’s attacking performance, what are some metrics that can be used to represent a squad’s defensive performance?

The defensive performance of a team is just as important as their offensive showing. Teams that have allowed the most goals show that they have a weaker defensive performance, while teams that have conceded fewer goals tend to have better defensive performances. Manchester City and Liverpool conceded the least amount of goals during the duration of these three seasons analyzed which aligns with their overall performances as a team. Burnley and Crystal Palace, on the other hand, have conceded more goals in each season which leads to them struggling to move up higher in the standings.

#GOALS ALLOWED CHART
SG2122 <- X2021_2022_Squad_Goalkeeping %>% 
    select("Squad", "GA", "W", "L") %>% 
    add_column(Season = "2021-2022")
SG2021 <- X2020_2021_Squad_Goalkeeping %>% 
    select("Squad", "GA", "W", "L") %>% 
    add_column(Season = "2020-2021")
SG1920 <- X2019_2020_Squad_Goalkeeping %>% 
    select("Squad", "GA", "W", "L") %>% 
    add_column(Season = "2019-2020")

SG_Combined<- SG2122 %>% 
    full_join(SG2021) %>% 
    full_join(SG1920)
#> Joining, by = c("Squad", "GA", "W", "L", "Season")
#> Joining, by = c("Squad", "GA", "W", "L", "Season")
SG_Combined %>%
  count(Squad, Season, GA) %>%
  group_by(Season) %>% 
  mutate(
      teams = if_else( Squad %in% c('Manchester City', 'Liverpool', 'Crystal Palace', 'Burnley'), TRUE, FALSE
  )) %>% 
  ggplot() +
  geom_col(aes(x = Squad, y = GA, fill = teams)) +
  coord_flip()+
  facet_wrap(vars(Season), nrow = 1, scale = "free_y") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.05)))+
    labs(
        x = "Squad name",
        y = "Goals allowed",
        title = "Figure 5: EPL squads' goals allowed per season from 2019-2020 to 2021-2022 season",
        fill = "Teams with most goals conceded"
    ) + theme_replace()+
    theme(legend.position = 'none')

Diving deeper into the defensive performance of teams, Figure 7 and Figure 8 show the expected goals allowed in both home and away games for our selected teams. This metric is a representation of the expected goals of the opposing team that each squad faced. For the four teams that are analyzed, the expected goals allowed have a negative correlation with the expected goals over the three seasons. This results in Manchester City having the least expected goals allowed for home games, followed by Liverpool, Crystal Palace, and Burnley, respectively. This order holds true as well for the away games excepting a switch between the placement of Burnley and Crystal Palace.

#DEFENSIVE
#xGA Home

FR_OppHome_Combined <- FR1920H %>% 
    full_join(FR2021H) %>% 
    full_join(FR2122H) %>%
    arrange(Home) %>%
    group_by(Home) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xGA = cumsum(xGA))
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
FR_OppHome_animated<- FR_OppHome_Combined %>% 
    filter(Home == "Manchester City" | Home == "Liverpool" | Home == "Crystal Palace" | Home == "Burnley")

animation_data <- FR_OppHome_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xGA, color = Home), size = 2) +
    geom_line(aes(x = count, y = total_xGA, color = Home)) +
    geom_text_repel(aes(x = count, y = total_xGA, color = Home, label = Home),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Home games played",
        y = "xGA at home",
        title = "Figure 6: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?

#xGA Away

FR_OppAway_Combined <- FR1920A %>% 
    full_join(FR2021A) %>% 
    full_join(FR2122A) %>%
    arrange(Away) %>%
    group_by(Away) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xGA = cumsum(xGH))
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
#> Joining, by = c("Wk", "Home", "xGH", "Away", "xGA", "Season")
view(FR_OppAway_Combined)

FR_OppAway_animated<- FR_OppAway_Combined %>% 
    filter(Away == "Manchester City" | Away == "Liverpool" | Away == "Crystal Palace" | Away == "Burnley")

animation_data <- FR_OppAway_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xGA, color = Away), size = 2) +
    geom_line(aes(x = count, y = total_xGA, color = Away)) +
    geom_text_repel(aes(x = count, y = total_xGA, color = Away, label = Away),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Away games played",
        y = "xGA playing away",
        title = "Figure 7: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?
#> `geom_line()`: Each group consists of only one observation.
#> ℹ Do you need to adjust the group aesthetic?

Now that we have assessed teams’ attacking and defensive performances, we can see how that correlates to their overall standings. Below are images of the standings from the 2019-2020, 2020-2021, and 2021-2022 seasons, respectively.

5) Conclusion

In conclusion, the attacking and defensive performances correlate with the overall squad performance as displayed in the table standings at the end of each season. Manchester City and Liverpool were the teams that scored the most goals, had the most expected goals and allowed the least amount of goals in their net, as a result of that they finished at the top of the standings consistently fighting for the championship spot. The overall performance of Burnley and Crystal Palace is also mirrored onto the table standings. These two teams were mediocre at scoring goals, and allowed a significant amount of goals conceded. So the final takeaway of this report is that a good combination of attacking and defensive performances leads to a good overall team performance which in turn will lead teams to land higher on the table rankings.

6) Appendix

Data Dictionary

data_dictionary_path <- here("data_processed", 'ReportDataDictionary.xlsx')
data_dictionary <- read_excel(data_dictionary_path)
kable(data_dictionary) %>% 
  kable_styling(latex_options = "striped")
Variable Description
MP Matches Played by the player or squad
W Wins
D Draws
L Losses
GF Goals For
GA Goals Against
GD Goals Difference
Pts Points
Pts/MP Points per Match Played
xG Exptected Goals
xGA Expected Goals Allowed
xGD Expected Goals Difference
xGD/90 Expected Goals Difference per 90 Minutes
# Load libraries and settings here
library(tidyverse)
library(here)
library(readr)
library(tibble)
library(gganimate)
library(magick)
library(hrbrthemes)
library(ggrepel)
library(upstartr)
library(readxl)
library(kableExtra)
knitr::opts_chunk$set(
    comment = "#>",
    fig.path = "figs/", # Folder where rendered plots are saved
    fig.width = 7.252, # Default plot width
    fig.height = 4, # Default plot height
    fig.retina = 3 # For better plot resolution
)

# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))

# Regular Season

X2021_2022_Regular_Season <- read_csv("data_processed/2021-2022_Regular_Season.csv")

X2020_2021_Regular_Season <- read_csv("data_processed/2020-2021_Regular_Season.csv")

X2019_2020_Regular_Season <- read_csv("data_processed/2019-2020_Regular_Season.csv")


# Standard Stats

X2019_2020_Squad_Standard_Stats <- read_csv("data_processed/2019-2020_Squad_Standard_Stats.csv")

X2020_2021_Squad_Standard_Stats <- read_csv("data_processed/2020-2021_Squad_Standard_Stats.csv")

X2021_2022_Squad_Standard_Stats <- read_csv("data_processed/2021-2022_Squad_Standard_Stats.csv")


# Squad Goalkeeping

X2021_2022_Squad_Goalkeeping <- read_csv("data_processed/2021-2022_Squad_Goalkeeping.csv")

X2020_2021_Squad_Goalkeeping <- read_csv("data_processed/2020-2021_Squad_Goalkeeping.csv")

X2019_2020_Squad_Goalkeeping <- read_csv("data_processed/2019-2020_Squad_Goalkeeping.csv")


# Fixture Results

FR_2021_2022 <- read_csv("data_processed/Fixtures_Results_2021-2022.csv")

FR_2020_2021 <- read_csv("data_processed/Fixtures_Results_2020-2021.csv")

FR_2019_2020 <- read_csv("data_processed/Fixtures_Results_2019-2020.csv")

#Goals by season
SS2122 <- X2021_2022_Squad_Standard_Stats %>% 
    select("Squad", "Gls", "xG", "xGper90", "Glsper90" ) %>% 
    mutate(xG_Vs_Gls = Glsper90 - xGper90) %>% 
    add_column(season = "2021-2022") %>% 
    arrange(desc(xG_Vs_Gls))

SS2021 <- X2020_2021_Squad_Standard_Stats %>% 
    select("Squad", "Gls", "xG", "xGper90", "Glsper90" ) %>% 
    mutate(xG_Vs_Gls = Glsper90 - xGper90) %>% 
    add_column(season = "2020-2021") %>% 
    arrange(desc(xG_Vs_Gls))

SS1920 <- X2019_2020_Squad_Standard_Stats %>% 
    select("Squad", "Gls", "xG", "xGper90", "Glsper90" ) %>% 
    mutate(xG_Vs_Gls = Glsper90 - xGper90) %>% 
    add_column(season = "2019-2020") %>% 
    arrange(desc(xG_Vs_Gls))


SS_Combined <- SS1920 %>% 
    full_join(SS2021) %>% 
    full_join(SS2122) %>% 
    group_by(season)

#Goals performance by season
SS_Combined %>%
    group_by(season) %>% 
  count(Squad, Gls) %>%
    arrange(season, desc(Gls))%>% 
    mutate(
        teams = if_else( Squad %in% c('Manchester City', 'Liverpool', 'Crystal Palace', 'Burnley'), TRUE, FALSE
    ))%>% 
  ggplot() +
  geom_col(aes(x = reorder(Squad, Gls), y = Gls, fill = teams)) +
    coord_flip()+
   facet_wrap(vars(season), nrow = 1, scale = "free_y") +
   scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
    labs(
        x = "Squad name",
        y = "Goals",
        title = "Figure 1: EPL squads' goals scored each season from 2019-2020 to 2021-2022 season") +
    theme_replace()+
  theme(legend.position="none")

#Goals per 90 - xG per 90
SS_Combined %>%
    group_by(season) %>% 
  count(Squad, xGper90, Glsper90) %>%
  mutate(
      xG_Vs_Gls = Glsper90 - xGper90,
      teams = if_else( Squad %in% c('Manchester City', 'Liverpool', 'Crystal Palace', 'Burnley'), TRUE, FALSE
    )) %>% 
    arrange(season, desc(xG_Vs_Gls)) %>% 
  ggplot() +
  geom_col(aes(x = reorder(Squad, xG_Vs_Gls), y = xG_Vs_Gls, fill = teams)) +
    coord_flip() +
   facet_wrap(vars(season), nrow = 1, scales = 'free_y') +
   scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
    theme_replace()+
    theme(legend.position="none") +
    labs(
        x = "Squad Name",
        y = "xG per 90 compared to goals per 90 ",
        title = "Figure 2: EPL squads' difference between expected goals per 90 and Goals per 90 from 2019-2020 to 2021-2022 season")


#HOME VIZ

FR1920H <- FR_2019_2020 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Home) %>% 
    add_column(Season = "2019-2020")
FR2021H <- FR_2020_2021 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Home) %>% 
    add_column(Season = "2020-2021")
FR2122H <- FR_2021_2022 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Home) %>% 
    add_column(Season = "2021-2022")

#Combined table for three seasons
FR_Home_Combined <- FR1920H %>% 
    full_join(FR2021H) %>% 
    full_join(FR2122H) %>%
    arrange(Home) %>%
    group_by(Home) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xG = cumsum(xGH))

#Animated graph
FR_Home_animated<- FR_Home_Combined %>% 
    filter(Home == "Manchester City" | Home == "Liverpool" | Home == "Crystal Palace" | Home == "Burnley")

animation_data <- FR_Home_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xG, color = Home), size = 2) +
    geom_line(aes(x = count, y = total_xG, color = Home)) +
    geom_text_repel(aes(x = count, y = total_xG, color = Home, label = Home),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Home games played",
        y = "xG playing at home",
        title = "Figure 3: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
#AWAY VIZ
FR1920A <- FR_2019_2020 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Away)%>% 
    add_column(Season = "2019-2020")
FR2122A <- FR_2021_2022 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Away) %>%
    add_column(Season = "2021-2022")
FR2021A <- FR_2020_2021 %>% 
    select("Wk","Home","xGH","Away","xGA") %>% 
    arrange(Away)%>% 
    add_column(Season = "2020-2021")

#Combined table for three seasons
FR_Away_Combined <- FR1920A %>% 
    full_join(FR2021A) %>% 
    full_join(FR2122A) %>%
    arrange(Away) %>%
    group_by(Away) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xG = cumsum(xGA))
view(FR_Away_Combined)


#Animated graph
FR_Away_animated<- FR_Away_Combined %>% 
    filter(Away == "Manchester City" | Away == "Liverpool" | Away == "Crystal Palace" | Away == "Burnley")

animation_data <- FR_Away_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xG, color = Away), size = 2) +
    geom_line(aes(x = count, y = total_xG, color = Away)) +
    geom_text_repel(aes(x = count, y = total_xG, color = Away, label = Away),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Away games played",
        y = "xG playing away",
        title = "Figure 4: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())

#GOALS ALLOWED CHART
SG2122 <- X2021_2022_Squad_Goalkeeping %>% 
    select("Squad", "GA", "W", "L") %>% 
    add_column(Season = "2021-2022")
SG2021 <- X2020_2021_Squad_Goalkeeping %>% 
    select("Squad", "GA", "W", "L") %>% 
    add_column(Season = "2020-2021")
SG1920 <- X2019_2020_Squad_Goalkeeping %>% 
    select("Squad", "GA", "W", "L") %>% 
    add_column(Season = "2019-2020")

SG_Combined<- SG2122 %>% 
    full_join(SG2021) %>% 
    full_join(SG1920)

SG_Combined %>%
  count(Squad, Season, GA) %>%
  group_by(Season) %>% 
  mutate(
      teams = if_else( Squad %in% c('Manchester City', 'Liverpool', 'Crystal Palace', 'Burnley'), TRUE, FALSE
  )) %>% 
  ggplot() +
  geom_col(aes(x = Squad, y = GA, fill = teams)) +
  coord_flip()+
  facet_wrap(vars(Season), nrow = 1, scale = "free_y") +
  scale_y_continuous(expand = expansion(mult = c(0, 0.05)))+
    labs(
        x = "Squad name",
        y = "Goals allowed",
        title = "Figure 5: EPL squads' goals allowed per season from 2019-2020 to 2021-2022 season",
        fill = "Teams with most goals conceded"
    ) + theme_replace()+
    theme(legend.position = 'none')

#DEFENSIVE
#xGA Home

FR_OppHome_Combined <- FR1920H %>% 
    full_join(FR2021H) %>% 
    full_join(FR2122H) %>%
    arrange(Home) %>%
    group_by(Home) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xGA = cumsum(xGA))

FR_OppHome_animated<- FR_OppHome_Combined %>% 
    filter(Home == "Manchester City" | Home == "Liverpool" | Home == "Crystal Palace" | Home == "Burnley")

animation_data <- FR_OppHome_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xGA, color = Home), size = 2) +
    geom_line(aes(x = count, y = total_xGA, color = Home)) +
    geom_text_repel(aes(x = count, y = total_xGA, color = Home, label = Home),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Home games played",
        y = "xGA at home",
        title = "Figure 6: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
#xGA Away

FR_OppAway_Combined <- FR1920A %>% 
    full_join(FR2021A) %>% 
    full_join(FR2122A) %>%
    arrange(Away) %>%
    group_by(Away) %>% 
    mutate(count = 1:n()) %>% 
    mutate(total_xGA = cumsum(xGH))
view(FR_OppAway_Combined)

FR_OppAway_animated<- FR_OppAway_Combined %>% 
    filter(Away == "Manchester City" | Away == "Liverpool" | Away == "Crystal Palace" | Away == "Burnley")

animation_data <- FR_OppAway_animated %>%
    ggplot()+
    geom_point(aes(x = count, y = total_xGA, color = Away), size = 2) +
    geom_line(aes(x = count, y = total_xGA, color = Away)) +
    geom_text_repel(aes(x = count, y = total_xGA, color = Away, label = Away),
                hjust = 0, nudge_x = 2, direction = "y",
        segment.color = "grey", na.rm = TRUE) +
    scale_color_manual( values = c("burlywood4", "blue", "red", "cyan3" )) +
    scale_x_continuous(
        expand = expansion(add = c(0,15)),
        limits = c(0, 60),
        breaks = c(0,19,38,57)
    ) +
    labs(
        x = "Away games played",
        y = "xGA playing away",
        title = "Figure 7: Comparing teams"
    ) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = 'none')

animated <- animation_data + transition_reveal(count)

animate(animated,
        end_pause = 20,
        duration = 20,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())

data_dictionary_path <- here("data_processed", 'ReportDataDictionary.xlsx')
data_dictionary <- read_excel(data_dictionary_path)
kable(data_dictionary) %>% 
  kable_styling(latex_options = "striped")