Evolving Trends in Cyber Events: Insights from a Decade of Data

Author

Annie Goodman & Etiene Effiong

Published

December 8, 2024

Primary Research Question

How have the tactics, targets, and motives of cyber actors evolved over the last decade?

Introduction

The number of cyber events has surged in recent years, with a reported 38% increase in 2022 (Checkpoint Blog, 2023). As digital infrastructure becomes essential to daily life, military operations, healthcare, and education, cybercriminals have adapted, employing increasingly sophisticated tactics. Cyber events encompass a wide range of activities, including social media manipulation and fake news generation, prompting an important question: beyond frequency, what else has changed in this landscape?

Despite advancements in cybersecurity, accurately attributing these events to their perpetrators remains a significant challenge. Attackers frequently utilize evasion techniques, such as VPNs and proxy servers, to obscure their identity, location, and motives. False flag operations complicate this issue, as misattributions can have severe consequences. Nation-states have increasingly participated in this arena, posing serious threats to global stability. The disruption caused by an accidental software glitch earlier this year illustrates the risks involved—if such incidents can lead to widespread damage, the potential impact of a malicious actor could be catastrophic.

This research explores the evolution of cyber events over the past decade, focusing on key questions such as whether we have improved in attributing these events, how nation-states are leveraging cyber attacks to gain advantages, and which nations are the most frequent perpetrators. We will also investigate whether industries have been targeted differently based on perpetrator type and event frequency, along with examining relationships between actor countries and their targets. Lastly, we will conduct text analysis of event descriptions to identify patterns in motives and the descriptive language used. These insights aim to inform future defense strategies, helping businesses and governments better manage the growing threat of cyber events.

Data

The dataset used for this analysis is sourced from the (University of Maryland School of Public Policy, CISSM-Center for International & Security Studies)

Reference

Harry, C., & Gallagher, N. (2018). Classifying Cyber Events. Journal of Information Warfare, 17(3), 17-31.

Description

This dataset includes cyber events recorded from January 2014 to August 2024, detailing various aspects of each incident, such as the type of threat actor, their motives, targeted industries, and the outcomes of the events. It comprises 15 columns and 13,841 rows, pre-processed by the CISSM research team to ensure accuracy and completeness.

The data is collected through a Python script designed to scrape information from reputable sources on both the open and dark web. This script automatically extracts key details, including publication date, title, and URL, which are subsequently reviewed and coded by the CISSM team. Each event is classified using a standard taxonomy, with attributions drawn from the original source material rather than the team’s interpretation.

While the dataset provides valuable insights into cyber event patterns and trends, it does come with limitations. The scraping process is restricted to specific URLs, which may result in the omission of some events. Furthermore, since the sources are human-generated, they can be subject to biases, inaccuracies, or misleading information. Despite these concerns, we believe that the dataset and its underlying framework will yield high-quality insights into the evolving landscape of cyber events.

Code

# Cleaning data

df <- df |> 
    mutate(event_subtype = str_replace_all(event_subtype, "\\bServer\\b", "Servers"),
           event_subtype = str_replace_all(event_subtype, "\\bService\\b", "Services"),
           event_subtype = str_replace_all(event_subtype, "\\bSensor\\b", "Sensors"),
           event_subtype = str_replace_all(event_subtype, "\\bHost\\b", "Hosts"),
           event_subtype = str_replace_all(event_subtype, "\\bUndetermined\\b", "Unknown"),
           event_subtype = str_replace_all(event_subtype, "\\bUser\\b", "Users"),
           actor_type = str_replace_all(actor_type, "\\bHacktvist\\b", "Hacktivist"),
           industry_code = case_when(
               industry_code == 92 & industry == "Professional, Scientific, and Technical Services" ~ 54,
               industry_code == 99 & industry == "Public Administration" ~ 92,
               industry == "Medusa" ~ industry_code,  # Preserve the industry code here
               TRUE ~ industry_code),
           industry = case_when(
               industry == "Medusa" ~ "Undetermined",
               TRUE ~ industry)) |> 
    drop_na(event_subtype) |> 
    separate_rows(event_subtype, sep = ",|;") |>
    mutate(event_subtype = str_trim(event_subtype), 
           year = as.integer(year))

Data Snippet

Code

head(df)

# A tibble: 6 × 15
  event_date  year month actor    actor_type organization industry_code industry
  <chr>      <int> <chr> <chr>    <chr>      <chr>                <dbl> <chr>   
1 2014-01-01  2014 01    Undeter… Criminal   Barry Unive…            61 Educati…
2 2014-01-01  2014 01    Undeter… Criminal   Record Assi…            54 Profess…
3 2014-01-01  2014 01    Syrian … Hacktivist Skype's Soc…            54 Profess…
4 2014-01-02  2014 01    Undeter… Criminal   Snapchat                51 Informa…
5 2014-01-03  2014 01    DERP Tr… Undetermi… Battle.net              51 Informa…
6 2014-01-03  2014 01    DERP Tr… Undetermi… Club Penguin            51 Informa…
# ℹ 7 more variables: motive <chr>, event_type <chr>, event_subtype <chr>,
#   description <chr>, source_url <chr>, country <chr>, actor_country <chr>

Summary Statistics

Code

# Events per Year
events_per_year <- df |>
  filter(year < 2024) |> 
  rename(Year = year) |> 
  group_by(Year) |>
  summarise(Events = n()) |>
  arrange(Year)

events_per_year_table <- events_per_year |>
  knitr::kable("html", caption = "Number of Cyber Events per Year") |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |>
  kableExtra::row_spec(0, background = "#931e18", color = "white") |>
  kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") |>
  kableExtra::column_spec(2, bold = TRUE, color = "#2A3C4E")

# Most Frequent Target Countries (Top 5)
most_frequent_target_countries <- df |>
  group_by(country) |>
  summarise(Count = n()) |>
  arrange(desc(Count)) |>
  head(5) |>
  rename(Country = country)

# Render Most Frequent Target Countries Table
target_countries_table <- most_frequent_target_countries |>
  knitr::kable("html", caption = "Most Frequent Target Countries") |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |>
  kableExtra::row_spec(0, background = "#4A6073", color = "white") |>
  kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E")

# Most Frequent Actor Countries (Top 5)
most_frequent_actor_countries <- df |>
  filter(!actor_country == "Undetermined") |>
  group_by(actor_country) |>
  summarise(Count = n()) |>
  arrange(desc(Count)) |>
  head(5) |>
  rename(Actor_Country = actor_country)

# Render Most Frequent Actor Countries Table
actor_countries_table <- most_frequent_actor_countries |>
  knitr::kable("html", caption = "Most Frequent Actor Countries") |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |>
  kableExtra::row_spec(0, background = "#da7901", color = "white") |>
  kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E")

# List of unique Actor Types and Event Types
actor_types_list <- unique(df$actor_type)
event_types_list <- unique(df$event_type)

# Create a table for Actor Types and Event Types
types_table <- data.frame(
  Statistic = c("Unique Actor Types", "Unique Event Types"),
  Value = c(paste(actor_types_list, collapse = ", "), 
            paste(event_types_list, collapse = ", ")),
  stringsAsFactors = FALSE
) |>
  knitr::kable("html", caption = "Actor and Event Types") |>
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |>
  kableExtra::row_spec(0, background = "#247d3f", color = "white") |>
  kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") 

# Print all tables
events_per_year_table

Number of Cyber Events per Year
Year	Events
2014	667
2015	941
2016	1209
2017	874
2018	946
2019	1225
2020	2221
2021	1784
2022	2618
2023	2446

Code

actor_countries_table

Most Frequent Actor Countries
Actor_Country	Count
Russian Federation	2043
China	213
Korea (the Democratic People's Republic of)	172
United States of America	160
Ukraine	136

Code

target_countries_table

Most Frequent Target Countries
Country	Count
United States of America	7410
United Kingdom of Great Britain and Northern Ireland	844
Italy	455
Canada	417
Ukraine	417

Code

types_table

Actor and Event Types
Statistic	Value
Unique Actor Types	Criminal, Hacktivist, Undetermined, Hobbyist, Nation-State, Terrorist
Unique Event Types	Exploitive, Disruptive, Mixed, Undetermined

Data exploration and visualization

With our data now cleaned and ready, we dived straight into the analysis to uncover the trends our research aims to highlight. This thorough exploration enabled us to identify patterns in cyber incidents, including variations in actor types and their motivations. By scrutinizing these trends, we can gain insights into how technological advancements shape the dynamics of cyber events.

Event Subtypes Over Time

Code

density_plot_df <- df |> 
    filter(year <= 2023)  # Filter data to include only up to 2023

density_plot_df |>  
    group_by(year, event_subtype) |> 
    summarize(count = n(), .groups = "drop") |>  
    group_by(event_subtype) |> 
    summarize(total_count = sum(count), .groups = "drop") |> 
    top_n(5, total_count) |> 
    inner_join(density_plot_df, by = "event_subtype") |>  # Use the filtered dataset here
    group_by(year, event_subtype) |> 
    summarize(count = n(), .groups = "drop") |>
    ggplot(aes(x = year, y = event_subtype, height = count, fill = event_subtype)) +
    geom_density_ridges(scale = 0.9, alpha = 0.9, stat = "identity") +
    scale_fill_manual(values = custom_palette) + 
    labs(title = "Density Ridge Plot of Top 5 Event Subtypes Over Time",
         subtitle = "(2013-2023)",
         x = "Year",
         y = "Event Subtype") +
    theme_minimal() + 
    theme(legend.position = "none")

The density ridge plot indicates that while some sub-types, such as Message Manipulation and External Denial of Service, have remained stable over time, others, like Exploitation of Application Server and Data Attack, have shown notable fluctuations.

Specifically, Exploitation of Application Server exhibited distinct peaks, with a significant increase in frequency starting around 2018, reaching its highest point in 2021 before experiencing a slight decline. In contrast, Data Attack displayed a gradual rise beginning in 2018, reaching a substantial peak around 2021-2022, followed by a subsequent decrease.

Proportion of Cyber Attacks by Nation-State Actors Across Industries

Code

# Define critical infrastructure codes
critical_infrastructure_codes <- c(51, 48, 11, 22, 62, 52, 92, 31, 33, 21)

# Function to analyze actor proportions
analyze_actor_proportions <- function(df) {
  # Add critical infrastructure flag
  df <- df %>%
    mutate(is_critical = industry_code %in% critical_infrastructure_codes)
  
  # Calculate proportions for all industries
  actor_proportions <- df %>%
    group_by(industry_code, industry) %>%
    summarise(
      total_events = n(),
      nation_state_events = sum(actor_type == "Nation-State"),
      proportion_nation_state = nation_state_events / total_events,
      .groups = 'drop'
    ) %>%
    arrange(desc(proportion_nation_state))
  
  # Calculate proportions specifically for critical infrastructure
  critical_proportions <- df %>%
    group_by(is_critical) %>%
    summarise(
      total_events = n(),
      nation_state_events = sum(actor_type == "Nation-State"),
      proportion_nation_state = nation_state_events / total_events,
      .groups = 'drop'
    )
  
  return(list(
    actor_proportions = actor_proportions,
    critical_proportions = critical_proportions
  ))
}

# Create visualizations
plot_actor_proportions <- function(actor_proportions) {
  # Industry-specific proportions plot
  p1 <- ggplot(actor_proportions, 
               aes(x = reorder(as.factor(industry_code), -proportion_nation_state),
                   y = proportion_nation_state,
                   text = paste("Industry:", industry,
                                "<br>Proportion:", round(proportion_nation_state * 100, 1), "%",
                                "<br>Nation-State Events:", nation_state_events,
                                "<br>Total Events:", total_events))) +
    geom_bar(stat = "identity", fill = "#2A3C4E", width = .75) + 
    labs(
      title = "Proportion of Nation-State Actors by Industry",
      x = "Industry Code",
      y = "Proportion"
    ) +
    theme(
      plot.title = element_text(size = 12),  # Reduced title size
      axis.title = element_text(size = 10),  # Reduced axis titles size
      axis.text.x = element_text(angle = 45, hjust = 1, size = 8),  # Reduced x-axis text
      axis.text.y = element_text(size = 8),  # Reduced y-axis text
      aspect.ratio = 1/2
    )
  
  # Convert ggplot to plotly for interactivity
  p1_interactive <- ggplotly(p1, tooltip = "text")
  
  return(p1_interactive)  # Return the Plotly version of the plot
}


# Example usage:
results <- analyze_actor_proportions(df)
 
# Display tables
kable(results$actor_proportions,
caption = "Proportion of Nation-State Events by Industry") %>%
kable_styling()

Proportion of Nation-State Events by Industry
industry_code	industry	total_events	nation_state_events	proportion_nation_state
21	Mining, Quarrying, and Oil and Gas Extraction	77	15	0.1948052
99	Undetermined	236	44	0.1864407
49	Transportation and Warehousing	6	1	0.1666667
22	Utilities	304	45	0.1480263
92	Public Administration	2896	303	0.1046271
81	Other Services (except Public Administration)	1042	108	0.1036468
51	Information	1515	128	0.0844884
55	Management of Companies and Enterprises	25	2	0.0800000
48	Transportation and Warehousing	477	34	0.0712788
31	Manufacturing	615	32	0.0520325
52	Finance and Insurance	1425	72	0.0505263
54	Professional, Scientific, and Technical Services	1290	54	0.0418605
11	Agriculture, Forestry, Fishing and Hunting	24	1	0.0416667
71	Arts, Entertainment, and Recreation	448	14	0.0312500
44	Retail Trade	499	14	0.0280561
56	Administrative and Support and Waste Management and Remediation Services	191	5	0.0261780
61	Educational Services	1485	30	0.0202020
33	Manufacturing	73	1	0.0136986
53	Real Estate and Rental and Leasing	97	1	0.0103093
62	Health Care and Social Assistance	2081	18	0.0086497
42	Wholesale Trade	121	1	0.0082645
72	Accommodation and Food Services	323	2	0.0061920
23	Construction	51	0	0.0000000
32	Manufacturing	14	0	0.0000000
45	Retail Trade	11	0	0.0000000

Code

kable(results$critical_proportions,
caption = "Nation-State Events in Critical vs Non-Critical Infrastructure") %>%
kable_styling()

Nation-State Events in Critical vs Non-Critical Infrastructure
is_critical	total_events	nation_state_events	proportion_nation_state
FALSE	5839	276	0.0472684
TRUE	9487	649	0.0684094

Code

# Create interactive plot
plot_actor_proportions(results$actor_proportions)

Our analysis reveals a compelling pattern in the landscape of cyber attacks, particularly when it comes to nation-state operations. Critical infrastructure, the backbone of modern society, emerges as a preferred target for nation-state actors. When nation-states strike in cyberspace, their targeting preference is stark: 70.2% of their attacks are aimed at critical infrastructure, while only 29.8% target non-critical sectors.

In absolute numbers, this translates to 649 attacks on critical infrastructure versus 276 on non-critical targets, out of 925 total nation-state incidents. This isn’t random chance - it reflects a deliberate targeting strategy where nation-states concentrate their efforts on sectors that could provide maximum strategic impact through potential infrastructure disruption.

Annual Event Count Across Actor Types

Cyber threat trends from 2014 to 2023 demonstrate a fundamental transformation in attack patterns. The chart reveals that criminal actors have become the primary threat vector, with incidents increasing from 348 to 2,081 over the nine-year period. This shift coincides with a decline in amateur attacks and a rise in sophisticated operations. Nation-state-actors’ activities increased from 6 to 106 incidents, while hacktivist operations showed significant variability, peaking at 658 events in 2022. The overall increase in cyber events from 633 (2014) to 2,443 (2023) reflects the intensification of cyber threats and suggests a trend toward more organized, well-resourced threat actors.

Code

actor_event_count <- actor_clean_df %>%
    filter(year < 2024) |> 
    group_by(year, actor_type) %>%
    summarise(event_count = n(), .groups = 'drop') %>%
    mutate(
        rank = rank(-event_count),  # Compute rank for sorting
        Value_lbl = paste0(' ', round(event_count))  # Create Value_lbl for labels
    )

actor_race_anim <- actor_event_count %>%
    mutate(year = as.integer(year)) %>%  # Convert year to integer
    ggplot(aes(x = rank, y = event_count, fill = actor_type)) +  # Use fill for color
    geom_bar(stat = "identity", position = "dodge", width = 3, alpha = 0.8) +
    geom_text(aes(y = event_count, label = Value_lbl), hjust = 0) +
    coord_flip(clip = 'off', expand = FALSE) +
    scale_y_continuous(labels = scales::comma) +
    scale_fill_manual(values = custom_palette) +
    scale_x_reverse() +
    theme_minimal() +
    theme(
        axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        plot.title = element_text(
            size = 10, hjust = 0.5, face = 'bold',
            colour = 'grey', vjust = -1),  # Reduced title font size
        plot.subtitle = element_text(
            size = 10, hjust = 0.5, 
            face = 'italic', color = 'grey'),
        plot.caption = element_text(
            size = 8, hjust = 0.5,
            face = 'italic', color = 'grey'),
        plot.margin = margin(0.5, 2, 0.5, 3, 'cm'),
        # Only vertical gridlines (major x-axis gridlines)
        panel.grid.major.x = element_line(color = "grey80"),
        panel.grid.major.y = element_blank(), 
        # Adjust the legend size and position
        legend.position = c(0.9, 0.3),
        legend.background = element_rect(fill = 'white'),
        legend.title = element_text(size = 10),  # Reduce legend title size
        legend.text = element_text(size = 8),  # Reduce legend label size
        legend.key.size = unit(0.5, "cm")  # Reduce size of the legend keys (color boxes)
    ) +
    transition_time(year) +  # Use year as discrete variable
    view_follow(fixed_x = TRUE) +
    labs(title = 'Year: {frame_time}',
         subtitle = 'Annual Cyber Events by Actor Type, 2014-2023',
         fill = 'Actor Type',
         caption = 'Number of Events')

# Animate with reduced size
animate(actor_race_anim, duration = 17, end_pause = 15,
        width = 600, height = 500, res = 150, 
        fps = 20,  # Set FPS to a factor of 100 (20 is a factor of 100)
        renderer = gifski_renderer("actor_race_anim.gif"))

Distribution of Motives by Actor Type

Code

# Preprocess motive data
motive_data <- df |> 
    filter(!motive == "Undetermined") |> 
    separate_rows(motive, sep = ",|;") |> 
    mutate(motive = str_trim(motive)) |> 
    group_by(actor_type, motive) |> 
    summarize(count = n(), .groups = 'drop') |> 
    arrange(desc(count))

# Calculate proportions for pie chart
motive_data <- motive_data |> 
    group_by(actor_type) |> 
    mutate(proportion = count / sum(count)) |> 
    ungroup()

# Identify the largest slice for each actor type
largest_labels <- motive_data |> 
    group_by(actor_type) |> 
    filter(proportion == max(proportion)) |> 
    mutate(
        label = paste(motive, "\n", scales::percent(proportion, accuracy = 0.1))  # Create label text
    ) |> 
    ungroup()

# Create pie chart with labels matching slice colors
ggplot(motive_data, aes(x = 1, y = proportion, fill = motive)) + 
    geom_bar(stat = "identity", width = 1) +
    coord_polar(theta = "y") +
    facet_wrap(~ actor_type, scales = "free") +
    scale_fill_manual(values = custom_palette) +
    # Add labels at the center of each pie chart with a dynamic color box
    geom_label(
        data = largest_labels,
        aes(
            x = 0,  
            y = 0,  
            label = label,
            fill = motive 
        ),
        inherit.aes = FALSE,  
        size = 3.5,  
        color = "white", 
        label.padding = unit(0.2, "lines"),  # Padding for the label box
        fontface = "bold",
        show.legend = FALSE  # Don't include these in the legend
    ) +
    labs(
        title = "Distribution of Motives by Actor Type",
        x = NULL,
        y = NULL,
        fill = "Motive"
    ) +
    theme_minimal() +
    theme(
        axis.ticks = element_blank(),
        axis.text = element_blank(),  # This removes both x and y axis text
        axis.title = element_blank(),  # This removes both x and y axis titles
        panel.grid = element_blank(),
        strip.text = element_text(size = 14, face = "bold")  # Larger and bolder facet titles
    )

These charts show the proportions of attributable motives by actor type which are mostly as expected, given our existing knowledge of the topic. Criminals are motivated by financial gains. Hacktivists are generally protesting something. Nation-states and terrorists are committing acts of political espionage.

The two most interesting results, however, are the Hobbyists and the Unknown actors. Hobbyists had a substantial proportion of financial motivation, but also a significant amount of protests, which led us to wonder how the dataset is coded between hobbyist and Hacktivist. We were unable to determine how these are coded based on the information provided in the source data, but it would be a potential future avenue for research. The differences between a criminal actor, a hacktivist, and a hobbyist seem as though they may be hard to pinpoint.

This is something we will explore more as the project goes on. We were hoping to find some attributable qualities in the undetermined group, such as a strong linkage to a specific type, but it seems pretty distributed. Fittingly, however, that graph has the largest proportion of sabotage motivations, which makes sense given that those people are usually trying not to be identified.

Monthly Trend Analysis

Code

# Convert event_date to Date format
actor_clean_df$event_date <- as.Date(actor_clean_df$event_date)

# Filter the data to include events up to 2023
actor_clean_df_filtered <- actor_clean_df %>%
  filter(year <= 2023)

# Create a monthly summary
monthly_summary <- actor_clean_df_filtered %>%
  group_by(year, month) %>%
  summarise(event_count = n())

ggplot(monthly_summary, aes(x = as.Date(paste(year, month, "01", sep = "-")), y = event_count)) +
  geom_line() +
  labs(
    x = "Year", 
    y = "Number of Events", 
    title = "Monthly Event Trend (Up to 2023)"
  ) +
  theme_minimal() +
  scale_x_date(
    date_breaks = "1 year", 
    date_labels = "%Y", 
    limits = as.Date(c("2014-01-01", "2023-12-31")), 
    expand = c(0, 0)
  ) +
  # Add reference line at the start of COVID-19 (December 2019)
  geom_vline(xintercept = as.Date("2019-12-01"), col = "#247d3f", linetype = 'dashed', size = .7) +
  # Add annotation with background
  annotate(
    'label', 
    x = as.Date("2019-12-01"), 
    y = 280, 
    label = 'COVID-19 started in late 2019', 
    color = "white", 
    fill = "#247d3f",  # Background color matching the line
    size = 3.2, 
    fontface = "bold",
    label.padding = unit(0.3, "lines")  # Add padding for better readability
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid.major = element_line(color = "lightgray", size = .25),  
    panel.grid.minor = element_blank(),  
    panel.border = element_rect(color = "lightgray", fill = NA, size = 1)  
  )

The chart illustrates the monthly trend of cyber events from 2014 to 2023. Between 2014 and 2019, the number of incidents grew gradually, reflecting a steady increase in cyber activities.

However, starting in 2020, there is a marked rise in the frequency of cyber events, coinciding with the onset of the COVID-19 pandemic, which likely contributed to a surge in digital activity and, consequently, cyber incidents. This period is characterized by significant spikes, with the highest number of events occurring around 2022. The peaks during these years suggest increased vulnerability or heightened detection and reporting of cyber incidents.

In late 2023, there is a noticeable decline in the number of events, although this could be due to delayed reporting for more recent months. Overall, the chart indicates a shift from relatively stable trends pre-2020 to a more volatile and elevated rate of cyber events in the last few years, highlighting the growing prominence and complexity of cyber security challenges.

Conclusion

Our analysis reveals significant changes in the frequency, nature, and impact of cyber events over the past decade. The increasing dominance of sophisticated actors, such as nation-states and organized criminal groups, underscores the shift from amateur attacks to highly strategic operations. Critical infrastructure remains a primary target, reflecting the strategic importance of these sectors for nation-state actors. The rise in data breaches and exploitation of application servers highlights the growing reliance on digital systems and the vulnerabilities they introduce.

Temporal trends, such as the surge during the COVID-19 pandemic, show how global crises can amplify cyber threats, while a late-2023 decline may signal delays in reporting or shifts in attacker strategies. Analysis of motives confirms known patterns—criminals are financially driven, hacktivists pursue ideological goals, and nation-states engage in espionage—while posing questions about underexplored categories like hobbyists and undetermined actors. These findings emphasize the need for adaptive, motive-specific defenses and robust real-time monitoring systems.

Our report provides a foundation for further research into the interplay of actors, motives, and industry-specific impacts. Future analysis will delve deeper into the relationships between attacker nations and their targets, refine motive categorizations, and explore the language patterns in event descriptions. These efforts aim to better equip policymakers and organizations to combat the evolving landscape of cyber threats.

Attribution

All members contributed equally.

Appendix

Data dictionary

--- title: "Evolving Trends in Cyber Events: Insights from a Decade of Data" author: Annie Goodman & Etiene Effiong date: December 8, 2024 format: html: toc: true toc-location: left toc_depth: 2 toc_float_position: top theme: lumen self-contained: true page-layout: full code-fold: true code-tools: true css: "styles.css" execute: warning: false message: false --- ```{r} #| label: setup #| include: false # Standard Functionality, Themes, & Data Cleaning library(here) library(tidyverse) library(janitor) library(datapasta) library(MetBrewer) # Tables library(knitr) library(kableExtra) library(DT) # Plotting library(ggridges) # Animation & Interactivity library(plotly) # Reactivate below if editing or remaking gif # library(gganimate) # library(magick) # library(gifski) custom_palette <- met.brewer("Lakota") # Put any other "global" settings here, e.g. a ggplot theme: theme_set(theme_bw(base_size = 20)) # Load the data # Cleaned main dataset df <- read_csv(here('data_processed', "clean_data.csv"), show_col_types = FALSE) df <- df %>% mutate(actor_type = str_replace_all(actor_type, "Hacktvist", "Hacktivist")) # Print dimensions of the df cat("Rows:", nrow(df), "Columns:", ncol(df), "\n") # Cleaned actor countries actor_clean_df <- read_csv(here("data_processed", "actor_countries_cleaned.csv"), show_col_types = FALSE) ``` # Primary Research Question How have the tactics, targets, and motives of cyber actors evolved over the last decade? # Introduction The number of cyber events has surged in recent years, with a reported 38% increase in 2022 [(Checkpoint Blog, 2023)](https://blog.checkpoint.com/2023/01/05/38-increase-in-2022-global-cyberattacks/). As digital infrastructure becomes essential to daily life, military operations, healthcare, and education, cybercriminals have adapted, employing increasingly sophisticated tactics. Cyber events encompass a wide range of activities, including social media manipulation and fake news generation, prompting an important question: beyond frequency, what else has changed in this landscape? Despite advancements in cybersecurity, accurately attributing these events to their perpetrators remains a significant challenge. Attackers frequently utilize evasion techniques, such as VPNs and proxy servers, to obscure their identity, location, and motives. False flag operations complicate this issue, as misattributions can have severe consequences. Nation-states have increasingly participated in this arena, posing serious threats to global stability. The disruption caused by an accidental software glitch earlier this year illustrates the risks involved—if such incidents can lead to widespread damage, the potential impact of a malicious actor could be catastrophic. This research explores the evolution of cyber events over the past decade, focusing on key questions such as whether we have improved in attributing these events, how nation-states are leveraging cyber attacks to gain advantages, and which nations are the most frequent perpetrators. We will also investigate whether industries have been targeted differently based on perpetrator type and event frequency, along with examining relationships between actor countries and their targets. Lastly, we will conduct text analysis of event descriptions to identify patterns in motives and the descriptive language used. These insights aim to inform future defense strategies, helping businesses and governments better manage the growing threat of cyber events. # Data The dataset used for this analysis is sourced from the [(University of Maryland School of Public Policy, CISSM-Center for International & Security Studies)](https://cissm.umd.edu/research-impact/publications/cyber-events-database-home) *Reference* Harry, C., & Gallagher, N. (2018). Classifying Cyber Events. *Journal of Information Warfare*, 17(3), 17-31. *Description* This dataset includes cyber events recorded from January 2014 to August 2024, detailing various aspects of each incident, such as the type of threat actor, their motives, targeted industries, and the outcomes of the events. It comprises 15 columns and 13,841 rows, pre-processed by the CISSM research team to ensure accuracy and completeness. The data is collected through a Python script designed to scrape information from reputable sources on both the open and dark web. This script automatically extracts key details, including publication date, title, and URL, which are subsequently reviewed and coded by the CISSM team. Each event is classified using a standard taxonomy, with attributions drawn from the original source material rather than the team’s interpretation. While the dataset provides valuable insights into cyber event patterns and trends, it does come with limitations. The scraping process is restricted to specific URLs, which may result in the omission of some events. Furthermore, since the sources are human-generated, they can be subject to biases, inaccuracies, or misleading information. Despite these concerns, we believe that the dataset and its underlying framework will yield high-quality insights into the evolving landscape of cyber events. ```{r data_cleaning} # Cleaning data df <- df |> mutate(event_subtype = str_replace_all(event_subtype, "\\bServer\\b", "Servers"), event_subtype = str_replace_all(event_subtype, "\\bService\\b", "Services"), event_subtype = str_replace_all(event_subtype, "\\bSensor\\b", "Sensors"), event_subtype = str_replace_all(event_subtype, "\\bHost\\b", "Hosts"), event_subtype = str_replace_all(event_subtype, "\\bUndetermined\\b", "Unknown"), event_subtype = str_replace_all(event_subtype, "\\bUser\\b", "Users"), actor_type = str_replace_all(actor_type, "\\bHacktvist\\b", "Hacktivist"), industry_code = case_when( industry_code == 92 & industry == "Professional, Scientific, and Technical Services" ~ 54, industry_code == 99 & industry == "Public Administration" ~ 92, industry == "Medusa" ~ industry_code, # Preserve the industry code here TRUE ~ industry_code), industry = case_when( industry == "Medusa" ~ "Undetermined", TRUE ~ industry)) |> drop_na(event_subtype) |> separate_rows(event_subtype, sep = ",|;") |> mutate(event_subtype = str_trim(event_subtype), year = as.integer(year)) ``` ## Data Snippet ```{r data_snippet} head(df) ``` ## Summary Statistics ```{r summary_stats} # Events per Year events_per_year <- df |> filter(year < 2024) |> rename(Year = year) |> group_by(Year) |> summarise(Events = n()) |> arrange(Year) events_per_year_table <- events_per_year |> knitr::kable("html", caption = "Number of Cyber Events per Year") |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |> kableExtra::row_spec(0, background = "#931e18", color = "white") |> kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") |> kableExtra::column_spec(2, bold = TRUE, color = "#2A3C4E") # Most Frequent Target Countries (Top 5) most_frequent_target_countries <- df |> group_by(country) |> summarise(Count = n()) |> arrange(desc(Count)) |> head(5) |> rename(Country = country) # Render Most Frequent Target Countries Table target_countries_table <- most_frequent_target_countries |> knitr::kable("html", caption = "Most Frequent Target Countries") |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |> kableExtra::row_spec(0, background = "#4A6073", color = "white") |> kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") # Most Frequent Actor Countries (Top 5) most_frequent_actor_countries <- df |> filter(!actor_country == "Undetermined") |> group_by(actor_country) |> summarise(Count = n()) |> arrange(desc(Count)) |> head(5) |> rename(Actor_Country = actor_country) # Render Most Frequent Actor Countries Table actor_countries_table <- most_frequent_actor_countries |> knitr::kable("html", caption = "Most Frequent Actor Countries") |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |> kableExtra::row_spec(0, background = "#da7901", color = "white") |> kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") # List of unique Actor Types and Event Types actor_types_list <- unique(df$actor_type) event_types_list <- unique(df$event_type) # Create a table for Actor Types and Event Types types_table <- data.frame( Statistic = c("Unique Actor Types", "Unique Event Types"), Value = c(paste(actor_types_list, collapse = ", "), paste(event_types_list, collapse = ", ")), stringsAsFactors = FALSE ) |> knitr::kable("html", caption = "Actor and Event Types") |> kableExtra::kable_styling(bootstrap_options = c("striped", "hover")) |> kableExtra::row_spec(0, background = "#247d3f", color = "white") |> kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") # Print all tables events_per_year_table actor_countries_table target_countries_table types_table ``` # Data exploration and visualization With our data now cleaned and ready, we dived straight into the analysis to uncover the trends our research aims to highlight. This thorough exploration enabled us to identify patterns in cyber incidents, including variations in actor types and their motivations. By scrutinizing these trends, we can gain insights into how technological advancements shape the dynamics of cyber events. ## Event Subtypes Over Time ```{r density_plot_event_subtypes} #| fig.align: "center" density_plot_df <- df |> filter(year <= 2023) # Filter data to include only up to 2023 density_plot_df |> group_by(year, event_subtype) |> summarize(count = n(), .groups = "drop") |> group_by(event_subtype) |> summarize(total_count = sum(count), .groups = "drop") |> top_n(5, total_count) |> inner_join(density_plot_df, by = "event_subtype") |> # Use the filtered dataset here group_by(year, event_subtype) |> summarize(count = n(), .groups = "drop") |> ggplot(aes(x = year, y = event_subtype, height = count, fill = event_subtype)) + geom_density_ridges(scale = 0.9, alpha = 0.9, stat = "identity") + scale_fill_manual(values = custom_palette) + labs(title = "Density Ridge Plot of Top 5 Event Subtypes Over Time", subtitle = "(2013-2023)", x = "Year", y = "Event Subtype") + theme_minimal() + theme(legend.position = "none") ``` The density ridge plot indicates that while some sub-types, such as `Message Manipulation` and `External Denial of Service`, have remained stable over time, others, like `Exploitation of Application Server` and `Data Attack`, have shown notable fluctuations. Specifically, `Exploitation of Application Server` exhibited distinct peaks, with a significant increase in frequency starting around 2018, reaching its highest point in 2021 before experiencing a slight decline. In contrast, Data Attack displayed a gradual rise beginning in 2018, reaching a substantial peak around 2021-2022, followed by a subsequent decrease. ## Proportion of Cyber Attacks by Nation-State Actors Across Industries ```{r interactive_bar_chart_industries} #| fig.align: "center" # Define critical infrastructure codes critical_infrastructure_codes <- c(51, 48, 11, 22, 62, 52, 92, 31, 33, 21) # Function to analyze actor proportions analyze_actor_proportions <- function(df) { # Add critical infrastructure flag df <- df %>% mutate(is_critical = industry_code %in% critical_infrastructure_codes) # Calculate proportions for all industries actor_proportions <- df %>% group_by(industry_code, industry) %>% summarise( total_events = n(), nation_state_events = sum(actor_type == "Nation-State"), proportion_nation_state = nation_state_events / total_events, .groups = 'drop' ) %>% arrange(desc(proportion_nation_state)) # Calculate proportions specifically for critical infrastructure critical_proportions <- df %>% group_by(is_critical) %>% summarise( total_events = n(), nation_state_events = sum(actor_type == "Nation-State"), proportion_nation_state = nation_state_events / total_events, .groups = 'drop' ) return(list( actor_proportions = actor_proportions, critical_proportions = critical_proportions )) } # Create visualizations plot_actor_proportions <- function(actor_proportions) { # Industry-specific proportions plot p1 <- ggplot(actor_proportions, aes(x = reorder(as.factor(industry_code), -proportion_nation_state), y = proportion_nation_state, text = paste("Industry:", industry, "<br>Proportion:", round(proportion_nation_state * 100, 1), "%", "<br>Nation-State Events:", nation_state_events, "<br>Total Events:", total_events))) + geom_bar(stat = "identity", fill = "#2A3C4E", width = .75) + labs( title = "Proportion of Nation-State Actors by Industry", x = "Industry Code", y = "Proportion" ) + theme( plot.title = element_text(size = 12), # Reduced title size axis.title = element_text(size = 10), # Reduced axis titles size axis.text.x = element_text(angle = 45, hjust = 1, size = 8), # Reduced x-axis text axis.text.y = element_text(size = 8), # Reduced y-axis text aspect.ratio = 1/2 ) # Convert ggplot to plotly for interactivity p1_interactive <- ggplotly(p1, tooltip = "text") return(p1_interactive) # Return the Plotly version of the plot } # Example usage: results <- analyze_actor_proportions(df) # Display tables kable(results$actor_proportions, caption = "Proportion of Nation-State Events by Industry") %>% kable_styling() kable(results$critical_proportions, caption = "Nation-State Events in Critical vs Non-Critical Infrastructure") %>% kable_styling() # Create interactive plot plot_actor_proportions(results$actor_proportions) ``` Our analysis reveals a compelling pattern in the landscape of cyber attacks, particularly when it comes to nation-state operations. Critical infrastructure, the backbone of modern society, emerges as a preferred target for nation-state actors. When nation-states strike in cyberspace, their targeting preference is stark: 70.2% of their attacks are aimed at critical infrastructure, while only 29.8% target non-critical sectors. In absolute numbers, this translates to 649 attacks on critical infrastructure versus 276 on non-critical targets, out of 925 total nation-state incidents. This isn't random chance - it reflects a deliberate targeting strategy where nation-states concentrate their efforts on sectors that could provide maximum strategic impact through potential infrastructure disruption. ## Annual Event Count Across Actor Types ![](figs/actor_race_anim.gif) Cyber threat trends from 2014 to 2023 demonstrate a fundamental transformation in attack patterns. The chart reveals that criminal actors have become the primary threat vector, with incidents increasing from 348 to 2,081 over the nine-year period. This shift coincides with a decline in amateur attacks and a rise in sophisticated operations. Nation-state-actors' activities increased from 6 to 106 incidents, while hacktivist operations showed significant variability, peaking at 658 events in 2022. The overall increase in cyber events from 633 (2014) to 2,443 (2023) reflects the intensification of cyber threats and suggests a trend toward more organized, well-resourced threat actors. ```{r gif_code, eval=FALSE} #| fig.align: "center" actor_event_count <- actor_clean_df %>% filter(year < 2024) |> group_by(year, actor_type) %>% summarise(event_count = n(), .groups = 'drop') %>% mutate( rank = rank(-event_count), # Compute rank for sorting Value_lbl = paste0(' ', round(event_count)) # Create Value_lbl for labels ) actor_race_anim <- actor_event_count %>% mutate(year = as.integer(year)) %>% # Convert year to integer ggplot(aes(x = rank, y = event_count, fill = actor_type)) + # Use fill for color geom_bar(stat = "identity", position = "dodge", width = 3, alpha = 0.8) + geom_text(aes(y = event_count, label = Value_lbl), hjust = 0) + coord_flip(clip = 'off', expand = FALSE) + scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = custom_palette) + scale_x_reverse() + theme_minimal() + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), plot.title = element_text( size = 10, hjust = 0.5, face = 'bold', colour = 'grey', vjust = -1), # Reduced title font size plot.subtitle = element_text( size = 10, hjust = 0.5, face = 'italic', color = 'grey'), plot.caption = element_text( size = 8, hjust = 0.5, face = 'italic', color = 'grey'), plot.margin = margin(0.5, 2, 0.5, 3, 'cm'), # Only vertical gridlines (major x-axis gridlines) panel.grid.major.x = element_line(color = "grey80"), panel.grid.major.y = element_blank(), # Adjust the legend size and position legend.position = c(0.9, 0.3), legend.background = element_rect(fill = 'white'), legend.title = element_text(size = 10), # Reduce legend title size legend.text = element_text(size = 8), # Reduce legend label size legend.key.size = unit(0.5, "cm") # Reduce size of the legend keys (color boxes) ) + transition_time(year) + # Use year as discrete variable view_follow(fixed_x = TRUE) + labs(title = 'Year: {frame_time}', subtitle = 'Annual Cyber Events by Actor Type, 2014-2023', fill = 'Actor Type', caption = 'Number of Events') # Animate with reduced size animate(actor_race_anim, duration = 17, end_pause = 15, width = 600, height = 500, res = 150, fps = 20, # Set FPS to a factor of 100 (20 is a factor of 100) renderer = gifski_renderer("actor_race_anim.gif")) ``` ## Distribution of Motives by Actor Type ```{r pie_charts} #| fig.align: "center" # Preprocess motive data motive_data <- df |> filter(!motive == "Undetermined") |> separate_rows(motive, sep = ",|;") |> mutate(motive = str_trim(motive)) |> group_by(actor_type, motive) |> summarize(count = n(), .groups = 'drop') |> arrange(desc(count)) # Calculate proportions for pie chart motive_data <- motive_data |> group_by(actor_type) |> mutate(proportion = count / sum(count)) |> ungroup() # Identify the largest slice for each actor type largest_labels <- motive_data |> group_by(actor_type) |> filter(proportion == max(proportion)) |> mutate( label = paste(motive, "\n", scales::percent(proportion, accuracy = 0.1)) # Create label text ) |> ungroup() # Create pie chart with labels matching slice colors ggplot(motive_data, aes(x = 1, y = proportion, fill = motive)) + geom_bar(stat = "identity", width = 1) + coord_polar(theta = "y") + facet_wrap(~ actor_type, scales = "free") + scale_fill_manual(values = custom_palette) + # Add labels at the center of each pie chart with a dynamic color box geom_label( data = largest_labels, aes( x = 0, y = 0, label = label, fill = motive ), inherit.aes = FALSE, size = 3.5, color = "white", label.padding = unit(0.2, "lines"), # Padding for the label box fontface = "bold", show.legend = FALSE # Don't include these in the legend ) + labs( title = "Distribution of Motives by Actor Type", x = NULL, y = NULL, fill = "Motive" ) + theme_minimal() + theme( axis.ticks = element_blank(), axis.text = element_blank(), # This removes both x and y axis text axis.title = element_blank(), # This removes both x and y axis titles panel.grid = element_blank(), strip.text = element_text(size = 14, face = "bold") # Larger and bolder facet titles ) ``` These charts show the proportions of attributable motives by actor type which are mostly as expected, given our existing knowledge of the topic. `Criminals` are motivated by `financial` gains. `Hacktivists` are generally protesting something. `Nation-states` and `terrorists` are committing acts of `political espionage`. The two most interesting results, however, are the `Hobbyists` and the `Unknown` actors. `Hobbyists` had a substantial proportion of `financial` motivation, but also a significant amount of `protests`, which led us to wonder how the dataset is coded between hobbyist and `Hacktivist`. We were unable to determine how these are coded based on the information provided in the source data, but it would be a potential future avenue for research. The differences between a criminal actor, a hacktivist, and a hobbyist seem as though they may be hard to pinpoint. This is something we will explore more as the project goes on. We were hoping to find some attributable qualities in the undetermined group, such as a strong linkage to a specific type, but it seems pretty distributed. Fittingly, however, that graph has the largest proportion of sabotage motivations, which makes sense given that those people are usually trying not to be identified. ## Monthly Trend Analysis ```{r monthly_line} #| fig.align: "center" # Convert event_date to Date format actor_clean_df$event_date <- as.Date(actor_clean_df$event_date) # Filter the data to include events up to 2023 actor_clean_df_filtered <- actor_clean_df %>% filter(year <= 2023) # Create a monthly summary monthly_summary <- actor_clean_df_filtered %>% group_by(year, month) %>% summarise(event_count = n()) ggplot(monthly_summary, aes(x = as.Date(paste(year, month, "01", sep = "-")), y = event_count)) + geom_line() + labs( x = "Year", y = "Number of Events", title = "Monthly Event Trend (Up to 2023)" ) + theme_minimal() + scale_x_date( date_breaks = "1 year", date_labels = "%Y", limits = as.Date(c("2014-01-01", "2023-12-31")), expand = c(0, 0) ) + # Add reference line at the start of COVID-19 (December 2019) geom_vline(xintercept = as.Date("2019-12-01"), col = "#247d3f", linetype = 'dashed', size = .7) + # Add annotation with background annotate( 'label', x = as.Date("2019-12-01"), y = 280, label = 'COVID-19 started in late 2019', color = "white", fill = "#247d3f", # Background color matching the line size = 3.2, fontface = "bold", label.padding = unit(0.3, "lines") # Add padding for better readability ) + theme( axis.text.x = element_text(angle = 45, hjust = 1), panel.grid.major = element_line(color = "lightgray", size = .25), panel.grid.minor = element_blank(), panel.border = element_rect(color = "lightgray", fill = NA, size = 1) ) ``` The chart illustrates the monthly trend of cyber events from 2014 to 2023. Between 2014 and 2019, the number of incidents grew gradually, reflecting a steady increase in cyber activities. However, starting in 2020, there is a marked rise in the frequency of cyber events, coinciding with the onset of the COVID-19 pandemic, which likely contributed to a surge in digital activity and, consequently, cyber incidents. This period is characterized by significant spikes, with the highest number of events occurring around 2022. The peaks during these years suggest increased vulnerability or heightened detection and reporting of cyber incidents. In late 2023, there is a noticeable decline in the number of events, although this could be due to delayed reporting for more recent months. Overall, the chart indicates a shift from relatively stable trends pre-2020 to a more volatile and elevated rate of cyber events in the last few years, highlighting the growing prominence and complexity of cyber security challenges. # Conclusion Our analysis reveals significant changes in the frequency, nature, and impact of cyber events over the past decade. The increasing dominance of sophisticated actors, such as nation-states and organized criminal groups, underscores the shift from amateur attacks to highly strategic operations. Critical infrastructure remains a primary target, reflecting the strategic importance of these sectors for nation-state actors. The rise in data breaches and exploitation of application servers highlights the growing reliance on digital systems and the vulnerabilities they introduce. Temporal trends, such as the surge during the COVID-19 pandemic, show how global crises can amplify cyber threats, while a late-2023 decline may signal delays in reporting or shifts in attacker strategies. Analysis of motives confirms known patterns—criminals are financially driven, hacktivists pursue ideological goals, and nation-states engage in espionage—while posing questions about underexplored categories like hobbyists and undetermined actors. These findings emphasize the need for adaptive, motive-specific defenses and robust real-time monitoring systems. Our report provides a foundation for further research into the interplay of actors, motives, and industry-specific impacts. Future analysis will delve deeper into the relationships between attacker nations and their targets, refine motive categorizations, and explore the language patterns in event descriptions. These efforts aim to better equip policymakers and organizations to combat the evolving landscape of cyber threats. # Attribution All members contributed equally. # Appendix **Data dictionary** ```{r appendix} #| echo: false #| fig.align: "center" data_dict <- tibble::tribble( ~variable, ~class, ~description, "event_date", "character", "Date or estimated date that the event occurred in DD-MM-YYYY format.", "year", "integer", "Year event occurred in YYYY format.", "month", "character", "Month of the event in two-digit format.", "actor", "character", "Name of the organization or individual responsible for the event; 'undetermined' if unknown.", "actor_type", "character", "Nature of the actor responsible for the event (e.g., Criminal, Nation-State, Terrorist, Hacktivist, Hobbyist).", "organization", "character", "Name of the target organization whose networks were breached.", "industry_code", "integer", "Two-digit NAICS code defining the target organization.", "industry", "character", "Name of the NAICS code category.", "motive", "character", "Intended results sought by the actor (e.g., Protest, Sabotage, Espionage, Financial).", "event_type", "character", "Type of event (e.g., Disruptive, Exploitive, Mixed).", "event_subtype", "character", "Further classification of the event based on the impacted part of the organization’s IT infrastructure.", "description", "character", "Details of the event in 1-3 sentences.", "source_url", "character", "URL from which the data was pulled.", "country", "character", "Full name of the country for the target organization’s location.", "actor_country", "character", "Full name of the country for the actor’s location." ) # Render as a scrollable table datatable(data_dict, options = list(scrollX = TRUE)) ```