Primary Research Question
How have the tactics, targets, and motives of cyber actors evolved over the last decade?
How have the tactics, targets, and motives of cyber actors evolved over the last decade?
The number of cyber events has surged in recent years, with a reported 38% increase in 2022 (Checkpoint Blog, 2023). As digital infrastructure becomes essential to daily life, military operations, healthcare, and education, cybercriminals have adapted, employing increasingly sophisticated tactics. Cyber events encompass a wide range of activities, including social media manipulation and fake news generation, prompting an important question: beyond frequency, what else has changed in this landscape?
Despite advancements in cybersecurity, accurately attributing these events to their perpetrators remains a significant challenge. Attackers frequently utilize evasion techniques, such as VPNs and proxy servers, to obscure their identity, location, and motives. False flag operations complicate this issue, as misattributions can have severe consequences. Nation-states have increasingly participated in this arena, posing serious threats to global stability. The disruption caused by an accidental software glitch earlier this year illustrates the risks involved—if such incidents can lead to widespread damage, the potential impact of a malicious actor could be catastrophic.
This research explores the evolution of cyber events over the past decade, focusing on key questions such as whether we have improved in attributing these events, how nation-states are leveraging cyber attacks to gain advantages, and which nations are the most frequent perpetrators. We will also investigate whether industries have been targeted differently based on perpetrator type and event frequency, along with examining relationships between actor countries and their targets. Lastly, we will conduct text analysis of event descriptions to identify patterns in motives and the descriptive language used. These insights aim to inform future defense strategies, helping businesses and governments better manage the growing threat of cyber events.
The dataset used for this analysis is sourced from the (University of Maryland School of Public Policy, CISSM-Center for International & Security Studies)
Reference
Harry, C., & Gallagher, N. (2018). Classifying Cyber Events. Journal of Information Warfare, 17(3), 17-31.
Description
This dataset includes cyber events recorded from January 2014 to August 2024, detailing various aspects of each incident, such as the type of threat actor, their motives, targeted industries, and the outcomes of the events. It comprises 15 columns and 13,841 rows, pre-processed by the CISSM research team to ensure accuracy and completeness.
The data is collected through a Python script designed to scrape information from reputable sources on both the open and dark web. This script automatically extracts key details, including publication date, title, and URL, which are subsequently reviewed and coded by the CISSM team. Each event is classified using a standard taxonomy, with attributions drawn from the original source material rather than the team’s interpretation.
While the dataset provides valuable insights into cyber event patterns and trends, it does come with limitations. The scraping process is restricted to specific URLs, which may result in the omission of some events. Furthermore, since the sources are human-generated, they can be subject to biases, inaccuracies, or misleading information. Despite these concerns, we believe that the dataset and its underlying framework will yield high-quality insights into the evolving landscape of cyber events.
# Cleaning data
<- df |>
df mutate(event_subtype = str_replace_all(event_subtype, "\\bServer\\b", "Servers"),
event_subtype = str_replace_all(event_subtype, "\\bService\\b", "Services"),
event_subtype = str_replace_all(event_subtype, "\\bSensor\\b", "Sensors"),
event_subtype = str_replace_all(event_subtype, "\\bHost\\b", "Hosts"),
event_subtype = str_replace_all(event_subtype, "\\bUndetermined\\b", "Unknown"),
event_subtype = str_replace_all(event_subtype, "\\bUser\\b", "Users"),
actor_type = str_replace_all(actor_type, "\\bHacktvist\\b", "Hacktivist"),
industry_code = case_when(
== 92 & industry == "Professional, Scientific, and Technical Services" ~ 54,
industry_code == 99 & industry == "Public Administration" ~ 92,
industry_code == "Medusa" ~ industry_code, # Preserve the industry code here
industry TRUE ~ industry_code),
industry = case_when(
== "Medusa" ~ "Undetermined",
industry TRUE ~ industry)) |>
drop_na(event_subtype) |>
separate_rows(event_subtype, sep = ",|;") |>
mutate(event_subtype = str_trim(event_subtype),
year = as.integer(year))
head(df)
# A tibble: 6 × 15
event_date year month actor actor_type organization industry_code industry
<chr> <int> <chr> <chr> <chr> <chr> <dbl> <chr>
1 2014-01-01 2014 01 Undeter… Criminal Barry Unive… 61 Educati…
2 2014-01-01 2014 01 Undeter… Criminal Record Assi… 54 Profess…
3 2014-01-01 2014 01 Syrian … Hacktivist Skype's Soc… 54 Profess…
4 2014-01-02 2014 01 Undeter… Criminal Snapchat 51 Informa…
5 2014-01-03 2014 01 DERP Tr… Undetermi… Battle.net 51 Informa…
6 2014-01-03 2014 01 DERP Tr… Undetermi… Club Penguin 51 Informa…
# ℹ 7 more variables: motive <chr>, event_type <chr>, event_subtype <chr>,
# description <chr>, source_url <chr>, country <chr>, actor_country <chr>
# Events per Year
<- df |>
events_per_year filter(year < 2024) |>
rename(Year = year) |>
group_by(Year) |>
summarise(Events = n()) |>
arrange(Year)
<- events_per_year |>
events_per_year_table ::kable("html", caption = "Number of Cyber Events per Year") |>
knitr::kable_styling(bootstrap_options = c("striped", "hover")) |>
kableExtra::row_spec(0, background = "#931e18", color = "white") |>
kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E") |>
kableExtra::column_spec(2, bold = TRUE, color = "#2A3C4E")
kableExtra
# Most Frequent Target Countries (Top 5)
<- df |>
most_frequent_target_countries group_by(country) |>
summarise(Count = n()) |>
arrange(desc(Count)) |>
head(5) |>
rename(Country = country)
# Render Most Frequent Target Countries Table
<- most_frequent_target_countries |>
target_countries_table ::kable("html", caption = "Most Frequent Target Countries") |>
knitr::kable_styling(bootstrap_options = c("striped", "hover")) |>
kableExtra::row_spec(0, background = "#4A6073", color = "white") |>
kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E")
kableExtra
# Most Frequent Actor Countries (Top 5)
<- df |>
most_frequent_actor_countries filter(!actor_country == "Undetermined") |>
group_by(actor_country) |>
summarise(Count = n()) |>
arrange(desc(Count)) |>
head(5) |>
rename(Actor_Country = actor_country)
# Render Most Frequent Actor Countries Table
<- most_frequent_actor_countries |>
actor_countries_table ::kable("html", caption = "Most Frequent Actor Countries") |>
knitr::kable_styling(bootstrap_options = c("striped", "hover")) |>
kableExtra::row_spec(0, background = "#da7901", color = "white") |>
kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E")
kableExtra
# List of unique Actor Types and Event Types
<- unique(df$actor_type)
actor_types_list <- unique(df$event_type)
event_types_list
# Create a table for Actor Types and Event Types
<- data.frame(
types_table Statistic = c("Unique Actor Types", "Unique Event Types"),
Value = c(paste(actor_types_list, collapse = ", "),
paste(event_types_list, collapse = ", ")),
stringsAsFactors = FALSE
|>
) ::kable("html", caption = "Actor and Event Types") |>
knitr::kable_styling(bootstrap_options = c("striped", "hover")) |>
kableExtra::row_spec(0, background = "#247d3f", color = "white") |>
kableExtra::column_spec(1, bold = TRUE, color = "#2A3C4E")
kableExtra
# Print all tables
events_per_year_table
Year | Events |
---|---|
2014 | 667 |
2015 | 941 |
2016 | 1209 |
2017 | 874 |
2018 | 946 |
2019 | 1225 |
2020 | 2221 |
2021 | 1784 |
2022 | 2618 |
2023 | 2446 |
actor_countries_table
Actor_Country | Count |
---|---|
Russian Federation | 2043 |
China | 213 |
Korea (the Democratic People's Republic of) | 172 |
United States of America | 160 |
Ukraine | 136 |
target_countries_table
Country | Count |
---|---|
United States of America | 7410 |
United Kingdom of Great Britain and Northern Ireland | 844 |
Italy | 455 |
Canada | 417 |
Ukraine | 417 |
types_table
Statistic | Value |
---|---|
Unique Actor Types | Criminal, Hacktivist, Undetermined, Hobbyist, Nation-State, Terrorist |
Unique Event Types | Exploitive, Disruptive, Mixed, Undetermined |
With our data now cleaned and ready, we dived straight into the analysis to uncover the trends our research aims to highlight. This thorough exploration enabled us to identify patterns in cyber incidents, including variations in actor types and their motivations. By scrutinizing these trends, we can gain insights into how technological advancements shape the dynamics of cyber events.
<- df |>
density_plot_df filter(year <= 2023) # Filter data to include only up to 2023
|>
density_plot_df group_by(year, event_subtype) |>
summarize(count = n(), .groups = "drop") |>
group_by(event_subtype) |>
summarize(total_count = sum(count), .groups = "drop") |>
top_n(5, total_count) |>
inner_join(density_plot_df, by = "event_subtype") |> # Use the filtered dataset here
group_by(year, event_subtype) |>
summarize(count = n(), .groups = "drop") |>
ggplot(aes(x = year, y = event_subtype, height = count, fill = event_subtype)) +
geom_density_ridges(scale = 0.9, alpha = 0.9, stat = "identity") +
scale_fill_manual(values = custom_palette) +
labs(title = "Density Ridge Plot of Top 5 Event Subtypes Over Time",
subtitle = "(2013-2023)",
x = "Year",
y = "Event Subtype") +
theme_minimal() +
theme(legend.position = "none")
The density ridge plot indicates that while some sub-types, such as Message Manipulation
and External Denial of Service
, have remained stable over time, others, like Exploitation of Application Server
and Data Attack
, have shown notable fluctuations.
Specifically, Exploitation of Application Server
exhibited distinct peaks, with a significant increase in frequency starting around 2018, reaching its highest point in 2021 before experiencing a slight decline. In contrast, Data Attack displayed a gradual rise beginning in 2018, reaching a substantial peak around 2021-2022, followed by a subsequent decrease.
# Define critical infrastructure codes
<- c(51, 48, 11, 22, 62, 52, 92, 31, 33, 21)
critical_infrastructure_codes
# Function to analyze actor proportions
<- function(df) {
analyze_actor_proportions # Add critical infrastructure flag
<- df %>%
df mutate(is_critical = industry_code %in% critical_infrastructure_codes)
# Calculate proportions for all industries
<- df %>%
actor_proportions group_by(industry_code, industry) %>%
summarise(
total_events = n(),
nation_state_events = sum(actor_type == "Nation-State"),
proportion_nation_state = nation_state_events / total_events,
.groups = 'drop'
%>%
) arrange(desc(proportion_nation_state))
# Calculate proportions specifically for critical infrastructure
<- df %>%
critical_proportions group_by(is_critical) %>%
summarise(
total_events = n(),
nation_state_events = sum(actor_type == "Nation-State"),
proportion_nation_state = nation_state_events / total_events,
.groups = 'drop'
)
return(list(
actor_proportions = actor_proportions,
critical_proportions = critical_proportions
))
}
# Create visualizations
<- function(actor_proportions) {
plot_actor_proportions # Industry-specific proportions plot
<- ggplot(actor_proportions,
p1 aes(x = reorder(as.factor(industry_code), -proportion_nation_state),
y = proportion_nation_state,
text = paste("Industry:", industry,
"<br>Proportion:", round(proportion_nation_state * 100, 1), "%",
"<br>Nation-State Events:", nation_state_events,
"<br>Total Events:", total_events))) +
geom_bar(stat = "identity", fill = "#2A3C4E", width = .75) +
labs(
title = "Proportion of Nation-State Actors by Industry",
x = "Industry Code",
y = "Proportion"
+
) theme(
plot.title = element_text(size = 12), # Reduced title size
axis.title = element_text(size = 10), # Reduced axis titles size
axis.text.x = element_text(angle = 45, hjust = 1, size = 8), # Reduced x-axis text
axis.text.y = element_text(size = 8), # Reduced y-axis text
aspect.ratio = 1/2
)
# Convert ggplot to plotly for interactivity
<- ggplotly(p1, tooltip = "text")
p1_interactive
return(p1_interactive) # Return the Plotly version of the plot
}
# Example usage:
<- analyze_actor_proportions(df)
results
# Display tables
kable(results$actor_proportions,
caption = "Proportion of Nation-State Events by Industry") %>%
kable_styling()
industry_code | industry | total_events | nation_state_events | proportion_nation_state |
---|---|---|---|---|
21 | Mining, Quarrying, and Oil and Gas Extraction | 77 | 15 | 0.1948052 |
99 | Undetermined | 236 | 44 | 0.1864407 |
49 | Transportation and Warehousing | 6 | 1 | 0.1666667 |
22 | Utilities | 304 | 45 | 0.1480263 |
92 | Public Administration | 2896 | 303 | 0.1046271 |
81 | Other Services (except Public Administration) | 1042 | 108 | 0.1036468 |
51 | Information | 1515 | 128 | 0.0844884 |
55 | Management of Companies and Enterprises | 25 | 2 | 0.0800000 |
48 | Transportation and Warehousing | 477 | 34 | 0.0712788 |
31 | Manufacturing | 615 | 32 | 0.0520325 |
52 | Finance and Insurance | 1425 | 72 | 0.0505263 |
54 | Professional, Scientific, and Technical Services | 1290 | 54 | 0.0418605 |
11 | Agriculture, Forestry, Fishing and Hunting | 24 | 1 | 0.0416667 |
71 | Arts, Entertainment, and Recreation | 448 | 14 | 0.0312500 |
44 | Retail Trade | 499 | 14 | 0.0280561 |
56 | Administrative and Support and Waste Management and Remediation Services | 191 | 5 | 0.0261780 |
61 | Educational Services | 1485 | 30 | 0.0202020 |
33 | Manufacturing | 73 | 1 | 0.0136986 |
53 | Real Estate and Rental and Leasing | 97 | 1 | 0.0103093 |
62 | Health Care and Social Assistance | 2081 | 18 | 0.0086497 |
42 | Wholesale Trade | 121 | 1 | 0.0082645 |
72 | Accommodation and Food Services | 323 | 2 | 0.0061920 |
23 | Construction | 51 | 0 | 0.0000000 |
32 | Manufacturing | 14 | 0 | 0.0000000 |
45 | Retail Trade | 11 | 0 | 0.0000000 |
kable(results$critical_proportions,
caption = "Nation-State Events in Critical vs Non-Critical Infrastructure") %>%
kable_styling()
is_critical | total_events | nation_state_events | proportion_nation_state |
---|---|---|---|
FALSE | 5839 | 276 | 0.0472684 |
TRUE | 9487 | 649 | 0.0684094 |
# Create interactive plot
plot_actor_proportions(results$actor_proportions)
Our analysis reveals a compelling pattern in the landscape of cyber attacks, particularly when it comes to nation-state operations. Critical infrastructure, the backbone of modern society, emerges as a preferred target for nation-state actors. When nation-states strike in cyberspace, their targeting preference is stark: 70.2% of their attacks are aimed at critical infrastructure, while only 29.8% target non-critical sectors.
In absolute numbers, this translates to 649 attacks on critical infrastructure versus 276 on non-critical targets, out of 925 total nation-state incidents. This isn’t random chance - it reflects a deliberate targeting strategy where nation-states concentrate their efforts on sectors that could provide maximum strategic impact through potential infrastructure disruption.
Cyber threat trends from 2014 to 2023 demonstrate a fundamental transformation in attack patterns. The chart reveals that criminal actors have become the primary threat vector, with incidents increasing from 348 to 2,081 over the nine-year period. This shift coincides with a decline in amateur attacks and a rise in sophisticated operations. Nation-state-actors’ activities increased from 6 to 106 incidents, while hacktivist operations showed significant variability, peaking at 658 events in 2022. The overall increase in cyber events from 633 (2014) to 2,443 (2023) reflects the intensification of cyber threats and suggests a trend toward more organized, well-resourced threat actors.
<- actor_clean_df %>%
actor_event_count filter(year < 2024) |>
group_by(year, actor_type) %>%
summarise(event_count = n(), .groups = 'drop') %>%
mutate(
rank = rank(-event_count), # Compute rank for sorting
Value_lbl = paste0(' ', round(event_count)) # Create Value_lbl for labels
)
<- actor_event_count %>%
actor_race_anim mutate(year = as.integer(year)) %>% # Convert year to integer
ggplot(aes(x = rank, y = event_count, fill = actor_type)) + # Use fill for color
geom_bar(stat = "identity", position = "dodge", width = 3, alpha = 0.8) +
geom_text(aes(y = event_count, label = Value_lbl), hjust = 0) +
coord_flip(clip = 'off', expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_fill_manual(values = custom_palette) +
scale_x_reverse() +
theme_minimal() +
theme(
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
plot.title = element_text(
size = 10, hjust = 0.5, face = 'bold',
colour = 'grey', vjust = -1), # Reduced title font size
plot.subtitle = element_text(
size = 10, hjust = 0.5,
face = 'italic', color = 'grey'),
plot.caption = element_text(
size = 8, hjust = 0.5,
face = 'italic', color = 'grey'),
plot.margin = margin(0.5, 2, 0.5, 3, 'cm'),
# Only vertical gridlines (major x-axis gridlines)
panel.grid.major.x = element_line(color = "grey80"),
panel.grid.major.y = element_blank(),
# Adjust the legend size and position
legend.position = c(0.9, 0.3),
legend.background = element_rect(fill = 'white'),
legend.title = element_text(size = 10), # Reduce legend title size
legend.text = element_text(size = 8), # Reduce legend label size
legend.key.size = unit(0.5, "cm") # Reduce size of the legend keys (color boxes)
+
) transition_time(year) + # Use year as discrete variable
view_follow(fixed_x = TRUE) +
labs(title = 'Year: {frame_time}',
subtitle = 'Annual Cyber Events by Actor Type, 2014-2023',
fill = 'Actor Type',
caption = 'Number of Events')
# Animate with reduced size
animate(actor_race_anim, duration = 17, end_pause = 15,
width = 600, height = 500, res = 150,
fps = 20, # Set FPS to a factor of 100 (20 is a factor of 100)
renderer = gifski_renderer("actor_race_anim.gif"))
# Preprocess motive data
<- df |>
motive_data filter(!motive == "Undetermined") |>
separate_rows(motive, sep = ",|;") |>
mutate(motive = str_trim(motive)) |>
group_by(actor_type, motive) |>
summarize(count = n(), .groups = 'drop') |>
arrange(desc(count))
# Calculate proportions for pie chart
<- motive_data |>
motive_data group_by(actor_type) |>
mutate(proportion = count / sum(count)) |>
ungroup()
# Identify the largest slice for each actor type
<- motive_data |>
largest_labels group_by(actor_type) |>
filter(proportion == max(proportion)) |>
mutate(
label = paste(motive, "\n", scales::percent(proportion, accuracy = 0.1)) # Create label text
|>
) ungroup()
# Create pie chart with labels matching slice colors
ggplot(motive_data, aes(x = 1, y = proportion, fill = motive)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
facet_wrap(~ actor_type, scales = "free") +
scale_fill_manual(values = custom_palette) +
# Add labels at the center of each pie chart with a dynamic color box
geom_label(
data = largest_labels,
aes(
x = 0,
y = 0,
label = label,
fill = motive
),inherit.aes = FALSE,
size = 3.5,
color = "white",
label.padding = unit(0.2, "lines"), # Padding for the label box
fontface = "bold",
show.legend = FALSE # Don't include these in the legend
+
) labs(
title = "Distribution of Motives by Actor Type",
x = NULL,
y = NULL,
fill = "Motive"
+
) theme_minimal() +
theme(
axis.ticks = element_blank(),
axis.text = element_blank(), # This removes both x and y axis text
axis.title = element_blank(), # This removes both x and y axis titles
panel.grid = element_blank(),
strip.text = element_text(size = 14, face = "bold") # Larger and bolder facet titles
)
These charts show the proportions of attributable motives by actor type which are mostly as expected, given our existing knowledge of the topic. Criminals
are motivated by financial
gains. Hacktivists
are generally protesting something. Nation-states
and terrorists
are committing acts of political espionage
.
The two most interesting results, however, are the Hobbyists
and the Unknown
actors. Hobbyists
had a substantial proportion of financial
motivation, but also a significant amount of protests
, which led us to wonder how the dataset is coded between hobbyist and Hacktivist
. We were unable to determine how these are coded based on the information provided in the source data, but it would be a potential future avenue for research. The differences between a criminal actor, a hacktivist, and a hobbyist seem as though they may be hard to pinpoint.
This is something we will explore more as the project goes on. We were hoping to find some attributable qualities in the undetermined group, such as a strong linkage to a specific type, but it seems pretty distributed. Fittingly, however, that graph has the largest proportion of sabotage motivations, which makes sense given that those people are usually trying not to be identified.
# Convert event_date to Date format
$event_date <- as.Date(actor_clean_df$event_date)
actor_clean_df
# Filter the data to include events up to 2023
<- actor_clean_df %>%
actor_clean_df_filtered filter(year <= 2023)
# Create a monthly summary
<- actor_clean_df_filtered %>%
monthly_summary group_by(year, month) %>%
summarise(event_count = n())
ggplot(monthly_summary, aes(x = as.Date(paste(year, month, "01", sep = "-")), y = event_count)) +
geom_line() +
labs(
x = "Year",
y = "Number of Events",
title = "Monthly Event Trend (Up to 2023)"
+
) theme_minimal() +
scale_x_date(
date_breaks = "1 year",
date_labels = "%Y",
limits = as.Date(c("2014-01-01", "2023-12-31")),
expand = c(0, 0)
+
) # Add reference line at the start of COVID-19 (December 2019)
geom_vline(xintercept = as.Date("2019-12-01"), col = "#247d3f", linetype = 'dashed', size = .7) +
# Add annotation with background
annotate(
'label',
x = as.Date("2019-12-01"),
y = 280,
label = 'COVID-19 started in late 2019',
color = "white",
fill = "#247d3f", # Background color matching the line
size = 3.2,
fontface = "bold",
label.padding = unit(0.3, "lines") # Add padding for better readability
+
) theme(
axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major = element_line(color = "lightgray", size = .25),
panel.grid.minor = element_blank(),
panel.border = element_rect(color = "lightgray", fill = NA, size = 1)
)
The chart illustrates the monthly trend of cyber events from 2014 to 2023. Between 2014 and 2019, the number of incidents grew gradually, reflecting a steady increase in cyber activities.
However, starting in 2020, there is a marked rise in the frequency of cyber events, coinciding with the onset of the COVID-19 pandemic, which likely contributed to a surge in digital activity and, consequently, cyber incidents. This period is characterized by significant spikes, with the highest number of events occurring around 2022. The peaks during these years suggest increased vulnerability or heightened detection and reporting of cyber incidents.
In late 2023, there is a noticeable decline in the number of events, although this could be due to delayed reporting for more recent months. Overall, the chart indicates a shift from relatively stable trends pre-2020 to a more volatile and elevated rate of cyber events in the last few years, highlighting the growing prominence and complexity of cyber security challenges.
Our analysis reveals significant changes in the frequency, nature, and impact of cyber events over the past decade. The increasing dominance of sophisticated actors, such as nation-states and organized criminal groups, underscores the shift from amateur attacks to highly strategic operations. Critical infrastructure remains a primary target, reflecting the strategic importance of these sectors for nation-state actors. The rise in data breaches and exploitation of application servers highlights the growing reliance on digital systems and the vulnerabilities they introduce.
Temporal trends, such as the surge during the COVID-19 pandemic, show how global crises can amplify cyber threats, while a late-2023 decline may signal delays in reporting or shifts in attacker strategies. Analysis of motives confirms known patterns—criminals are financially driven, hacktivists pursue ideological goals, and nation-states engage in espionage—while posing questions about underexplored categories like hobbyists and undetermined actors. These findings emphasize the need for adaptive, motive-specific defenses and robust real-time monitoring systems.
Our report provides a foundation for further research into the interplay of actors, motives, and industry-specific impacts. Future analysis will delve deeper into the relationships between attacker nations and their targets, refine motive categorizations, and explore the language patterns in event descriptions. These efforts aim to better equip policymakers and organizations to combat the evolving landscape of cyber threats.
All members contributed equally.
Data dictionary