Project Proposal

Shark Incidents in California

Author

Luis Bracho, Diego Farah, Diego Paredes

Published

December 9, 2024

Introduction

Shark incidents have long fascinated researchers and the public alike, given the potentially fatal nature of these encounters and their connection to popular beach activities. This project aims to explore the characteristics of shark incidents in California, specifically focusing on the types of injuries sustained (fatal, major, minor or none). We want to derive using data, the probability that if a shark attacks someone, that person would be dead.

We want to explore, the common activities during attacks (scuba diving, swimming, surfing, etc.) and whether particular shark species are more likely to be involved.

The last question we want to approach in this study, is to explore which particular shark species is more likely to be involved.

The study will provide valuable exploratory insights into the patterns of these incidents, which could help inform safety measures for coastal activities. By analyzing historical data on shark incidents, this research will help understand the frequency of attacks and potentially identify the environmental and behavioral factors that correlate with different injury types.

Research Question

The topic of this project is the exploration of shark incidents in California based on a dataset provided by the California Department of Fish and Wildlife. The research questions will focus on identifying patterns in the data rather than establishing causality. For example, some of the research questions include:

What factors affect shark incidents in California, including species, activities, locations, and injury severity?

Data Sources

The primary data source for this project is the California Department of Fish and Wildlife’s dataset on shark incidents, last updated in March 2024. This dataset includes a wide range of variables, such as the date, time, location, water depth, type of human activity, species of shark involved, and the severity of injuries. The data is original and directly collected from incidents reported in California, ensuring its reliability. However, there may be some concerns related to incomplete data entries or inconsistencies in how certain variables were recorded. These issues will be addressed through data cleaning and preparation. For example, the depth and injury fields will need to be standardized, and any missing or ambiguous entries will be carefully handled to maintain the integrity of the analysis.

Link 1: https://wildlife.ca.gov/Conservation/Marine/White-Shark.

Link 2: https://wildlife.ca.gov/Data/Sci-Data.

Link 3: https://catalog.data.gov/dataset/shark-incident-database-california-56167.

Link 4: https://animalbiotelemetry.biomedcentral.com/articles/10.1186/2050-3385-1-2.

Link 4 is quite important as present a study titled “Two-year migration of adult female white sharks (Carcharodon carcharias) reveals widely separated nursery areas and conservation concerns”, published in Animal Biotelemetry, investigates the migratory patterns and ecological significance of adult female white sharks. Utilizing satellite tracking data collected over two years, the research highlights the extensive movement and habitat usage of these apex predators.

Key findings of the study include:

Identification of Migration Routes: Adult female white sharks exhibit long-distance migratory behavior, connecting distinct geographic regions, including widely separated nursery areas.

Results

1. Is there a lethal shark, or a species that tries to attack more than others?

First We create a pie chart to see the most common species involved in attacks.

Code
library(dplyr)
library(plotly)
species_counts <- data %>%
  count(Species) %>%
  mutate(Species = ifelse(Species == "White", "White Shark", "Other")) %>%
  group_by(Species) %>%
  summarise(n = sum(n)) %>%
  ungroup() %>%
  mutate(percentage = n / sum(n) * 100,
         label = paste0(Species, " (", round(percentage, 1), "%)"))

# Define custom colors
colors <- c("White Shark" = "#00008B", "Other" = "#ADD8E6") 

# Create the pie chart
fig <- plot_ly(
  data = species_counts,
  labels = ~Species,
  values = ~n,
  textinfo = "label+percent",
  hoverinfo = "text",
  text = ~label,
  type = "pie",
  marker = list(colors = colors[species_counts$Species])
)

fig <- fig %>%
  layout(
    title = "Distribution of Shark Species (White Shark vs Other)",
    showlegend = TRUE
  )

fig

Based on the data shown in the pie chart, it is clear that White Sharks are responsible for the majority of incidents, representing 88.6% of the total cases analyzed. Given this overwhelming representation, it is logical and relevant for us to focus primarily on White Sharks.

2. What relationship is between the activity and the Injury type?

To answer this interesting question we created a bar plot for each type of injury, in which we will see which species are involved in each type of injury.

Code
injury_mode_count <- data %>%
  count(Mode, Injury) %>%
  rename(injury_count = 'n')

total_incidents_per_mode <- injury_mode_count %>%
  group_by(Mode) %>%
  summarise(total_incidents = sum(injury_count)) %>%
  arrange(desc(total_incidents))

injury_mode_count$Mode <- factor(injury_mode_count$Mode, 
                                 levels = total_incidents_per_mode$Mode)

ggplot(injury_mode_count, aes(x = Mode, y = injury_count, fill = Mode)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = injury_count), hjust = -0.5, size = 3) + 
  coord_flip() +
  labs(
    title = "Injury Types by Activity",
    x = "Activity",
    y = "Number of Incidents"
  ) +
  scale_fill_viridis_d(option = "plasma", guide = "none") + 
  facet_wrap(~ Injury, scales = "free_y") + 
  expand_limits(y = max(injury_mode_count$injury_count) * 1.1) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    strip.text = element_text(size = 12, face = "bold")
  )

The data suggests Surfing/Boarding is the activity most associated with shark interactions across all injury levels, making it the highest-risk activity. Freediving follows as another high-risk activity, particularly for major and minor injuries. These patterns can help guide safety measures, such as targeted awareness campaigns or safety protocols for high-risk activities.

3. What types of injuries are most common in shark incidents, if a shark attacks you… will it kill you for sure?

To answer this we created a waffle chart in which we will see the proportion of the type of damage, we will focus on the number of fatal accidents and also the incidents that did not present any damage.

Code
injury_counts <- data %>%
  count(Injury) %>%
  mutate(percentage = n / sum(n) * 100,
         label = paste0(round(percentage, 1), "%")) 

injury_counts_vector <- setNames(injury_counts$n, injury_counts$Injury)

waffle_chart <- waffle(injury_counts_vector, rows = 10, 
                       colors = RColorBrewer::brewer.pal(n = length(injury_counts$Injury), "Set3"),
                       title = "Distribution of Injury Types")

total_cells <- sum(injury_counts_vector)

waffle_data <- expand.grid(row = 1:10, col = 1:ceiling(total_cells / 10))
waffle_data <- waffle_data[1:total_cells, ]
waffle_data$group <- rep(names(injury_counts_vector), injury_counts_vector)

colors <- RColorBrewer::brewer.pal(n = length(injury_counts$Injury), "Set3")
waffle_data$color <- unlist(lapply(waffle_data$group, function(g) {
  colors[which(names(injury_counts_vector) == g)]
}))

fig <- plot_ly(data = waffle_data,x = ~col,y = ~row,color = ~group,
               colors = colors,type = "scatter",mode = "markers",
            marker = list(size = 15, symbol = "square")) %>%
  layout(title = "Distribution of Injury Types",
        xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
        yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

fig

This allows us to better understand the severity of shark attacks and dispel common misconceptions, such as the assumption that all shark attacks are fatal. By focusing on the number of fatal injuries and cases with no damage, the chart highlights the variability in outcomes and emphasizes that survival without major injury is a significant possibility in shark encounters.

4. Is there a town or city where attacks are more common?

Code
county_counts <- data %>%
  count(County) %>%
  arrange(desc(n))

p <- ggplot(county_counts, aes(x = reorder(County, n), y = n, fill = County)) +
  geom_bar(stat = "identity", fill = "gray40") +
  geom_text(aes(label = n), hjust = -0.1, size = 3.5, color = "black") +
  coord_flip() +
  labs(
    title = "Shark Incidents per County",
    x = "County",
    y = "Number of Incidents"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold"),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    legend.position = "none"
  )
p

From the analysis of shark incidents by county in California, San Diego County has the highest number of incidents with 23, followed by Santa Barbara County (19) and San Mateo County (18). These counties, particularly those with high incident counts, may represent areas where shark-human interactions are more common due to a combination of environmental factors, human activities, and marine biodiversity.

Counties like Humboldt, Marin, Sonoma, and Santa Cruz also show notable numbers of incidents, indicating potential hotspots for shark activity. However, several counties such as Ventura, San Francisco, and various islands have significantly fewer incidents, suggesting these areas might be less prone to shark interactions.

This distribution highlights the importance of focusing safety measures and awareness campaigns in counties with higher incident rates, particularly in San Diego, Santa Barbara, and San Mateo, to minimize risks and improve public safety in these regions.

5. Is there a month or time when more accidents occur?

Code
# Extract month from the Date column and create a month factor
data$Month <- format(as.Date(data$Date, format = "%Y-%m-%d"), "%B")
data$Month <- factor(data$Month, levels = month.name, ordered = TRUE)

# Count the number of incidents per month
month_counts <- data %>%
  count(Month) %>%
  arrange(Month)

# Plot the number of shark incidents by month
p<-ggplot(month_counts, aes(x = Month, y = n, group = 1)) +
  geom_line(color = "darkred", size = 1) +
  geom_point(color = "darkred", size = 2) +
  labs(title = "Shark Incidents by Month",
       x = "Month",
       y = "Number of Incidents") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(color = "Coral",size = 16, face = "bold"),
        axis.title.x = element_text(size = 16, face = "bold"),
        axis.title.y = element_text(size = 16, face = "bold"))

ggplotly(p)

According to information taking from: “Two-year migration of adult female white sharks (Carcharodon carcharias) reveals widely separated nursery areas and conservation concerns” The white shark mating season occurs in spring or summer, in temperate waters, We conclude that the highest number of accidents occur at the end of the mating seasons.

But there was an even more important issue, we have enough evidence that the highest number of accidents occur during surfing activities by far, so it is important to know on what dates this activity is practiced according to information taken from… the month with the best weather for surfing is October. The other activities with the highest number of accidents are apnea and swimming, which for reasons of weather and vacations are practiced in summer from June to October, which makes sense given the time series graph shown above.

Conclusion

The present study attempts to describe the characteristics and distribution of shark incidents in California; the species involved, human use of the water, and ecological conditions. The analysis showed that white sharks are responsible for the majority of incidents, with surfing-related and other water-based activities forming the majority of higher-risk interactions. Seasonality and specific geographic hotspots also became standout factors.

These findings emphasize how meaningful safety measures should be selectively implemented, such as raising awareness among surfers and swimmers during peak months of activity in high-risk counties like San Diego. Besides, the knowledge of behavioral traits and migratory patterns will help in conservation efforts and also in improving coexistence between humans and marine life.

Although the dataset was of great value, limitations include incomplete records and possible biases in reporting. In addition, future studies can build from this research by investigating how environmental factors like water temperature and the presence of prey impact shark activity.

Ultimately, this study will further improve the understanding of shark incidents in California and develop appropriate public safety strategies with a balanced perspective on the ecological role of sharks.

The analysis of shark incidents in California provides valuable insights into patterns and contributing factors. Key findings indicate that White Sharks are overwhelmingly responsible for incidents, making them a central focus for safety strategies. Activities like surfing and freediving present the highest risk, and counties like San Diego and Santa Barbara show greater incident prevalence. Seasonal trends reveal that accidents peak during summer and fall, aligning with increased human activity in the water and shark mating seasons.

Attribution

All members collaborated equally