Trustworthiness of New York City Subway System

Author

Ernest Giahyue, Lawrenz Pacayo, Sean Smolyanskiy, Yusuf Ozaydin

Published

December 7, 2025

Introduction:

Millions of New Yorkers rely on public transportation. Because of this, the transit system must be safe and reliable; however, public safety in the MTA has been a problem affecting New Yorkers. This project examines patterns of crime across NYC, with a particular focus on crime in the subway system, subway-related incidents such as delays, and how these factors may be connected. We analyze overall citywide crime trends, explore whether subway incidents correlate with higher crime levels, whether they relate to street-level crime, and map where subway shootings have occurred to understand their geographic distribution and impact. Additionally, we compare our findings with traffic accident data to consider driving as a safer alternative to the metro. Overall, our analysis shows that subway incidents are relatively rare, typically under 100 per month, and mostly mechanical. We find only a weak relationship between delays and subway crime, suggesting that incidents do not necessarily signal unsafe conditions. However, stations in higher-crime neighborhoods tend to experience more transit-related crime, indicating that subway safety often reflects broader community patterns. Finally, when comparing transit safety to driving, the data suggest that waiting for a train is generally safer than driving.

Research Questions:

How does the MTA’s safety and reliability compare to the city’s overall safety? Beyond that, how often do incidents and delays happen? Is there a correlation between incidents and subway crime? Are certain boroughs safer than others? Is driving a safer alternative to the subway?

Discuss Your Data Sources:

MTA Subway Major Incidents

Data Link

The data used for this analysis is subway incidents, including mechanical and electrical failures, over time. The data was collected from data.gov and is also the original source of the data. Data.gov provides publicly accessible pre-processed data that will be used for this project. The original publisher of the data is NY Open data. The original data is by the city of New York and collected by MTA every time an incident had to be addressed in the subway system that caused 50 or more trains to be delayed. The data.gov link provides access to the original source. The only data that may be missing from this data set are mechanical incidents in the subway that have been left unnoticed and unresolved. Additionally, Incidents that caused less than 50 trains to be delayed may not have been reported on this data set. The data may be biased because some subway stations may be better maintained and therefore have more problems identified and reported, but overall the dataset is credible.

MTA Major Felonies

Data Link

The dataset used for this analysis is the MTA Major Felonies dataset, which contains monthly counts of arrests for major felony offenses occurring on property operated by the Metropolitan Transportation Authority (MTA), including NYCT subways, the Staten Island Railway, Metro-North Railroad, and the Long Island Rail Road. Major felonies include murder, rape, robbery, felony assault, burglary, grand larceny, and grand larceny auto, as defined by New York State Penal Law. The data are compiled monthly by two law enforcement agencies: the NYPD Transit Bureau for NYCT property and the MTA Police Department for the remaining transit systems. If multiple offenses occur in a single incident, only the most serious felony is recorded, and attempted crimes are generally classified as completed offenses. This dataset is published as part of the MTA’s open data transparency initiative and may be revised slightly if crimes are reported after the monthly cutoff. The file used in this project is a direct public release from the MTA and has not undergone third-party preprocessing beyond standard publication formatting. In addition to felony counts, the dataset includes a standardized metric, Crimes per Million Riders, calculated from monthly ridership totals. Some data may be missing due to delayed reporting or because the dataset reflects arrests rather than all crimes committed. Grand larceny auto is excluded from NYCT because vehicles are not present in subway station areas. Potential biases include variation in enforcement practices across transit systems and differences in ridership volume, both of which influence arrest counts and crime rate calculations. A complete data dictionary for the dataset is provided in the appendix, including variable names, data types, and descriptions for all key fields: Month, Agency, Police Force, Felony Type, Felony Count, and Crimes per Million Riders.

Traffic Accidents

The dataset used for this analysis is the NYC Motor Vehicle Collisions – Crashes dataset, which contains detailed records of traffic crashes in New York City. It is published through the NYC Open Data portal by the New York City Police Department (NYPD). It documents the date, time, location, contributing factors, and number of injuries and fatalities associated with each reported collision. The source and dataset access page is: Data Link The NYPD collects data at the scene of each reported motor vehicle collision using standardized police accident reports. These reports are completed when officers respond to crashes involving injury, death, or significant property damage, and the records are subsequently entered into the city’s open data system. The file used in this project is a direct public release from NYC Open Data and has not undergone third-party preprocessing beyond standard formatting for public access. Some data may be missing due to unreported minor collisions, incomplete officer reports, or imprecise geocoding. Potential biases include underreporting in low-severity crashes and variation in reporting practices by location or time of day. A complete data dictionary describing all variables used in this dataset is provided in the appendix.

City Wide Crime

Data Link

The second data set used is Citywide Seven Major Felony Offenses from 2000 to 2024. The URL below returned several versions of the data, but the Excel version provided a more straightforward analysis of the number and types of incidents from 2000 to 2024. It was published by the NYPD via the “Historical Crime Data” page. The table reports annual citywide counts for the seven major felony categories plus the “Total Seven Major Felony Offenses” row for each year. The table is compiled and released by the NYPD, and all of these are internal NYPD data systems that capture complaint, arrest, and follow-up information. The dataset covers calendar years from 2000 to 2024, but for my chart, I narrowed it to 2019-2024 to align with the most recent period shown in my MTA chart. These incidents occur across all precincts and boroughs and include all NYPD-reported index crimes in the seven categories, not just those in transit. There weren’t any blank cells, indicating no missing data. A potential bias in this dataset could stem from changes in law, NYPD policy, or coding rules, which can alter the reported numbers even if the underlying behavior remains constant.

New York City Map (Subway Stations and Shootings)

Shootings Link

Stations Link The MTA Subway Stations dataset and the NYPD Shooting Incident Data. Combining these sources provides the geographic station points and the incident-level shooting records needed to examine the relationship between subway infrastructure and nearby violent incidents within a 40-meter radius of each station. The first dataset is from the Metropolitan Transportation Authority (MTA) and includes the location and attribute information for every subway station in New York City. This was downloaded from the official MTA Open Data Portal, labeled “MTA_Subway_Stations.csv”. Contains fields labeled as latitude, longitude, station names, lines served, and borough identifiers. The dataset is maintained by the MTA’s Geographic Information System (GIS) and transit planning teams, who record and update the station points through the GIS systems to reflect changes in service patterns, infrastructure updates, or newly opened stations. This dataset covers stations across all five boroughs and is periodically updated to support public transit planning, mapping tools, and operational transparency. The dataset was exported directly with no third-party modifications; however, we filtered out any latitudes or longitudes that were missing. This dataset is mostly reliable; the only bias is that a single point represents stations. A single point representing a multi-platform station with multiple entrances and exits represents a central location rather than all access points. The second dataset used is the NYPD Shooting Incident Data, published through the NYC Open Data portal. This dataset, labeled as NYPD_Shooting_Incident_Data_Historic.csv, includes every confirmed shooting incident recorded in New York City from 2006 to the present. Each row represents an individual shooting and includes variables such as the approximate location, date, and time, victim and perpetrator characteristics, and police jurisdiction information. The NYPD collects this data through official police reports, which also include the coordinates based on the address. The file used in this project was taken directly from the portal with no modifications. This dataset holds potential biases; some locations are imprecise due to geocoding limitations. The dataset only includes shootings with confirmed victims; if shots were fired and no one got hurt, it would not be recorded in the data. Despite these limitations, the dataset remains a comprehensive, publicly available source of gun violence data in the city.

Analysis

Subway Crime Data

Code
library(tidyverse)
library(lubridate)
library(ggrepel)

mta_final_crime <- felonies_df %>% clean_names()

top_offenses <- mta_final_crime %>%
  mutate(date = mdy(month)) %>%
  filter(year(date) <= 2024) %>%
  count(felony_type, wt = felony_count, name = "total") %>%
  slice_max(total, n = 5) %>%
  pull(felony_type)

mta_by_offense_month <- mta_final_crime %>%
  mutate(
    date       = mdy(month),
    year_month = floor_date(date, unit = "month"),
    offense    = if_else(felony_type %in% top_offenses,
                         felony_type,
                         "Other")
  ) %>%
  filter(
    year(date) <= 2024,
    offense != "Other"
  ) %>%
  group_by(year_month, offense) %>%
  summarise(
    total = sum(felony_count, na.rm = TRUE),
    .groups = "drop"
  )

# pick the x-position for labels
label_date <- as.Date("2023-06-01")

# IMPORTANT: rebuild label_points from mta_by_offense_month
label_points <- mta_by_offense_month %>%
  group_by(offense) %>%
  mutate(dist = abs(as.numeric(year_month - label_date))) %>%
  slice_min(dist, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  select(year_month, total, offense, dist)




ggplot(mta_by_offense_month, aes(x = year_month, y = total, color = offense)) +
  geom_line() +
  geom_text_repel(
    data = label_points,
    aes(x = year_month, y = total, label = offense),
    nudge_x = 90,
    direction = "y",
    hjust = 0,
    segment.color = NA,
    show.legend = FALSE
  ) +
  scale_x_date(
    date_labels = "%Y",
    date_breaks = "1 year",
    expand = c(0, 0)
  ) +
  coord_cartesian(
    xlim = c(as.Date("2019-01-01"), as.Date("2024-12-31")),
    clip = "off"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.margin = margin(5.5, 80, 5.5, 5.5)
  ) +
  labs(
    title = "Monthly Major Felonies on the MTA by Offense Type",
    x = "Year",
    y = "Number of Incidents"
  )

The Monthly Major Felonies on the MTA by Offense Type displayed the number of incidents over time from 2019 to 2024. Shockingly, the number of incidents was extremely low, given that over 8 million people live in NYC. At first glance, you may think based on the plots that the felonies occur at a high rate, but compared to the citywide crime chart, this isn’t true. In fact, the highest number of incidents recorded was just above 150, which was for grand larceny. This takes place right before COVID-19 spiked, and as you can see in the visual, the number starts to decrease significantly after the COVID-19 outbreak. We then see a trend in which felony types rise again once residents return to in-person work and school, right around mid-2021. Based on the numbers, we can assume that riding the MTA in NYC is much safer than many think. Now, is it safer than other ways of transportation? Like driving, for example.

Felonies Per Million Riders

Code
chart2 <- ggplot(cleaned_felonies_df, aes(x = date, y = total_crimes_per_million)) +
    geom_line(color = "steelblue", size = 1) +
    #geom_point(color = "darkred", size = 2) +
    labs(
        title = "Total Crime per Million Riders Over Time",
        x = "Time",
        y = "Total Crime per Million Riders"
    ) +
    theme_minimal(base_size = 14)
chart2

This chart displays total major felony crime per one million subway riders over time, allowing crime levels to be viewed as a rate of risk rather than as raw counts alone. This adjustment is especially important during the COVID and post-COVID periods, when ridership levels fluctuated dramatically. The early spike in crime per million riders around 2020 reflects a period when ridership collapsed while felony counts declined more slowly, leading to a temporary increase in the per-rider crime rate. Following that disruption, the chart shows that crime per million riders generally stabilizes into a fluctuating but relatively moderate range, even as total ridership steadily increases in the post-pandemic years. While short-term spikes appear intermittently, there is no sustained upward trend in per-rider crime risk across the later period. This indicates that the individual likelihood of experiencing a major felony on the subway remains low relative to the volume of trips being taken each month. In the context of the project’s broader research question: “How does the MTA’s safety and reliability compare to the city’s overall safety?” This chart is especially revealing. Although subway crime often receives heightened public attention, the per-rider rate illustrates that subway travel exposes individual riders to a comparatively low level of serious criminal risk. When viewed alongside the earlier charts showing vastly larger volumes of citywide crime and far higher injury risk from driving, this normalized crime rate reinforces the conclusion that the subway remains one of the safer high-capacity transportation modes in New York City when risk is evaluated per trip rather than through raw incident counts alone.

Total Subway Incidents

Code
chart3 <- ggplot(df_summary, aes(x = as.Date(month), y = total_incidents)) +
    geom_line(color = "steelblue", size = 1) +
    facet_wrap(~ category, scales = "free_y") +
    labs(
        title = "Incident Trends by Category",
        x = "Time",
        y = "Incidents per Month"
    ) +
    theme_light()
chart3

This figure breaks down monthly MTA operational incidents by infrastructure category (Signals, Subway Car, and Track) over time, providing a more detailed view of how different components of the transit system contribute to overall reliability issues. Rather than viewing incidents as a single aggregate, this chart reveals how distinct sources of failure evolve separately across the system. Signal-related incidents consistently represent the largest and most volatile source of monthly disruptions, fluctuating at a relatively high baseline throughout the entire period and exhibiting frequent sharp spikes. These swings suggest that the signal system remains a persistent challenge to system reliability and is likely a major driver of day-to-day service delays riders experience. This volatility reinforces the idea that even when total incidents appear stable, passengers may still experience highly unpredictable delays linked to signaling failures. Subway car incidents remain relatively low through most of the early period but show a clear upward trend in the later years, culminating in a sharp spike near the end of the timeline. This pattern suggests that rolling stock reliability has become an increasing concern in recent years, potentially reflecting fleet aging, deferred maintenance, or operational strain as ridership rebounded after COVID. In contrast, track-related incidents show a gradual but steady increase over time, indicating growing infrastructure stress at the physical system level as well. Together, these trends directly address the research question: “How often do incidents and delays happen?” and deepen the analysis by identifying which types of infrastructure failures are most responsible for those disruptions. The fact that multiple categories, especially signals and track, show upward movement supports the conclusion that system reliability challenges are structural rather than isolated, and that improving subway reliability will require targeted investments across multiple layers of the transit network rather than a single-point solution.

Crime and Incidents Correlation

Code
# ============================
# 5. Final Scatterplot (NO COVID) + Save
# ============================

# ---- Correlation value ----
r_val <- round(cor(ci_scatter$incidents, ci_scatter$crime, use = "complete.obs"), 2)

# ---- Fit regression for precise right-edge annotation ----
fit <- lm(crime ~ incidents, data = ci_scatter)
x_r  <- max(ci_scatter$incidents, na.rm = TRUE)
y_r  <- predict(fit, newdata = data.frame(incidents = x_r))

# ---- Build plot object ----
p_scatter_polished <- ggplot(ci_scatter, aes(x = incidents, y = crime)) +
    geom_point(size = 3, alpha = 0.85) +
    geom_smooth(method = "lm", se = FALSE, linewidth = 1, color = "black") +
   
    # ---- r-value at far right of regression line ----
annotate(
    "text",
    x = x_r,
    y = y_r,
    label = paste0("r = ", r_val),
    hjust = 1.1,    # pull slightly left from the edge
    vjust = -0.6,   # lift slightly above the line
    size = 5,
    fontface = "bold",
    color = "black"
) +
   
    labs(
        title = "Relationship Between Operational Incidents and Felony Crime (NYCT)",
        subtitle = "Each point represents one month; black line shows linear trend",
        x = "Operational Incidents (Monthly Total)",
        y = "Felony Crime (Monthly Total)"
    ) +
   
    theme_minimal(base_size = 14) +
    theme(
        plot.title = element_text(face = "bold", size = 18),
        plot.subtitle = element_text(size = 12),
        legend.position = "none",
        panel.grid.minor = element_blank()
    )
p_scatter_polished

This scatterplot examines the relationship between monthly operational incidents on the MTA and monthly felony crime counts on NYCT property, with each point representing one month. The fitted regression line shows a positive but weak correlation between incidents and crime (r = 0.23), indicating that months with more delays and infrastructure-related incidents tend to be associated with slightly higher reported felony counts, though the relationship is not strong.

In the context of our research question, this chart suggests that operational breakdowns alone do not appear to be a dominant driver of subway crime levels. While the upward slope suggests an association, the wide vertical spread of points indicates substantial variability in crime even during months with similar incident totals. This indicates that other social, economic, and ridership-related factors likely play a larger role in shaping subway crime patterns than delays by themselves.

This chart also contributes to the broader comparison of MTA safety versus overall city safety. If incidents were a primary cause of increased subway crime, we would expect a much tighter relationship. Instead, the weak correlation suggests that subway reliability issues and crime risk are only loosely connected, reinforcing the need to examine additional factors such as borough-level patterns, ridership volume, and street-level crime trends. These findings motivate the next stages of analysis, including geographic mapping of shootings and comparison with citywide motor vehicle accident data to evaluate whether driving represents a safer alternative to public transit.

Subway Shootings Map

Code
# ----------------------------------------------------------
# SHOOTINGS & STATIONS AS sf
# ----------------------------------------------------------
shoot_sf <- shoot %>%
  filter(!is.na(Latitude), !is.na(Longitude)) %>%
  st_as_sf(coords = c("Longitude", "Latitude"),
           crs = 4326, remove = FALSE)

stations_sf <- mta_locations %>%
  filter(!is.na(`GTFS Latitude`), !is.na(`GTFS Longitude`)) %>%
  st_as_sf(coords = c("GTFS Longitude", "GTFS Latitude"),
           crs = 4326, remove = FALSE)

# ----------------------------------------------------------
# STATION BUFFERS & SUBWAY SHOOTINGS (within 40m)
# ----------------------------------------------------------
shoot_3857    <- st_transform(shoot_sf, 3857)
stations_3857 <- st_transform(stations_sf, 3857)

stations_buf <- st_buffer(stations_3857, dist = 40)

shootings_subway_sf <- st_join(
  shoot_3857,
  stations_buf,
  join = st_within,
  left  = FALSE
)

# counts per station for subway shootings
shoot_counts <- shootings_subway_sf %>%
  st_drop_geometry() %>%
  group_by(`Stop Name`, Line, `Daytime Routes`, Borough) %>%
  summarise(shooting_count = n(), .groups = "drop")

stations_with_counts <- stations_sf %>%
  left_join(
    shoot_counts,
    by = c("Stop Name", "Line", "Daytime Routes", "Borough")
  ) %>%
  mutate(
    shooting_count = ifelse(is.na(shooting_count), 0L, shooting_count)
  )

# ----------------------------------------------------------
# STREET SHOOTINGS (NOT NEAR SUBWAYS)
# ----------------------------------------------------------
street_shoot_sf <- shoot_sf %>%
  filter(!(INCIDENT_KEY %in% shootings_subway_sf$INCIDENT_KEY))

boro_stats <- street_shoot_sf %>%
  st_drop_geometry() %>%
  count(BORO, name = "street_shootings")

# map BORO names to county names
boro_map <- c(
  "BRONX"         = "Bronx",
  "BROOKLYN"      = "Kings",
  "MANHATTAN"     = "New York",
  "QUEENS"        = "Queens",
  "STATEN ISLAND" = "Richmond"
)

boro_stats <- boro_stats %>%
  mutate(COUNTY_NAME = recode(BORO, !!!boro_map))

# ----------------------------------------------------------
# NYC BOROUGHS + JOIN STREET SHOOTING COUNTS
# ----------------------------------------------------------
ny_boroughs <- counties(state = "NY", cb = TRUE, class = "sf") %>%
  filter(NAME %in% c("New York", "Kings", "Bronx", "Queens", "Richmond"))

ny_boroughs_shoot <- ny_boroughs %>%
  left_join(boro_stats, by = c("NAME" = "COUNTY_NAME")) %>%
  # NEW: if a borough has no street shootings, set to 0 (e.g. Staten Island)
  mutate(street_shootings = ifelse(is.na(street_shootings), 0L, street_shootings))

# we'll use this name in leaflet instead of boroughs_geo
boroughs_geo <- ny_boroughs_shoot

# ----------------------------------------------------------
# COLOR PALETTES
# ----------------------------------------------------------
# Stations: gray for 0, strong reds for 1+
nonzero_counts <- stations_with_counts$shooting_count[
  stations_with_counts$shooting_count > 0
]

pal_station <- colorNumeric(
  palette = colorRampPalette(c(
    "#FF7B7B",  # light red
    "#FF3B3B",  # solid red
    "#D71414",  # strong red
    "#8B0000"   # deep dark red
  ))(200),
  domain = nonzero_counts
)

zero_color <- "#A6AEC0"   # more visible bluish-gray for 0-shooting stations

# Boroughs: blue gradient for street shootings
pal_boro <- colorNumeric(
  palette = "Blues",
  domain  = boroughs_geo$street_shootings
)

# ----------------------------------------------------------
# LEAFLET MAP (WITH GROUP-CONTROLLED LEGENDS)
# ----------------------------------------------------------
map <- leaflet() %>%
  # addTiles() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%

  # Blue borough borders + blue fill by street shootings
  addPolygons(
    data = boroughs_geo,
    fillColor = ~pal_boro(street_shootings),
    fillOpacity = 0.45,
    weight = 3,
    color = "#0A1D4D",   # dark navy blue border
    group = "Boroughs",
    label = ~paste0(NAME, ": ", street_shootings, " street shootings"),
    highlightOptions = highlightOptions(
      weight = 4,
      color = "#000000",
      fillOpacity = 0.6,
      bringToFront = TRUE
    )
  ) %>%

  # Station dots colored by subway shootings
  addCircleMarkers(
    data = stations_with_counts,
    radius = 6,
    fillColor = ~ifelse(
      shooting_count == 0,
      zero_color,
      pal_station(shooting_count)
    ),
    color = NA,
    stroke = FALSE,
    fillOpacity = 0.9,
    popup = ~paste0(
      "<b>", `Stop Name`, "</b><br>",
      "Lines: ", `Daytime Routes`, "<br>",
      "Borough: ", Borough, "<br>",
      "Subway shootings within 40m: ", shooting_count
    ),
    group = "Stations"
  ) %>%

  # Legend for borough street shootings (blue)
  addLegend(
    position  = "bottomright",
    pal       = pal_boro,
    values    = boroughs_geo$street_shootings,
    title     = "Street shootings per borough",
    opacity   = 0.8,
    className = "legend-boroughs"
  ) %>%

  # Legend for station subway shootings (red)
  addLegend(
    position  = "bottomleft",
    pal       = pal_station,
    values    = nonzero_counts,
    title     = "Subway shootings (within 40m)",
    opacity   = 0.8,
    className = "legend-stations"
  ) %>%

  addLayersControl(
    overlayGroups = c("Boroughs", "Stations"),
    options = layersControlOptions(collapsed = FALSE)
  )

# ----------------------------------------------------------
# JS: HIDE/SHOW LEGENDS WITH THEIR GROUPS
# ----------------------------------------------------------
htmlwidgets::onRender(
  map,
  "
  function(el, x) {
    var map = this;
    var boroughLegend = document.querySelector('.legend-boroughs');
    var stationLegend = document.querySelector('.legend-stations');

    // start visible (since both groups are checked by default)
    if (boroughLegend) boroughLegend.style.display = 'block';
    if (stationLegend) stationLegend.style.display = 'block';

    map.on('overlayadd', function(e) {
      if (e.name === 'Boroughs' && boroughLegend) {
        boroughLegend.style.display = 'block';
      }
      if (e.name === 'Stations' && stationLegend) {
        stationLegend.style.display = 'block';
      }
    });

    map.on('overlayremove', function(e) {
      if (e.name === 'Boroughs' && boroughLegend) {
        boroughLegend.style.display = 'none';
      }
      if (e.name === 'Stations' && stationLegend) {
        stationLegend.style.display = 'none';
      }
    });
  }
  "
)

One of the central questions in this project is whether subway-related crime reflects broader patterns of violence in the neighborhoods surrounding each station. To dive deep, I created a map that overlays two key datasets: borough-level street shootings and station-level subway shootings within a 40-meter radius. To construct the station layer, I first counted the number of shootings near each subway station and assigned each count to a color category using a dictionary that mapped specific value ranges to different shades of red. After rejoining these categorized values to the station dataset, I plotted the stations as red points, with intensity increasing with the number of incidents recorded around them. Stations with zero shootings were kept visible with a very light shade so viewers could clearly distinguish them from stations with one or more incidents. To avoid interpreting station-level incidents without context, I also added a borough-level gradient representing total street shootings across the city. Boroughs are shaded from light to dark blue based on the number of street-level shootings within their boundaries. This blue gradient does not represent subway crime, but instead illustrates the broader landscape of violence that residents experience at the neighborhood level. By layering red station markers on top of the blue borough shading, the map allows us to compare transit-specific shootings with the general concentration of violent incidents across New York City. The map reveals several important patterns that help address the project’s research question. Subway shootings are heavily concentrated in areas that also experience high levels of street violence, particularly Northern Manhattan and central Brooklyn, where both the red dots and the borough shading darken noticeably. This suggests that subway-related shootings are not isolated events; instead, they tend to occur in neighborhoods already struggling with elevated crime rates. At the same time, some stations show moderate or high subway shooting counts despite being located in boroughs with lower overall street-level violence. These outliers raise new questions about whether certain stations face localized safety challenges independent of their surrounding neighborhoods. Overall, the visualization demonstrates that subway crime cannot be understood solely within the transit system itself. Instead, subway shootings appear deeply connected to the broader patterns of crime across the city, reinforcing the idea that public transportation safety is intertwined with neighborhood-level conditions, social dynamics, and demand on the transit system. This chart, therefore, both supports and complicates the initial research question by showing that subway delays, incidents, and crime likely emerge from a combination of system-specific issues and the broader urban environment in which stations operate.

Overall NYC Crime

Code
label_year <- 2022

label_points_year <- crime_long %>%
  group_by(offense) %>%
  mutate(dist = abs(year - label_year)) %>%
  slice_min(dist, n = 1, with_ties = FALSE) %>%
  ungroup()

ggplot(crime_long, aes(x = year, y = total, color = offense)) +
  geom_line() +
  geom_point() +
  geom_text_repel(
    data = label_points_year,
    aes(x = year, y = total, label = offense),
    nudge_x = 0.3,
    direction = "y",
    hjust = 0,
    segment.color = NA,
    show.legend = FALSE
  ) +
  scale_x_continuous(
    breaks = 2019:2024,
    limits = c(2019, 2024),
    expand = c(0, 0)
  ) +
  coord_cartesian(clip = "off") +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.margin = margin(5.5, 80, 5.5, 5.5)
  ) +
  labs(
    title = "Seven Major Felony Offenses in NYC, 2019–2024",
    x = "Year",
    y = "Number of Incidents"
  )

The citywide crime visualization provides context, showing that crime is occurring more often outside subway stations. The MTA accounts for only a small slice of the city’s seven major felonies. So we can assume that one of the random major felonies is far more likely to occur somewhere in the city than in a subway station. Taken together, the two figures show that crime on the subway largely mirrors citywide patterns but at a much smaller scale. In the citywide chart, grand larceny is by far the most common of the seven major felonies each year from 2019 to 2024, followed by felony assault, with robbery, burglary, and grand larceny of a motor vehicle in the mid-range, and murder and rape much lower in absolute numbers. The MTA chart shows the same ranking of offenses within the transit system: grand larceny is consistently the most frequent subway felony, felony assault is second, and robbery and burglary form a middle tier, with rape and murder making up a small share of incidents. This suggests that the types of serious crime occurring on transit reflect the broader mix of serious crime in the city as a whole, rather than being dominated by a completely different offense profile. However, the scales of the two charts highlight that only a small fraction of the seven major felonies occur on the MTA. Citywide annual counts for grand larceny and felony assault reach into the tens of thousands, while monthly counts on the MTA are in the tens to low hundreds per offense, even before aggregating to annual totals. In other words, most serious crimes in New York City happen outside the transit system, on streets, in homes, and in businesses, rather than inside subway stations or trains. Per ride, major felonies on the subway are relatively rare compared with the total number of trips, even though they receive outsized attention in the media and public discourse. These comparisons position the subway as one crucial, but numerically small, component of the overall serious crime landscape that New Yorkers experience.

Traffic Accidents

Code
ggplot(cleaned_crashes, aes(x = month)) +
    geom_area(aes(y = total_casualties),
              alpha = 0.5, fill = 'red2', color = 'red') +   # semi-transparent casualties area
    geom_line(aes(y = total_crashes)) +
    labs(
        title   = "NYC Car Crashes and Casualties Per Month (post covid)",
        x       = "Time",
        y       = "Count",
        caption = "Line = total crashes; shaded area = total casualties (injured + killed)"
    ) +
    theme_minimal()

To evaluate whether driving is a safer alternative to the subway, this chart shows monthly totals of motor vehicle crashes and total casualties (injuries + fatalities) in New York City during the post-COVID period. The black line represents the number of crashes per month, while the shaded red area captures the number of people physically harmed in those crashes. Together, these two measures provide a direct view of both the frequency and the severity of driving-related risk over time. The visualization reveals that thousands of traffic crashes occur every month, with casualty counts consistently in the several-thousand range, even after pandemic-related disruptions subsided. While there is month-to-month variation, the broader pattern shows that driving exposes residents to a persistent and substantial risk of physical injury or death, even during periods when crash totals decline slightly. This level of harm contrasts sharply with subway crime figures, where major felonies occur at much lower absolute levels relative to the number of trips taken. In the context of our broader research question: “Is driving a safer alternative to the subway?” This chart suggests that driving is far more likely to cause physical harm than riding the subway. While subway safety debates often focus on crime incidents, this comparison shows that transportation risk is not solely criminal but also physical. That car travel produces far more injuries than subway travel. When viewed alongside earlier charts showing relatively low monthly subway felony counts, this evidence challenges the perception that switching from public transit to driving necessarily improves personal safety.

Conclusion

Across our analyses, we find that MTA reliability challenges and subway crime risk tell very different stories about safety in New York City. Mechanical incident data show that signal failures remain the most frequent and volatile source of disruptions, while subway car and track problems have increased steadily in recent years, suggesting growing infrastructure strain. However, the correlation between operational incidents and subway crime is weak, indicating that system breakdowns alone do not meaningfully drive serious crime levels. Comparisons between citywide and transit felony trends show that only a small fraction of New York City’s major crimes occur on the subway, even though the offense types on transit mirror the broader city pattern. The station-shootings map further shows that subway shootings are largely concentrated in neighborhoods already experiencing higher street-level violence, reinforcing that subway crime reflects broader community conditions rather than isolated transit-system failures. When safety is evaluated per trip instead of by raw counts, the subway appears even safer. Crime per million riders remains relatively low and stable in the post-COVID period, even as ridership recovers, while traffic crash and casualty data show that driving consistently produces thousands of injuries and deaths each month. Taken together, these results directly challenge the perception that driving is a safer alternative to public transit. While subway reliability remains a legitimate concern tied to aging infrastructure, our findings show that the subway remains one of the safest high-capacity transportation modes in New York City when risk is properly normalized by ridership. Future research could be strengthened by incorporating socioeconomic neighborhood data, station-level crowding, police response times, and near-miss traffic exposure metrics to better understand how safety, infrastructure stress, and public perception interact across transportation modes.

Appendix

MTA Subway Major Incidents

Variable Name Description
month Date of incident.
Division Operational division of the subway system where the incident occurred.
Line Subway line on which the incident occurred.
Day_type Indicates type of day: 1 = weekday or workday; 2 = holiday or weekend.
Category Type of incident (e.g., signals, subway car, track).
Count Number of incidents reported for that combination of month, division, line, day_type, and category.
Cat_group Indicates whether the incident is infrastructure-related or not.

MTA Major Felonies

Variable Name Data Type Description
Month Date The month in which the felony metrics are calculated, recorded as the first day of each month (MM-DD-YYYY).
Agency Text MTA operating agency (NYCT, SIR, MNR, LIRR).
Police Force Text Law enforcement agency responsible for reporting (MTAPD or NYPD).
Felony Type Text The classification of the major felony offense.
Felony Count Numeric The total number of felony arrests recorded for that category during the month.
Crimes per Million Riders Numeric Monthly felony rate standardized by ridership volume.

Traffic Accidents

Variable Name Data Type Description
CRASH DATE Date The calendar date on which the motor vehicle collision occurred.
NUMBER OF PERSONS INJURED Numeric (Integer) The total number of individuals who sustained non-fatal injuries as a result of the collision.
NUMBER OF PERSONS KILLED Numeric (Integer) The total number of individuals who died as a result of the collision.

Citywide Crime

Variable name Type Description
seven_major_felony_offenses Text Offense label (e.g., “MURDER & NON-NEGL. MANSLAUGHTER”, “RAPE”, “ROBBERY”, “FELONY ASSAULT”, “BURGLARY”, “GRAND LARCENY”, “GRAND LARCENY OF MOTOR VEHICLE”, “TOTAL SEVEN MAJOR FELONY OFFENSES”).
2000, 2001, …, 2024 (in Excel: columns like x2, x3, etc.) Date Annual citywide count of reported offenses in that category and year. These are derived from NYPD complaint and homicide databases as described in the PDF statistical notes.

MTA Subway Station

Variable Name Description
Station ID Unique station identifier
Complex ID Identifier grouping station complexes
GTFS Stop ID Stop ID used in MTA GTFS feeds
Stop Name Station name as displayed to riders
Borough Borough where the station is located
GTFS Latitude Station latitude
GTFS Longitude Station longitude
Daytime Routes Train routes stopping at this station during daytime service
Structure Elevated, surface, subway, etc.
Line Line or grouping the station belongs to

NYPD Shooting Data

Variable Name Description
INCIDENT_KEY Unique NYPD shooting identifier
BORO Borough where incident occurred
PRECINCT Precinct reporting the incident
OCCUR_DATE Date of shooting
OCCUR_TIME Time of shooting
STATISTICAL_MURDER_FLAG Whether the incident involved a homicide
PERP_AGE_GROUP Age group of perpetrator
PERP_SEX Gender of perpetrator
PERP_RACE Race of perpetrator
VIC_AGE_GROUP Age group of victim
VIC_SEX Gender of victim
VIC_RACE Race of victim
Latitude Geocoded latitude
Longitude Geocoded longitude
LOCATION_DESC Location category (e.g., outside, inside)
X_COORD_CD, Y_COORD_CD NYC State Plane coordinates
JURISDICTION_CODE Jurisdiction handling the case
Code
# Load libraries and settings here
library(tidyverse)
library(here)
library(readr)
library(fs)
library(janitor)
library(stringr)
library(lubridate)
library(dplyr)
library(ggplot2)
library(ggrepel)
library(leaflet)
library(tigris)
library(htmlwidgets)
library(sf)
library(readxl)


knitr::opts_chunk$set(
  warning = FALSE,
  message = FALSE,
  comment = "#>",
  fig.path = "figs/", # Folder where rendered plots are saved
  fig.width = 7.252, # Default plot width
  fig.height = 4, # Default plot height
  fig.retina = 3 # For better plot resolution
)

# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))

# Write code below here to load any data used in project


cand <- fs::dir_ls(here("data_raw"), regexp = "MTA_Subway_Major_Incidents.*\\.csv$")
cand
mta_subway_incidents <- read_csv(cand[1], show_col_types = FALSE)

data_path_felonies <- here::here('data_raw','MTA_Major_Felonies.csv')
felonies_df <- readr::read_csv(data_path_felonies)

options(tigris_use_cache = TRUE)
options(repos = c(CRAN = "https://cloud.r-project.org"))

# ----------------------------------------------------------
# READ DATA
# ----------------------------------------------------------
mta_locations<- read_csv(here('data_raw',"MTA_Subway_Stations.csv"))
shoot <- read_csv(here('data_raw',"NYPD_Shooting_Incident_Data__Historic_.csv"))

url <- "https://www.nyc.gov/assets/nypd/downloads/excel/analysis_and_planning/historical-crime-data/seven-major-felony-offenses-2000-2024.xls"

tmp <- tempfile(fileext = ".xls")
download.file(url, destfile = tmp, mode = "wb")

crime_raw <- read_xls(tmp)

data_path_crashes <- here::here('data_raw','Motor_Vehicle_collisions_-_Crashes_20251206.csv')
crashes_df <- readr::read_csv(data_path_crashes)





#cleaning work

mta_clean_incidents <- mta_subway_incidents %>%
    mutate(cat_group = dplyr::case_when(
        category %in% "Track"      ~ "Infrastructure",
        category %in% "Signals"    ~ "Infrastructure",
        category %in% "Subway Car" ~ "Infrastructure",
        TRUE                       ~ "Other"
    )) %>% 
    filter(cat_group == 'Infrastructure')


# Aggregate counts by month and category
df_summary <- mta_clean_incidents %>%
    group_by(month, category) %>%
    summarise(total_incidents = sum(count), .groups = "drop")



#MTA_felonies

cleaned_felonies_df <- felonies_df %>% 
    janitor::clean_names() %>% 
    mutate(month = mdy(month)) %>% 
    group_by(month) %>% 
    mutate(total_crimes_per_million = sum(crimes_per_million_riders, na.rm = TRUE)) %>% 
    ungroup() %>% 
    rename(date = month)

mta_final_crime <- felonies_df %>% clean_names()
top_offenses <- mta_final_crime %>%
  mutate(date = mdy(month)) %>%
  filter(year(date) <= 2024) %>%                # ✅ stop at 2024
  count(felony_type, wt = felony_count, name = "total") %>%
  slice_max(total, n = 5) %>%                   # adjust n for more/fewer lines
  pull(felony_type)

mta_by_offense_month <- mta_final_crime %>%
  mutate(
    date       = mdy(month),
    year_month = floor_date(date, unit = "month"),
    offense    = if_else(felony_type %in% top_offenses,
                         felony_type,
                         "Other")
  ) %>%
  filter(
    year(date) <= 2024,                         # ✅ stop at 2024
    offense != "Other"                          # drop the "Other" line entirely
  ) %>%
  group_by(year_month, offense) %>%
  summarise(
    total = sum(felony_count, na.rm = TRUE),
    .groups = "drop"
  )

# 2. Choose a point on the x-axis where gaps between lines are bigger
label_date <- as.Date("2023-06-01")

label_points <- mta_by_offense_month %>%
  group_by(offense) %>%
  mutate(dist = abs(as.numeric(year_month - label_date))) %>%
  filter(dist == min(dist)) %>%
  ungroup()




# ============================
# 1. Monthly Crime Totals (NYCT)
# ============================

crime_monthly <- felonies_df %>%
    filter(Agency == "NYCT") %>%
    mutate(month = mdy(Month)) %>%
    group_by(month) %>%
    summarise(crime = sum(`Felony Count`, na.rm = TRUE), .groups = "drop")

# ============================
# 2. Monthly Incident Totals
# ============================

incidents_monthly <- mta_clean_incidents %>%
    group_by(month) %>%
    summarise(incidents = sum(count, na.rm = TRUE), .groups = "drop")

# ============================
# 3. Merge Clean Data
# ============================

ci_scatter <- full_join(crime_monthly, incidents_monthly, by = "month") %>%
    filter(!is.na(crime), !is.na(incidents))

# ============================
# 4. Correlation Value
# ============================

r_val <- round(cor(ci_scatter$incidents, ci_scatter$crime, use = "complete.obs"), 2)

crime_clean <- crime_raw %>%
  clean_names() %>%   # optional but usually helpful
  drop_na()           # removes any row that has at least one NA
year_lookup <- tibble(
  col  = names(crime_clean)[-1],               # x2, x3, ...
  year = as.integer(as.vector(crime_clean[1, -1]))
)

# --- 3. Drop the first row (the "OFFENSE / years" header row) ---
#     and make all x* columns the same type (character) to avoid pivot_longer error
crime_no_headerrow <- crime_clean[-1, ] %>%
  mutate(across(-seven_major_felony_offenses, as.character))

# --- 4. Tidy: one row per (offense, year) ---
crime_long <- crime_no_headerrow %>%
  pivot_longer(
    cols = -seven_major_felony_offenses,
    names_to  = "col",
    values_to = "total"
  ) %>%
  left_join(year_lookup, by = "col") %>%
  rename(offense = seven_major_felony_offenses) %>%
  mutate(
    total = parse_number(total),   # turn "32562" / "32,562" etc. into numeric
    year  = as.integer(year)
  ) %>%
  # keep 2019–2024, drop TOTAL row and any missing totals
  filter(
    year >= 2019, year <= 2024,
    offense != "TOTAL SEVEN MAJOR FELONY OFFENSES",
    !is.na(total)
  )

# --- 5. Choose a label year where there’s more separation between lines ---
label_year <- 2022

label_points <- crime_long %>%
  group_by(offense) %>%
  mutate(dist = abs(year - label_year)) %>%
  filter(dist == min(dist)) %>%
  ungroup()

cleaned_crashes <- crashes_df %>% 
    janitor::clean_names() %>% 
    mutate(
        crash_date = mdy(crash_date),
        month = floor_date(crash_date, "month"),
        casualties = number_of_persons_injured + number_of_persons_killed
    ) %>% 
    group_by(month) %>%
    summarise(
        total_crashes = n(),
        total_casualties = sum(casualties, na.rm = TRUE),
        .groups = "drop"
    ) %>% 
    filter(month < max(month)) %>% 
    filter(month >= as.Date("2020-04-01"))


library(tidyverse)
library(lubridate)
library(ggrepel)

mta_final_crime <- felonies_df %>% clean_names()

top_offenses <- mta_final_crime %>%
  mutate(date = mdy(month)) %>%
  filter(year(date) <= 2024) %>%
  count(felony_type, wt = felony_count, name = "total") %>%
  slice_max(total, n = 5) %>%
  pull(felony_type)

mta_by_offense_month <- mta_final_crime %>%
  mutate(
    date       = mdy(month),
    year_month = floor_date(date, unit = "month"),
    offense    = if_else(felony_type %in% top_offenses,
                         felony_type,
                         "Other")
  ) %>%
  filter(
    year(date) <= 2024,
    offense != "Other"
  ) %>%
  group_by(year_month, offense) %>%
  summarise(
    total = sum(felony_count, na.rm = TRUE),
    .groups = "drop"
  )

# pick the x-position for labels
label_date <- as.Date("2023-06-01")

# IMPORTANT: rebuild label_points from mta_by_offense_month
label_points <- mta_by_offense_month %>%
  group_by(offense) %>%
  mutate(dist = abs(as.numeric(year_month - label_date))) %>%
  slice_min(dist, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  select(year_month, total, offense, dist)




ggplot(mta_by_offense_month, aes(x = year_month, y = total, color = offense)) +
  geom_line() +
  geom_text_repel(
    data = label_points,
    aes(x = year_month, y = total, label = offense),
    nudge_x = 90,
    direction = "y",
    hjust = 0,
    segment.color = NA,
    show.legend = FALSE
  ) +
  scale_x_date(
    date_labels = "%Y",
    date_breaks = "1 year",
    expand = c(0, 0)
  ) +
  coord_cartesian(
    xlim = c(as.Date("2019-01-01"), as.Date("2024-12-31")),
    clip = "off"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.margin = margin(5.5, 80, 5.5, 5.5)
  ) +
  labs(
    title = "Monthly Major Felonies on the MTA by Offense Type",
    x = "Year",
    y = "Number of Incidents"
  )

chart2 <- ggplot(cleaned_felonies_df, aes(x = date, y = total_crimes_per_million)) +
    geom_line(color = "steelblue", size = 1) +
    #geom_point(color = "darkred", size = 2) +
    labs(
        title = "Total Crime per Million Riders Over Time",
        x = "Time",
        y = "Total Crime per Million Riders"
    ) +
    theme_minimal(base_size = 14)
chart2
chart3 <- ggplot(df_summary, aes(x = as.Date(month), y = total_incidents)) +
    geom_line(color = "steelblue", size = 1) +
    facet_wrap(~ category, scales = "free_y") +
    labs(
        title = "Incident Trends by Category",
        x = "Time",
        y = "Incidents per Month"
    ) +
    theme_light()
chart3
# ============================
# 5. Final Scatterplot (NO COVID) + Save
# ============================

# ---- Correlation value ----
r_val <- round(cor(ci_scatter$incidents, ci_scatter$crime, use = "complete.obs"), 2)

# ---- Fit regression for precise right-edge annotation ----
fit <- lm(crime ~ incidents, data = ci_scatter)
x_r  <- max(ci_scatter$incidents, na.rm = TRUE)
y_r  <- predict(fit, newdata = data.frame(incidents = x_r))

# ---- Build plot object ----
p_scatter_polished <- ggplot(ci_scatter, aes(x = incidents, y = crime)) +
    geom_point(size = 3, alpha = 0.85) +
    geom_smooth(method = "lm", se = FALSE, linewidth = 1, color = "black") +
   
    # ---- r-value at far right of regression line ----
annotate(
    "text",
    x = x_r,
    y = y_r,
    label = paste0("r = ", r_val),
    hjust = 1.1,    # pull slightly left from the edge
    vjust = -0.6,   # lift slightly above the line
    size = 5,
    fontface = "bold",
    color = "black"
) +
   
    labs(
        title = "Relationship Between Operational Incidents and Felony Crime (NYCT)",
        subtitle = "Each point represents one month; black line shows linear trend",
        x = "Operational Incidents (Monthly Total)",
        y = "Felony Crime (Monthly Total)"
    ) +
   
    theme_minimal(base_size = 14) +
    theme(
        plot.title = element_text(face = "bold", size = 18),
        plot.subtitle = element_text(size = 12),
        legend.position = "none",
        panel.grid.minor = element_blank()
    )
p_scatter_polished
# ----------------------------------------------------------
# SHOOTINGS & STATIONS AS sf
# ----------------------------------------------------------
shoot_sf <- shoot %>%
  filter(!is.na(Latitude), !is.na(Longitude)) %>%
  st_as_sf(coords = c("Longitude", "Latitude"),
           crs = 4326, remove = FALSE)

stations_sf <- mta_locations %>%
  filter(!is.na(`GTFS Latitude`), !is.na(`GTFS Longitude`)) %>%
  st_as_sf(coords = c("GTFS Longitude", "GTFS Latitude"),
           crs = 4326, remove = FALSE)

# ----------------------------------------------------------
# STATION BUFFERS & SUBWAY SHOOTINGS (within 40m)
# ----------------------------------------------------------
shoot_3857    <- st_transform(shoot_sf, 3857)
stations_3857 <- st_transform(stations_sf, 3857)

stations_buf <- st_buffer(stations_3857, dist = 40)

shootings_subway_sf <- st_join(
  shoot_3857,
  stations_buf,
  join = st_within,
  left  = FALSE
)

# counts per station for subway shootings
shoot_counts <- shootings_subway_sf %>%
  st_drop_geometry() %>%
  group_by(`Stop Name`, Line, `Daytime Routes`, Borough) %>%
  summarise(shooting_count = n(), .groups = "drop")

stations_with_counts <- stations_sf %>%
  left_join(
    shoot_counts,
    by = c("Stop Name", "Line", "Daytime Routes", "Borough")
  ) %>%
  mutate(
    shooting_count = ifelse(is.na(shooting_count), 0L, shooting_count)
  )

# ----------------------------------------------------------
# STREET SHOOTINGS (NOT NEAR SUBWAYS)
# ----------------------------------------------------------
street_shoot_sf <- shoot_sf %>%
  filter(!(INCIDENT_KEY %in% shootings_subway_sf$INCIDENT_KEY))

boro_stats <- street_shoot_sf %>%
  st_drop_geometry() %>%
  count(BORO, name = "street_shootings")

# map BORO names to county names
boro_map <- c(
  "BRONX"         = "Bronx",
  "BROOKLYN"      = "Kings",
  "MANHATTAN"     = "New York",
  "QUEENS"        = "Queens",
  "STATEN ISLAND" = "Richmond"
)

boro_stats <- boro_stats %>%
  mutate(COUNTY_NAME = recode(BORO, !!!boro_map))

# ----------------------------------------------------------
# NYC BOROUGHS + JOIN STREET SHOOTING COUNTS
# ----------------------------------------------------------
ny_boroughs <- counties(state = "NY", cb = TRUE, class = "sf") %>%
  filter(NAME %in% c("New York", "Kings", "Bronx", "Queens", "Richmond"))

ny_boroughs_shoot <- ny_boroughs %>%
  left_join(boro_stats, by = c("NAME" = "COUNTY_NAME")) %>%
  # NEW: if a borough has no street shootings, set to 0 (e.g. Staten Island)
  mutate(street_shootings = ifelse(is.na(street_shootings), 0L, street_shootings))

# we'll use this name in leaflet instead of boroughs_geo
boroughs_geo <- ny_boroughs_shoot

# ----------------------------------------------------------
# COLOR PALETTES
# ----------------------------------------------------------
# Stations: gray for 0, strong reds for 1+
nonzero_counts <- stations_with_counts$shooting_count[
  stations_with_counts$shooting_count > 0
]

pal_station <- colorNumeric(
  palette = colorRampPalette(c(
    "#FF7B7B",  # light red
    "#FF3B3B",  # solid red
    "#D71414",  # strong red
    "#8B0000"   # deep dark red
  ))(200),
  domain = nonzero_counts
)

zero_color <- "#A6AEC0"   # more visible bluish-gray for 0-shooting stations

# Boroughs: blue gradient for street shootings
pal_boro <- colorNumeric(
  palette = "Blues",
  domain  = boroughs_geo$street_shootings
)

# ----------------------------------------------------------
# LEAFLET MAP (WITH GROUP-CONTROLLED LEGENDS)
# ----------------------------------------------------------
map <- leaflet() %>%
  # addTiles() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%

  # Blue borough borders + blue fill by street shootings
  addPolygons(
    data = boroughs_geo,
    fillColor = ~pal_boro(street_shootings),
    fillOpacity = 0.45,
    weight = 3,
    color = "#0A1D4D",   # dark navy blue border
    group = "Boroughs",
    label = ~paste0(NAME, ": ", street_shootings, " street shootings"),
    highlightOptions = highlightOptions(
      weight = 4,
      color = "#000000",
      fillOpacity = 0.6,
      bringToFront = TRUE
    )
  ) %>%

  # Station dots colored by subway shootings
  addCircleMarkers(
    data = stations_with_counts,
    radius = 6,
    fillColor = ~ifelse(
      shooting_count == 0,
      zero_color,
      pal_station(shooting_count)
    ),
    color = NA,
    stroke = FALSE,
    fillOpacity = 0.9,
    popup = ~paste0(
      "<b>", `Stop Name`, "</b><br>",
      "Lines: ", `Daytime Routes`, "<br>",
      "Borough: ", Borough, "<br>",
      "Subway shootings within 40m: ", shooting_count
    ),
    group = "Stations"
  ) %>%

  # Legend for borough street shootings (blue)
  addLegend(
    position  = "bottomright",
    pal       = pal_boro,
    values    = boroughs_geo$street_shootings,
    title     = "Street shootings per borough",
    opacity   = 0.8,
    className = "legend-boroughs"
  ) %>%

  # Legend for station subway shootings (red)
  addLegend(
    position  = "bottomleft",
    pal       = pal_station,
    values    = nonzero_counts,
    title     = "Subway shootings (within 40m)",
    opacity   = 0.8,
    className = "legend-stations"
  ) %>%

  addLayersControl(
    overlayGroups = c("Boroughs", "Stations"),
    options = layersControlOptions(collapsed = FALSE)
  )

# ----------------------------------------------------------
# JS: HIDE/SHOW LEGENDS WITH THEIR GROUPS
# ----------------------------------------------------------
htmlwidgets::onRender(
  map,
  "
  function(el, x) {
    var map = this;
    var boroughLegend = document.querySelector('.legend-boroughs');
    var stationLegend = document.querySelector('.legend-stations');

    // start visible (since both groups are checked by default)
    if (boroughLegend) boroughLegend.style.display = 'block';
    if (stationLegend) stationLegend.style.display = 'block';

    map.on('overlayadd', function(e) {
      if (e.name === 'Boroughs' && boroughLegend) {
        boroughLegend.style.display = 'block';
      }
      if (e.name === 'Stations' && stationLegend) {
        stationLegend.style.display = 'block';
      }
    });

    map.on('overlayremove', function(e) {
      if (e.name === 'Boroughs' && boroughLegend) {
        boroughLegend.style.display = 'none';
      }
      if (e.name === 'Stations' && stationLegend) {
        stationLegend.style.display = 'none';
      }
    });
  }
  "
)

label_year <- 2022

label_points_year <- crime_long %>%
  group_by(offense) %>%
  mutate(dist = abs(year - label_year)) %>%
  slice_min(dist, n = 1, with_ties = FALSE) %>%
  ungroup()

ggplot(crime_long, aes(x = year, y = total, color = offense)) +
  geom_line() +
  geom_point() +
  geom_text_repel(
    data = label_points_year,
    aes(x = year, y = total, label = offense),
    nudge_x = 0.3,
    direction = "y",
    hjust = 0,
    segment.color = NA,
    show.legend = FALSE
  ) +
  scale_x_continuous(
    breaks = 2019:2024,
    limits = c(2019, 2024),
    expand = c(0, 0)
  ) +
  coord_cartesian(clip = "off") +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.margin = margin(5.5, 80, 5.5, 5.5)
  ) +
  labs(
    title = "Seven Major Felony Offenses in NYC, 2019–2024",
    x = "Year",
    y = "Number of Incidents"
  )
ggplot(cleaned_crashes, aes(x = month)) +
    geom_area(aes(y = total_casualties),
              alpha = 0.5, fill = 'red2', color = 'red') +   # semi-transparent casualties area
    geom_line(aes(y = total_crashes)) +
    labs(
        title   = "NYC Car Crashes and Casualties Per Month (post covid)",
        x       = "Time",
        y       = "Count",
        caption = "Line = total crashes; shaded area = total casualties (injured + killed)"
    ) +
    theme_minimal()

Attribution

All members contributed equally.