Parking Violations in Washington DC

EMSE 4572

Author

Arsema Demeke, Kaitlyn Frost, Amna Maqsood, and Lola Nurullaeva

Published

December 10, 2023

Introduction

Our research analyzes the diverse landscape of parking violations in different wards of Washington, DC, examining how the frequencies, fees, and notable trends in parking violations have occurred between 2018 and 2022. Through our research, we aim to shed light on the potential disparities in violation patterns. With a focus on demographics, our aim as engineers is to provide valuable insights that guide practical approaches to establishing fairness and efficiency in the evolving urban environment.

Additionally, we wanted to consider outside influences on our data. Our study seeked to understand how various wards correlated with shifts in parking behaviors and fees, contributing valuable insights on how different socioeconomic groups are impacted. We found that parking violations in Wards 2 and 6 are significantly higher, likely due to their status as major tourist hubs with attractions like downtown DC, the White House, and Capitol Hill. The study emphasized the need for nuanced solutions and left us with more questions about demographic disparities and consider the impact of gentrification on parking behaviors in urban environments.

Research Question

Research Question:

“How do parking violations vary across the wards of Washington DC?”

Sub-questions:

  • How do violation frequencies differ within each ward from 2018 to 2022?
  • What changes occurred in parking violation fees across wards during the specified time?
  • What debatable trends or anomalies characterize Washington, DC’s parking violations?
  • How do parking violations vary across the wards of Washington DC?

The Data

The Data Sources

We are using data compiled from monthly reports from 2018-2022 given by the Metropolitan Police Department published by Vision Zero. The data is described as “pertain(ing) to parking citation issues by parking enforcement of various DC agencies and federal partners”. Data is organized by time of day, week, and category of violation as well as other information.

In addition to these data sets, we used a data set to describe each ticket by the ward it is in. The data in the data set was collected from official election reports. We used the latitude and longitude information from this data set to mutate a ward column based on the coordinates corresponding to the ticket.

Parking Violation Data Set: https://opendata.dc.gov/datasets/f57be968e9184b7fa665b61f40e6bbd8_11/explore

Wards Data Set: https://opendata.dc.gov/datasets/DCGIS::wards-from-2022/about

Data Validity

The data has been pre-processed by Vision Zero, a government organization focused on creating safer DC streets. They collected the data from the MPD eTIMS meter work management system. It was then exported into the DDOT (District Department of Transportation) and processed into coordinated by the Office of Chief Technology Officer. We believe that this is still a valid source of data, given that all of the pre-processing was done by reputable government organizations. The data might be biased because those giving the tickets may have an internal bias. The ward data was not pre-processed. This data was collected from the US 2020 Census and official election data from 2022. The data is complete, and may be biased due to self reporting in the 2022 census.

Figures and Analysis

Figure 1

When considering the variation of parking violations across DC, to better understand our data, our first exploration was the number of violations per ward. This chart shows the number of violations in thousands in each DC ward during 2018-2022.

The above chart visualizes a clear outlier in the data. Ward 2 has the highest violations. The non-linear scale of the graph allows for us to recognize other differences in our data. Knowing the general spread of violations across DC wards allows us to better understand our data and the difference in frequency of violations per ward as we continue with our exploration. To broaden our understanding we can continue to explore our data by year.

Figure 2

To better understand the parking violations in DC by each ward, a visualization that incorporates seasons is crucial to understanding how the data is affected by this factor.

This faceted bar chart portrays each ward’s parking violation accumulation by ward and by season from 2018-2022. Through this visual analysis, a common trend can be observed; Wards 2 and 6 accumulate the largest number of parking violations and these violations most frequently occur in the Summer and Spring seasons.

Figure 3

Before conducting data visualizations on the type of violations that were accumulated in different DC wards over the past 5 years, the plethora of the different types of violations provided was simplified by grouping similar types of violations. Next, the relationship between these types of parking violations and the frequency of their occurrence is analyzed within each ward between the years 2018-2022.

Through this visualization, we can observe that Parking and Stopping violations occur most frequently as they accumulate high compared to the other violation types. Furthermore, this visualization allows us to examine this trend over time. One thing to note is that the second highest violation type to occur is categorized as ‘Other’ in the data. This indicates how a lot of the data wasn’t filtered by the groupings performed and also shows how a multitude of violations aren’t accounted for. This highlights possible N/A values and signifies how the information displayed here is general.

Figure 4

Explore the intricate landscape of parking enforcement dynamics in DC through dual perspectives.

Together, these visualizations provide a comprehensive overview of the complex interplay between agencies in shaping the parking fines ecosystem. Notably, the data highlights the Department of Public Works’ absolute dominance, offering insights into its significant role and impact within the parking enforcement landscape.

Figure 5

This bar chart covers the variations in outstanding fines, offering valuable insights into the financial dynamics surrounding parking violations. Through a clear depiction of the disparities in unpaid fines across different entities, it serves as a key tool for understanding and addressing the challenges in the management of parking violations.

The clear depiction of these disparities across different entities in the bar chart serves as a key tool for understanding and addressing the challenges in the management of parking violations, sparking additional questions about the extent of unpaid fines and the number of individuals who have not fulfilled their financial obligations.

Figure 6

This interactive visualization focuses on the reciprocity between issuing agencies across the city’s diverse wards. By employing an extended color palette and interactive bubble charts, we aim to unravel the nuances of the fines, spotlighting key agencies and their significant roles in shaping the urban landscape of parking enforcement.

The significance of this research lies in the ability to portray the complex dynamics through visualizations, making the disparities between the top and bottom-performing agencies more apparent. Introducing a lower bound that offers a nuanced threshold for understanding the spectrum of enforcement intensities.

Conclusion

Intending to address the above research question, this project dissects the data on DC parking violations between 2018-2022. Through this effort, we learned many surprising facts and trends were discovered. As expected, the number of parking violations in Wards 2 and 6 was found to overwhelmingly dominate over the other wards. Many factors contribute to this occurrence. For instance, these areas of DC are most prone to tourism and visitor traffic with many popular places located in them; Ward 2 has downtown DC, the White House, governmental buildings, and many famous monuments while Ward 6 has places like the Capitol Hill. This proves that a strong positive correlation exists between the number of parking violations and its geographical location, with a higher prevalence of violations near tourist attractions.

Indicative of seasonal influence on parking violation accumulation, the above-faceted chart Section 4.2 visualized how the number of violations increased during the Summer and Spring months. This indicates that parking violations vary seasonally, with peak tourist seasons likely contributing to higher violations near attractions. Therefore, this positive correlation is expected to be more pronounced during peak tourist seasons when visitor numbers are higher and public transportation usage may fluctuate accordingly.

This project also found a trend in the parking violation fees; most wards had a higher percentage of unpaid violations with Wards 2 and 6 experiencing the highest number of paid and unpaid violation fees for all 5 years.

Lastly, a bar chart analyzing the number of fines by the agency found that although the Metropolitan Police Department accumulated the most violations when compared to the other agencies alone, it is easily dwarfed by the Department of Public Works. This emphasizes the contribution of the department agencies to the number of parking violations issued.

To further extend our analysis, we can consider the demographics of each ward or what is the most prevalent building type such as work, tourism, shopping, and residence. To further extend our data, we can incorporate data from previous years including the early 2010’s. This will allow a more detailed explanation and can account for the missing data seen in the line plot. Furthermore, we can incorporate larger concepts like gentrification and how that affects parking violations, especially in Ward 2 where Georgetown continues to represent the fastest gentrified city in the US.

Attribution

All members contributed equally with ideas, graphs, and writing portions.

Appendix

Data Cleaning

Originally, the parking violation data set consisted of csv files of violations for individual years (2018-2022) which was then combined together. Then the parking violation data set and the “Wards_from_2022.shp” file from the Wards data set source were used to mutate a column indicating what ward each parking violation occurred in. This cleaned data was saved into the “parking_violations_data.parquet” file in the data_processed folder.

Code
directory <- here::here('data_raw', 'parking_files')
files <- list.files(directory, pattern = "\\.csv$", recursive = TRUE, full.names = TRUE)

df <- data.frame()

read_and_normalize <- function(file_path) {
  temp_df <- read_csv(file_path) %>%
    rename_all(tolower)
  
  temp_df$issue_time <- format(strptime(temp_df$issue_time, format = "%H:%M:%S"), format = "%H:%M:%S")
  
  return(temp_df)
}

for (file in files) {
  file_path <- file.path(directory, file) 
  temp_df <- read_and_normalize(file) 
  df <- bind_rows(df, temp_df)
}

# Mutating Ward Column using file in wards_data folder ----
path_dc_wards <- here::here('wards_data', 'Wards_from_2022.shp')
dc_wards <- st_read(path_dc_wards)

df_sf <- df %>% 
  filter(!is.na(longitude) & !is.na(latitude)) %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326)  # 4326 is the WGS 84 coordinate reference system

df_with_wards <- st_join(df_sf, dc_wards)

df <- df_with_wards %>% select(
objectid, ticket_number, issue_date, issue_time, issuing_agency_code, issuing_agency_name, 
issuing_agency_short, violation_code, violation_proc_desc, 
location, plate_state, vehicle_type, 
multi_owner_number, disposition_code, disposition_type, 
disposition_desc, disposition_date, fine_amount, 
total_paid, penalty_1, penalty_2, penalty_3, penalty_4, penalty_5, xcoord, 
ycoord, mar_id, gis_last_mod_dttm, violation_type_desc, WARD, NAME)

df$geometry <- NULL

# Saving df as parquet ---
savePath <- here('data_processed', 'parking_violations_data.parquet')
write_parquet(df, savePath)

Data Dictionary

Below is a data dictionary specifying the variable names of the resulting cleaned data frame.

variable class description
objectid double The ID used to identify each ticket in the data set.
ticket_number double The number used to create and track the incident report.
issue_date character The date the ticket was issued, written in day/month/year format.
issue_time character The time of day the ticket was issued, written in military time.
issuing_agency_code double The number code used to represent the agency that issued the ticket.
issuing_agency_name character The name of the agency that issued the ticket.
issuing_agency_short character The number and letter code representing the agency that issued the ticket.
violation_code character City wide code representing types of violations.
violation_proc_desc character Description of the parking violation.
location character Address of the parked car.
plate_state logical The state the vehicle is registered in (not recorded).
vehicle_type logical The type of vehicle, shortened.
multi_owner_number double The number, if connected to a secondary owner.
disposition_code double The code representing the type of disposition, if applicable.
disposition_type character Other or dismissed, depending on the outcome of appeal.
disposition_desc character A description of the type of disposition.
disposition_date character The date of the disposition.
fine_amount double The amount the vehicle owner was fined.
total_paid double The amount the vehicle owner paid.
penalty_1 double The amount of the first penalty, if applicable
penalty_2 double The amount of the second penalty, if applicable
penalty_3 double The amount of the third penalty, if applicable
penalty_4 double The amount of the fourth penalty, if applicable
penalty_5 double The amount of the fifth penalty, if applicable
xcoord double The longitude of the location that the ticket was given.
ycoord double The latitude of the location that the ticket was given.
mar_id double A numerical identifier.
gis_last_mod_dttm character The time and date of the last modification of the ticket.
violation_type_desc character A code representing the type of violation, P means Parking.
WARD integer The DC ward that the violation occurred in.
NAME character The labeled ward that the violation occurred in (eg. Ward 8)

Graph Code

Code
library(sp)
total_violations <- df %>%
    group_by(WARD) %>%   
    summarize(ward = n()) %>% 
    drop_na() %>% 
    arrange(desc(ward))
dc_shp <- st_read(here::here("data_raw", "wards_data", "Wards_from_2022.shp"), quiet = TRUE) 
merged_data <- dc_shp %>%
  left_join(total_violations, by = "WARD") %>%
  mutate(total_violations = ifelse(is.na(ward), 0, ward))
centroids <- st_centroid(dc_shp)
centroid_coords <- st_coordinates(centroids)
shape <- ggplot() +
    geom_sf(data = merged_data, aes(fill = ward/1000, tooltip = NAME)) +
    geom_text(data = as.data.frame(centroid_coords), 
            aes(x = X, y = Y, label = dc_shp$NAME), 
            size = 3, color = "steelblue") +
    scale_fill_viridis(option = "rocket",
                       trans = "sqrt",
                       direction = -1) +
    labs(title = 'Parking Violations by DC Ward',
         fill = 'Parking violations in thousands') +
    theme_void() +
    theme(text = element_text(family = "Verdana"), 
          plot.title = element_text(face = "bold"))

ggplotly(shape) %>%
  layout(
    title = list(
      text = paste0('Parking Violations by DC Ward', '<br>', '<sup>', '(Hover for Breakdown)')))
seasonal_df <- df %>% 
    select("issue_date", "WARD") %>% 
    filter(!is.na(WARD)) %>% 
    mutate(issue_date = as.POSIXct(issue_date, format = "%Y/%m/%d %H:%M:%S")) %>% 
    mutate(season = ifelse(month(issue_date) %in% c(12, 1, 2), "Winter",
                  ifelse(month(issue_date) %in% c(3, 4, 5), "Spring",
                         ifelse(month(issue_date) %in% c(6, 7, 8), "Summer", "Fall")))) %>% 
    mutate(year = year(issue_date)) %>% 
    mutate(WARD = paste("Ward", WARD))
    
starfish_palette <- c(pnw_palette("Starfish"), "#de3163")

seasonal_df_facet_func <- function(input) {
    seasonal_df %>%
    filter(year == input) %>% 
    count(WARD, season) %>%
    mutate(n = n / 1000) %>%
    ggplot() +
    geom_col(aes(x = season, y = n, fill = WARD),
             width = 0.7) +
    facet_wrap(vars(WARD), ncol = 2) +
    coord_flip() +
    scale_y_continuous(
        expand = expansion(mult = c(0, 0.05))) +
    scale_fill_manual(values = starfish_palette) +
    labs(x = "Seasons", 
         y = "Count (in thousands)", 
         title = paste("Number of Parking Violations by Ward for", input)
         ) +
    theme_light() +
    theme(legend.position = "none", 
          text = element_text(family = "Verdana"), 
          plot.title = element_text(face = "bold"))
}
seasonal_df_facet_func(2018)
seasonal_df_facet_func(2019)
seasonal_df_facet_func(2020)
seasonal_df_facet_func(2021)
seasonal_df_facet_func(2022)
group_viol_df <- mutate(df, violation_type = case_when(
  grepl("PARK|STAND|NO STOPPING", violation_proc_desc) ~ "Parking/Stopping",
  grepl("FAIL TO DISPLAY|IMPROPER DISPLAY", violation_proc_desc) ~ "Display Issues",
  grepl("OBSTRUCT|FAIL TO TURN WHEEL", violation_proc_desc) ~ "Obstruction",
  grepl("UNAUTHORIZED VEHICLE|NO PERMIT|RESIDENTIAL PMT PKG|RPP|EXPIRED|FAIL TO SECURE TAGS", violation_proc_desc) ~ "Tag/Permit Issues",
  grepl("DANGEROUS VEHICLE|DEFECTIVE|MUFFLER|NO FRONT|NO REAR|TINTED WINDOWS", violation_proc_desc) ~ "Vehicle Condition",
  grepl("COMMERCIAL|LOADING ZONE", violation_proc_desc) ~ "Loading Zone",
  TRUE ~ "Other"
))

group_viol_df <- group_viol_df %>% 
    mutate(issue_date = as.POSIXct(issue_date, format = "%Y/%m/%d %H:%M:%S")) %>% 
    mutate(year = year(issue_date)) %>% 
    group_by(year) %>% 
    count(violation_type)

starfish_palette <- c(pnw_palette("Starfish"), "#de3163", "#79cecc", "#902e59", "#95afc2", "#91BBE8")

grp_viol_anim_plot <- group_viol_df %>%
  ggplot(
    aes(x = year, y = n/1000,
        color = violation_type)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  geom_text_repel(
    aes(label = violation_type),
    hjust = 0, nudge_x = 0.2, direction = "y",
    size = 3, segment.color = NA,
    max.overlaps = Inf) + 
  scale_x_continuous(
    breaks = seq(2018, 2022, 1)
    ) +
  scale_color_manual(values = starfish_palette) +
  theme_minimal() +
  labs(x = 'Year',
       y = 'Count (in thousands)',
       title = 'Stopping/Standing Outnumbers') +
    theme(legend.position = "none", 
          text = element_text(family = "Verdana"), 
          plot.title = element_text(face = "bold"))

grp_viol_anim <- grp_viol_anim_plot +
    transition_reveal(year)
# Render the animation
animate(grp_viol_anim,
        fps = 25,
        duration = 15,
        width = 1100, height = 650, res = 150,
        renderer = magick_renderer())
# Save last animation
anim_save(here::here(
  'figs', 'grp_viol_animation.gif'))
dff <- df %>% 
    select(issuing_agency_name, fine_amount, total_paid, NAME)
dff$year <- year(as.Date(df$issue_date, format = "%Y/%m/%d %H:%M:%S"))

dff2 <- dff %>% 
    filter(!is.na(fine_amount)) %>% 
    group_by(NAME,issuing_agency_name) %>% 
    summarize(sum_fines = sum(fine_amount), num_fines = n()) %>% 
    arrange(desc(num_fines))

anemone_palette <- c(pnw_palette("Starfish"), "#de3163")

# Create the common theme
common_theme <- theme_minimal() +
  theme(
    plot.title = element_text(size = 17L, face = "bold"),
    plot.subtitle = element_text(size = 13L)
  )

# Plot 2
plot2 <- dff2 %>%
  filter(num_fines >= 20000L & num_fines <= 2560000L) %>%
  ggplot() +
  aes(x = issuing_agency_name, fill = NAME, weight = num_fines) +
  geom_bar() +
  scale_fill_manual(values = anemone_palette) +
  scale_y_continuous(labels = scales::comma_format(scale = 1e-6)) +  # Adjust y-axis labels
  labs(x = "Agency", y = "Number of Fines (Millions)", title = "Number of Parking Fines Fines in DC Wards 2023", 
       subtitle = "Top 3 Issuing Agencies", fill = "Wards ") +
  theme_minimal() +
  theme(plot.title = element_text(size = 17L, face = "bold"), plot.subtitle = element_text(size = 13L)) +
  common_theme

# Plot 1
filtered_data <- dff2 %>%
   filter(num_fines >= 1000L & num_fines <= 25600L & issuing_agency_name != "DEPARTMENT OF PUBLIC WORKS")

plot1 <- filtered_data %>%
  ggplot() +
  aes(x = issuing_agency_name, fill = NAME, weight = num_fines) +
  geom_bar() +
  scale_fill_manual(values = anemone_palette) +  # Use the Anemone palette
  scale_y_continuous(labels = scales::comma_format(scale = 1e-6)) +  # Adjust y-axis labels
  labs(
    y = "Number of Fines (Millions)",
    title = "Number of Parking Fines in DC Wards 2018 - 2022",
    subtitle = "Top Issuing Agencies",
    fill = "Wards"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 17L, face = "bold"),
    plot.subtitle = element_text(size = 13L),
    axis.text.x = element_text(angle = 45, hjust = 1),
    axis.title.x = element_text(color = 'white'),
    plot.caption = element_text(hjust = 1)  # Adjust hjust for the caption
  )

plot1
plot2
dff3 <- df %>% 
    filter(!is.na(fine_amount)) %>% 
    filter(!is.na(NAME)) %>% 
    group_by(NAME,issuing_agency_name) %>% 
    summarize(sum_fines = sum(fine_amount), num_fines = n(), total_paid = sum(total_paid)) %>% 
    arrange(desc(num_fines))

ggplot(dff3, aes(x = NAME)) +
  geom_bar(aes(y = sum_fines/1e+6, fill = "Unpaid"), position = position_dodge(width = 0.4), stat = "identity", width = 0.6) +
  geom_bar(aes(y = total_paid/1e+6, fill = "Paid"), position = position_dodge(width = 0.4), stat = "identity", width = 0.6) +
  scale_fill_manual(values = c("Unpaid" = "Grey", "Paid" = "#de3163"), name = "Fine Status") +
  labs(
    x = "Ward",
    y = "Amount (in millions of dollars)",
    title = "Fine Status from 2018-2022",
    subtitle = "In D.C. Wards"
  ) +
  theme_minimal() +
    theme(text = element_text(family = "Verdana"),
          plot.title = element_text(face = "bold"))
data <- dff2 %>%
  mutate(sum_fines_millions = round(sum_fines / 1e6, 2)) %>%
  
  # Reorder agencies to have big bubbles on top
  arrange(desc(sum_fines)) %>%
  mutate(issuing_agency_name = str_to_title(fct_inorder(issuing_agency_name))) %>%
    
    # Filter out points with values less than 0.5 million
  filter(sum_fines_millions >= 0.5) %>%
  
  # Prepare text for tooltip
  mutate(text = paste("Agency: ", issuing_agency_name, "\nTotal Fines: $", sum_fines_millions, " million", sep="")) %>%
  
  # Classic ggplot
  ggplot(aes(x = NAME, y = sum_fines_millions, size = sum_fines_millions, color = NAME, text = text)) +
  geom_point(alpha = 0.7) +
  scale_size(range = c(1.4, 19), name = "Total Fines (Millions)") +
  scale_color_manual(values = anemone_palette) +  # Use extended palette
  theme_ipsum() +
  theme(legend.position = "none", 
        axis.title.x = element_text(size = 17),  
        axis.title.y = element_text(size = 17), 
        axis.text.x = element_text(face = "bold",  size = 13),  
    axis.text.y = element_text(face = "bold",size = 13),
    text = element_text(family = "Verdana"),
          plot.title = element_text(face = "bold")) +
      labs(
          x = 'DC Wards', 
          y = 'Parking Fines in Millions', 
    title = "DC Parking Fines 2018 - 2022",
    caption = "Leading Agencies"
  )

# Turn ggplot interactive with plotly
pp <- ggplotly(data, tooltip = "text")
#pp
ggplotly(pp) %>%
  layout(title = list(text = paste0('DC Parking Fines Analysis (2018 - 2022)',
                                    '<br>',
                                    '<sup>','Leading Fine Issuing Agencies in Each Ward','</sup>'))) %>% 

  layout(margin = list(l = 50, r = 50, b = 100, t = 50),
         annotations = list(x = 1, y = -0.3, 
                            xref='paper', yref='paper', showarrow = F, 
                            xanchor='right', yanchor='auto', xshift=0, yshift=0,
                            font = list(size = 10)), autosize = TRUE)