---
title: "Fakes on the Rise"
subtitle: "Final Report – The Unseen Cost of U.S. Tariffs"
author: "Eunice, Donia, Elena, and Andrea"
date: "December 7, 2025"
format:
html:
toc: true
toc-location: right
theme: flatly
self-contained: true
code-fold: true
code-summary: "Show code"
code-tools: true
execute:
echo: true
warning: false
message: false
---
```{r}
#| label: setup
#| include: false
library(tidyverse)
library(here)
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
comment = "#>",
fig.path = "figs/",
fig.width = 7.252,
fig.height = 4,
fig.retina = 3
)
theme_set(theme_bw(base_size = 20))
#install.packages(c( "tidyverse","janitor","here","lubridate","forcats", "plotly","gifski","readr","stringr","scales"))
library(tidyverse)
library(janitor)
library(here)
library(lubridate)
library(forcats)
library(plotly)
library(readr)
library(stringr)
library(scales)
library(tibble)
dir.create("data_raw", showWarnings = FALSE)
dir.create("data_processed", showWarnings = FALSE)
dir.create("figs", showWarnings = FALSE)
dir.create("images", showWarnings = FALSE)
```
## Introduction
Global trade tensions and higher tariffs have changed how goods move and how people shop. When tariffs make imported goods more expensive, some consumers turn to cheaper alternatives—including counterfeit versions of popular products.
Our project looks at whether U.S. tariff protection might unintentionally encourage counterfeit activity. We focus on product categories that have both high tariffs and high counterfeit seizure values. Using two datasets—HTS 2025, which lists tariff rates, and CBP IPR Seizures (2019–2023), which tracks counterfeit goods—we compare which product types face the highest tariffs and which have the greatest counterfeit value.
By studying these patterns, we hope to show how trade policy and illegal markets might be connected. If we find that higher tariff categories often have higher counterfeit values, it could suggest that tariffs create unintended effects for businesses, consumers, and enforcement agencies.
## Research Question
Across all products in the HTS 2025 schedule, which major product categories (chapters) have the highest average tariff rates, and do these categories correspond to higher counterfeit seizure values in the United States?
# Data Sources
**Data Set Context**
In the CBP dataset, a “seizure” refers to an official case where U.S. Customs and Border Protection intercepts and confiscates counterfeit goods at the border. Each seizure record includes details such as the product type, country of origin, and mode of transport. While seizures do not capture every counterfeit item entering the U.S., they provide one of the most reliable indicators of counterfeit activity available. However, seizure counts also reflect changes in enforcement intensity, so they may not perfectly represent actual demand for counterfeit goods.
**[CBP’s IPR Seizures Dataset (Counterfeit Goods Seizures from 2019–2023)](https://www.cbp.gov/sites/default/files/assets/documents/2024-Mar/ipr-seizures-fy19-fy23_0.csv)**
**Title:** FY19 – FY23 IPR Seizures (CSV file available at the link above)
**Description:** For FY 2019–FY 2023; includes number of seizures, by product type, by country of origin, by mode of conveyance, etc. There is also a data dictionary. Why relevant: This gives us a time series and cross‐sectional data on counterfeit goods entering the US, which is a proxy for “demand for counterfeit goods” (while imperfect, seizures can reflect supply side, enforcement, and to some extent demand).
**Relevance:** This dataset shows trends in counterfeit goods coming into the U.S. While it mainly reflects enforcement and supply, it can also give clues about demand. Tracking seizures by product type, country, and year helps us see if tariffs may be linked to changes in counterfeit activity
**Validity & Limitations:** Seizure data does not fully capture the size of the counterfeit market, since not all counterfeit goods are detected, and enforcement intensity can vary by year or location. However, it remains one of the most systematic, publicly available indicators of counterfeit inflows over time.
**[HTS Archive (2025 Revision 23 – CSV)](https://www.usitc.gov/sites/default/files/tata/hts/hts_2025_revision_23_csv.csv)**
**Title:**Harmonized Tariff Schedule (HTS) Archive, Revision 23 (CSV available at link)
**Description:**This dataset provides scheduled tariff rates for all U.S. import product lines, categorized at the HTS 8-digit level. It includes classification codes and duty rates across product categories, updated annually.
**Relevance:**This dataset allows us to measure tariff exposure by product category and over time. By linking HTS tariff rates to the CBP seizures data, we can test whether increases in tariffs are associated with higher counterfeit inflows.
**Validity & Limitations:** Elena found both datasets on GitHub, and Donia and Andrea cleaned and organized the files so they could be used for analysis. Cleaning included renaming variables, removing missing values, and matching product categories across both datasets. Some missing information was found in both files; for example, a few tariff lines in HTS had no duty rate, and some CBP records had missing or zero MSRP values. We removed those rows before analyzing. There could also be some bias in the data: For the CBP seizures: Not all counterfeit goods entering the U.S. are caught. The data only reflects what was seized, which depends on how much enforcement is happening at that time or at certain ports. For the HTS data: The tariff rates are accurate, but they don’t show how much of each product is actually imported or sold. A high tariff doesn’t always mean high trade or consumer demand. Even with these limits, both datasets are trustworthy and give us a strong base for exploring the relationship between tariffs and counterfeit activity.
## Evaluation of Expectations
In our original proposal, we expected that product categories with higher tariffs would also show more counterfeit activity. We also anticipated that certain consumer-facing, brand-intensive goods—especially watches, handbags, and apparel—would dominate counterfeit seizures because they are popular, high-value, and relatively easy to imitate.
After cleaning and re-structuring the data, our updated analyses mostly support these expectations, while adding some nuance. Using the HTS data, we first confirmed that tariff rates are not evenly distributed across product categories. Average U.S. general duties are low for many goods, but a small group of HTS categories face much higher rates, reflecting stronger trade protection for particular sectors. When we map detailed HTS descriptions into broader consumer product types, we find that categories linked to our counterfeit focus areas (such as apparel, footwear, and certain branded consumer goods) tend to carry higher-than-average tariff burdens.
The CBP IPR data also shows a clear concentration of counterfeit value. Our ranked bar chart of total seized MSRP across 2019–2023 indicates that Watches/Jewelry, Handbags/Wallets, and Wearing Apparel/Accessories account for the largest shares of counterfeit value, followed by Consumer Electronics and a smaller set of other categories. This pattern aligns with our expectation that luxury and logo-heavy products are the most attractive targets for counterfeiters, both because of their high retail prices and strong brand recognition.
The new time-series plot provides a dynamic view that was missing from our progress report. Instead of moving steadily, seizure values rise and fall between 2019 and 2023, with categories such as handbags, apparel, and watches experiencing clear spikes before levelling off or declining. This implies that counterfeit activity adapts to changing market conditions, enforcement priorities, or broader disruptions rather than remaining constant over time.
Taken together, these findings partially support our original hypothesis. **In terms of distribution**, tariff rates remain unequal across categories, and counterfeit activity is concentrated in a small number of high-value, brand-intensive goods. **In terms of relationships**, the categories with the highest counterfeit seizure value are also subject to higher tariff protection, though this relationship varies by year and is not completely linear. While this does not prove a link, the correlation between high tariffs and high counterfeit values gives support to the theory that tariff-driven price increases may create incentives for counterfeit alternatives in the very products that consumers demand the most.
## Visualization & Analysis
**Top Counterfeited Product Categories by Total Seizure Value**
```{r}
ipr_raw_data <- read_csv(here::here("data_raw/ipr-seizures.csv"))
ipr_clean_data <- ipr_raw_data %>%
clean_names() %>%
rename(
fiscal_year = fy,
country = trading_partner,
mode = mode_of_transportation,
center = centers_of_excellence,
product_type = product,
seizure_id = unique_seizure_id,
line_number = line,
msrp_value = msrp
)
ipr_clean_data <- ipr_clean_data %>%
mutate(country = str_trim(str_to_upper(country)))
country_abbs <- tibble::tribble(
~country_abb, ~country_full,
"CA", "Canada",
"CN", "China",
"HK", "Hong Kong",
"SG", "Singapore",
"TR", "Turkey",
"OTHER COUNTRIES", "Other Countries"
)
ipr_clean_data <- ipr_clean_data %>%
left_join(country_abbs, by = c("country" = "country_abb")) %>%
mutate(
country = if_else(!is.na(country_full), country_full, country)
) %>%
select(
fiscal_year,
country,
mode,
center,
product_type,
seizure_id,
line_number,
msrp_value
) %>%
distinct()
ipr_clean_data <- ipr_clean_data %>%
filter(fiscal_year >= 2019, fiscal_year <= 2023) %>% #make sure data is in the supposed timeframe
distinct()
write_csv(ipr_clean_data, here::here("ipr-seizures-clean.csv"))
library(tidyverse)
library(forcats)
ipr_by_product <- ipr_clean_data %>%
group_by(product_type) %>%
summarise(
total_value_usd = sum(msrp_value, na.rm = TRUE),
.groups = "drop"
)
top8_products <- ipr_by_product %>%
slice_max(total_value_usd, n = 8) %>%
pull(product_type)
ipr_top8 <- ipr_clean_data %>%
mutate(
product_type = fct_other(
product_type,
keep = top8_products,
other_level = "Other"
)
) %>%
group_by(product_type) %>%
summarise(
total_value_usd = sum(msrp_value, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
product_type = fct_reorder(product_type, total_value_usd)
)
ggplot(ipr_top8, aes(x = total_value_usd, y = product_type)) +
geom_col(
fill = "#DAA520",
color = "black",
linewidth = 0.4
) +
scale_x_continuous(
labels = scales::label_dollar(scale_cut = scales::cut_short_scale())
) +
labs(
title = "Top Counterfeited Product Categories by Total Seizure Value",
subtitle = "Summed MSRP value across all fiscal years in the dataset",
x = "Total MSRP Value (USD)",
y = "Product Category",
caption = "Source: CBP IPR Seizures (cleaned)"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
axis.title = element_text(face = "bold")
)
```
**Interpretation:** This graph shows that counterfeit activity is highly concentrated in a small number of product categories. Watches/Jewelry, Handbags/Wallets, and Wearing Apparel/Accessories account for the majority of total seized value, far outpacing all other categories. These are high-value, brand-driven products, making counterfeiting more profitable and easier to detect at the border. Even after grouping the remaining items into a "Other" category, the top categories continue to dominate overall counterfeit value. This pattern supports our larger finding that counterfeit markets are not evenly distributed—they are concentrated in luxury and high-demand consumer goods.
**Which major product categories have the highest average tariff rates?**
```{r}
#| eval: false
#install.packages("gganimate")
library(tidyverse)
library(here)
library(forcats)
library(gganimate)
library(av)
hts <- read_csv(here::here("data_processed", "hts_2025_revision_23_complete_Clean.csv"))
# If duty is in proportions (0–1), convert to percent
scale_needed <- max(hts$general_rate_of_duty, na.rm = TRUE) <= 1
if (scale_needed) {
hts <- hts %>% mutate(general_rate_of_duty = general_rate_of_duty * 100)
}
chapter_labels <- tribble(
~chapter, ~category,
"21","Edible goods",
"98","Temporary imports",
"22","Beverages",
"24","Tobacco and substitutes",
"06","Plants",
"18","Cocoa",
"27","Mineral fuels/ oils",
"04","Dairy products"
)
# Summarize by chapter and keep top 8 by average duty
chapters <- hts %>%
mutate(chapter = str_sub(hts_number, 1, 2)) %>%
filter(!is.na(general_rate_of_duty)) %>%
group_by(chapter) %>%
summarise(
avg_general_duty = mean(general_rate_of_duty, na.rm = TRUE),
n_lines = n(),
.groups = "drop"
) %>%
filter(n_lines >= 10) %>%
slice_max(avg_general_duty, n = 8) %>%
left_join(chapter_labels, by = "chapter") %>%
mutate(
category = coalesce(category, paste0("Chapter ", chapter))
) %>%
arrange(desc(avg_general_duty)) %>%
mutate(
rank = row_number(),
category = fct_reorder(category, avg_general_duty)
)
p_base <- ggplot(chapters, aes(x = avg_general_duty, y = category)) +
geom_segment(aes(x = 0, xend = avg_general_duty, yend = category),
linewidth = 0.7, color = "black") +
geom_point(size = 4, color = "#ff66b2") +
labs(
title = "Which major product categories have the highest average tariff rates?",
x = "Average U.S. General Duty (%)",
y = "Top 8 HTS Categories",
caption = "Source: USITC HTS 2025 Revision 23"
) +
theme_minimal(base_size = 13) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold")
)
p_anim <- p_base +
transition_states(rank, transition_length = 0.3, state_length = 0.8) +
shadow_mark(past = TRUE) +
enter_grow() + enter_fade() +
ease_aes("cubic-in-out")
# Save GIF
animate(
p_anim,
nframes = 80,
fps = 20,
width = 900,
height = 650,
renderer = gifski_renderer("hts_lollipop_animation.gif")
)
# Save MP4 with pause at end (optional)
mp4 <- animate(
p_anim,
nframes = 120,
fps = 15,
end_pause = 40,
width = 900,
height = 650,
renderer = av_renderer("hts_lollipop_animation.mp4")
)
```
```{r show_hts_animation}
#| echo: false
#| out.width: "80%"
knitr::include_graphics("hts_lollipop_animation.gif")
```
**Interpretation:** This chart shows the top eight HTS product categories with the highest average US tariff rates. Each point represents a major product chapter, and the length of the line indicates the average tariff rate for that category. The standout category is edible goods (Chapter 21), which has an exceptionally high average duty rate far exceeding all others in the top group. Other categories, such as temporary imports, beverages, tobacco substitutes, plants, cocoa, mineral fuels, and dairy products, have higher but significantly lower tariffs in comparison.
Overall, the plot demonstrates that tariff protection is not evenly distributed across product groups. A few chapters have significantly higher duties, indicating sectors where imports are more heavily restricted or protected.The differences help to explain why certain categories may be more vulnerable to price pressure and counterfeit substitution than others.
**Seizure value over time by product type**
```{r}
#| eval: false
library(tidyverse)
library(gganimate)
library(here)
hts <- read_csv(here::here("data_processed", "hts_2025_revision_23_complete_Clean.csv"))
ipr <- read_csv(here::here("ipr-seizures-clean.csv"))
hts_by_product <- hts %>%
mutate(
description_lower = str_to_lower(description),
product_type = case_when(
str_detect(description_lower, "footwear") ~ "Footwear",
str_detect(description_lower, "handbag|wallet") ~ "Handbags/Wallets",
str_detect(description_lower, "apparel|clothing|garment") ~ "Wearing Apparel/Accessories",
str_detect(description_lower, "watch|jewel") ~ "Watches/Jewelry",
str_detect(description_lower, "pharmaceutical|cosmetic") ~ "Pharmaceuticals/Personal Care",
str_detect(description_lower, "electrical|electronic|computer") ~ "Consumer Electronics",
str_detect(description_lower, "toy|game") ~ "Toys",
str_detect(description_lower, "cigarette|tobacco") ~ "Cigarettes",
TRUE ~ NA_character_
)
) %>%
filter(!is.na(product_type)) %>%
group_by(product_type) %>%
summarize(
mean_tariff = mean(general_rate_of_duty, na.rm = TRUE),
.groups = "drop"
)
ipr_by_product_year <- ipr %>%
group_by(fiscal_year, product_type) %>%
summarize(
total_msrp = sum(msrp_value, na.rm = TRUE),
n_lines = n(),
.groups = "drop"
)
merged_data <- ipr_by_product_year %>%
left_join(hts_by_product, by = "product_type")
plot_time_series_millions <- merged_data %>%
filter(!is.na(mean_tariff)) %>%
ggplot(aes(
x = fiscal_year,
y = total_msrp / 1e6,
group = product_type,
color = product_type
)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
facet_wrap(~ product_type, scales = "free_y") +
scale_x_continuous(breaks = sort(unique(merged_data$fiscal_year))) +
labs(
x = "Fiscal year",
y = "Total seized MSRP (in millions of USD)",
title = "Seizure value over time by product type"
) +
theme_bw() +
theme(
legend.position = "none",
strip.text = element_text(face = "bold"),
axis.title = element_text(face = "bold")
)
plot_time_series_anim <- plot_time_series_millions +
labs(subtitle = "Year: {round(frame_along)}") +
transition_reveal(along = fiscal_year)
anim <- animate(
plot_time_series_anim,
nframes = 120,
end_pause = 40,
width = 900,
height = 600
)
anim_save("seizure_time_series.gif", anim)
```
```{r show_seizure_animation}
#| echo: false
#| out.width: "80%"
knitr::include_graphics("seizure_time_series.gif")
```
**Interpretation:** This graph shows that the seizure values for major counterfeit product types do not remain constant over time. Most categories reach a clear peak around 2021, followed by either a plateau or a decline. High-value categories such as handbags, apparel, and watches experience the greatest swings, while smaller categories such as toys and cigarettes fluctuate less dramatically. Overall, the trends indicate that counterfeit activity varies in response to changing market conditions and enforcement, rather than following a consistent pattern.
**Analysis Summary**
The three charts show a consistent pattern: counterfeit activity is concentrated in a small set of high-value product categories that are also subject to relatively high US tariff rates, and these categories fluctuate noticeably over time. The tariff chart shows that certain chapters—often luxury or branded goods—have significantly higher average duties. The seizure value chart confirms that these same categories are predominant in counterfeit seizures. Finally, time-series trends show that counterfeit activity fluctuates with broader market conditions while remaining focused on the same high-demand products. Taken together, the charts support the notion that tariff-sensitive, brand-intensive goods attract the most counterfeit activity, and that this relationship holds even as yearly levels fluctuate.
## Conclusion
Our findings indicate that counterfeit activity in the United States is consistently concentrated in a small number of high-value, brand-driven product categories that face high tariff rates. While the data does not show that tariffs directly cause counterfeit inflows, the correlation between tariff-sensitive goods and high counterfeit values suggests that price pressures may influence consumer demand for non-genuine alternatives.
To improve this work, future analyses could include more years of data to capture longer-term trends, as well as additional sources such as import volumes, price indexes, or port-level enforcement measures. Data from CBP operational reports, e-commerce counterfeit tracking, and international tariff comparisons would all help to clarify how trade policy interacts with illicit markets. More multi-year, multi-source data would provide a better picture of the drivers of counterfeit activity.
## Attributions
Each team member contributed to different parts of the project:
**Elena** initially located the HTS and CBP datasets on GitHub and helped design the early visualizations, including the structure for the lollipop, bar, and time-series plots.
**Donia** and **Andrea** cleaned and organized the raw data files, matched product categories across datasets, and prepared the processed datasets used in the analysis. They later assisted with editing and refining the visualizations.
**Eunice** updated and expanded the narrative sections of the report, wrote the interpretations for all finalized charts, and compiled and structured all qmd file.
All team members contributed equally to discussions, decision-making, and shaping the overall direction of the project.
## Appendix
### HTS Dataset (hts_2025_revision_23_complete_Clean.csv)
| Variable | Description |
|---------|-------------|
| **hts_number** | 8-digit Harmonized Tariff Schedule code identifying the product line. |
| **description** | Text description of the product. |
| **general_rate_of_duty** | Baseline U.S. tariff rate (percentage). |
| **chapter** | First two digits of the HTS code. |
### IPR Seizures Dataset (ipr-seizures-clean.csv)
| Variable | Description |
|---------|-------------|
| **fiscal_year** | Fiscal year of the seizure. |
| **country** | Country of origin. |
| **mode** | Transportation mode. |
| **center** | CBP Center of Excellence. |
| **product_type** | Counterfeit good category. |
| **seizure_id** | Unique seizure ID. |
| **line_number** | Item line number. |
| **msrp_value** | Retail value of seized goods. |