Basketball Teams with Most Loyal Fans
NBA & WNBA
Introduction
People often spend a lot of their free time watching sports, but certain teams often get more viewership than others. We want to dive deep into the fan loyalty, specifically basketball.
Due to historical events, there have been viewer trends that have affected which sports, male and female, have been broadcast more. Our main motivation is to understand which teams have the most loyal fans by looking at the game viewership and winning streaks.
For example: We know that the Eagles football (Go EAGLES!) team has a very loyal fan base, and we are interested in investigating the same for basketball.
There are a number of factors that qualify fans as loyal or not, game scores, fan attendance, and team details. Through the analysis below we find that there are more loyal fans than others in both the NBA and WNBA, some being more or less surprising.
Research Question
Main: Which NBA and WNBA teams have the most loyal fan bases?
Sub questions:
How has attendance changed over time in both the NBA and WNBA?
How does each team’s attendance compare to the league average?
How do NBA and WNBA teams rank in terms of fan loyalty between 2000–2021?
Do fans show up regardless of team performance, or is attendance tied closely to win percentage?
Data Sources
NBA Database:
This data set contains all sorts of data about NBA games like teams, game info, player stats, draft history, and play by plays. The information dates back to 1946 to 2023. The original data source was the NBA Stats whereas the data file used was posted on this Kaggle Page.
This data has been pre-processed from the original source.
The original data was collected using a Python3 script using the nba_api (https://github.com/swar/nba_api).
The data does not have missing values that may seem to be important but it however is not up to date, meaning the file goes up to the year of 2023.
The data is not biased, as it was sourced from the original source and not modified in any way, the Kaggle author just took the data from the website and put it into one zip file.
WNBA Database:
wnba-attendance-by-game:
This data set tracks WNBA game attendance from 1997 through 2024, listing attendance by team, season, and game. It provides a detailed view of how fan engagement has evolved over time, allowing for comparisons between teams, arenas, and historical periods. The data comes from a Kaggle dataset originally compiled by user Mayzie Hunter, who aggregated attendance figures from multiple reliable sports sources, including ESPN and Basketball Reference.
The version used in our project has been cleaned and standardized into wnba_attendance_alltime_clean.csv, which includes only the essential variables for year, team, opponent, and attendance. This cleaned format enables easier aggregation of average attendance by team and year for time-series and loyalty analyses.
Validity
Original Source: https://www.kaggle.com/datasets/mayziehunter/wnba-attendance-by-game
Kaggle dataset compiled from public sports statistics and official WNBA box scores.
Data Processing: Pre-processed into a clean CSV to remove incomplete rows, unify date formats, and add a year variable extracted from the game date.
Completeness: Covers most WNBA seasons since the league’s founding, though attendance figures for the 2020 and 2021 seasons may be lower or partially missing due to COVID-19 restrictions.
Bias: Reported attendance figures reflect “announced” crowd sizes rather than actual in-seat counts, which may slightly inflate some numbers. Otherwise, the dataset remains a credible representation of long-term attendance trends.
teamstats.csv:
This data set contains WNBA game stats like win percentage, season, and games played. The information dates back to 1997 to 2023. The original data source was gathered from the official WNBA Stats whereas the data file used was posted on this Kaggle Page.
This data has been pre-processed from the original source.
The data does not have missing values that may seem to be important but it however is not up to date, meaning the file goes up to the year of 2023.
Results
Attendance Normalized to League Average (Normalized Faceted Plots)
To compare attendance across teams in both the NBA and WNBA, we normalized each team’s attendance as a percentage of the league average and used faceted plots to show these trends. Giving each team its own panel makes the patterns much easier to read and avoids the mess of overlapping lines. Normalizing also helps us look past differences in arena size or market strength so we can focus on actual fan engagement. These side-by-side panels highlight which teams consistently stay above or below league norms and how steady their support is over time, which sets up the loyalty analysis that follows.
How We Measure Fan Loyalty
Fan loyalty is measured by looking at how closely a team’s attendance moves with its performance. The idea is simple: if attendance only rises when a team starts winning, that fan base is more performance-dependent. But if attendance stays stable even during losing seasons, those fans are showing strong loyalty because they continue showing up regardless of results. Using win percentage and average attendance by year gives us a way to see whether fans are reacting to performance or supporting their team no matter what.
What Loyalty Looks Like in the Data
We can figure it out by comparing the game average attendance and the win percentage of that year. If the amount of both wins and attendance increases, we can assume that depending on how well the team is performing, more fans are going to come and watch. If the amount of attendance stays the same no matter the win percentage (flat line), we can assume that no matter the performance of the team the amount of fans coming to watch will remain the same.
The graphs below take 3 teams from both NBA and WNBA from each category of loyalty.
Plot 1 – NBA Example Teams (Lakers, Heat, Clippers)
Plot 2 – WNBA Example Teams (Sun, Fever, Aces)
What is the loyalty score for all of the basketball teams in US?
Team loyalty is based off of how much correlation there is between the fan attendance and the amount of wins the basketball team has. Some fans are more loyal than others, and such loyalty can come from things like basketball culture in the area, financial investments, or star players in the league like, LeBron James on the Lakers.
Conclusion
Overall, our analysis shows that loyalty in the NBA and WNBA looks very different when we focus on consistency instead of crowd size. Even though NBA games bring in more fans overall, WNBA fans tend to be more stable and supportive regardless of how well their team is performing. About 60% of WNBA teams show very loyal fan bases, compared to only 33% in the NBA. This means that WNBA attendance is less tied to winning streaks and more tied to long-term community support.
Looking at loyalty this way helps us understand the deeper story behind attendance numbers. High attendance doesn’t always mean a loyal fan base: sometimes it just reflects team performance, city size, or star players. By comparing attendance to win percentage, we were able to see which teams have fans who show up even during tough seasons. These patterns could help teams think about marketing, fan engagement, and long-term investment strategies. There’s more we could explore, including ticket pricing, arena capacity, or player movement, but our results already reveal a clear picture: loyalty is not just about how many fans fill the seats, but how consistently they show up.
Attribution
All members attributed equally.
Appendix
| Variable Name | Description |
|---|---|
| team_name | Name/Nickname of the NBA’s teams |
| year | Year of the games the team was playing |
| mean_attendace | The mean attendance of fans in the stadium for that year |
| wins | Amount of wins the team had that year |
| losses | Amount of losses the team had that year |
| total_games | Amount of total games for that year |
| win_pct | Win percentage. Amount of wins in a percentage representation. |
| Variable Name | Description |
|---|---|
| team_name | Name/Nickname of the NBA’s teams |
| correlation | Calculation of correlation between win percentage and mean attendance |
| loyalty_type | Label of loyalty to the team based on the correlation result |
| Variable Name | Description |
|---|---|
| date | The calendar date on which the game was played |
| opponent | The opposing team for that specific game |
| segment | Indicates the part of the season (ex. Regular season, Conference) |
| arena | The name of the venue where the game was played |
| location | The city and state of the arena |
| attendance | The official attendance figure reported for the game |
| home_team | The team hosting the game |
| year | The year extracted from the game date, used for grouping and trend analysis |
| Variable Name | Description |
|---|---|
| season | Year of the games the team was playing |
| team | Name/Nickname of the NBA’s teams |
| gp | Amount of total games for that year |
| w | Amount of wins the team had that year |
| l | Amount of losses the team had that year |
| win_percent | Win percentage. Amount of wins in a percentage representation. |
| mean_attendace | The mean attendance of fans in the stadium for that year |
| Variable Name | Description |
|---|---|
| team | Name/Nickname of the NBA’s teams |
| correlation | Correlation of win percent and attendance of the team |
| loyalty_type | Label of loyalty to the team based on the correlation result/ |
Code Appendix
Show code
# Load libraries and settings here
library(tidyverse)
library(here)
library(dplyr)
library(plotly)
knitr::opts_chunk$set(
warning = FALSE,
message = FALSE,
comment = "#>",
fig.path = "figs/", # Folder where rendered plots are saved
fig.width = 7.252, # Default plot width
fig.height = 4, # Default plot height
fig.retina = 3 # For better plot resolution
)
# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))
# Write code below here to load any data used in project
team_names <- read_csv(here::here('data_raw', 'team.csv'))
game_infos <- read_csv(here::here('data_raw','game_info.csv'))
game_more_info <- read_csv(here::here('data_raw', 'game.csv'))
wnba_teamstats <- read_csv(here::here('data_raw', 'TEAM_MASTER.csv'))
wnba <- read_csv(here::here('data_processed', 'wnba_attendance_alltime_clean.csv'))
library(lubridate)
# ---- NBA cleaning ----
team_names_clean <- team_names %>%
select(-("city":"year_founded"))
game_info_clean <- game_infos %>%
select(-game_time) %>%
mutate(game_date = ymd(game_date)) %>%
filter(game_date >= ymd("2000-01-02"))
game_more_info_clean <- game_more_info %>%
select(game_id, game_date, matchup_home, season_id, season_type, wl_away) %>%
mutate(game_date = ymd(game_date)) %>%
filter(game_date >= ymd("2000-01-02"))
nba_games_clean <- game_info_clean %>%
inner_join(game_more_info_clean, c("game_id", "game_date")) %>%
filter(season_type == "Regular Season") %>%
mutate(
matchup_home = sapply(
str_split(matchup_home, " vs\\. "),
`[`, 1
)
) %>%
left_join(
team_names_clean %>% select(abbreviation, nickname),
by = c("matchup_home" = "abbreviation")
) %>%
mutate(matchup_home = nickname) %>%
mutate(wl_home = ifelse(wl_away == "W", "L", "W")) %>%
rename(team_name = matchup_home, W_or_L = wl_home) %>%
select(-nickname, -game_id, -season_id, -wl_away) %>%
filter(!is.na(team_name))
nba_games_yearly_attendance_clean <- nba_games_clean %>%
mutate(year = year(game_date)) %>%
group_by(team_name, year) %>%
summarize(
mean_attendance = mean(attendance, na.rm = TRUE) / 10^3,
wins = sum(W_or_L == "W"),
losses = sum(W_or_L == "L"),
.groups = "drop"
) %>%
arrange(desc(mean_attendance)) %>%
mutate(
total_games = wins + losses,
win_pct = wins / total_games
)
loyalty_analysis_clean <- nba_games_yearly_attendance_clean %>%
group_by(team_name) %>%
summarize(
correlation = cor(win_pct, mean_attendance, use = "complete.obs"),
.groups = "drop"
) %>%
mutate(
loyalty_type = case_when(
correlation < 0.3 ~ "Very Loyal",
correlation < 0.5 ~ "Moderately Loyal",
correlation < 0.7 ~ "Performance-Sensitive",
TRUE ~ "Performance-Dependent"
),
loyalty_type = factor(
loyalty_type,
levels = c(
"Very Loyal",
"Moderately Loyal",
"Performance-Sensitive",
"Performance-Dependent"
)
)
)
# ---- WNBA cleaning ----
names(wnba_teamstats) <- tolower(names(wnba_teamstats))
wnba_teamstats_clean <- wnba_teamstats %>%
select(-("min":"season type")) %>%
mutate(team = str_to_title(team)) %>%
filter(season >= 2000)
wnba_yearly_attendance_clean <- wnba %>%
group_by(home_team, year) %>%
summarize(mean_attendance = mean(attendance, na.rm = TRUE), .groups = "drop")
wbna_stats_clean <- wnba_teamstats_clean %>%
left_join(
wnba_yearly_attendance_clean %>% select(home_team, year, mean_attendance),
by = c("season" = "year", "team" = "home_team")
) %>%
mutate(win_percent = w / gp)
wnba_loyalty_analysis_clean <- wbna_stats_clean %>%
filter(team != "San Antonio Silver Stars") %>%
group_by(team) %>%
summarize(
correlation = cor(win_percent, mean_attendance, use = "complete.obs"),
.groups = "drop"
) %>%
mutate(
loyalty_type = case_when(
correlation < 0.3 ~ "Very Loyal",
correlation < 0.5 ~ "Moderately Loyal",
TRUE ~ "Performance-Sensitive"
),
loyalty_type = factor(
loyalty_type,
levels = c("Very Loyal", "Moderately Loyal", "Performance-Sensitive")
)
)
write.csv(nba_games_yearly_attendance_clean, "/Users/evelinanaumovich/Desktop/EMSE/project-template 2/data_processed/nba_games_yearly_attendance_clean.csv")
write.csv(loyalty_analysis_clean, "/Users/evelinanaumovich/Desktop/EMSE/project-template 2/data_processed/loyalty_analysis_clean.csv")
write.csv(wnba_loyalty_analysis_clean, "/Users/evelinanaumovich/Desktop/EMSE/project-template 2/data_processed/wnba_loyalty_analysis_clean.csv")
write.csv(wbna_stats_clean, "/Users/evelinanaumovich/Desktop/EMSE/project-template 2/data_processed/wbna_stats_clean.csv")
# Using nba_games_yearly_attendance_clean from your earlier code
nba_attendance_norm <- nba_games_yearly_attendance_clean %>%
filter(year >= 2000, year <= 2021) %>%
group_by(year) %>%
mutate(
league_avg = mean(mean_attendance, na.rm = TRUE),
attendance_pct = mean_attendance / league_avg
) %>%
ungroup()
ggplot(nba_attendance_norm, aes(x = year, y = attendance_pct)) +
geom_line(color = "#1f78b4", linewidth = 0.8, alpha = 0.9) +
geom_hline(yintercept = 1, linetype = "dashed", color = "black", linewidth = 1) +
facet_wrap(~ team_name) +
scale_x_continuous(
breaks = seq(2000, 2020, 5),
labels = seq(2000, 2020, 5)
) +
labs(
title = "NBA Attendance as % of League Average (2000–2021)",
x = "Year",
y = "Attendance Relative to League Average"
) +
theme_minimal(base_size = 12) +
theme(
strip.text = element_text(size = 7, face = "bold"),
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
panel.grid.major = element_line(color = "grey85", linewidth = 0.2),
panel.grid.minor = element_blank()
)
attendance_team_year <- wnba %>%
filter(year >= 2000, year <= 2021) %>%
group_by(home_team, year) %>%
summarize(avg_attendance = mean(attendance, na.rm = TRUE), .groups = "drop")
league_year <- wnba %>%
filter(year >= 2000, year <= 2021) %>%
group_by(year) %>%
summarize(league_avg = mean(attendance, na.rm = TRUE), .groups = "drop")
attendance_norm <- attendance_team_year %>%
left_join(league_year, by = "year") %>%
mutate(attendance_pct = avg_attendance / league_avg)
ggplot(attendance_norm, aes(x = year, y = attendance_pct)) +
geom_line(color = "#1f78b4", linewidth = 0.8, alpha = 0.9) + # consistent line color
geom_hline(yintercept = 1, linetype = "dashed", color = "black", linewidth = 1) +
facet_wrap(~ home_team) +
scale_x_continuous(
breaks = seq(2000, 2020, 5),
labels = seq(2000, 2020, 5)
) +
labs(
title = "WNBA Attendance as % of League Average (2000–2021)",
x = "Year",
y = "Attendance Relative to League Average"
) +
theme_minimal(base_size = 12) +
theme(
strip.text = element_text(size = 7, face = "bold"), # smaller facet labels
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
panel.grid.major = element_line(color = "grey85", linewidth = 0.2),
panel.grid.minor = element_blank()
)
library(plotly) # You will need to load this package
nba_games_yearly_attendance_clean %>%
filter(team_name %in% c("Lakers", "Heat", "Clippers")) %>%
mutate(loyalty = case_when(
team_name == "Lakers" ~ "Very Loyal",
team_name == "Heat" ~ "Moderately Loyal",
team_name == "Clippers" ~ "Performance-Sensitive"
)) %>%
# 1. Create a custom text aesthetic for the tooltip
mutate(hover_text = paste(
"Team:", team_name, "<br>",
"Loyalty:", loyalty, "<br>",
"Win %:", round(win_pct, 3), "<br>",
"Attendance:", mean_attendance
)) %>%
ggplot(aes(x = win_pct, y = mean_attendance, color = loyalty, text = hover_text)) +
geom_point(size = 2, alpha = 0.8) +
# Remove all previous geom_text layers for static labels
geom_smooth(
aes(group = team_name), # Removed linetype aesthetic since color/text is enough
method = "lm", se = FALSE, size = 1.2
) +
scale_color_manual(values = c(
"Very Loyal" = "#1f78b4",
"Moderately Loyal" = "#33a02c",
"Performance-Sensitive" = "#e31a1c"
)) +
labs(
title = "Win Percentage vs. Attendance (NBA)",
x = "Win Percentage",
y = "Average Attendance (Thousands)",
color = "Fan Loyalty"
) +
theme_minimal() -> p # Save the ggplot object to 'p'
# 2. Convert the ggplot object to an interactive plotly object
ggplotly(p, tooltip = "text")
wbna_stats_clean %>%
filter(team %in% c("Connecticut Sun", "Indiana Fever", "Las Vegas Aces")) %>%
mutate(loyalty = case_when(
team == "Connecticut Sun" ~ "Very Loyal",
team == "Indiana Fever" ~ "Moderately Loyal",
team == "Las Vegas Aces" ~ "Performance-Sensitive"
)) %>%
# Create a custom text aesthetic for the tooltip
mutate(hover_text = paste(
"Team:", team, "<br>",
"Loyalty:", loyalty, "<br>",
"Win %:", round(win_percent, 3), "<br>",
"Attendance:", mean_attendance
)) %>%
ggplot(aes(x = win_percent, y = mean_attendance, color = loyalty, text = hover_text)) +
geom_point(size = 2, alpha = 0.8) +
# Static geom_text layer has been removed
geom_smooth(
aes(group = team), # Removed linetype aesthetic and show.legend = FALSE
method = "lm", se = FALSE, size = 1.2
) +
scale_color_manual(values = c(
"Very Loyal" = "#1f78b4",
"Moderately Loyal" = "#33a02c",
"Performance-Sensitive" = "#e31a1c"
)) +
labs(
title = "Win Percentage vs. Attendance (WNBA)",
x = "Win Percentage",
y = "Average Attendance",
color = "Fan Loyalty"
) +
theme_minimal() -> p # Save the ggplot object to 'p'
# Convert the ggplot object to an interactive plotly object
ggplotly(p, tooltip = "text")
loyalty_analysis_clean %>%
ggplot(aes(x = reorder(team_name, correlation),
y = correlation,
fill = loyalty_type)) +
geom_col() +
coord_flip() +
scale_fill_manual(
values = c(
"Very Loyal" = "#1f78b4",
"Moderately Loyal" = "#33a02c",
"Performance-Sensitive" = "#e31a1c"
),
name = "Fan Base Type"
) +
labs(
title = "NBA Fan Loyalty: Win % vs Attendance Correlation (2000-2021)",
x = NULL,
y = "Correlation"
) +
theme_minimal()
wnba_loyalty_analysis_clean %>%
ggplot(aes(x = reorder(team, correlation),
y = correlation,
fill = loyalty_type)) +
geom_col() +
coord_flip() +
scale_fill_manual(
values = c(
"Very Loyal" = "#1f78b4",
"Moderately Loyal" = "#33a02c",
"Performance-Sensitive" = "#e31a1c"
),
name = "Fan Base Type"
) +
labs(
title = "WNBA Fan Loyalty: Win % vs Attendance Correlation (2000-2021)",
x = NULL,
y = "Correlation"
) +
theme_minimal()