<- smoking_drinking %>%
smoking_cleaned mutate(
BMI = weight / ((height / 100) ^ 2),
weight_category = case_when(
> 30 ~ "Obesity",
BMI > 25 ~ "Overweight",
BMI > 18.5 ~ "Healthy",
BMI TRUE ~ "Underweight" # Catch-all for values <= 18.5
),
smoking_status = case_when(
== "1" ~ "Never Smoke",
SMK_stat_type_cd == "2" ~ "Smoked But Quit",
SMK_stat_type_cd == "3" ~ "Still Smoke",),
SMK_stat_type_cd %>%
)select(!c(DRK_YN,SMK_stat_type_cd,urine_protein,hear_left,hear_right, triglyceride,hemoglobin,urine_protein,serum_creatinine,SGOT_AST,SGOT_ALT,gamma_GTP, height, weight, waistline))
The trends, spatial difference, and associated outcome of smoking
<- read.csv(here::here('data_raw', 'by_country.csv'), skip = 1) %>%
by_country clean_names() %>%
select(c(countries_territories_and_areas, year, both_sexes_1,male_1,female_1)) %>%
rename(
smoking_rate = both_sexes_1,
male_smoking_rate = male_1,
female_smoking_rate = female_1
%>%
) filter(!rowSums(is.na(.)) > 0)
<- by_states %>%
by_states clean_names() %>%
mutate(
smoking_rate = 1-never_smoked,
smoking_rate_new = smoke_everyday + smoke_some_days
)
1. Introduction
Smoking remains a pervasive global health concern with significant implications for public well-being. According to CDC, cigarette smoking is the leading cause of preventable death in the United States, which causes more than 480,000 deaths each year in the United States. It is proven to increase the risks of lung cancer, heart disease, and strokes. As shown in the photo, there is a tragic change in the lungs after smoking. Besides, there are multiple toxic chemicals generated in the life cycle of cigarettes and associated products, such as nicotine, and butane. In this context, we are curious about whether the smoking rate has decreased as the public awareness of the risks brought by smoking cigarettes. In addition, are there significant differences in smoking rates all around the world, and across the US? At last, we would like to know if there are other associated health outcomes with smoking status.Understanding these patterns is crucial for informing public health initiatives and guiding efforts to mitigate the adverse effects associated with smoking.
With our survey, we found that the trend of smoking rate went down all over the world. In addition, the data in the US also shows a declining smoking rate all across the US. Some top states with highest smoking rate over the years are Kentucky, West Virginia, and Nevada.
There is a strong relation between systolic blood pressure (SBP) and diastolic blood pressure (DBP), as well as total cholesterol (total_chole) and low-density lipoprotein cholesterol (LDL_chole). Overall, within the dataset, there have been significantly elevated risks in overweight, blood pressure problems, and blood glucose problems in smoking groups (smoke but quit, and still smoke). Surprisingly, the still smoke group shows a better performance in those metrics than the smoke but quit group.
2. Research question
1.What countries have higher smoking rates? What regions in the US have higher tobacco consumption?
2.Does smoking rate increase or decrease over time?
3.Are those measurable health outcomes correlated with each other?
4.What measurable health outcomes are correlated with different types of smoking statuses?
3. Data sources
There are three data sets involved in our research.
The first data set is smoking_driking_dataset_Ver01.csv. It is used to explore: What measurable health outcomes are correlated with different types of smoking statuses? Are those measurable health outcomes correlated with each other? We downloaded this data from this link. This dataset is already pre-cleaned by a Kaggle user.There are 24 columns, 991346 rows of data. The original source is this link. It is collected by the National Health Insurance Service in Korea.
The second data is CDC_US_v2.csv. It is used to explore: What countries have higher smoking rates? Does smoking rate increase or decrease over time? We downloaded from this linkThis data is pre-cleaned by the github owner. This data is originally from this link. This data was collected by the CDC and describes the frequency of smoking rates of different countries.
The third data is by_country.csv. It is used to explore: what regions in the US have higher tobacco consumption? Does smoking rate increase or decrease over time? This data set is originally from this link. We cleaned this data by ourselves. This data is collected by WHO. It shows Age-standardized estimates of current tobacco use, tobacco smoking and cigarette smoking data by country. This data set contains 1478 rows and 11 columns. This estimates uses a statistical model based on a Bayesian negative binomial meta-regression is used to model prevalence of current tobacco use for each country, separately for men and women. The model has two main components: (a) adjusting for missing indicators and age groups, and (b) generating an estimate of trends over time as well as the 95% credible interval around the estimate.
4. Results
Does smoking rate increase or decrease over time? What countries have higher smoking rates?
<- by_country %>%
top10countries filter(year==2000) %>%
arrange(desc(smoking_rate)) %>%
slice(1:10)
<- by_country %>%
bottom10countries filter(year==2000) %>%
arrange(smoking_rate) %>%
slice(1:10)
There are 164 countries/territories/areas in this cleaned data sets.The estimates are in the year of 2000, 2005, 2010, 2018, 2019, 2020, 2023, and 2025. We decide to compare the year of 2000 and 2023 using a slope chart.
<- by_country %>%
by_country_slope filter(
%in% c(2000, 2023),
year %in% top10countries$countries_territories_and_areas) %>%
countries_territories_and_areas mutate(
# Reorder state variables
countries_territories_and_areas = fct_reorder2(countries_territories_and_areas,
desc(smoking_rate)),
year, # Convert year to discrete variable
year = as.factor(year),
# Make labels
label = paste(countries_territories_and_areas, ' (',
round(smoking_rate), ')'),
label_left = ifelse(year == 2000, label, NA),
label_right = ifelse(year == 2023, label, NA))
The overall trend of smoking rate in top 10 countries goes down by 20%. The top three smoking countries in 2000 are Kiribati (68%), Nauru (57%), and Greece (55%).The top three smoking countries in 2023 are Nauru(42%), Serbia(39%), and Bulgaria(38%).
ggplot(by_country_slope,
aes(
x = year,
y = smoking_rate,
group = countries_territories_and_areas)) +
geom_line(size=0.8)+
# Add 2000 labels (left side)
geom_text_repel(
aes(label = label_left),
hjust = 1, nudge_x = -0.05,
direction = 'y', segment.color = 'grey') +
# Add 2023 labels (right side)
geom_text_repel(aes(label = label_right),
hjust = 0, nudge_x = 0.05,
direction = 'y', segment.color = 'grey') +
scale_x_discrete(position = 'top') +
scale_color_manual(values = c('black')) +
# Annotate & adjust theme
labs(x = NULL,
y = 'Smoking rate (%) ',
title = 'Top 10 smoking countries (2000 - 2023)') +
theme_minimal_grid() +
theme(panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = 'none')
The overall trend goes down by 1-6% for the bottom 10 countries, except Oman has a slight increase in smoking rate. Ethiopia has the lowest smoking rate at 5% in 2000. Ghana has the lowest smoking rate at 2% in 2023.
<- by_country %>%
by_country_slope_bottom filter(
%in% c(2000, 2023),
year %in% bottom10countries$countries_territories_and_areas) %>%
countries_territories_and_areas mutate(
# Reorder state variables
countries_territories_and_areas = fct_reorder2(countries_territories_and_areas,
desc(smoking_rate)),
year, # Convert year to discrete variable
year = as.factor(year),
# Define line color
lineColor = if_else(
== 'Oman', 'OMAN', 'other'),
countries_territories_and_areas # Make labels
label = paste(countries_territories_and_areas, ' (',
round(smoking_rate), ')'),
label_left = ifelse(year == 2000, label, NA),
label_right = ifelse(year == 2023, label, NA))
ggplot(by_country_slope_bottom,
aes(
x = year,
y = smoking_rate,
group = countries_territories_and_areas)) +
geom_line(aes(color = lineColor),
size=0.8)+
# Add 2000 labels (left side)
geom_text_repel(
aes(label = label_left),
hjust = 1, nudge_x = -0.05,
direction = 'y', segment.color = 'grey') +
# Add 2023 labels (right side)
geom_text_repel(aes(label = label_right),
hjust = 0, nudge_x = 0.05,
direction = 'y', segment.color = 'grey') +
scale_x_discrete(position = 'top') +
scale_color_manual(values = c('red', 'black')) +
# Annotate & adjust theme
labs(x = NULL,
y = 'Smoking rate (%) ',
title = '10 countries with lowest smoking rates (2000 - 2023)') +
theme_minimal_grid() +
theme(panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = 'none')
What is the trends over time? What states have higher smoking rate over time?
This data set ranges from 1995 to 2010. During the 15 years, the overall trends of smoking goes down. Kentucky has the highest smoking rate in most years(0.25 to 0.33). West Virginia has an interesting trend which goes down and rises again in 15 years, showing almost no trends of going down. Nevada is the highest one in 1999 at 31 %. It should be noted that the smoking rate is the total of everyday smoking rate and some day smoking rate, without the presence of the smoked but quit.
<- by_states %>%
by_states_formatted_new group_by(year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(new_rank = rank(-smoking_rate_new, ties.method = "first"),
Value_lbl = paste0(" ",smoking_rate_new)) %>%
group_by(state) %>%
filter(new_rank <=10) %>%
ungroup()
<- by_states_formatted_new %>%
by_states_anim_new mutate(year = as.integer(year)) %>%
ggplot(aes(x = new_rank, group = state,fill = state)) +
geom_tile(aes(y = smoking_rate_new / 2,
height = smoking_rate_new),
width = 0.5, alpha = 0.8, color = NA) +
geom_text(aes(y=smoking_rate_new,label = Value_lbl, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
geom_text(aes(y = 0, label = paste(state, " ")),
vjust = 0.2, hjust = 1) +
coord_flip(clip = 'off', expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
scale_x_reverse() +
# scale_x_continuous( limits = c(0, 0.6),)+
guides(color = FALSE) +
theme_minimal_vgrid() +
theme(
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.position = "none",
legend.background = element_rect(fill = 'white'),
plot.title = element_text(
size = 22, hjust = 0.5, face = 'bold',
colour = 'grey', vjust = -1),
plot.subtitle = element_text(
size = 18, hjust = 0.5,
face = 'italic', color = 'grey'),
plot.caption = element_text(
size = 8, hjust = 0.5,
face = 'italic', color = 'grey'),
plot.margin = margin(0.5, 2, 0.5, 3, 'cm')) +
transition_time(year) +
view_follow(fixed_x = TRUE) +
labs(title = 'Year : {frame_time}',
subtitle = 'Top 10 states by smoking rate')
animate(by_states_anim_new, duration = 15, end_pause = 15,
width = 800, height = 700, res = 150,fps = 20,
renderer = magick_renderer())
5. Conclusions
Firstly, we found that the trend of smoking rate went down all over the world. In addition, the data in the US also shows a declining smoking rate all across the US. Some top states with highest smoking rate over the years are Kentucky, West Virginia, and Nevada. Within the south Korea health survey dataset, which shows a strong relation between systolic blood pressure (SBP) and diastolic blood pressure (DBP), as well as total cholesterol (total_chole) and low-density lipoprotein cholesterol (LDL_chole). At last, within the dataset, there have been significantly elevated risks in overweight, blood pressure problems, and blood glucose problems in smoking groups (smoke but quit, and still smoke). Surprisingly, the still smoke group shows a better performance in those metrics than the smoke but quit group.
In this study, the smoking status is only divided into three groups, which might be improved by reporting the years, or how frequent smoking is for each person. We could also further introduce the genders to see if there are any specific patterns for the associated health outcomes and trends. The gender-specific smoking rate/ smoking status is available in the by_country dataset and the South Korea Health Survey Dataset.
6. Attribution
Kejia and Jiaxin contributed equally in this project.
# Load libraries and settings here
library(knitr)
library(cowplot)
library(viridis)
library(readxl)
library(maps)
library(sf)
library(rnaturalearth)
library(rnaturalearthdata)
library(rnaturalearthhires)
library(janitor)
library(tidyverse)
library(here)
library(ggrepel)
library(ggridges)
library(HistData)
library(GGally)
library(palmerpenguins)
library(tidyverse)
library(patchwork)
library(viridis)
library(gganimate)
library(magick)
::opts_chunk$set(
knitrwarning = FALSE,
message = FALSE,
comment = "#>",
fig.path = "figs/", # Folder where rendered plots are saved
fig.width = 7.252, # Default plot width
fig.height = 4, # Default plot height
fig.retina = 3 # For better plot resolution
)
# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))
# Write code below here to load any data used in project
<-read.csv(here::here('data_raw',"smoking_driking_dataset_Ver01.csv"))
smoking_drinking<-read.csv(here::here('data_raw',"CDC_US_v2.csv"))
by_states
<- smoking_drinking %>%
smoking_cleaned mutate(
BMI = weight / ((height / 100) ^ 2),
weight_category = case_when(
> 30 ~ "Obesity",
BMI > 25 ~ "Overweight",
BMI > 18.5 ~ "Healthy",
BMI TRUE ~ "Underweight" # Catch-all for values <= 18.5
),
smoking_status = case_when(
== "1" ~ "Never Smoke",
SMK_stat_type_cd == "2" ~ "Smoked But Quit",
SMK_stat_type_cd == "3" ~ "Still Smoke",),
SMK_stat_type_cd %>%
)select(!c(DRK_YN,SMK_stat_type_cd,urine_protein,hear_left,hear_right, triglyceride,hemoglobin,urine_protein,serum_creatinine,SGOT_AST,SGOT_ALT,gamma_GTP, height, weight, waistline))
<- read.csv(here::here('data_raw', 'by_country.csv'), skip = 1) %>%
by_country clean_names() %>%
select(c(countries_territories_and_areas, year, both_sexes_1,male_1,female_1)) %>%
rename(
smoking_rate = both_sexes_1,
male_smoking_rate = male_1,
female_smoking_rate = female_1
%>%
) filter(!rowSums(is.na(.)) > 0)
<- by_states %>%
by_states clean_names() %>%
mutate(
smoking_rate = 1-never_smoked,
smoking_rate_new = smoke_everyday + smoke_some_days
)<- by_country %>%
top10countries filter(year==2000) %>%
arrange(desc(smoking_rate)) %>%
slice(1:10)
<- by_country %>%
bottom10countries filter(year==2000) %>%
arrange(smoking_rate) %>%
slice(1:10)
<- by_country %>%
by_country_slope filter(
%in% c(2000, 2023),
year %in% top10countries$countries_territories_and_areas) %>%
countries_territories_and_areas mutate(
# Reorder state variables
countries_territories_and_areas = fct_reorder2(countries_territories_and_areas,
desc(smoking_rate)),
year, # Convert year to discrete variable
year = as.factor(year),
# Make labels
label = paste(countries_territories_and_areas, ' (',
round(smoking_rate), ')'),
label_left = ifelse(year == 2000, label, NA),
label_right = ifelse(year == 2023, label, NA))
ggplot(by_country_slope,
aes(
x = year,
y = smoking_rate,
group = countries_territories_and_areas)) +
geom_line(size=0.8)+
# Add 2000 labels (left side)
geom_text_repel(
aes(label = label_left),
hjust = 1, nudge_x = -0.05,
direction = 'y', segment.color = 'grey') +
# Add 2023 labels (right side)
geom_text_repel(aes(label = label_right),
hjust = 0, nudge_x = 0.05,
direction = 'y', segment.color = 'grey') +
scale_x_discrete(position = 'top') +
scale_color_manual(values = c('black')) +
# Annotate & adjust theme
labs(x = NULL,
y = 'Smoking rate (%) ',
title = 'Top 10 smoking countries (2000 - 2023)') +
theme_minimal_grid() +
theme(panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = 'none')
<- by_country %>%
by_country_slope_bottom filter(
%in% c(2000, 2023),
year %in% bottom10countries$countries_territories_and_areas) %>%
countries_territories_and_areas mutate(
# Reorder state variables
countries_territories_and_areas = fct_reorder2(countries_territories_and_areas,
desc(smoking_rate)),
year, # Convert year to discrete variable
year = as.factor(year),
# Define line color
lineColor = if_else(
== 'Oman', 'OMAN', 'other'),
countries_territories_and_areas # Make labels
label = paste(countries_territories_and_areas, ' (',
round(smoking_rate), ')'),
label_left = ifelse(year == 2000, label, NA),
label_right = ifelse(year == 2023, label, NA))
ggplot(by_country_slope_bottom,
aes(
x = year,
y = smoking_rate,
group = countries_territories_and_areas)) +
geom_line(aes(color = lineColor),
size=0.8)+
# Add 2000 labels (left side)
geom_text_repel(
aes(label = label_left),
hjust = 1, nudge_x = -0.05,
direction = 'y', segment.color = 'grey') +
# Add 2023 labels (right side)
geom_text_repel(aes(label = label_right),
hjust = 0, nudge_x = 0.05,
direction = 'y', segment.color = 'grey') +
scale_x_discrete(position = 'top') +
scale_color_manual(values = c('red', 'black')) +
# Annotate & adjust theme
labs(x = NULL,
y = 'Smoking rate (%) ',
title = '10 countries with lowest smoking rates (2000 - 2023)') +
theme_minimal_grid() +
theme(panel.grid = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = 'none')
<- by_states %>%
by_states_formatted_new group_by(year) %>%
# The * 1 makes it possible to have non-integer ranks while sliding
mutate(new_rank = rank(-smoking_rate_new, ties.method = "first"),
Value_lbl = paste0(" ",smoking_rate_new)) %>%
group_by(state) %>%
filter(new_rank <=10) %>%
ungroup()
<- by_states_formatted_new %>%
by_states_anim_new mutate(year = as.integer(year)) %>%
ggplot(aes(x = new_rank, group = state,fill = state)) +
geom_tile(aes(y = smoking_rate_new / 2,
height = smoking_rate_new),
width = 0.5, alpha = 0.8, color = NA) +
geom_text(aes(y=smoking_rate_new,label = Value_lbl, hjust=0)) +
coord_flip(clip = "off", expand = FALSE) +
geom_text(aes(y = 0, label = paste(state, " ")),
vjust = 0.2, hjust = 1) +
coord_flip(clip = 'off', expand = FALSE) +
scale_y_continuous(labels = scales::comma) +
scale_fill_viridis(discrete = TRUE) +
scale_color_viridis(discrete = TRUE) +
scale_x_reverse() +
# scale_x_continuous( limits = c(0, 0.6),)+
guides(color = FALSE) +
theme_minimal_vgrid() +
theme(
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
legend.position = "none",
legend.background = element_rect(fill = 'white'),
plot.title = element_text(
size = 22, hjust = 0.5, face = 'bold',
colour = 'grey', vjust = -1),
plot.subtitle = element_text(
size = 18, hjust = 0.5,
face = 'italic', color = 'grey'),
plot.caption = element_text(
size = 8, hjust = 0.5,
face = 'italic', color = 'grey'),
plot.margin = margin(0.5, 2, 0.5, 3, 'cm')) +
transition_time(year) +
view_follow(fixed_x = TRUE) +
labs(title = 'Year : {frame_time}',
subtitle = 'Top 10 states by smoking rate')
animate(by_states_anim_new, duration = 15, end_pause = 15,
width = 800, height = 700, res = 150,fps = 20,
renderer = magick_renderer())
ggcorr(smoking_cleaned, label = TRUE) +
ggtitle("Correlation Plot of Variables") + # Adding a title
theme_minimal() + # Applying a minimal theme (you can change the theme as desired)
theme(axis.text.x = element_text(angle = 45, hjust = 1))
%>%
smoking_cleaned ggplot() +
geom_boxplot(aes(x = BMI, y=smoking_status),outlier.shape = NA, size=1) +
labs(x = 'Body Mass Index', y = NULL) +
geom_vline(
xintercept = c(19,25),
color = 'red',
linetype = 'dashed',
size = 1.5,
+
)scale_x_continuous(
# Set the lower & upper boundaries
limits = c(15, 30),
# Explicitly set the break points
# breaks = c(break1, break2, etc.)
# # Adjust the axis so bars start at 0
expand = expand_scale(mult = c(0, 0.05)))+
labs(
title = 'Higher overweight risks in smoking groups'
+
)theme(
# Adjusting font sizes for various text elements
plot.title = element_text(size = 16), # Change plot title font size
axis.title.x = element_text(size = 14), # Change X-axis label font size
axis.title.y = element_text(size = 14), # Change Y-axis label font size
axis.text = element_text(size = 12) # Change axis tick label font size
# Add other theme settings as needed
)%>%
smoking_cleaned ggplot() +
geom_boxplot(aes(x = SBP, y=smoking_status),outlier.shape = NA, size=1) +
labs(x = 'Systolic blood pressure (mm Hg)', y = NULL) +
geom_vline(
xintercept = 130,
color = 'red',
linetype = 'dashed',
size = 1.5)+
scale_x_continuous(
# Set the lower & upper boundaries
limits = c(75, 175),
# Explicitly set the break points
# breaks = c(break1, break2, etc.)
# # Adjust the axis so bars start at 0
expand = expand_scale(mult = c(0, 0.05)))+
labs(
title = 'Elevated systolic blood pressure in smoking groups'
+
)theme(
# Adjusting font sizes for various text elements
plot.title = element_text(size = 16), # Change plot title font size
axis.title.x = element_text(size = 14), # Change X-axis label font size
axis.title.y = element_text(size = 14), # Change Y-axis label font size
axis.text = element_text(size = 12) # Change axis tick label font size
# Add other theme settings as needed
)
%>%
smoking_cleaned ggplot() +
geom_boxplot(aes(x = DBP, y=smoking_status),outlier.shape = NA, size=1) +
labs(x = 'Diastolic blood pressure(mm Hg)', y = NULL) +
geom_vline(
xintercept = 80,
color = 'red',
linetype = 'dashed',
size = 1.5)+
scale_x_continuous(
# Set the lower & upper boundaries
limits = c(40, 120),
# Explicitly set the break points
# breaks = c(break1, break2, etc.)
# # Adjust the axis so bars start at 0
expand = expand_scale(mult = c(0, 0.05)))+
labs(
title = 'Elevated diastolic blood pressure in smoking groups'
+
)theme(
# Adjusting font sizes for various text elements
plot.title = element_text(size = 16), # Change plot title font size
axis.title.x = element_text(size = 14), # Change X-axis label font size
axis.title.y = element_text(size = 14), # Change Y-axis label font size
axis.text = element_text(size = 12) # Change axis tick label font size
# Add other theme settings as needed
)
%>%
smoking_cleaned ggplot() +
geom_boxplot(aes(x = BLDS, y = smoking_status),outlier.shape = NA, size=1) +
labs(x = 'Fasting blood glucose [mg/dL]', y = NULL) +
geom_vline(
xintercept = c(79, 110),
color = 'red',
linetype = 'dashed',
size = 1.5)+
scale_x_continuous(
# Set the lower & upper boundaries
limits = c(50, 140),
# Explicitly set the break points
# breaks = c(break1, break2, etc.)
# # Adjust the axis so bars start at 0
expand = expand_scale(mult = c(0, 0.05)))+
#theme_minimal_vgrid(font_size = 18) +
labs(
title = 'Elevated blood glucose in smoking groups'
+
)theme(
# Adjusting font sizes for various text elements
plot.title = element_text(size = 16), # Change plot title font size
axis.title.x = element_text(size = 14), # Change X-axis label font size
axis.title.y = element_text(size = 14), # Change Y-axis label font size
axis.text = element_text(size = 12) # Change axis tick label font size
# Add other theme settings as needed
)%>%
smoking_cleaned ggplot() +
geom_boxplot(aes(x = tot_chole, y=smoking_status),outlier.shape = NA, size=1) +
labs(x = 'Total cholesterol [mg/dL]', y = NULL) +
geom_vline(
xintercept = 239,
color = 'red',
linetype = 'dashed',
size = 1.5)+
scale_x_continuous(
# Set the lower & upper boundaries
limits = c(50, 320),
# Explicitly set the break points
# breaks = c(break1, break2, etc.)
# # Adjust the axis so bars start at 0
expand = expand_scale(mult = c(0, 0.05)))+
labs(
title = 'No significant changes in total cholesterol'
+
)theme(
# Adjusting font sizes for various text elements
plot.title = element_text(size = 16), # Change plot title font size
axis.title.x = element_text(size = 14), # Change X-axis label font size
axis.title.y = element_text(size = 14), # Change Y-axis label font size
axis.text = element_text(size = 12) # Change axis tick label font size
# Add other theme settings as needed
)%>%
smoking_cleaned ggplot() +
geom_boxplot(aes(x = LDL_chole, y=smoking_status),outlier.shape = NA, size=1) +
labs(x = 'LDL cholesterol [mg/dL]', y = NULL) +
geom_vline(
xintercept = 160,
color = 'red',
linetype = 'dashed',
size = 1.5,
+
)scale_x_continuous(
# Set the lower & upper boundaries
limits = c(0, 220),
# Explicitly set the break points
# breaks = c(break1, break2, etc.)
# # Adjust the axis so bars start at 0
expand = expand_scale(mult = c(0, 0.05)))+
labs(
title = 'No significant changes in LDL cholesterol'
+
)theme(
# Adjusting font sizes for various text elements
plot.title = element_text(size = 16), # Change plot title font size
axis.title.x = element_text(size = 14), # Change X-axis label font size
axis.title.y = element_text(size = 14), # Change Y-axis label font size
axis.text = element_text(size = 12) # Change axis tick label font size
# Add other theme settings as needed
)