Comparisons

]

# Week 8: .fancy[Comparisons]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.6l-224 96c-14 6-21.8 21-18.7 35.8S16.8 160 32 160v8c0 13.3 10.7 24 24 24H456c13.3 0 24-10.7 24-24v-8c15.2 0 28.3-10.7 31.3-25.6s-4.8-29.9-18.7-35.8l-224-96c-8-3.4-17.2-3.4-25.2 0zM128 224H64V420.3c-.6 .3-1.2 .7-1.8 1.1l-48 32c-11.7 7.8-17 22.4-12.9 35.9S17.9 512 32 512H480c14.1 0 26.5-9.2 30.6-22.7s-1.1-28.1-12.9-35.9l-48-32c-.6-.4-1.2-.7-1.8-1.1V224H384V416H344V224H280V416H232V224H168V416H128V224zM256 64a32 32 0 1 1 0 64 32 32 0 1 1 0-64z"/></svg> EMSE 4572 / 6572: Exploratory Data Analysis
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M304 128a80 80 0 1 0 -160 0 80 80 0 1 0 160 0zM96 128a128 128 0 1 1 256 0A128 128 0 1 1 96 128zM49.3 464H398.7c-8.9-63.3-63.3-112-129-112H178.3c-65.7 0-120.1 48.7-129 112zM0 482.3C0 383.8 79.8 304 178.3 304h91.4C368.2 304 448 383.8 448 482.3c0 16.4-13.3 29.7-29.7 29.7H29.7C13.3 512 0 498.7 0 482.3z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M152 24c0-13.3-10.7-24-24-24s-24 10.7-24 24V64H64C28.7 64 0 92.7 0 128v16 48V448c0 35.3 28.7 64 64 64H384c35.3 0 64-28.7 64-64V192 144 128c0-35.3-28.7-64-64-64H344V24c0-13.3-10.7-24-24-24s-24 10.7-24 24V64H152V24zM48 192H400V448c0 8.8-7.2 16-16 16H64c-8.8 0-16-7.2-16-16V192z"/></svg> October 16, 2024

]

---

# .fancy[.blue[Tip of the week]]

# Shortcut keys

---

## 1) Quick shortcuts

Insert a `<-` operator:

- **Windows**: `ALT` + `-`
- **Mac**: `OPTION` + `-`

]

Insert a `%>%` operator:

- **Windows**: `CTRL` + `SHIFT` + `M`
- **Mac**: `COMMAND` + `SHIFT` + `M`

]

---

## 2) Edit multiple lines of code at once

1. Press and hold `ALT` (Windows) or `OPTION` (Mac)
2. Select multiple lines of code

https://twitter.com/i/status/995394452821721088

---

## "At the heart of quantitative reasoning is a single question: **Compared to what?**"

## -- Edward Tufte

]

---

## Today's data

``` r
college_all_ages <- read_csv(here('data', 'college_all_ages.csv'))
gapminder        <- read_csv(here('data', 'gapminder.csv'))
marathon         <- read_csv(here('data', 'marathon.csv'))
milk_production  <- read_csv(here('data', 'milk_production.csv'))
*internet_regions <- read_csv(here('data', 'internet_users_region.csv'))
```

## New packages

``` r
install.packages("ggrepel")
install.packages("ggridges")
```

---

# Week 8: .fancy[Comparisons]

## 1. Comparing to a reference
## 2. Comparing variables

## BREAK

## 3. Comparing distributions

---

# Week 8: .fancy[Comparisons]

## 1. .orange[Comparing to a reference]
## 2. Comparing variables

## BREAK

## 3. Comparing distributions

<!--
Comparing things to a reference line:
- Add a simple line
- diverging bars / lollipops,

- In any of these plots, adding a benchmark can be really useful
- Another way is to compare things to a **computed** benchmark, like the mean - diverging bars / lollipops
-->

---

## .center[For this section, we'll be using this data frame:]

``` r
gapminder_americas <- gapminder %>%
  filter(continent == "Americas", year == 2007) %>%
  mutate(country = fct_reorder(country, lifeExp))
```

]

---

## Use reference lines to add context to chart

]

]

---

## Or make zero the reference line

]

]

---

## How to add a reference line

Add horizontal line with `geom_hline()`

Add vertical line with `geom_vline()`

``` r
ggplot(gapminder_americas) +
  geom_point(
    aes(x = lifeExp, y = country),
    color = 'steelblue', size = 2.5) +
* geom_vline(
*   xintercept = mean(gapminder_americas$lifeExp),
*   color = 'red', linetype = 'dashed') +
  theme_minimal_vgrid() +
  labs(x = 'Life expectancy (years)',
       y = 'Country')
```

]]

]

---

## How to add a reference line

Add text with `annotate()`

``` r
ggplot(gapminder_americas) +
  geom_point(
    aes(x = lifeExp, y = country),
    color = 'steelblue', size = 2.5) +
  geom_vline(
    xintercept = mean(gapminder_americas$lifeExp),
    color = 'red', linetype = 'dashed') +
  annotate(
*   'text', x = 73.2, y = 'Puerto Rico',
*   color = 'red', hjust = 1,
*   label = 'Mean Life\nExpectancy') +
  theme_minimal_vgrid() +
  labs(x = 'Life expectancy (years)',
       y = 'Country')
```

]]

]]

---

## How to make zero the reference point

``` r
gapminder_diverging <- gapminder_americas %>%
    mutate(
        # Subtract the mean
*       lifeExp = lifeExp - mean(lifeExp),
        # Define the fill color
*       color = ifelse(lifeExp > 0, 'Above', 'Below'))
```

``` r
ggplot(gapminder_diverging) +
  geom_col(
*   aes(x = lifeExp, y = country, fill = color),
    width = 0.7, alpha = 0.8) +
  scale_fill_manual(
*   values = c('steelblue', 'red')) +
  theme_minimal_vgrid() +
  theme(legend.position = 'none') +
  labs(
    x = 'Country',
    y = 'Difference from mean life expectancy (years)')
```

]]

]

---

### Your turn - comparing to a reference

Make a ranking chart and either a) add a reference line or b) show the ranking as the _difference_ from a reference (e.g. the mean). Use any dataset you want in the "data" folder (examples below using `milk_production.csv`)

]

]

---

# Week 8: .fancy[Comparisons]

## 1. Comparing to a reference
## 2. .orange[Comparing variables]

## BREAK

## 3. Comparing distributions

<!--
Comparing categories with facets

Comparing two things (dodged bars, slope chart, dumbbell chart)

- dodged comparisons are fine, but really no more than 2 things.
- Finally, overlapping bars are great when you want to show when something exceeds a threshold. E.g. going over your budget.
- Using facets to break up 3-4 groups of 2 is okay.
- A better approach for multiple categories:
    - slope charts
    - dumbbell charts
-->

---

## Neither of these charts are great

]

]

---

## "Parallel Coordinates" plots work well

``` r
diamonds %>%
  count(clarity, cut) %>%
  ggplot(
    aes(x = clarity, y = n,
*       color = cut, group = cut)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(limits = c(0, 5100)) +
  theme_half_open(font_size = 18) +
  labs(y = "Count")
```

]]

]

---

## Consider facets for **comparing across categories**

``` r
diamonds %>%
  count(clarity, cut) %>%
  ggplot() +
  geom_col(aes(x = clarity, y = n),
           width = 0.7) +
* facet_wrap(vars(cut), nrow = 1) +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05))) +
  theme_minimal_hgrid(font_size = 16)
```

]

---

## Consider facets for **comparing across categories**

``` r
diamonds %>%
  count(clarity, cut) %>%
  mutate(n = n / 1000) %>%
  ggplot() +
  geom_col(aes(x = clarity, y = n),
           width = 0.7) +
* facet_wrap(vars(cut), ncol = 2) +
* coord_flip() +
  scale_y_continuous(
    expand = expansion(mult = c(0, 0.05))) +
* theme_minimal_vgrid(font_size = 16) +
  labs(y = "Count (thousands)")
```

]]

]

---

.right[<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> From [Financial Times](https://www.ft.com/coronavirus-latest)]

---

## When comparing across multiple categories, consider:

## Parallel coordinates charts

]

## Faceting

]

---

## .center[When comparing **only 2** things,<br>dodged bars are a good starting point]

``` r
milk_compare <- milk_production %>%
  filter(year %in% c(1970, 2017)) %>%
  mutate(state = fct_other(state,
    keep = c('California', 'Wisconsin'))) %>%
  group_by(year, state) %>%
  summarise(
    milk_produced = sum(milk_produced) / 10^9)
```

```
#> # A tibble: 6 × 3
#> # Groups:   year [2]
#>    year state      milk_produced
#>   <dbl> <fct>              <dbl>
#> 1  1970 California          9.46
#> 2  1970 Wisconsin          18.4 
#> 3  1970 Other              89.1 
#> 4  2017 California         39.8 
#> 5  2017 Wisconsin          30.3 
#> 6  2017 Other             145.
```

]]

]

---

## .center[When comparing **only 2** things,<br>dodged bars are a good starting point]

``` r
ggplot(milk_compare) +
  geom_col(
    aes(x = milk_produced, y = state,
        fill = as.factor(year)),
    width = 0.7, alpha = 0.8,
*   position = 'dodge') +
  scale_fill_manual(
      values = c('grey', 'steelblue'),
      guide  = guide_legend(reverse = TRUE)) +
  scale_x_continuous(
    expand = expansion(mult = c(0, 0.05))) +
  theme_minimal_vgrid() +
  labs(
    x = 'Milk produced (billion lbs)',
    y = NULL,
    fill = 'Year')
```

]]

]

---

## Avoid putting >2 categories in legend (if possible)

]

]

---

## .center[Or use facets to get rid of the legend!]

]

]

---

## "Bullet" charts are also effective for comparing **2** things

---

## How to make a "bullet" chart

``` r
milk_compare %>%
* pivot_wider(
*     names_from = year,
*     values_from =  milk_produced)
  ggplot() +
  geom_col(
    aes(x = `1970`, y = state, fill = '1970'),
*       width = 0.7) +
  geom_col(
    aes(x = `2017`, y = state, fill = '2017'),
*       width = 0.3) +
  scale_fill_manual(
    values = c('grey', 'black')) +
  scale_x_continuous(
      expand = expansion(mult = c(0, 0.05))) +
  theme_minimal_vgrid(font_size = 18) +
  labs(
    x = 'Milk produced (billion lbs)',
    y = NULL,
    fill = "Year")
```

]]

]]

---

## With **more than 2** things, dodged bars can get confusing

Still comparing 2 time periods, but across **10** categories

---

### Strategies for comparing 2 things across **more than 2 categories**

**Dodged bars 😢**

]

**Dumbbell bars 😄**

]

---

### Strategies for comparing 2 things across **more than 2 categories**

**Dodged bars 😿**

]

**Slope chart 😄**

]

---

Dumbbell charts highlight:

- Compare **magnitudes** across two periods / groups

]

Slope charts highlight:

- _Change_ in **rankings**
- Highlight individual categories

]

---

## How to make a **Dumbbell chart**

Create data frame for plotting

``` r
top10states <- milk_production %>%
    filter(year == 2017) %>%
    arrange(desc(milk_produced)) %>%
    slice(1:10)

milk_summary_dumbbell <- milk_production %>%
  filter(
    year %in% c(1970, 2017),
    state %in% top10states$state) %>%
  mutate(
    # Reorder state variables
    state = fct_reorder2(state,
      year, desc(milk_produced)),
    # Convert year to discrete variable
    year = as.factor(year),
    # Modify units
    milk_produced = milk_produced / 10^9)
```

]]

]]

---

## How to make a **Dumbbell chart**

Make lines (note the `group` variable)

``` r
ggplot(milk_summary_dumbbell,
*      aes(x = milk_produced, y = state)) +
* geom_line(aes(group = state),
            color = 'lightblue', size = 1)
```

]]

]]

---

## How to make a **Dumbbell chart**

Add points (note the `color` variable)

``` r
ggplot(milk_summary_dumbbell,
       aes(x = milk_produced, y = state)) +
  geom_line(aes(group = state),
            color = 'lightblue', size = 1) +
* geom_point(aes(color = year), size = 2.5)
```

]]

]]

---

## How to make a **Dumbbell chart**

Change the colors:

]]

]]

---

## How to make a **Dumbbell chart**

Adjust the theme and annotate

``` r
ggplot(milk_summary_dumbbell,
       aes(x = milk_produced, y = state)) +
  geom_line(aes(group = state),
            color = 'lightblue', size = 1) +
  geom_point(aes(color = year), size = 2.5) +
  scale_color_manual(
      values = c('lightblue', 'steelblue')) +
* theme_minimal_vgrid() +
  # Remove y axis line and tick marks
  theme(
    axis.line.y = element_blank(),
*   axis.ticks.y = element_blank()) +
* labs(x = 'Milk produced (billion lbs)',
*      y = 'State',
*      color = 'Year',
*      title = 'Top 10 milk producing states',
*      subtitle = '(1970 - 2017)')
```

]]

]]

---

Create data frame for plotting

``` r
top10states <- milk_production %>%
    filter(year == 2017) %>%
    arrange(desc(milk_produced)) %>%
    slice(1:10)

milk_summary_slope <- milk_production %>%
  filter(
    year %in% c(1970, 2017),
    state %in% top10states$state) %>%
  mutate(
    # Reorder state variables
    state = fct_reorder2(state,
      year, desc(milk_produced)),
    # Convert year to discrete variable
    year = as.factor(year),
    # Modify units
    milk_produced = milk_produced / 10^9,
*   # Define line color
*   lineColor = if_else(
*     state == 'California', 'CA', 'other'),
*   # Make labels
*   label = paste(state, ' (',
*                 round(milk_produced), ')'),
*   label_left = ifelse(year == 1970, label, NA),
*   label_right = ifelse(year == 2017, label, NA))
```

]]

## .center[How to make a<br>**Slope chart**]

]]

---

Create data frame for plotting

``` r
top10states <- milk_production %>%
    filter(year == 2017) %>%
    arrange(desc(milk_produced)) %>%
    slice(1:10)

]]

## .center[How to make a<br>**Slope chart**]

``` r
milk_summary_slope %>%
    select(state, year, milk_produced, label, lineColor)
```

```
#> # A tibble: 20 × 5
#>    state        year  milk_produced label                lineColor
#>    <fct>        <fct>         <dbl> <chr>                <chr>    
#>  1 New York     1970         10.3   New York  ( 10 )     other    
#>  2 Pennsylvania 1970          7.12  Pennsylvania  ( 7 )  other    
#>  3 Michigan     1970          4.60  Michigan  ( 5 )      other    
#>  4 Wisconsin    1970         18.4   Wisconsin  ( 18 )    other    
#>  5 Minnesota    1970          9.64  Minnesota  ( 10 )    other    
#>  6 Texas        1970          3.06  Texas  ( 3 )         other    
#>  7 Idaho        1970          1.49  Idaho  ( 1 )         other    
#>  8 New Mexico   1970          0.304 New Mexico  ( 0 )    other    
#>  9 Washington   1970          2.09  Washington  ( 2 )    other    
#> 10 California   1970          9.46  California  ( 9 )    CA       
#> 11 New York     2017         14.9   New York  ( 15 )     other    
#> 12 Pennsylvania 2017         10.9   Pennsylvania  ( 11 ) other    
#> 13 Michigan     2017         11.2   Michigan  ( 11 )     other    
#> 14 Wisconsin    2017         30.3   Wisconsin  ( 30 )    other    
#> 15 Minnesota    2017          9.86  Minnesota  ( 10 )    other    
#> 16 Texas        2017         12.1   Texas  ( 12 )        other    
#> 17 Idaho        2017         14.6   Idaho  ( 15 )        other    
#> 18 New Mexico   2017          8.21  New Mexico  ( 8 )    other    
#> 19 Washington   2017          6.53  Washington  ( 7 )    other    
#> 20 California   2017         39.8   California  ( 40 )   CA
```

]]

---

## How to make a **Slope chart**

Start with a line plot - note the `group` variable:

``` r
ggplot(milk_summary_slope,
*      aes(x = year, y = milk_produced,
*          group = state)) +
*   geom_line(aes(color = lineColor))
```

]]

]]

---

## How to make a **Slope chart**

Add labels:

``` r
ggplot(milk_summary_slope,
       aes(x = year, y = milk_produced,
           group = state)) +
    geom_line(aes(color = lineColor)) +
    # Add 1970 labels (left side)
*   geom_text(aes(label = label_left),
*             hjust = 1, nudge_x = -0.05) +
    # Add 2017 labels (right side)
*   geom_text(aes(label = label_right),
*             hjust = 0, nudge_x = 0.05)
```

Justification | `hjust`
--------------|-------
Left          | 0
Center        | 0.5
Right         | 1

]]

]]

---

## Overlapping labels?

]]

---

## Overlapping labels?<br>**ggrepel** library to the rescue!

]]

]

---

## How to make a **Slope chart**

Align labels so they don't overlap:

``` r
*library(ggrepel)

ggplot(milk_summary_slope,
       aes(x = year, y = milk_produced,
           group = state)) +
    geom_line(aes(color = lineColor)) +
    # Add 1970 labels (left side)
*   geom_text_repel(
      aes(label = label_left),
      hjust = 1, nudge_x = -0.05,
*     direction = 'y', segment.color = 'grey') +
    # Add 2017 labels (right side)
*   geom_text_repel(
      aes(label = label_right),
      hjust = 0, nudge_x = 0.05,
*     direction = 'y', segment.color = 'grey')
```

]]

]]

---

## How to make a **Slope chart**

Adjust colors:

``` r
ggplot(milk_summary_slope,
       aes(x = year, y = milk_produced,
           group = state)) +
    geom_line(aes(color = lineColor)) +
    geom_text_repel(
      aes(label = label_left),
      hjust = 1, nudge_x = -0.05,
      direction = 'y', segment.color = 'grey') +
    geom_text_repel(
      aes(label = label_right),
      hjust = 0, nudge_x = 0.05,
      direction = 'y', segment.color = 'grey') +
    # Move year labels to top, modify line colors
*   scale_x_discrete(position = 'top') +
*   scale_color_manual(values = c('red', 'black'))
```

]]

]]

---

Adjust the theme and annotate

``` r
ggplot(milk_summary_slope,
       aes(x = year, y = milk_produced,
           group = state)) +
    geom_line(aes(color = lineColor)) +
    # Add 1970 labels (left side)
    geom_text_repel(
      aes(label = label_left),
      hjust = 1, nudge_x = -0.05,
      direction = 'y', segment.color = 'grey') +
    # Add 2017 labels (right side)
    geom_text_repel(aes(label = label_right),
      hjust = 0, nudge_x = 0.05,
      direction = 'y', segment.color = 'grey') +
    # Move year labels to top, modify line colors
    scale_x_discrete(position = 'top') +
    scale_color_manual(values = c('red', 'black')) +
    # Annotate & adjust theme
    labs(x = NULL,
         y = 'Milk produced (billion lbs)',
         title = 'Top 10 milk producing states (1970 - 2017)') +
*   theme_minimal_grid() +
*   theme(panel.grid  = element_blank(),
*         axis.text.y = element_blank(),
*         axis.ticks = element_blank(),
*         legend.position = 'none')
```

]]

## .center[How to make a<br>**Slope chart**]

]]

---

## Your turn - comparing multiple categories

Using the `internet_regions` data frame, pick a strategy and create an improved version of this chart.

Strategies:

- Dodged bars
- Facets
- Bullet chart
- Dumbell chart
- Slope chart

]

]

---

# Break!

## Stand up, Move around, Stretch!

---

# Week 8: .fancy[Comparisons]

## 1. Comparing to a reference
## 2. Comparing variables

## BREAK

## 3. .orange[Comparing distributions]

<!-- Comparing distributions

- Box plots
- Transparent histograms & densities (good for maybe 2 categories)
- Ridgeline plots (good for lots of categories)

Ridgeline plots: https://wilkelab.org/ggridges/ -->

---

## Overlapping histograms have issues

### Bad

]

### Slightly better

]

---

## Good when number of categories is **small**

### Density facets

]

### Diverging histograms

]

---

## Good when number of categories is **large**

### Boxplot
<img src="figs/college_boxplot-1.png" width="504" style="display: block; margin: auto;" />

]

### Ridgeplot

]

---

## How to make density facets

You can use `facet_wrap()`, but<br>you won't get the full density overlay

``` r
ggplot(marathon,
       aes(x = Age, y = ..count..,
           fill = `M/F`)) +
*   geom_density(alpha = 0.7) +
*   facet_wrap(vars(`M/F`)) +
    scale_fill_manual(
        values = c('sienna', 'steelblue')) +
    scale_y_continuous(
        expand = expansion(mult = c(0, 0.05))) +
    theme_minimal_hgrid()
```

]]

]

]

---

## How to make density facets

Make the full density plot first

``` r
base <- ggplot(marathon,
*              aes(x = Age, y = ..count..)) +
*   geom_density(fill = 'grey', alpha = 0.7) +
    scale_y_continuous(
        expand = expansion(mult = c(0, 0.05))) +
    theme_minimal_hgrid()
```

]]

]

---

## How to make density facets

Separately create each sub-plot

``` r
male <- base +
  geom_density(
*   data = marathon %>%
*     filter(`M/F` == 'M'),
    fill = 'steelblue', alpha = 0.7) +
  theme(legend.position = 'none')
```

``` r
female <- base +
  geom_density(
*   data = marathon %>%
*     filter(`M/F` == 'F'),
    fill = 'sienna', alpha = 0.7) +
  theme(legend.position = 'none')
```

]]

]

---

## How to make density facets

``` r
plot_grid(male, female,
          labels = c('Male', 'Female'))
```

]

---

## How to make diverging histograms

Make the histograms by filtering the data

``` r
ggplot(marathon, aes(x = Age)) +
    # Add histogram for Female runners:
    geom_histogram(
*     data = marathon %>%
*       filter(`M/F` == 'F'),
      aes(fill = `M/F`, y=..count..),
      alpha = 0.7, color = 'white') +
    # Add negative histogram for Male runners:
    geom_histogram(
*     data = marathon %>%
*       filter(`M/F` == 'M'),
      aes(fill = `M/F`, y=..count..*(-1)),
      alpha = 0.7, color = 'white')
```

]]

]

---

## How to make diverging histograms

Make the histograms by filtering the data

``` r
ggplot(marathon, aes(x = Age)) +
    # Add histogram for Female runners:
    geom_histogram(
      data = marathon %>%
        filter(`M/F` == 'F'),
*     aes(fill = `M/F`, y=..count..),
      alpha = 0.7, color = 'white') +
    # Add negative histogram for Male runners:
    geom_histogram(
      data = marathon %>%
        filter(`M/F` == 'M'),
*     aes(fill = `M/F`, y=..count..*(-1)),
      alpha = 0.7, color = 'white')
```

]]

]

---

## How to make diverging histograms

``` r
ggplot(marathon, aes(x = Age)) +
    # Add histogram for Female runners:
    geom_histogram(
      data = marathon %>%
        filter(`M/F` == 'F'),
      aes(fill = `M/F`, y=..count..),
      alpha = 0.7, color = 'white') +
    # Add negative histogram for Male runners:
    geom_histogram(
      data = marathon %>%
        filter(`M/F` == 'M'),
      aes(fill = `M/F`, y=..count..*(-1)),
      alpha = 0.7, color = 'white')
*   scale_fill_manual(
        values = c('sienna', 'steelblue')) +
*   coord_flip() +
*   theme_minimal_hgrid() +
*   labs(fill = 'Gender',
*        y    = 'Count')
```

]]

]

---

## How to make ridgeplots

Make a ridgeplot with **ggridges** library

``` r
*library(ggridges)

college_all_ages %>%
  mutate(
    major_category = fct_reorder(
      major_category, median)) %>%
  ggplot() +
* geom_density_ridges(
*   aes(x = median, y = major_category),
*   scale = 4, alpha = 0.7) +
  scale_y_discrete(expand = c(0, 0)) +
  scale_x_continuous(expand = c(0, 0)) +
* coord_cartesian(clip = "off") +
* theme_ridges() +
  labs(x = 'Median income ($)',
       y = 'Major category')
```

]]

]

---

## Your turn - comparing distributions

Use the `gapminder.csv` data to create the following charts comparing the distribution of life expectancy across countries in continents in 2007.

]

]

---

# Reminder:<br>Your [Progress Report](https://eda.seas.gwu.edu/2024-Fall/project/2-progress-report.html) is due in 10 days!