Getting Started

]

# Week 1: .fancy[Getting Started]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.587C251.4-.8625 260.6-.8625 268.6 2.587L492.6 98.59C506.6 104.6 514.4 119.6 511.3 134.4C508.3 149.3 495.2 159.1 479.1 160V168C479.1 181.3 469.3 192 455.1 192H55.1C42.74 192 31.1 181.3 31.1 168V160C16.81 159.1 3.708 149.3 .6528 134.4C-2.402 119.6 5.429 104.6 19.39 98.59L243.4 2.587zM256 128C273.7 128 288 113.7 288 96C288 78.33 273.7 64 256 64C238.3 64 224 78.33 224 96C224 113.7 238.3 128 256 128zM127.1 416H167.1V224H231.1V416H280V224H344V416H384V224H448V420.3C448.6 420.6 449.2 420.1 449.8 421.4L497.8 453.4C509.5 461.2 514.7 475.8 510.6 489.3C506.5 502.8 494.1 512 480 512H31.1C17.9 512 5.458 502.8 1.372 489.3C-2.715 475.8 2.515 461.2 14.25 453.4L62.25 421.4C62.82 420.1 63.41 420.6 63.1 420.3V224H127.1V416z"/></svg> EMSE 4575: Exploratory Data Analysis
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M224 256c70.7 0 128-57.31 128-128s-57.3-128-128-128C153.3 0 96 57.31 96 128S153.3 256 224 256zM274.7 304H173.3C77.61 304 0 381.6 0 477.3c0 19.14 15.52 34.67 34.66 34.67h378.7C432.5 512 448 496.5 448 477.3C448 381.6 370.4 304 274.7 304z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M96 32C96 14.33 110.3 0 128 0C145.7 0 160 14.33 160 32V64H288V32C288 14.33 302.3 0 320 0C337.7 0 352 14.33 352 32V64H400C426.5 64 448 85.49 448 112V160H0V112C0 85.49 21.49 64 48 64H96V32zM448 464C448 490.5 426.5 512 400 512H48C21.49 512 0 490.5 0 464V192H448V464z"/></svg> August 31, 2022

]

---

# Week 1: .fancy[Getting Started]

## 1. Course Goal
## 2. Course Introduction
## 3. Break: Install Stuff
## 4. Workflow & Reading In Data
## 5. Wrangling Data
## 6. Visualizing Data

---

# Week 1: .fancy[Getting Started]

## 1. .orange[Course Goal]
## 2. Course Introduction
## 3. Break: Install Stuff
## 4. Workflow & Reading In Data
## 5. Wrangling Data
## 6. Visualizing Data

---

## Course 1: [Intro to Programming for Analytics](https://p4a.seas.gwu.edu/)

**"Computational Literacy"**

- Programming: Conditionals (if/else), loops, functions, testing, data types.
- Analytics: Data structures, import / export, basic data manipulation & visualization.

## Course 2: [Exploratory Data Analysis](https://eda.seas.gwu.edu/)

**"Data Literacy"**

- Strategies for conducting an exploratory data analysis.
- Design principles for visualizing and communicating _information_ extracted from data.
- Reproducibility: Reports that contain code, equations, visualizations, and narrative text.

---

# **Class goal**: translate _data_ into _information_

---

# **Class goal**: translate _data_ into _information_

**Data**

Average student engagement scores

Class       | Type | City | County
 ------------|-------------|------|-------
 Special Ed. | Charter     | 643  | 793
 Special Ed. | Public      | 735  | 928
 General Ed. | Charter     | 590  | 724
 General Ed. | Public      | 863  | 662

]

**Information**

]

---

# Data exploration: an iterative process

Encode data:

```r
engagement_data <- data.frame(
    City   = c(643, 735, 590, 863),
    County = c(793, 928, 724, 662),
    School = c('Special Ed., Charter', 'Special Ed., Public',
               'General Ed., Charter', 'General Ed., Public'))
engagement_data
```

```
#>   City County               School
#> 1  643    793 Special Ed., Charter
#> 2  735    928  Special Ed., Public
#> 3  590    724 General Ed., Charter
#> 4  863    662  General Ed., Public
```

]]

Re-format data for plotting:

```r
engagement_data <- engagement_data %>%
    gather(Location, Engagement, City:County) %>%
    mutate(Location = fct_relevel(
      Location, c('City', 'County')))
engagement_data
```

```
#>                 School Location Engagement
#> 1 Special Ed., Charter     City        643
#> 2  Special Ed., Public     City        735
#> 3 General Ed., Charter     City        590
#> 4  General Ed., Public     City        863
#> 5 Special Ed., Charter   County        793
#> 6  Special Ed., Public   County        928
#> 7 General Ed., Charter   County        724
#> 8  General Ed., Public   County        662
```

]]

---

# Data exploration: an iterative process

Initial exploratory plotting:

```r
engagement_data %>%
    ggplot() +
    geom_col(aes(x = Engagement, y = School,
                 fill = Location),
             position = 'dodge')
```

]]

More exploratory plotting:<br>highlight difference

]

---

# Data exploration: an iterative process

Directly label figure:

]

Remove unnecessary axes, change colors, fix labels:

]

---

**A fully reproducible analysis**

```r
data <- data.frame(
    City   = c(643, 735, 590, 863),
    County = c(793, 928, 724, 662),
    School = c('Special Ed., Charter', 'Special Ed., Public',
               'General Ed., Charter', 'General Ed., Public'),
    Highlight = c(0, 0, 0, 1)) %>%
    gather(Location, Engagement, City:County) %>%
    mutate(
      Location = fct_relevel(Location, c('City', 'County')),
      Highlight = as.factor(Highlight),
      x = ifelse(Location == 'County', 1, 0))
```

]

```r
plot <- ggplot(data, aes(x = x, y = Engagement, group = School, color = Highlight)) +
    geom_point() +
    geom_line() +
    scale_color_manual(values = c('#757575', '#ed573e')) +
    labs(x = 'Sex', y = 'Engagement',
         title = paste0('Students in public, general education classes\n',
                        'in county schools have surprisingly low engagement')) +
    scale_x_continuous(limits = c(-1.2, 1.2), labels = c('City', 'County'),
                       breaks = c(0, 1)) +
    geom_text_repel(aes(label = Engagement, color = as.factor(Highlight)),
                    data          = subset(engagement, Location == 'County'),
                    size          = 5,
                    nudge_x       = 0.1,
                    segment.color = NA) +
    geom_text_repel(aes(label = Engagement, color = as.factor(Highlight)),
                    data          = subset(engagement, Location == 'City'),
                    size          = 5,
                    nudge_x       = -0.1,
                    segment.color = NA) +
    geom_text_repel(aes(label = School, color = as.factor(Highlight)),
                    data          = subset(engagement, Location == 'City'),
                    size          = 5,
                    nudge_x       = -0.25,
                    hjust         = 1,
                    segment.color = NA) +
    theme_cowplot() +
    background_grid(major = 'x') +
    theme(axis.line = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank(),
          legend.position = 'none')
```

]]]

]]

---

# Week 1: .fancy[Getting Started]

## 1. Course Goal
## 2. .orange[Course Introduction]
## 3. Break: Install Stuff
## 4. Workflow & Reading In Data
## 5. Wrangling Data
## 6. Visualizing Data

---

# Meet your instructor!

]]

### John Helveston, Ph.D.

- 2018 - Present Assistant Professor, Engineering Management & Systems Engineering
- 2016-2018 Postdoc at [Institute for Sustainable Energy](https://www.bu.edu/ise/), Boston University
- 2016 PhD in Engineering & Public Policy at Carnegie Mellon University
- 2015 MS in Engineering & Public Policy at Carnegie Mellon University
- 2010 BS in Engineering Science & Mechanics at Virginia Tech
- Website: [www.jhelvy.com](http://www.jhelvy.com/)

]]

---

# Meet your tutors!

]]

### **Michael Rossetti**

- Graduate Assistant (GA)
- PhD student in EMSE

]

---

# Meet your tutors!

]]

### **Eliese Ottinger**

- Learning Assistant (LA)
- EMSE Senior & P4A / EDA alumni

]

---

# Prerequisites

## [EMSE 4574: Intro to Programming for Analytics](https://p4a.seas.gwu.edu/2020-Fall/)

You should be able to:

- Use RStudio to write basic R commands.
- Know the distinctions between different R operators and data types, including numeric, string, and logical data.
- Use **tidyverse** functions to wrangle and manipulate data in R.
- Use the **ggplot2** library to create plots in R.

> [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M243.4 2.587C251.4-.8625 260.6-.8625 268.6 2.587L492.6 98.59C506.6 104.6 514.4 119.6 511.3 134.4C508.3 149.3 495.2 159.1 479.1 160V168C479.1 181.3 469.3 192 455.1 192H55.1C42.74 192 31.1 181.3 31.1 168V160C16.81 159.1 3.708 149.3 .6528 134.4C-2.402 119.6 5.429 104.6 19.39 98.59L243.4 2.587zM256 128C273.7 128 288 113.7 288 96C288 78.33 273.7 64 256 64C238.3 64 224 78.33 224 96C224 113.7 238.3 128 256 128zM127.1 416H167.1V224H231.1V416H280V224H344V416H384V224H448V420.3C448.6 420.6 449.2 420.1 449.8 421.4L497.8 453.4C509.5 461.2 514.7 475.8 510.6 489.3C506.5 502.8 494.1 512 480 512H31.1C17.9 512 5.458 502.8 1.372 489.3C-2.715 475.8 2.515 461.2 14.25 453.4L62.25 421.4C62.82 420.1 63.41 420.6 63.1 420.3V224H127.1V416z"/></svg> Check out R for Analytics Primer](http://jhelvy.github.io/r4aPrimer/)

---

# Course website

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M352 256C352 278.2 350.8 299.6 348.7 320H163.3C161.2 299.6 159.1 278.2 159.1 256C159.1 233.8 161.2 212.4 163.3 192H348.7C350.8 212.4 352 233.8 352 256zM503.9 192C509.2 212.5 512 233.9 512 256C512 278.1 509.2 299.5 503.9 320H380.8C382.9 299.4 384 277.1 384 256C384 234 382.9 212.6 380.8 192H503.9zM493.4 160H376.7C366.7 96.14 346.9 42.62 321.4 8.442C399.8 29.09 463.4 85.94 493.4 160zM344.3 160H167.7C173.8 123.6 183.2 91.38 194.7 65.35C205.2 41.74 216.9 24.61 228.2 13.81C239.4 3.178 248.7 0 256 0C263.3 0 272.6 3.178 283.8 13.81C295.1 24.61 306.8 41.74 317.3 65.35C328.8 91.38 338.2 123.6 344.3 160H344.3zM18.61 160C48.59 85.94 112.2 29.09 190.6 8.442C165.1 42.62 145.3 96.14 135.3 160H18.61zM131.2 192C129.1 212.6 127.1 234 127.1 256C127.1 277.1 129.1 299.4 131.2 320H8.065C2.8 299.5 0 278.1 0 256C0 233.9 2.8 212.5 8.065 192H131.2zM194.7 446.6C183.2 420.6 173.8 388.4 167.7 352H344.3C338.2 388.4 328.8 420.6 317.3 446.6C306.8 470.3 295.1 487.4 283.8 498.2C272.6 508.8 263.3 512 255.1 512C248.7 512 239.4 508.8 228.2 498.2C216.9 487.4 205.2 470.3 194.7 446.6H194.7zM190.6 503.6C112.2 482.9 48.59 426.1 18.61 352H135.3C145.3 415.9 165.1 469.4 190.6 503.6V503.6zM321.4 503.6C346.9 469.4 366.7 415.9 376.7 352H493.4C463.4 426.1 399.8 482.9 321.4 503.6V503.6z"/></svg> Everything you need will be on the course website:<br>https://eda.seas.gwu.edu/2022-Fall/

---

# **Quizzes** (8% of grade)

> **Why quiz at all?** The "retrieval effect" - basically, you have to _practice_ remembering things, otherwise your brain won't remember them (see the book ["Make It Stick: The Science of Successful Learning"](https://www.hup.harvard.edu/catalog.php?isbn=9780674729018))

---

## Assignments

## 1) <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M448 336v-288C448 21.49 426.5 0 400 0H96C42.98 0 0 42.98 0 96v320c0 53.02 42.98 96 96 96h320c17.67 0 32-14.33 32-31.1c0-11.72-6.607-21.52-16-27.1v-81.36C441.8 362.8 448 350.2 448 336zM143.1 128h192C344.8 128 352 135.2 352 144C352 152.8 344.8 160 336 160H143.1C135.2 160 128 152.8 128 144C128 135.2 135.2 128 143.1 128zM143.1 192h192C344.8 192 352 199.2 352 208C352 216.8 344.8 224 336 224H143.1C135.2 224 128 216.8 128 208C128 199.2 135.2 192 143.1 192zM384 448H96c-17.67 0-32-14.33-32-32c0-17.67 14.33-32 32-32h288V448z"/></svg> Weekly Homework / Readings: [HW1](https://eda.seas.gwu.edu/2022-Fall/hw/1-tidy-data.html)

## 2) <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M492.7 42.75C517.7 67.74 517.7 108.3 492.7 133.3L436.3 189.7L322.3 75.72L378.7 19.32C403.7-5.678 444.3-5.678 469.3 19.32L492.7 42.75zM44.89 353.2L299.7 98.34L413.7 212.3L158.8 467.1C152.1 473.8 143.8 478.7 134.6 481.4L30.59 511.1C22.21 513.5 13.19 511.1 7.03 504.1C.8669 498.8-1.47 489.8 .9242 481.4L30.65 377.4C33.26 368.2 38.16 359.9 44.89 353.2zM249.4 103.4L103.4 249.4L16 161.9C-2.745 143.2-2.745 112.8 16 94.06L94.06 16C112.8-2.745 143.2-2.745 161.9 16L181.7 35.76C181.4 36.05 181 36.36 180.7 36.69L116.7 100.7C110.4 106.9 110.4 117.1 116.7 123.3C122.9 129.6 133.1 129.6 139.3 123.3L203.3 59.31C203.6 58.99 203.1 58.65 204.2 58.3L249.4 103.4zM453.7 307.8C453.4 308 453 308.4 452.7 308.7L388.7 372.7C382.4 378.9 382.4 389.1 388.7 395.3C394.9 401.6 405.1 401.6 411.3 395.3L475.3 331.3C475.6 330.1 475.1 330.6 476.2 330.3L496 350.1C514.7 368.8 514.7 399.2 496 417.9L417.9 496C399.2 514.7 368.8 514.7 350.1 496L262.6 408.6L408.6 262.6L453.7 307.8z"/></svg> 3 Mini Projects (due 2 weeks from date assigned)

## 3) <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M492.7 42.75C517.7 67.74 517.7 108.3 492.7 133.3L436.3 189.7L322.3 75.72L378.7 19.32C403.7-5.678 444.3-5.678 469.3 19.32L492.7 42.75zM44.89 353.2L299.7 98.34L413.7 212.3L158.8 467.1C152.1 473.8 143.8 478.7 134.6 481.4L30.59 511.1C22.21 513.5 13.19 511.1 7.03 504.1C.8669 498.8-1.47 489.8 .9242 481.4L30.65 377.4C33.26 368.2 38.16 359.9 44.89 353.2zM249.4 103.4L103.4 249.4L16 161.9C-2.745 143.2-2.745 112.8 16 94.06L94.06 16C112.8-2.745 143.2-2.745 161.9 16L181.7 35.76C181.4 36.05 181 36.36 180.7 36.69L116.7 100.7C110.4 106.9 110.4 117.1 116.7 123.3C122.9 129.6 133.1 129.6 139.3 123.3L203.3 59.31C203.6 58.99 203.1 58.65 204.2 58.3L249.4 103.4zM453.7 307.8C453.4 308 453 308.4 452.7 308.7L388.7 372.7C382.4 378.9 382.4 389.1 388.7 395.3C394.9 401.6 405.1 401.6 411.3 395.3L475.3 331.3C475.6 330.1 475.1 330.6 476.2 330.3L496 350.1C514.7 368.8 514.7 399.2 496 417.9L417.9 496C399.2 514.7 368.8 514.7 350.1 496L262.6 408.6L408.6 262.6L453.7 307.8z"/></svg> [Final Project](https://eda.seas.gwu.edu/2022-Fall/project-final/0-overview.html) (Teams of 2 - 3 students)

Item            | Due Date
----------------|---------------
Proposal        | March 12
Progress Report | April 16
Final Report    | April 30
Presentation    | May 03
Interview       | Exam week

---

background-color: #FFF

# .center[Grades]

---

# .center[Grades]

Item                           | Weight | Notes
-------------------------------|--------|-------------------------------------
Weekly HW                      | 12 %   |
Quizzes                        | 8 %    | 5 quizzes, lowest dropped
Mini Project 1                 | 8 %    | Individual assignments
Mini Project 2                 | 8 %    |
Mini Project 3                 | 8 %    |
Final Project: Proposal        | 9 %    | Teams of 2-3 students
Final Project: Progress Report | 12 %   |
Final Project: Report          | 16 %   |
Final Project: Presentation    | 9 %    |
Final Interview                | 10 %   | Individual interview

---

# Course policies

- ## BE NICE
- ## BE HONEST
- ## DON'T CHEAT

]

## Copying is good, stealing is bad

> "Plagiarism is trying to pass someone else’s work off as your own. Copying is about reverse-engineering."
>
> .right[-- Austin Kleon, from [Steal Like An Artist](https://austinkleon.com/steal/)&ensp;]

]

---

# Late submissions

## - **5** late days - use them anytime, no questions asked
## - No more than **2** late days on any one assignment
## - Contact me for special cases

---

# How to succeed in this class

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M492.7 42.75C517.7 67.74 517.7 108.3 492.7 133.3L436.3 189.7L322.3 75.72L378.7 19.32C403.7-5.678 444.3-5.678 469.3 19.32L492.7 42.75zM44.89 353.2L299.7 98.34L413.7 212.3L158.8 467.1C152.1 473.8 143.8 478.7 134.6 481.4L30.59 511.1C22.21 513.5 13.19 511.1 7.03 504.1C.8669 498.8-1.47 489.8 .9242 481.4L30.65 377.4C33.26 368.2 38.16 359.9 44.89 353.2zM249.4 103.4L103.4 249.4L16 161.9C-2.745 143.2-2.745 112.8 16 94.06L94.06 16C112.8-2.745 143.2-2.745 161.9 16L181.7 35.76C181.4 36.05 181 36.36 180.7 36.69L116.7 100.7C110.4 106.9 110.4 117.1 116.7 123.3C122.9 129.6 133.1 129.6 139.3 123.3L203.3 59.31C203.6 58.99 203.1 58.65 204.2 58.3L249.4 103.4zM453.7 307.8C453.4 308 453 308.4 452.7 308.7L388.7 372.7C382.4 378.9 382.4 389.1 388.7 395.3C394.9 401.6 405.1 401.6 411.3 395.3L475.3 331.3C475.6 330.1 475.1 330.6 476.2 330.3L496 350.1C514.7 368.8 514.7 399.2 496 417.9L417.9 496C399.2 514.7 368.8 514.7 350.1 496L262.6 408.6L408.6 262.6L453.7 307.8z"/></svg> Start assignments early and **read carefully**!

## <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M128 95.1c26.5 0 47.1-21.5 47.1-47.1S154.5 0 128 0S80.01 21.5 80.01 47.1S101.5 95.1 128 95.1zM511.1 95.1c26.5 0 47.1-21.5 47.1-47.1S538.5 0 511.1 0c-26.5 0-48 21.5-48 47.1S485.5 95.1 511.1 95.1zM603.5 258.3l-18.5-80.13c-4.625-20-18.62-36.88-37.5-44.88c-18.5-8-38.1-6.75-56.12 3.25c-22.62 13.38-39.62 34.5-48.12 59.38l-11.25 33.88l-15.1 10.25L415.1 144c0-8.75-7.25-16-16-16H240c-8.75 0-16 7.25-16 16L224 239.1l-16.12-10.25l-11.25-33.88c-8.375-25-25.38-46-48.12-59.38c-17.25-10-37.63-11.25-56.12-3.25c-18.88 8-32.88 24.88-37.5 44.88l-18.37 80.13c-4.625 20 .7506 41.25 14.37 56.75l67.25 75.88l10.12 92.63C130 499.8 143.8 512 160 512c1.25 0 2.25-.125 3.5-.25c17.62-1.875 30.25-17.62 28.25-35.25l-10-92.75c-1.5-13-7-25.12-15.62-35l-43.37-49l17.62-70.38l6.876 20.38c4 12.5 11.87 23.5 24.5 32.63l51 32.5c4.623 2.875 12.12 4.625 17.25 5h159.1c5.125-.375 12.62-2.125 17.25-5l51-32.5c12.62-9.125 20.5-20 24.5-32.63l6.875-20.38l17.63 70.38l-43.37 49c-8.625 9.875-14.12 22-15.62 35l-10 92.75c-2 17.62 10.75 33.38 28.25 35.25C477.7 511.9 478.7 512 479.1 512c16.12 0 29.1-12.12 31.75-28.5l10.12-92.63L589.1 315C602.7 299.5 608.1 278.3 603.5 258.3zM46.26 358.1l-44 110c-6.5 16.38 1.5 35 17.88 41.63c16.75 6.5 35.12-1.75 41.62-17.88l27.62-69.13l-2-18.25L46.26 358.1zM637.7 468.1l-43.1-110l-41.13 46.38l-2 18.25l27.62 69.13C583.2 504.4 595.2 512 607.1 512c3.998 0 7.998-.75 11.87-2.25C636.2 503.1 644.2 484.5 637.7 468.1z"/></svg> Ask for help!

---

# [Getting Help](https://eda.seas.gwu.edu/2022-Fall/help/getting-help.html)

## <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.8c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.1c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.1 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.1 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.9c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.8c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"/></svg> Use [Slack](https://emse-eda-f22.slack.com/) to ask questions.

- Mondays from 8:00-5:00pm
- Wednesdays from 3:20-5:00pm
- Thursdays from 12:00-5:00pm

---

# <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M464 288h-416C21.5 288 0 309.5 0 336v96C0 458.5 21.5 480 48 480h416c26.5 0 48-21.5 48-48v-96C512 309.5 490.5 288 464 288zM320 416c-17.62 0-32-14.38-32-32s14.38-32 32-32s32 14.38 32 32S337.6 416 320 416zM416 416c-17.62 0-32-14.38-32-32s14.38-32 32-32s32 14.38 32 32S433.6 416 416 416zM464 32h-416C21.5 32 0 53.5 0 80v192.4C13.41 262.3 29.92 256 48 256h416c18.08 0 34.59 6.254 48 16.41V80C512 53.5 490.5 32 464 32z"/></svg> [Course Software](https://eda.seas.gwu.edu/2022-Fall/help/course-software.html)

## <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.8c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.1c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.1 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.1 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.9c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.8c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"/></svg> [Slack](https://emse-eda-f22.slack.com/): See bb for link to join;<br>install on phone and **turn notifications on**!

## <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> [R](https://cloud.r-project.org/) & [RStudio](https://rstudio.com/products/rstudio/download/) (Install both)

---

<br>
# .fancy[Break]

# Install Stuff

---

# Week 1: .fancy[Getting Started]

## 1. Course Goal
## 2. Course Introduction
## 3. Break: Install Stuff
## 4. .orange[Workflow & Reading In Data]
## 5. Wrangling Data
## 6. Visualizing Data

---

## Workflow for reading in data

1) Use R Projects (.Rproj files) to organize your analysis - **don't double-click .R files**!

2) Use the `here` package to create file paths

```r
path <- here::here("folder", "file.csv")
```

3) Import data with these functions:

File type  | Function       | Library
-----------|----------------|----------
`.csv`     | `read_csv()`   | **readr**
`.txt`     | `read.table()` | **utils**
`.xlsx`    | `read_excel()` | **readxl**

---

# Importing Comma Separated Values (.csv)

Read in `.csv` files with `read_csv()`:

```r
library(tidyverse)
library(here)

csvPath <- here('data', 'milk_production.csv')
*milk_production <- read_csv(csvPath)

head(milk_production)
```

```
#> # A tibble: 6 × 4
#>   region    state          year milk_produced
#>   <chr>     <chr>         <dbl>         <dbl>
#> 1 Northeast Maine          1970     619000000
#> 2 Northeast New Hampshire  1970     356000000
#> 3 Northeast Vermont        1970    1970000000
#> 4 Northeast Massachusetts  1970     658000000
#> 5 Northeast Rhode Island   1970      75000000
#> 6 Northeast Connecticut    1970     661000000
```

---

# Importing Text Files (.txt)

Read in `.txt` files with `read.table()`:

```r
txtPath <- here('data', 'nasa_global_temps.txt')
*global_temps <- read.table(txtPath, skip = 5, header = FALSE)

head(global_temps)
```

```
#>     V1    V2    V3
#> 1 1880 -0.15 -0.08
#> 2 1881 -0.07 -0.12
#> 3 1882 -0.10 -0.15
#> 4 1883 -0.16 -0.19
#> 5 1884 -0.27 -0.23
#> 6 1885 -0.32 -0.25
```

---

# Importing Text Files (.txt)

Read in `.txt` files with `read.table()`:

```r
txtPath <- here('data', 'nasa_global_temps.txt')
global_temps <- read.table(txtPath, skip = 5, header = FALSE)
*names(global_temps) <- c('year', 'no_smoothing', 'loess') # Add header

head(global_temps)
```

```
#>   year no_smoothing loess
#> 1 1880        -0.15 -0.08
#> 2 1881        -0.07 -0.12
#> 3 1882        -0.10 -0.15
#> 4 1883        -0.16 -0.19
#> 5 1884        -0.27 -0.23
#> 6 1885        -0.32 -0.25
```

---

# Importing Excel Files (.xlsx)

Read in `.xlsx` files with `read_excel()`:

```r
library(readxl)

xlsxPath <- here('data', 'pv_cell_production.xlsx')
*pv_cells <- read_excel(xlsxPath, sheet = 'Cell Prod by Country', skip = 2)
```
.code70[

```r
glimpse(pv_cells)
```

```
#> Rows: 25
#> Columns: 10
#> $ Year            <chr> NA, NA, "1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", NA, "Note: NA = data not available.", NA, "Source: Compiled by E…
#> $ China           <chr> "Megawatts", NA, "NA", "NA", "NA", "NA", "NA", "2.5", "3", "10", "13", "40", "128.30000000000001", "341.8", "1192.8735755126208", "2535.9804999999997", "5193.2335000000003", "12882.114299891044", "24338.646000000004", "24139…
#> $ Taiwan          <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "NA", "3.5", "8", "17", "39.299999999999997", "88", "169.5", "413.19362206495737", "871.4", "1573.2", "3755.9046488657718", "4773.1499999999996", "5270.1999999999989", "6338.565000000000…
#> $ Japan           <dbl> NA, NA, 16.4, 21.2, 35.0, 49.0, 80.0, 128.6, 171.2, 251.1, 363.9, 601.5, 833.0, 926.4, 937.5, 1268.0, 1503.0, 2169.0, 2707.0, 2641.8, 3679.0, NA, NA, NA, NA
#> $ Malaysia        <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "0", "0", "100.1", "397.9", "1228.0566037735848", "1919.0129442119946", "2684.5953947368421", "2597.365436241611", "3072.59", NA, NA, NA, NA
#> $ Germany         <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "22.5", "23.5", "55", "121.5", "193", "339", "469.1", "815.35421116529074", "1476.6923205919056", "1606.0497978436656", "2181.2726133183096", "2152.8626315789475", "1406.7827181208054", …
#> $ `South Korea`   <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "5.3", "13", "31.883935905674612", "70.848164851527258", "234", "886.29518449560589", "1227.3", "1107.0999999999999", "1127.0999999999999", NA, NA, NA, NA
#> $ `United States` <dbl> NA, NA, 34.7500, 38.8500, 51.0000, 53.7000, 60.8000, 75.0000, 100.3000, 120.6000, 103.0000, 138.7000, 153.1000, 177.6000, 261.9804, 403.1250, 594.7922, 1162.5177, 1044.1895, 886.4018, 868.4250, NA, NA, NA, NA
#> $ Others          <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "48.200000000000017", "69.800000000000011", "97.299999999999955", "131", "186.29999999999995", "235.70000000000027", "361.09999999999991", "410.97322650945807", "709.03112641453299", "66…
#> $ World           <dbl> NA, NA, 77.600, 88.600, 125.800, 154.900, 201.300, 276.800, 371.300, 542.000, 749.400, 1198.800, 1782.400, 2458.500, 4163.859, 7732.977, 12595.992, 26399.539, 40761.761, 39523.565, 44464.496, NA, NA, NA, NA
```
]

---

# Importing Excel Files (.xlsx)

Read in `.xlsx` files with `read_excel()`:

```r
library(readxl)

xlsxPath <- here('data', 'pv_cell_production.xlsx')
pv_cells <- read_excel(xlsxPath, sheet = 'Cell Prod by Country', skip = 2) %>%
* mutate(Year = as.numeric(Year)) %>% # Convert "non-years" to NA
* filter(!is.na(Year)) # Drop NA rows in Year
```
.code60[

```r
glimpse(pv_cells)
```

```
#> Rows: 19
#> Columns: 10
#> $ Year            <dbl> 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013
#> $ China           <chr> "NA", "NA", "NA", "NA", "NA", "2.5", "3", "10", "13", "40", "128.30000000000001", "341.8", "1192.8735755126208", "2535.9804999999997", "5193.2335000000003", "12882.114299891044", "24338.646000000004", "24139.014999999999", "…
#> $ Taiwan          <chr> "NA", "NA", "NA", "NA", "NA", "NA", "3.5", "8", "17", "39.299999999999997", "88", "169.5", "413.19362206495737", "871.4", "1573.2", "3755.9046488657718", "4773.1499999999996", "5270.1999999999989", "6338.5650000000005"
#> $ Japan           <dbl> 16.4, 21.2, 35.0, 49.0, 80.0, 128.6, 171.2, 251.1, 363.9, 601.5, 833.0, 926.4, 937.5, 1268.0, 1503.0, 2169.0, 2707.0, 2641.8, 3679.0
#> $ Malaysia        <chr> "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "0", "0", "100.1", "397.9", "1228.0566037735848", "1919.0129442119946", "2684.5953947368421", "2597.365436241611", "3072.59"
#> $ Germany         <chr> "NA", "NA", "NA", "NA", "NA", "22.5", "23.5", "55", "121.5", "193", "339", "469.1", "815.35421116529074", "1476.6923205919056", "1606.0497978436656", "2181.2726133183096", "2152.8626315789475", "1406.7827181208054", "1054.88…
#> $ `South Korea`   <chr> "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "5.3", "13", "31.883935905674612", "70.848164851527258", "234", "886.29518449560589", "1227.3", "1107.0999999999999", "1127.0999999999999"
#> $ `United States` <dbl> 34.7500, 38.8500, 51.0000, 53.7000, 60.8000, 75.0000, 100.3000, 120.6000, 103.0000, 138.7000, 153.1000, 177.6000, 261.9804, 403.1250, 594.7922, 1162.5177, 1044.1895, 886.4018, 868.4250
#> $ Others          <chr> "NA", "NA", "NA", "NA", "NA", "48.200000000000017", "69.800000000000011", "97.299999999999955", "131", "186.29999999999995", "235.70000000000027", "361.09999999999991", "410.97322650945807", "709.03112641453299", "663.660000…
#> $ World           <dbl> 77.600, 88.600, 125.800, 154.900, 201.300, 276.800, 371.300, 542.000, 749.400, 1198.800, 1782.400, 2458.500, 4163.859, 7732.977, 12595.992, 26399.539, 40761.761, 39523.565, 44464.496
```
]

---

# Your turn

Open the `practice.Rmd` file.

Write code to import the following data files from the "data" folder:

- For `lotr_words.csv`, call the data frame `lotr`
- For `north_america_bear_killings.txt`, call the data frame `bears`
- For `uspto_clean_energy_patents.xlsx`, call the data frame `patents`

---

# Week 1: .fancy[Getting Started]

## 1. Course Goal
## 2. Course Introduction
## 3. Break: Install Stuff
## 4. Workflow & Reading In Data
## 5. .orange[Wrangling Data]
## 6. Visualizing Data

---

# .center[The data frame...<br>in .darkgreen[Excel]]

]

# .center[The data frame...<br>in <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:blue;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg>]

```r
lotr
```

```
#> # A tibble: 18 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Elf    Female       1229
#>  2 The Fellowship Of The Ring Elf    Male          971
#>  3 The Fellowship Of The Ring Hobbit Female         14
#>  4 The Fellowship Of The Ring Hobbit Male         3644
#>  5 The Fellowship Of The Ring Man    Female          0
#>  6 The Fellowship Of The Ring Man    Male         1995
#>  7 The Return Of The King     Elf    Female        183
#>  8 The Return Of The King     Elf    Male          510
#>  9 The Return Of The King     Hobbit Female          2
#> 10 The Return Of The King     Hobbit Male         2673
#> 11 The Return Of The King     Man    Female        268
#> 12 The Return Of The King     Man    Male         2459
#> 13 The Two Towers             Elf    Female        331
#> 14 The Two Towers             Elf    Male          513
#> 15 The Two Towers             Hobbit Female          0
#> 16 The Two Towers             Hobbit Male         2463
#> 17 The Two Towers             Man    Female        401
#> 18 The Two Towers             Man    Male         3589
```

]

---

## **Columns**: _Vectors_ of values (must be same data type)

Extract a column using `$`

```r
lotr$race
```

```
#>  [1] "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"
```

---

## **Columns**: _Vectors_ of values (must be same data type)

Can also use brackets:

```r
lotr$race
```

```
#>  [1] "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"
```

```r
lotr[,2]
```

```
#> # A tibble: 18 × 1
#>    race  
#>    <chr> 
#>  1 Elf   
#>  2 Elf   
#>  3 Hobbit
#>  4 Hobbit
#>  5 Man   
#>  6 Man   
#>  7 Elf   
#>  8 Elf   
#>  9 Hobbit
#> 10 Hobbit
#> 11 Man   
#> 12 Man   
#> 13 Elf   
#> 14 Elf   
#> 15 Hobbit
#> 16 Hobbit
#> 17 Man   
#> 18 Man
```

---

## **Rows**: Information about individual observations

Information about the first row:

```r
lotr[1,]
```

```
#> # A tibble: 1 × 4
#>   film                       race  gender word_count
#>   <chr>                      <chr> <chr>       <dbl>
#> 1 The Fellowship Of The Ring Elf   Female       1229
```

Information about rows 1 & 2:

```r
lotr[1:2,]
```

```
#> # A tibble: 2 × 4
#>   film                       race  gender word_count
#>   <chr>                      <chr> <chr>       <dbl>
#> 1 The Fellowship Of The Ring Elf   Female       1229
#> 2 The Fellowship Of The Ring Elf   Male          971
```

---

## Quick Practice

Read in the `data.csv` file in the "data" folder:

```r
data <- read_csv(here('data', 'data.csv'))
```

Now answer these questions:

- How many rows and columns are in the data frame?
- What type of data is each column?
- Preview the different columns - what do you think this data is about? What might one row represent?
- How many unique airlines are in the data frame? 
- What is the shortest and longest air time for any one flight in the data frame?

---

### The tidyverse: `stringr` + `dplyr` + `readr` +  `ggplot2` + ...

<center>
<img src="images/horst_monsters_tidyverse.jpeg" width="950">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---

# .center[The main `dplyr` "verbs"]

<br>

"Verb"        | What it does
--------------|--------------------
`select()`    | Select columns by name
`filter()`    | Keep rows that match criteria
`arrange()`   | Sort rows based on column(s)
`mutate()`    | Create new columns 
`summarize()` | Create summary values

---

# .center[Core `tidyverse` concept:<br>**Chain functions together with "pipes"**]

# .center[`%>%`]

## Think of the words "...and then..."

```r
data %>% 
  do_something() %>% 
  do_something_else()
```

---

# Select columns with `select()`

---

# Select columns with `select()`

Select the columns `film` & `race`

```r
lotr %>% 
  select(film, race)
```

```
#> # A tibble: 18 × 2
#>    film                       race  
#>    <chr>                      <chr> 
#>  1 The Fellowship Of The Ring Elf   
#>  2 The Fellowship Of The Ring Elf   
#>  3 The Fellowship Of The Ring Hobbit
#>  4 The Fellowship Of The Ring Hobbit
#>  5 The Fellowship Of The Ring Man   
#>  6 The Fellowship Of The Ring Man   
#>  7 The Return Of The King     Elf   
#>  8 The Return Of The King     Elf   
#>  9 The Return Of The King     Hobbit
#> 10 The Return Of The King     Hobbit
#> 11 The Return Of The King     Man   
#> 12 The Return Of The King     Man   
#> 13 The Two Towers             Elf   
#> 14 The Two Towers             Elf   
#> 15 The Two Towers             Hobbit
#> 16 The Two Towers             Hobbit
#> 17 The Two Towers             Man   
#> 18 The Two Towers             Man
```

---

# Select columns with `select()`

Use the `-` sign to drop columns

```r
lotr %>% 
  select(-film)
```

```
#> # A tibble: 18 × 3
#>    race   gender word_count
#>    <chr>  <chr>       <dbl>
#>  1 Elf    Female       1229
#>  2 Elf    Male          971
#>  3 Hobbit Female         14
#>  4 Hobbit Male         3644
#>  5 Man    Female          0
#>  6 Man    Male         1995
#>  7 Elf    Female        183
#>  8 Elf    Male          510
#>  9 Hobbit Female          2
#> 10 Hobbit Male         2673
#> 11 Man    Female        268
#> 12 Man    Male         2459
#> 13 Elf    Female        331
#> 14 Elf    Male          513
#> 15 Hobbit Female          0
#> 16 Hobbit Male         2463
#> 17 Man    Female        401
#> 18 Man    Male         3589
```

---

# Filter for rows with `filter()`

---

# Filter for rows with `filter()`

Keep only the rows with Elf characters

```r
lotr %>% 
    filter(race == "Elf")
```

```
#> # A tibble: 6 × 4
#>   film                       race  gender word_count
#>   <chr>                      <chr> <chr>       <dbl>
#> 1 The Fellowship Of The Ring Elf   Female       1229
#> 2 The Fellowship Of The Ring Elf   Male          971
#> 3 The Return Of The King     Elf   Female        183
#> 4 The Return Of The King     Elf   Male          510
#> 5 The Two Towers             Elf   Female        331
#> 6 The Two Towers             Elf   Male          513
```

---

# Filter for rows with `filter()`

Keep only the rows with Elf or Hobbit characters

```r
lotr %>% 
    filter((race == "Elf") | (race  == "Hobbit"))
```

```
#> # A tibble: 12 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Elf    Female       1229
#>  2 The Fellowship Of The Ring Elf    Male          971
#>  3 The Fellowship Of The Ring Hobbit Female         14
#>  4 The Fellowship Of The Ring Hobbit Male         3644
#>  5 The Return Of The King     Elf    Female        183
#>  6 The Return Of The King     Elf    Male          510
#>  7 The Return Of The King     Hobbit Female          2
#>  8 The Return Of The King     Hobbit Male         2673
#>  9 The Two Towers             Elf    Female        331
#> 10 The Two Towers             Elf    Male          513
#> 11 The Two Towers             Hobbit Female          0
#> 12 The Two Towers             Hobbit Male         2463
```

---

# Filter for rows with `filter()`

Keep only the rows with Elf or Hobbit characters

```r
lotr %>% 
    filter(race %in% c("Elf", "Hobbit"))
```

---

# .center[Logic operators for `filter()`]

<br>

Description | Example
------------|------------
Values greater than 1 | `value > 1`
Values greater than or equal to 1 | `value >= 1`
Values less than 1 | `value < 1`
Values less than or equal to 1 | `value <= 1`
Values equal to 1 | `value == 1`
Values not equal to 1 | `value != 1`
Values in the set c(1, 4) | `value %in% c(1, 4)`

---

# Combine `filter()` and `select()`

Keep only the rows with Elf characters that spoke more than 1000 words, then select everything but the race column

```r
lotr %>% 
  filter((race == "Elf") & (word_count > 1000)) %>% 
  select(-race)
```

```
#> # A tibble: 1 × 3
#>   film                       gender word_count
#>   <chr>                      <chr>       <dbl>
#> 1 The Fellowship Of The Ring Female       1229
```

---

## Create new variables with `mutate()`

---

# Create new variables with `mutate()`

Create a new variable, `word1000` which is `TRUE` if the character spoke 1,000 or more words

```r
lotr %>%
    mutate(word1000 = word_count >= 1000)
```

```
#> # A tibble: 18 × 5
#>    film                       race   gender word_count word1000
#>    <chr>                      <chr>  <chr>       <dbl> <lgl>   
#>  1 The Fellowship Of The Ring Elf    Female       1229 TRUE    
#>  2 The Fellowship Of The Ring Elf    Male          971 FALSE   
#>  3 The Fellowship Of The Ring Hobbit Female         14 FALSE   
#>  4 The Fellowship Of The Ring Hobbit Male         3644 TRUE    
#>  5 The Fellowship Of The Ring Man    Female          0 FALSE   
#>  6 The Fellowship Of The Ring Man    Male         1995 TRUE    
#>  7 The Return Of The King     Elf    Female        183 FALSE   
#>  8 The Return Of The King     Elf    Male          510 FALSE   
#>  9 The Return Of The King     Hobbit Female          2 FALSE   
#> 10 The Return Of The King     Hobbit Male         2673 TRUE    
#> 11 The Return Of The King     Man    Female        268 FALSE   
#> 12 The Return Of The King     Man    Male         2459 TRUE    
#> 13 The Two Towers             Elf    Female        331 FALSE   
#> 14 The Two Towers             Elf    Male          513 FALSE   
#> 15 The Two Towers             Hobbit Female          0 FALSE   
#> 16 The Two Towers             Hobbit Male         2463 TRUE    
#> 17 The Two Towers             Man    Female        401 FALSE   
#> 18 The Two Towers             Man    Male         3589 TRUE
```

---

# .center[Handling if/else conditions]

### .center[`ifelse(<condition>, <if TRUE>, <else>)`]

```r
lotr %>%
    mutate(word1000 = ifelse(word_count >= 1000, TRUE, FALSE))
```

---

# Sort data frame with `arrange()`

Sort the `lotr` data frame by `word_count`

```r
lotr %>%
    arrange(word_count)
```

```
#> # A tibble: 18 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Man    Female          0
#>  2 The Two Towers             Hobbit Female          0
#>  3 The Return Of The King     Hobbit Female          2
#>  4 The Fellowship Of The Ring Hobbit Female         14
#>  5 The Return Of The King     Elf    Female        183
#>  6 The Return Of The King     Man    Female        268
#>  7 The Two Towers             Elf    Female        331
#>  8 The Two Towers             Man    Female        401
#>  9 The Return Of The King     Elf    Male          510
#> 10 The Two Towers             Elf    Male          513
#> 11 The Fellowship Of The Ring Elf    Male          971
#> 12 The Fellowship Of The Ring Elf    Female       1229
#> 13 The Fellowship Of The Ring Man    Male         1995
#> 14 The Return Of The King     Man    Male         2459
#> 15 The Two Towers             Hobbit Male         2463
#> 16 The Return Of The King     Hobbit Male         2673
#> 17 The Two Towers             Man    Male         3589
#> 18 The Fellowship Of The Ring Hobbit Male         3644
```

---

# Sort data frame with `arrange()`

Use the `desc()` function to sort in descending order

```r
lotr %>%
    arrange(desc(word_count))
```

```
#> # A tibble: 18 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Hobbit Male         3644
#>  2 The Two Towers             Man    Male         3589
#>  3 The Return Of The King     Hobbit Male         2673
#>  4 The Two Towers             Hobbit Male         2463
#>  5 The Return Of The King     Man    Male         2459
#>  6 The Fellowship Of The Ring Man    Male         1995
#>  7 The Fellowship Of The Ring Elf    Female       1229
#>  8 The Fellowship Of The Ring Elf    Male          971
#>  9 The Two Towers             Elf    Male          513
#> 10 The Return Of The King     Elf    Male          510
#> 11 The Two Towers             Man    Female        401
#> 12 The Two Towers             Elf    Female        331
#> 13 The Return Of The King     Man    Female        268
#> 14 The Return Of The King     Elf    Female        183
#> 15 The Fellowship Of The Ring Hobbit Female         14
#> 16 The Return Of The King     Hobbit Female          2
#> 17 The Fellowship Of The Ring Man    Female          0
#> 18 The Two Towers             Hobbit Female          0
```

---

# Your turn

Read in the `data.csv` file in the "data" folder:

```r
data <- read_csv(here('data', 'data.csv'))
```

Now answer these questions:

- Create a new data frame, `flights_fall`, that contains only flights that departed in the fall semester.
- Create a new data frame, `flights_dc`, that contains only flights that flew to DC airports (Reagan or Dulles).
- Create a new data frame, `flights_dc_carrier`, that contains only flights that flew to DC airports (Reagan or Dulles) and only the columns about the month and airline.
- How many unique airlines were flying to DC airports in July?
- Create a new variable, `speed`, in miles per hour using the `time` (minutes) and `distance` (miles) variables. 
- Which flight flew the fastest?
- Remove rows that have `NA` for `air_time` and re-arrange the resulting data frame based on the longest air time and longest flight distance.

]

---

# Week 1: .fancy[Getting Started]

## 1. Course Goal
## 2. Course Introduction
## 3. Break: Install Stuff
## 4. Workflow & Reading In Data
## 5. Wrangling Data
## 6. .orange[Visualizing Data]

---

]

# "Grammar of Graphics"

Concept developed by Leland Wilkinson (1999)

**ggplot2** package developed by Hadley Wickham (2005)

]

---

# Making plot layers with ggplot2

<br>

### 1. The data 
### 2. The aesthetic mapping (what goes on the axes?)
### 3. The geometries (points? bars? etc.)
### 4. The annotations / labels
### 5. The theme

---

# Layer 1: The data

```r
head(mpg)
```

```
#> # A tibble: 6 × 11
#>   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class  
#>   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>  
#> 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compact
#> 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compact
#> 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compact
#> 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compact
#> 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compact
#> 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compact
```

---

# Layer 1: The data

The `ggplot()` function initializes the plot with whatever data you're using

```r
mpg %>% 
  ggplot()
```

]

]]

---

# Layer 2: The aesthetic mapping

The `aes()` function determines which variables will be _mapped_ to the geometries<br>(e.g. the axes)

```r
mpg %>% 
* ggplot(aes(x = displ, y = hwy))
```

]

]]

---

# Layer 3: The geometries

Use `+` to add geometries, e.g. `geom_points()` for points

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
* geom_point()
```

]

]]

---

# Layer 4: The annotations / labels

Use `labs()` to modify most labels

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* labs(
*   x = "Engine displacement (liters)",
*   y = "Highway fuel economy (mpg)",
*   title = "Most larger engine vehicles are less fuel efficient"
* )
```

]

]]

---

# Layer 5: The theme

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
  labs(
    x = "Engine displacement (liters)",  
    y = "Highway fuel economy (mpg)", 
    title = "Most larger engine vehicles are less fuel efficient"
  ) + 
* theme_bw()
```

]

]]

---

### Common themes

`theme_bw()`

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_bw()
```

]

`theme_minimal()`

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_minimal()
```

]

---

### Common themes

`theme_classic()`

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_classic()
```

]

`theme_void()`

```r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_void()
```

]

---

]

## Your turn

Open `practice.Rmd`

Use the `mpg` data frame and ggplot to create these charts

<img src="figs/unnamed-chunk-55-1.png" width="522.144" />
]

---

# Extra practice

]

]