Getting Started

]

# Week 1: .fancy[Getting Started]

### <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M243.4 2.6l-224 96c-14 6-21.8 21-18.7 35.8S16.8 160 32 160v8c0 13.3 10.7 24 24 24H456c13.3 0 24-10.7 24-24v-8c15.2 0 28.3-10.7 31.3-25.6s-4.8-29.9-18.7-35.8l-224-96c-8-3.4-17.2-3.4-25.2 0zM128 224H64V420.3c-.6 .3-1.2 .7-1.8 1.1l-48 32c-11.7 7.8-17 22.4-12.9 35.9S17.9 512 32 512H480c14.1 0 26.5-9.2 30.6-22.7s-1.1-28.1-12.9-35.9l-48-32c-.6-.4-1.2-.7-1.8-1.1V224H384V416H344V224H280V416H232V224H168V416H128V224zM256 64a32 32 0 1 1 0 64 32 32 0 1 1 0-64z"/></svg> EMSE 4572/6572: Exploratory Data Analysis
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M304 128a80 80 0 1 0 -160 0 80 80 0 1 0 160 0zM96 128a128 128 0 1 1 256 0A128 128 0 1 1 96 128zM49.3 464H398.7c-8.9-63.3-63.3-112-129-112H178.3c-65.7 0-120.1 48.7-129 112zM0 482.3C0 383.8 79.8 304 178.3 304h91.4C368.2 304 448 383.8 448 482.3c0 16.4-13.3 29.7-29.7 29.7H29.7C13.3 512 0 498.7 0 482.3z"/></svg> John Paul Helveston
### <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:white;overflow:visible;position:relative;"><path d="M152 24c0-13.3-10.7-24-24-24s-24 10.7-24 24V64H64C28.7 64 0 92.7 0 128v16 48V448c0 35.3 28.7 64 64 64H384c35.3 0 64-28.7 64-64V192 144 128c0-35.3-28.7-64-64-64H344V24c0-13.3-10.7-24-24-24s-24 10.7-24 24V64H152V24zM48 192H400V448c0 8.8-7.2 16-16 16H64c-8.8 0-16-7.2-16-16V192z"/></svg> August 28, 2024

]

---

# Week 1: .fancy[Getting Started]

### 1. Course Goal
### 2. Course Introduction
### 3. Break: Install Stuff
### 4. Quarto
### 5. Workflow & Reading In Data
### 6. Wrangling Data
### 7. Visualizing Data

---

# Week 1: .fancy[Getting Started]

### 1. .orange[Course Goal]
### 2. Course Introduction
### 3. Break: Install Stuff
### 4. Quarto
### 5. Workflow & Reading In Data
### 6. Wrangling Data
### 7. Visualizing Data

---

## Course 1: [Intro to Programming for Analytics](https://p4a.seas.gwu.edu/)

**"Computational Literacy"**

- Programming: Conditionals (if/else), loops, functions, testing, data types.
- Analytics: Data structures, import / export, basic data manipulation & visualization.

## Course 2: [Exploratory Data Analysis](https://eda.seas.gwu.edu/)

**"Data Literacy"**

- Strategies for conducting an exploratory data analysis.
- Design principles for visualizing and communicating _information_ extracted from data.
- Reproducibility: Reports that contain code, equations, visualizations, and narrative text.

---

# **Class goal**: translate _data_ into _information_

---

# **Class goal**: translate _data_ into _information_

**Data**

Average student engagement scores

Class       | Type | City | County
 ------------|-------------|------|-------
 Special Ed. | Charter     | 643  | 793
 Special Ed. | Public      | 735  | 928
 General Ed. | Charter     | 590  | 724
 General Ed. | Public      | 863  | 662

]

**Information**

]

---

# Data exploration: an iterative process

Encode data:

``` r
engagement_data <- data.frame(
    City   = c(643, 735, 590, 863),
    County = c(793, 928, 724, 662),
    School = c('Special Ed., Charter', 'Special Ed., Public',
               'General Ed., Charter', 'General Ed., Public'))
engagement_data
```

```
#>   City County               School
#> 1  643    793 Special Ed., Charter
#> 2  735    928  Special Ed., Public
#> 3  590    724 General Ed., Charter
#> 4  863    662  General Ed., Public
```

]]

Re-format data for plotting:

``` r
engagement_data <- engagement_data %>%
    gather(Location, Engagement, City:County) %>%
    mutate(Location = fct_relevel(
      Location, c('City', 'County')))
engagement_data
```

```
#>                 School Location Engagement
#> 1 Special Ed., Charter     City        643
#> 2  Special Ed., Public     City        735
#> 3 General Ed., Charter     City        590
#> 4  General Ed., Public     City        863
#> 5 Special Ed., Charter   County        793
#> 6  Special Ed., Public   County        928
#> 7 General Ed., Charter   County        724
#> 8  General Ed., Public   County        662
```

]]

---

# Data exploration: an iterative process

Initial exploratory plotting:

``` r
engagement_data %>%
    ggplot() +
    geom_col(aes(x = Engagement, y = School,
                 fill = Location),
             position = 'dodge')
```

]]

More exploratory plotting:<br>highlight difference

]

---

# Data exploration: an iterative process

Directly label figure:

]

Remove unnecessary axes, change colors, fix labels:

]

---

**A fully reproducible analysis**

``` r
data <- data.frame(
    City   = c(643, 735, 590, 863),
    County = c(793, 928, 724, 662),
    School = c('Special Ed., Charter', 'Special Ed., Public',
               'General Ed., Charter', 'General Ed., Public'),
    Highlight = c(0, 0, 0, 1)) %>%
    gather(Location, Engagement, City:County) %>%
    mutate(
      Location = fct_relevel(Location, c('City', 'County')),
      Highlight = as.factor(Highlight),
      x = ifelse(Location == 'County', 1, 0))
```

]

``` r
plot <- ggplot(data, aes(x = x, y = Engagement, group = School, color = Highlight)) +
    geom_point() +
    geom_line() +
    scale_color_manual(values = c('#757575', '#ed573e')) +
    labs(x = 'Sex', y = 'Engagement',
         title = paste0('Students in public, general education classes\n',
                        'in county schools have surprisingly low engagement')) +
    scale_x_continuous(limits = c(-1.2, 1.2), labels = c('City', 'County'),
                       breaks = c(0, 1)) +
    geom_text_repel(aes(label = Engagement, color = as.factor(Highlight)),
                    data          = subset(engagement, Location == 'County'),
                    size          = 5,
                    nudge_x       = 0.1,
                    segment.color = NA) +
    geom_text_repel(aes(label = Engagement, color = as.factor(Highlight)),
                    data          = subset(engagement, Location == 'City'),
                    size          = 5,
                    nudge_x       = -0.1,
                    segment.color = NA) +
    geom_text_repel(aes(label = School, color = as.factor(Highlight)),
                    data          = subset(engagement, Location == 'City'),
                    size          = 5,
                    nudge_x       = -0.25,
                    hjust         = 1,
                    segment.color = NA) +
    theme_cowplot() +
    background_grid(major = 'x') +
    theme(axis.line = element_blank(),
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          axis.text.y = element_blank(),
          axis.ticks = element_blank(),
          legend.position = 'none')
```

]]]

]]

---

background-color: #fff
class: center

# Data exploration: an iterative process

---

# Week 1: .fancy[Getting Started]

### 1. Course Goal
### 2. .orange[Course Introduction]
### 3. Break: Install Stuff
### 4. Quarto
### 5. Workflow & Reading In Data
### 6. Wrangling Data
### 7. Visualizing Data

---

# Meet your instructor!

]]

### John Helveston, Ph.D.

- 2018 - Present Assistant Professor, Engineering Management & Systems Engineering
- 2016-2018 Postdoc at [Institute for Sustainable Energy](https://www.bu.edu/ise/), Boston University
- 2016 PhD in Engineering & Public Policy at Carnegie Mellon University
- 2015 MS in Engineering & Public Policy at Carnegie Mellon University
- 2010 BS in Engineering Science & Mechanics at Virginia Tech
- Website: [www.jhelvy.com](https://www.jhelvy.com/)

]]

---

# Meet your tutors!

]]

### **Pingfan Hu**

- Graduate Teaching Assistant (GTA)
- PhD student in EMSE
- Website: [www.pingfanhu.com](https://www.pingfanhu.com/)

]

---

# Meet your tutors!

]]

### **Bogdan Bunea**

- Learning Assistant (LA)
- EMSE Junior & P4A / EDA alumni
- Check out his team's [project](https://eda.seas.gwu.edu/showcase/2023-Fall/ukraine-war.html) from 2023

]

---

# Prerequisites

## [EMSE 4574 / 6574: Intro to Programming for Analytics](https://p4a.seas.gwu.edu/)

You should be able to:

- Use RStudio to write basic R commands.
- Know the distinctions between different R operators and data types, including numeric, string, and logical data.
- Use **tidyverse** functions to wrangle and manipulate data in R.
- Use the **ggplot2** library to create plots in R.

> [<svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M243.4 2.6l-224 96c-14 6-21.8 21-18.7 35.8S16.8 160 32 160v8c0 13.3 10.7 24 24 24H456c13.3 0 24-10.7 24-24v-8c15.2 0 28.3-10.7 31.3-25.6s-4.8-29.9-18.7-35.8l-224-96c-8-3.4-17.2-3.4-25.2 0zM128 224H64V420.3c-.6 .3-1.2 .7-1.8 1.1l-48 32c-11.7 7.8-17 22.4-12.9 35.9S17.9 512 32 512H480c14.1 0 26.5-9.2 30.6-22.7s-1.1-28.1-12.9-35.9l-48-32c-.6-.4-1.2-.7-1.8-1.1V224H384V416H344V224H280V416H232V224H168V416H128V224zM256 64a32 32 0 1 1 0 64 32 32 0 1 1 0-64z"/></svg> Check out R for Analytics Primer](http://jhelvy.github.io/r4aPrimer/)

---

# Course website

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M352 256c0 22.2-1.2 43.6-3.3 64H163.3c-2.2-20.4-3.3-41.8-3.3-64s1.2-43.6 3.3-64H348.7c2.2 20.4 3.3 41.8 3.3 64zm28.8-64H503.9c5.3 20.5 8.1 41.9 8.1 64s-2.8 43.5-8.1 64H380.8c2.1-20.6 3.2-42 3.2-64s-1.1-43.4-3.2-64zm112.6-32H376.7c-10-63.9-29.8-117.4-55.3-151.6c78.3 20.7 142 77.5 171.9 151.6zm-149.1 0H167.7c6.1-36.4 15.5-68.6 27-94.7c10.5-23.6 22.2-40.7 33.5-51.5C239.4 3.2 248.7 0 256 0s16.6 3.2 27.8 13.8c11.3 10.8 23 27.9 33.5 51.5c11.6 26 20.9 58.2 27 94.7zm-209 0H18.6C48.6 85.9 112.2 29.1 190.6 8.4C165.1 42.6 145.3 96.1 135.3 160zM8.1 192H131.2c-2.1 20.6-3.2 42-3.2 64s1.1 43.4 3.2 64H8.1C2.8 299.5 0 278.1 0 256s2.8-43.5 8.1-64zM194.7 446.6c-11.6-26-20.9-58.2-27-94.6H344.3c-6.1 36.4-15.5 68.6-27 94.6c-10.5 23.6-22.2 40.7-33.5 51.5C272.6 508.8 263.3 512 256 512s-16.6-3.2-27.8-13.8c-11.3-10.8-23-27.9-33.5-51.5zM135.3 352c10 63.9 29.8 117.4 55.3 151.6C112.2 482.9 48.6 426.1 18.6 352H135.3zm358.1 0c-30 74.1-93.6 130.9-171.9 151.6c25.5-34.2 45.2-87.7 55.3-151.6H493.4z"/></svg> Everything you need will be on the course website:<br>https://eda.seas.gwu.edu/2024-Fall/

---

# **Quizzes** (10% of grade)

> **Why quiz at all?** The "retrieval effect" - basically, you have to _practice_ remembering things, otherwise your brain won't remember them (see the book ["Make It Stick: The Science of Successful Learning"](https://www.hup.harvard.edu/catalog.php?isbn=9780674729018))

---

## Assignments

## 1) <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M96 0C43 0 0 43 0 96V416c0 53 43 96 96 96H384h32c17.7 0 32-14.3 32-32s-14.3-32-32-32V384c17.7 0 32-14.3 32-32V32c0-17.7-14.3-32-32-32H384 96zm0 384H352v64H96c-17.7 0-32-14.3-32-32s14.3-32 32-32zm32-240c0-8.8 7.2-16 16-16H336c8.8 0 16 7.2 16 16s-7.2 16-16 16H144c-8.8 0-16-7.2-16-16zm16 48H336c8.8 0 16 7.2 16 16s-7.2 16-16 16H144c-8.8 0-16-7.2-16-16s7.2-16 16-16z"/></svg> Weekly Homework / Readings: [HW1](https://eda.seas.gwu.edu/2024-Fall/hw/1-tidy-data.html)

## 2) <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M469.3 19.3l23.4 23.4c25 25 25 65.5 0 90.5l-56.4 56.4L322.3 75.7l56.4-56.4c25-25 65.5-25 90.5 0zM44.9 353.2L299.7 98.3 413.7 212.3 158.8 467.1c-6.7 6.7-15.1 11.6-24.2 14.2l-104 29.7c-8.4 2.4-17.4 .1-23.6-6.1s-8.5-15.2-6.1-23.6l29.7-104c2.6-9.2 7.5-17.5 14.2-24.2zM249.4 103.4L103.4 249.4 16 161.9c-18.7-18.7-18.7-49.1 0-67.9L94.1 16c18.7-18.7 49.1-18.7 67.9 0l19.8 19.8c-.3 .3-.7 .6-1 .9l-64 64c-6.2 6.2-6.2 16.4 0 22.6s16.4 6.2 22.6 0l64-64c.3-.3 .6-.7 .9-1l45.1 45.1zM408.6 262.6l45.1 45.1c-.3 .3-.7 .6-1 .9l-64 64c-6.2 6.2-6.2 16.4 0 22.6s16.4 6.2 22.6 0l64-64c.3-.3 .6-.7 .9-1L496 350.1c18.7 18.7 18.7 49.1 0 67.9L417.9 496c-18.7 18.7-49.1 18.7-67.9 0l-87.4-87.4L408.6 262.6z"/></svg> 3 Mini Projects (due 2 weeks from date assigned)

## 3) <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M469.3 19.3l23.4 23.4c25 25 25 65.5 0 90.5l-56.4 56.4L322.3 75.7l56.4-56.4c25-25 65.5-25 90.5 0zM44.9 353.2L299.7 98.3 413.7 212.3 158.8 467.1c-6.7 6.7-15.1 11.6-24.2 14.2l-104 29.7c-8.4 2.4-17.4 .1-23.6-6.1s-8.5-15.2-6.1-23.6l29.7-104c2.6-9.2 7.5-17.5 14.2-24.2zM249.4 103.4L103.4 249.4 16 161.9c-18.7-18.7-18.7-49.1 0-67.9L94.1 16c18.7-18.7 49.1-18.7 67.9 0l19.8 19.8c-.3 .3-.7 .6-1 .9l-64 64c-6.2 6.2-6.2 16.4 0 22.6s16.4 6.2 22.6 0l64-64c.3-.3 .6-.7 .9-1l45.1 45.1zM408.6 262.6l45.1 45.1c-.3 .3-.7 .6-1 .9l-64 64c-6.2 6.2-6.2 16.4 0 22.6s16.4 6.2 22.6 0l64-64c.3-.3 .6-.7 .9-1L496 350.1c18.7 18.7 18.7 49.1 0 67.9L417.9 496c-18.7 18.7-49.1 18.7-67.9 0l-87.4-87.4L408.6 262.6z"/></svg> [Final Project](https://eda.seas.gwu.edu/2023-Fall/project/0-overview.html)

**Undergrads**: Teams of 3 - 4 students

**Grads**: Teams of 2 students

]

Item            | Due Date
----------------|---------------
Proposal        | Sep 22
Progress Report | Oct 27
Final Report    | Dec 08
Presentation    | Dec 11

]

---

# .center[Grades]

Item                           | Weight | Notes
-------------------------------|--------|-------------------------------------
Participation / Attendance | 5%     | (Yes, I take attendance)
Reflections           | 12 %   | Weekly assignment, lowest dropped)
Quizzes                        | 10 %    | 5 quizzes, lowest dropped
Mini Project 1                 | 10 %    | Individual assignments
Mini Project 2                 | 10 %    |
Mini Project 3                 | 10 %    |
Final Project: Proposal        | 6 %    |
Final Project: Progress Report | 6 %   |
Final Project: Report          | 15 %   |
Final Project: Presentation    | 6 %    |
Final Interview                | 10 %   | Individual interview

---

background-color: #FFF

# .center[Grades]

---

# Course policies

- ## BE NICE
- ## BE HONEST
- ## DON'T CHEAT

]

## Copying is good, stealing is bad

> "Plagiarism is trying to pass someone else’s work off as your own. Copying is about reverse-engineering."
>
> .right[-- Austin Kleon, from [Steal Like An Artist](https://austinkleon.com/steal/)&ensp;]

]

---

## Use of chatGPT and other AI tools

- Large language models (LLMs) are pretty good...but sometimes suck.

--
 
- Use of AI tools is generally permitted, but **be transparent**.

- All assignments must include a **Use of AI on this assignment** section where you:

- Describe any AI tool and how it was used along with prompt(s) used.
    - Include a link to the chat transcript.

## **Use AI as an assistant, not a solutions manual**

> Curious how LLMs actually work? Check out [this article](https://www.understandingai.org/p/large-language-models-explained-with), which provides a simplified description of how they work (which itself is still quite complicated).

---

# Late submissions

## - **5** late days - use them anytime, no questions asked
## - No more than **2** late days on any one assignment
## - Contact me for special cases

---

# How to succeed in this class

## <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M144 0a80 80 0 1 1 0 160A80 80 0 1 1 144 0zM512 0a80 80 0 1 1 0 160A80 80 0 1 1 512 0zM0 298.7C0 239.8 47.8 192 106.7 192h42.7c15.9 0 31 3.5 44.6 9.7c-1.3 7.2-1.9 14.7-1.9 22.3c0 38.2 16.8 72.5 43.3 96c-.2 0-.4 0-.7 0H21.3C9.6 320 0 310.4 0 298.7zM405.3 320c-.2 0-.4 0-.7 0c26.6-23.5 43.3-57.8 43.3-96c0-7.6-.7-15-1.9-22.3c13.6-6.3 28.7-9.7 44.6-9.7h42.7C592.2 192 640 239.8 640 298.7c0 11.8-9.6 21.3-21.3 21.3H405.3zM224 224a96 96 0 1 1 192 0 96 96 0 1 1 -192 0zM128 485.3C128 411.7 187.7 352 261.3 352H378.7C452.3 352 512 411.7 512 485.3c0 14.7-11.9 26.7-26.7 26.7H154.7c-14.7 0-26.7-11.9-26.7-26.7z"/></svg> Participate during class!

## <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M469.3 19.3l23.4 23.4c25 25 25 65.5 0 90.5l-56.4 56.4L322.3 75.7l56.4-56.4c25-25 65.5-25 90.5 0zM44.9 353.2L299.7 98.3 413.7 212.3 158.8 467.1c-6.7 6.7-15.1 11.6-24.2 14.2l-104 29.7c-8.4 2.4-17.4 .1-23.6-6.1s-8.5-15.2-6.1-23.6l29.7-104c2.6-9.2 7.5-17.5 14.2-24.2zM249.4 103.4L103.4 249.4 16 161.9c-18.7-18.7-18.7-49.1 0-67.9L94.1 16c18.7-18.7 49.1-18.7 67.9 0l19.8 19.8c-.3 .3-.7 .6-1 .9l-64 64c-6.2 6.2-6.2 16.4 0 22.6s16.4 6.2 22.6 0l64-64c.3-.3 .6-.7 .9-1l45.1 45.1zM408.6 262.6l45.1 45.1c-.3 .3-.7 .6-1 .9l-64 64c-6.2 6.2-6.2 16.4 0 22.6s16.4 6.2 22.6 0l64-64c.3-.3 .6-.7 .9-1L496 350.1c18.7 18.7 18.7 49.1 0 67.9L417.9 496c-18.7 18.7-49.1 18.7-67.9 0l-87.4-87.4L408.6 262.6z"/></svg> Start assignments early and **read carefully**!

## <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M80 48a48 48 0 1 1 96 0A48 48 0 1 1 80 48zm64 193.7v65.1l51 51c7.1 7.1 11.8 16.2 13.4 26.1l15.2 90.9c2.9 17.4-8.9 33.9-26.3 36.8s-33.9-8.9-36.8-26.3l-14.3-85.9L66.8 320C54.8 308 48 291.7 48 274.7V186.6c0-32.4 26.2-58.6 58.6-58.6c24.1 0 46.5 12 59.9 32l47.4 71.1 10.1 5V160c0-17.7 14.3-32 32-32H384c17.7 0 32 14.3 32 32v76.2l10.1-5L473.5 160c13.3-20 35.8-32 59.9-32c32.4 0 58.6 26.2 58.6 58.6v88.1c0 17-6.7 33.3-18.7 45.3l-79.4 79.4-14.3 85.9c-2.9 17.4-19.4 29.2-36.8 26.3s-29.2-19.4-26.3-36.8l15.2-90.9c1.6-9.9 6.3-19 13.4-26.1l51-51V241.7l-19 28.5c-4.6 7-11 12.6-18.5 16.3l-59.6 29.8c-2.4 1.3-4.9 2.2-7.6 2.8c-2.6 .6-5.3 .9-7.9 .8H256.7c-2.5 .1-5-.2-7.5-.7c-2.9-.6-5.6-1.6-8.1-3l-59.5-29.8c-7.5-3.7-13.8-9.4-18.5-16.3l-19-28.5zM2.3 468.1L50.1 348.6l49.2 49.2-37.6 94c-6.6 16.4-25.2 24.4-41.6 17.8S-4.3 484.5 2.3 468.1zM512 0a48 48 0 1 1 0 96 48 48 0 1 1 0-96zm77.9 348.6l47.8 119.5c6.6 16.4-1.4 35-17.8 41.6s-35-1.4-41.6-17.8l-37.6-94 49.2-49.2z"/></svg> Ask for help!

---

# Getting Help

## <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"/></svg> Use [Slack](https://emse-eda-f24.slack.com/) to ask questions.

- Mondays from 8:00-4:30pm
- Tuesdays from 8:00-4:30pm
- Fridays from 8:00-4:00pm

---

# <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M64 80c-8.8 0-16 7.2-16 16V258c5.1-1.3 10.5-2 16-2H448c5.5 0 10.9 .7 16 2V96c0-8.8-7.2-16-16-16H64zM48 320v96c0 8.8 7.2 16 16 16H448c8.8 0 16-7.2 16-16V320c0-8.8-7.2-16-16-16H64c-8.8 0-16 7.2-16 16zM0 320V96C0 60.7 28.7 32 64 32H448c35.3 0 64 28.7 64 64V320v96c0 35.3-28.7 64-64 64H64c-35.3 0-64-28.7-64-64V320zm280 48a24 24 0 1 1 48 0 24 24 0 1 1 -48 0zm120-24a24 24 0 1 1 0 48 24 24 0 1 1 0-48z"/></svg> [Course Software](https://eda.seas.gwu.edu/2024-Fall/software.html)

## <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M94.12 315.1c0 25.9-21.16 47.06-47.06 47.06S0 341 0 315.1c0-25.9 21.16-47.06 47.06-47.06h47.06v47.06zm23.72 0c0-25.9 21.16-47.06 47.06-47.06s47.06 21.16 47.06 47.06v117.84c0 25.9-21.16 47.06-47.06 47.06s-47.06-21.16-47.06-47.06V315.1zm47.06-188.98c-25.9 0-47.06-21.16-47.06-47.06S139 32 164.9 32s47.06 21.16 47.06 47.06v47.06H164.9zm0 23.72c25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06H47.06C21.16 243.96 0 222.8 0 196.9s21.16-47.06 47.06-47.06H164.9zm188.98 47.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06s-21.16 47.06-47.06 47.06h-47.06V196.9zm-23.72 0c0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06V79.06c0-25.9 21.16-47.06 47.06-47.06 25.9 0 47.06 21.16 47.06 47.06V196.9zM283.1 385.88c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06-25.9 0-47.06-21.16-47.06-47.06v-47.06h47.06zm0-23.72c-25.9 0-47.06-21.16-47.06-47.06 0-25.9 21.16-47.06 47.06-47.06h117.84c25.9 0 47.06 21.16 47.06 47.06 0 25.9-21.16 47.06-47.06 47.06H283.1z"/></svg> [Slack](https://emse-eda-f24.slack.com/): See bb for link to join;<br>install on phone and **turn notifications on**!

## <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> [R](https://cloud.r-project.org/) & [RStudio](https://posit.co/download/rstudio-desktop/) (Install both)

---

<br>

# .center[.fancy[Break]]

1. If you haven't already, install everything on the [software page](https://eda.seas.gwu.edu/2024-Fall/software.html)

2. Stand up, meet each other, (maybe form teams?...use [this sheet](https://docs.google.com/spreadsheets/d/15pn9VNtYBG3XF-1OhvdKLMoNj4hOTXCGf4U1KNx0Tco/edit?usp=sharing))

---

# Week 1: .fancy[Getting Started]

### 1. Course Goal
### 2. Course Introduction
### 3. Break: Install Stuff
### 4. .orange[Quarto]
### 5. Workflow & Reading In Data
### 6. Wrangling Data
### 7. Visualizing Data

---

# .center[Quick demo]

<br>

# 1. Open `quarto_demo.qmd`

# 2. Click "Render"

---

# .center[Anatomy of a .qmd file]

<br>

# .red[Header]

# Markdown text

# R code

---

# Define overall document options in header

Basic html page

```
---
title: Your title
author: Author name
format: html
---
```

]

Add table of contents, change theme

```
---
title: Your title
author: Author name
toc: true 
format:
  html:
    theme: united
---
```

More on themes at https://quarto.org/docs/output-formats/html-themes.html

]

---

# Render to multiple outputs

### PDF uses LaTeX

```
---
title: Your title
author: Author name
format: pdf 
---
```

If you don't have LaTeX on your computer, install tinytex in R:

``` r
tinytex::install_tinytex()
```

]

### Microsoft Word

```
---
title: Your title
author: Author name
format: docx
---
```

]

---

# .center[Anatomy of a .qmd file]

<br>

# ~~Header~~

# .red[Markdown text]

# R code

---

# Right now, bookmark this! 👇

# https://commonmark.org/help/

# (When you have 10 minutes, do this! 👇)

# https://commonmark.org/help/tutorial/

---

# .center[Headers]

```markdown
# HEADER 1

## HEADER 2

### HEADER 3

#### HEADER 4

##### HEADER 5

###### HEADER 6
```

]

# HEADER 1

## HEADER 2

### HEADER 3

#### HEADER 4

##### HEADER 5

###### HEADER 6

]

---

# .center[Basic Text Formatting]

## Type this...

- `normal text`
- `_italic text_`
- `*italic text*`
- `**bold text**`
- `***bold italic text***`
- `~~strikethrough~~`
- `` `code text` ``

]

## ..to get this

- normal text
- _italic text_
- *italic text*
- **bold text**
- ***bold italic text***
- ~~strikethrough~~
- `code text`

]

---

# .center[Lists]

Bullet list:

``` r
- first item
- second item
- third item
```

- first item
- second item
- third item

]

Numbered list:

``` r
1. first item
2. second item
3. third item
```

1. first item
2. second item
3. third item

]

---

# .center[Links]

Simple **url link** to another site:

``` r
[Download R](http://www.r-project.org/)
```

[Download R](http://www.r-project.org/)

---

# Don't want to use Markdown?

# .red[Use Visual Mode!]

---

# .center[Anatomy of a .qmd file]

<br>

# ~~Header (think of this as the "settings")~~

# ~~Markdown text~~

# .red[R code]

---

# R Code

## Inline code

``` r
`r insert code here`
```

]]

## Code chunks

````markdown
```{r}
insert code here
insert more code here
```
````

]]

---

# Inline R code

``` r
The sum of 3 and 4 is `r 3 + 4`
```

Produces this:

The sum of 3 and 4 is 7

---

# R Code chunks

This code chunk...

````markdown
```{r}
library(palmerpenguins)

head(penguins)
```
````

]

...will produce this when compiled:

``` r
library(palmerpenguins)

head(penguins)
```

```
#> # A tibble: 6 × 8
#>   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex     year
#>   <fct>   <fct>              <dbl>         <dbl>             <int>       <int> <fct>  <int>
#> 1 Adelie  Torgersen           39.1          18.7               181        3750 male    2007
#> 2 Adelie  Torgersen           39.5          17.4               186        3800 female  2007
#> 3 Adelie  Torgersen           40.3          18                 195        3250 female  2007
#> 4 Adelie  Torgersen           NA            NA                  NA          NA <NA>    2007
#> 5 Adelie  Torgersen           36.7          19.3               193        3450 female  2007
#> 6 Adelie  Torgersen           39.3          20.6               190        3650 male    2007
```

]

---

# Chunk options

Control what chunks output using options

All options [here](https://quarto.org/docs/reference/cells/cells-knitr.html)

---

# .center[Chunk output options]

````markdown
```{r}
#| echo: false

cat('hello world!')
```
````

Prints only **output**<br>(doesn't show code)

```
#> hello world!
```

]

````markdown
```{r}
#| eval: false

cat('hello world!')
```
````

Prints only **code**<br>(doesn't run the code)

``` r
cat('hello world!')
```

]

````markdown
```{r}
#| include: false

cat('hello world!')
```
````

Runs, but doesn't print anything

]

---

# A global `setup` chunk 🌍

````markdown
```{r}
#| label: setup
#| include: false

knitr::opts_chunk$set(
    warning = FALSE,
    message = FALSE,
    fig.path = "figs/",
    fig.width = 7.252,
    fig.height = 4,
    comment = "#>",
    fig.retina = 3
)
```
````

]

- Typically the first chunk
- All following chunks will use these options (i.e., sets global chunk options)
- You can (and should) use individual chunk options too
- Often where I load libraries, etc.

]

---

# Week 1: .fancy[Getting Started]

### 1. Course Goal
### 2. Course Introduction
### 3. Break: Install Stuff
### 4. Quarto
### 5. .orange[Workflow & Reading In Data]
### 6. Wrangling Data
### 7. Visualizing Data

---

## Workflow for reading in data

1) Use R Projects (.Rproj files) to organize your analysis - **don't double-click .R files**!

2) Use the `here` package to create file paths

``` r
path <- here::here("folder", "file.csv")
```

3) Import data with these functions:

File type  | Function       | Library
-----------|----------------|----------
`.csv`     | `read_csv()`   | **readr**
`.txt`     | `read.table()` | **utils**
`.xlsx`    | `read_excel()` | **readxl**

---

# Importing Comma Separated Values (.csv)

Read in `.csv` files with `read_csv()`:

``` r
library(tidyverse)
library(here)

csvPath <- here('data', 'milk_production.csv')
*milk_production <- read_csv(csvPath)

head(milk_production)
```

```
#> # A tibble: 6 × 4
#>   region    state          year milk_produced
#>   <chr>     <chr>         <dbl>         <dbl>
#> 1 Northeast Maine          1970     619000000
#> 2 Northeast New Hampshire  1970     356000000
#> 3 Northeast Vermont        1970    1970000000
#> 4 Northeast Massachusetts  1970     658000000
#> 5 Northeast Rhode Island   1970      75000000
#> 6 Northeast Connecticut    1970     661000000
```

---

# Importing Text Files (.txt)

Read in `.txt` files with `read.table()`:

``` r
txtPath <- here('data', 'nasa_global_temps.txt')
*global_temps <- read.table(txtPath, skip = 5, header = FALSE)

head(global_temps)
```

```
#>     V1    V2    V3
#> 1 1880 -0.15 -0.08
#> 2 1881 -0.07 -0.12
#> 3 1882 -0.10 -0.15
#> 4 1883 -0.16 -0.19
#> 5 1884 -0.27 -0.23
#> 6 1885 -0.32 -0.25
```

---

# Importing Text Files (.txt)

Read in `.txt` files with `read.table()`:

``` r
txtPath <- here('data', 'nasa_global_temps.txt')
global_temps <- read.table(txtPath, skip = 5, header = FALSE)
*names(global_temps) <- c('year', 'no_smoothing', 'loess') # Add header

head(global_temps)
```

```
#>   year no_smoothing loess
#> 1 1880        -0.15 -0.08
#> 2 1881        -0.07 -0.12
#> 3 1882        -0.10 -0.15
#> 4 1883        -0.16 -0.19
#> 5 1884        -0.27 -0.23
#> 6 1885        -0.32 -0.25
```

---

# Importing Excel Files (.xlsx)

Read in `.xlsx` files with `read_excel()`:

``` r
library(readxl)

xlsxPath <- here('data', 'pv_cell_production.xlsx')
*pv_cells <- read_excel(xlsxPath, sheet = 'Cell Prod by Country', skip = 2)
```

``` r
glimpse(pv_cells)
```

```
#> Rows: 25
#> Columns: 10
#> $ Year            <chr> NA, NA, "1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", NA, "Note: NA = data not available.", NA, "Source: Compiled by E…
#> $ China           <chr> "Megawatts", NA, "NA", "NA", "NA", "NA", "NA", "2.5", "3", "10", "13", "40", "128.30000000000001", "341.8", "1192.8735755126208", "2535.9804999999997", "5193.2335000000003", "12882.114299891044", "24338.646000000004", "24139…
#> $ Taiwan          <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "NA", "3.5", "8", "17", "39.299999999999997", "88", "169.5", "413.19362206495737", "871.4", "1573.2", "3755.9046488657718", "4773.1499999999996", "5270.1999999999989", "6338.565000000000…
#> $ Japan           <dbl> NA, NA, 16.4, 21.2, 35.0, 49.0, 80.0, 128.6, 171.2, 251.1, 363.9, 601.5, 833.0, 926.4, 937.5, 1268.0, 1503.0, 2169.0, 2707.0, 2641.8, 3679.0, NA, NA, NA, NA
#> $ Malaysia        <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "0", "0", "100.1", "397.9", "1228.0566037735848", "1919.0129442119946", "2684.5953947368421", "2597.365436241611", "3072.59", NA, NA, NA, NA
#> $ Germany         <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "22.5", "23.5", "55", "121.5", "193", "339", "469.1", "815.35421116529074", "1476.6923205919056", "1606.0497978436656", "2181.2726133183096", "2152.8626315789475", "1406.7827181208054", …
#> $ `South Korea`   <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "5.3", "13", "31.883935905674612", "70.848164851527258", "234", "886.29518449560589", "1227.3", "1107.0999999999999", "1127.0999999999999", NA, NA, NA, NA
#> $ `United States` <dbl> NA, NA, 34.7500, 38.8500, 51.0000, 53.7000, 60.8000, 75.0000, 100.3000, 120.6000, 103.0000, 138.7000, 153.1000, 177.6000, 261.9804, 403.1250, 594.7922, 1162.5177, 1044.1895, 886.4018, 868.4250, NA, NA, NA, NA
#> $ Others          <chr> NA, NA, "NA", "NA", "NA", "NA", "NA", "48.200000000000017", "69.800000000000011", "97.299999999999955", "131", "186.29999999999995", "235.70000000000027", "361.09999999999991", "410.97322650945807", "709.03112641453299", "66…
#> $ World           <dbl> NA, NA, 77.600, 88.600, 125.800, 154.900, 201.300, 276.800, 371.300, 542.000, 749.400, 1198.800, 1782.400, 2458.500, 4163.859, 7732.977, 12595.992, 26399.539, 40761.761, 39523.565, 44464.496, NA, NA, NA, NA
```

]

---

# Importing Excel Files (.xlsx)

Read in `.xlsx` files with `read_excel()`:

``` r
library(readxl)

xlsxPath <- here('data', 'pv_cell_production.xlsx')
pv_cells <- read_excel(xlsxPath, sheet = 'Cell Prod by Country', skip = 2) %>%
* mutate(Year = as.numeric(Year)) %>% # Convert "non-years" to NA
* filter(!is.na(Year)) # Drop NA rows in Year
```

``` r
glimpse(pv_cells)
```

```
#> Rows: 19
#> Columns: 10
#> $ Year            <dbl> 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013
#> $ China           <chr> "NA", "NA", "NA", "NA", "NA", "2.5", "3", "10", "13", "40", "128.30000000000001", "341.8", "1192.8735755126208", "2535.9804999999997", "5193.2335000000003", "12882.114299891044", "24338.646000000004", "24139.014999999999", "…
#> $ Taiwan          <chr> "NA", "NA", "NA", "NA", "NA", "NA", "3.5", "8", "17", "39.299999999999997", "88", "169.5", "413.19362206495737", "871.4", "1573.2", "3755.9046488657718", "4773.1499999999996", "5270.1999999999989", "6338.5650000000005"
#> $ Japan           <dbl> 16.4, 21.2, 35.0, 49.0, 80.0, 128.6, 171.2, 251.1, 363.9, 601.5, 833.0, 926.4, 937.5, 1268.0, 1503.0, 2169.0, 2707.0, 2641.8, 3679.0
#> $ Malaysia        <chr> "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "0", "0", "100.1", "397.9", "1228.0566037735848", "1919.0129442119946", "2684.5953947368421", "2597.365436241611", "3072.59"
#> $ Germany         <chr> "NA", "NA", "NA", "NA", "NA", "22.5", "23.5", "55", "121.5", "193", "339", "469.1", "815.35421116529074", "1476.6923205919056", "1606.0497978436656", "2181.2726133183096", "2152.8626315789475", "1406.7827181208054", "1054.88…
#> $ `South Korea`   <chr> "NA", "NA", "NA", "NA", "NA", "NA", "0", "0", "0", "0", "5.3", "13", "31.883935905674612", "70.848164851527258", "234", "886.29518449560589", "1227.3", "1107.0999999999999", "1127.0999999999999"
#> $ `United States` <dbl> 34.7500, 38.8500, 51.0000, 53.7000, 60.8000, 75.0000, 100.3000, 120.6000, 103.0000, 138.7000, 153.1000, 177.6000, 261.9804, 403.1250, 594.7922, 1162.5177, 1044.1895, 886.4018, 868.4250
#> $ Others          <chr> "NA", "NA", "NA", "NA", "NA", "48.200000000000017", "69.800000000000011", "97.299999999999955", "131", "186.29999999999995", "235.70000000000027", "361.09999999999991", "410.97322650945807", "709.03112641453299", "663.660000…
#> $ World           <dbl> 77.600, 88.600, 125.800, 154.900, 201.300, 276.800, 371.300, 542.000, 749.400, 1198.800, 1782.400, 2458.500, 4163.859, 7732.977, 12595.992, 26399.539, 40761.761, 39523.565, 44464.496
```

]

---

# Your turn

Open the `practice.qmd` file.

Write code to import the following data files from the "data" folder:

- For `lotr_words.csv`, call the data frame `lotr`
- For `north_america_bear_killings.txt`, call the data frame `bears`
- For `uspto_clean_energy_patents.xlsx`, call the data frame `patents`

---

# Week 1: .fancy[Getting Started]

### 1. Course Goal
### 2. Course Introduction
### 3. Break: Install Stuff
### 4. Quarto
### 5. Workflow & Reading In Data
### 6. .orange[Wrangling Data]
### 7. Visualizing Data

---

# .center[The data frame...<br>in .darkgreen[Excel]]

]

# .center[The data frame...<br>in <svg aria-hidden="true" role="img" viewBox="0 0 581 512" style="height:1em;width:1.13em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:blue;overflow:visible;position:relative;"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg>]

``` r
lotr
```

```
#> # A tibble: 18 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Elf    Female       1229
#>  2 The Fellowship Of The Ring Elf    Male          971
#>  3 The Fellowship Of The Ring Hobbit Female         14
#>  4 The Fellowship Of The Ring Hobbit Male         3644
#>  5 The Fellowship Of The Ring Man    Female          0
#>  6 The Fellowship Of The Ring Man    Male         1995
#>  7 The Return Of The King     Elf    Female        183
#>  8 The Return Of The King     Elf    Male          510
#>  9 The Return Of The King     Hobbit Female          2
#> 10 The Return Of The King     Hobbit Male         2673
#> 11 The Return Of The King     Man    Female        268
#> 12 The Return Of The King     Man    Male         2459
#> 13 The Two Towers             Elf    Female        331
#> 14 The Two Towers             Elf    Male          513
#> 15 The Two Towers             Hobbit Female          0
#> 16 The Two Towers             Hobbit Male         2463
#> 17 The Two Towers             Man    Female        401
#> 18 The Two Towers             Man    Male         3589
```

]

---

## **Columns**: _Vectors_ of values (must be same data type)

Extract a column using `$`

``` r
lotr$race
```

```
#>  [1] "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"
```

---

## **Columns**: _Vectors_ of values (must be same data type)

Can also use brackets:

``` r
lotr$race
```

```
#>  [1] "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"    "Elf"    "Elf"    "Hobbit" "Hobbit" "Man"    "Man"
```

``` r
lotr[,2]
```

```
#> # A tibble: 18 × 1
#>    race  
#>    <chr> 
#>  1 Elf   
#>  2 Elf   
#>  3 Hobbit
#>  4 Hobbit
#>  5 Man   
#>  6 Man   
#>  7 Elf   
#>  8 Elf   
#>  9 Hobbit
#> 10 Hobbit
#> 11 Man   
#> 12 Man   
#> 13 Elf   
#> 14 Elf   
#> 15 Hobbit
#> 16 Hobbit
#> 17 Man   
#> 18 Man
```

---

## **Rows**: Information about individual observations

Information about the first row:

``` r
lotr[1,]
```

```
#> # A tibble: 1 × 4
#>   film                       race  gender word_count
#>   <chr>                      <chr> <chr>       <dbl>
#> 1 The Fellowship Of The Ring Elf   Female       1229
```

Information about rows 1 & 2:

``` r
lotr[1:2,]
```

```
#> # A tibble: 2 × 4
#>   film                       race  gender word_count
#>   <chr>                      <chr> <chr>       <dbl>
#> 1 The Fellowship Of The Ring Elf   Female       1229
#> 2 The Fellowship Of The Ring Elf   Male          971
```

---

## Quick Practice

Read in the `data.csv` file in the "data" folder:

``` r
data <- read_csv(here('data', 'data.csv'))
```

Now answer these questions:

- How many rows and columns are in the data frame?
- What type of data is each column?
- Preview the different columns - what do you think this data is about? What might one row represent?
- How many unique airlines are in the data frame? 
- What is the shortest and longest air time for any one flight in the data frame?

---

### The tidyverse: `stringr` + `dplyr` + `readr` +  `ggplot2` + ...

<center>
<img src="images/horst_monsters_tidyverse.jpeg" width="950">
</center>Art by [Allison Horst](https://www.allisonhorst.com/)

---

# .center[The main `dplyr` "verbs"]

<br>

"Verb"        | What it does
--------------|--------------------
`select()`    | Select columns by name
`filter()`    | Keep rows that match criteria
`arrange()`   | Sort rows based on column(s)
`mutate()`    | Create new columns 
`summarize()` | Create summary values

---

# .center[Core `tidyverse` concept:<br>**Chain functions together with "pipes"**]

# .center[`%>%`]

## Think of the words "...and then..."

``` r
data %>% 
  do_something() %>% 
  do_something_else()
```

---

# Select columns with `select()`

---

# Select columns with `select()`

Select the columns `film` & `race`

``` r
lotr %>% 
  select(film, race)
```

```
#> # A tibble: 18 × 2
#>    film                       race  
#>    <chr>                      <chr> 
#>  1 The Fellowship Of The Ring Elf   
#>  2 The Fellowship Of The Ring Elf   
#>  3 The Fellowship Of The Ring Hobbit
#>  4 The Fellowship Of The Ring Hobbit
#>  5 The Fellowship Of The Ring Man   
#>  6 The Fellowship Of The Ring Man   
#>  7 The Return Of The King     Elf   
#>  8 The Return Of The King     Elf   
#>  9 The Return Of The King     Hobbit
#> 10 The Return Of The King     Hobbit
#> 11 The Return Of The King     Man   
#> 12 The Return Of The King     Man   
#> 13 The Two Towers             Elf   
#> 14 The Two Towers             Elf   
#> 15 The Two Towers             Hobbit
#> 16 The Two Towers             Hobbit
#> 17 The Two Towers             Man   
#> 18 The Two Towers             Man
```

---

# Select columns with `select()`

Use the `-` sign to drop columns

``` r
lotr %>% 
  select(-film)
```

```
#> # A tibble: 18 × 3
#>    race   gender word_count
#>    <chr>  <chr>       <dbl>
#>  1 Elf    Female       1229
#>  2 Elf    Male          971
#>  3 Hobbit Female         14
#>  4 Hobbit Male         3644
#>  5 Man    Female          0
#>  6 Man    Male         1995
#>  7 Elf    Female        183
#>  8 Elf    Male          510
#>  9 Hobbit Female          2
#> 10 Hobbit Male         2673
#> 11 Man    Female        268
#> 12 Man    Male         2459
#> 13 Elf    Female        331
#> 14 Elf    Male          513
#> 15 Hobbit Female          0
#> 16 Hobbit Male         2463
#> 17 Man    Female        401
#> 18 Man    Male         3589
```

---

# Filter for rows with `filter()`

---

# Filter for rows with `filter()`

Keep only the rows with Elf characters

``` r
lotr %>% 
    filter(race == "Elf")
```

```
#> # A tibble: 6 × 4
#>   film                       race  gender word_count
#>   <chr>                      <chr> <chr>       <dbl>
#> 1 The Fellowship Of The Ring Elf   Female       1229
#> 2 The Fellowship Of The Ring Elf   Male          971
#> 3 The Return Of The King     Elf   Female        183
#> 4 The Return Of The King     Elf   Male          510
#> 5 The Two Towers             Elf   Female        331
#> 6 The Two Towers             Elf   Male          513
```

---

# Filter for rows with `filter()`

Keep only the rows with Elf or Hobbit characters

``` r
lotr %>% 
    filter((race == "Elf") | (race  == "Hobbit"))
```

```
#> # A tibble: 12 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Elf    Female       1229
#>  2 The Fellowship Of The Ring Elf    Male          971
#>  3 The Fellowship Of The Ring Hobbit Female         14
#>  4 The Fellowship Of The Ring Hobbit Male         3644
#>  5 The Return Of The King     Elf    Female        183
#>  6 The Return Of The King     Elf    Male          510
#>  7 The Return Of The King     Hobbit Female          2
#>  8 The Return Of The King     Hobbit Male         2673
#>  9 The Two Towers             Elf    Female        331
#> 10 The Two Towers             Elf    Male          513
#> 11 The Two Towers             Hobbit Female          0
#> 12 The Two Towers             Hobbit Male         2463
```

---

# Filter for rows with `filter()`

Keep only the rows with Elf or Hobbit characters

``` r
lotr %>% 
    filter(race %in% c("Elf", "Hobbit"))
```

---

# .center[Logic operators for `filter()`]

<br>

Description | Example
------------|------------
Values greater than 1 | `value > 1`
Values greater than or equal to 1 | `value >= 1`
Values less than 1 | `value < 1`
Values less than or equal to 1 | `value <= 1`
Values equal to 1 | `value == 1`
Values not equal to 1 | `value != 1`
Values in the set c(1, 4) | `value %in% c(1, 4)`

---

# Combine `filter()` and `select()`

Keep only the rows with Elf characters that spoke more than 1000 words, then select everything but the race column

``` r
lotr %>% 
  filter((race == "Elf") & (word_count > 1000)) %>% 
  select(-race)
```

```
#> # A tibble: 1 × 3
#>   film                       gender word_count
#>   <chr>                      <chr>       <dbl>
#> 1 The Fellowship Of The Ring Female       1229
```

---

## Create new variables with `mutate()`

---

# Create new variables with `mutate()`

Create a new variable, `word1000` which is `TRUE` if the character spoke 1,000 or more words

``` r
lotr %>%
    mutate(word1000 = word_count >= 1000)
```

```
#> # A tibble: 18 × 5
#>    film                       race   gender word_count word1000
#>    <chr>                      <chr>  <chr>       <dbl> <lgl>   
#>  1 The Fellowship Of The Ring Elf    Female       1229 TRUE    
#>  2 The Fellowship Of The Ring Elf    Male          971 FALSE   
#>  3 The Fellowship Of The Ring Hobbit Female         14 FALSE   
#>  4 The Fellowship Of The Ring Hobbit Male         3644 TRUE    
#>  5 The Fellowship Of The Ring Man    Female          0 FALSE   
#>  6 The Fellowship Of The Ring Man    Male         1995 TRUE    
#>  7 The Return Of The King     Elf    Female        183 FALSE   
#>  8 The Return Of The King     Elf    Male          510 FALSE   
#>  9 The Return Of The King     Hobbit Female          2 FALSE   
#> 10 The Return Of The King     Hobbit Male         2673 TRUE    
#> 11 The Return Of The King     Man    Female        268 FALSE   
#> 12 The Return Of The King     Man    Male         2459 TRUE    
#> 13 The Two Towers             Elf    Female        331 FALSE   
#> 14 The Two Towers             Elf    Male          513 FALSE   
#> 15 The Two Towers             Hobbit Female          0 FALSE   
#> 16 The Two Towers             Hobbit Male         2463 TRUE    
#> 17 The Two Towers             Man    Female        401 FALSE   
#> 18 The Two Towers             Man    Male         3589 TRUE
```

---

# .center[Handling if/else conditions]

### .center[`ifelse(<condition>, <if TRUE>, <else>)`]

``` r
lotr %>%
    mutate(word1000 = ifelse(word_count >= 1000, TRUE, FALSE))
```

---

# Sort data frame with `arrange()`

Sort the `lotr` data frame by `word_count`

``` r
lotr %>%
    arrange(word_count)
```

```
#> # A tibble: 18 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Man    Female          0
#>  2 The Two Towers             Hobbit Female          0
#>  3 The Return Of The King     Hobbit Female          2
#>  4 The Fellowship Of The Ring Hobbit Female         14
#>  5 The Return Of The King     Elf    Female        183
#>  6 The Return Of The King     Man    Female        268
#>  7 The Two Towers             Elf    Female        331
#>  8 The Two Towers             Man    Female        401
#>  9 The Return Of The King     Elf    Male          510
#> 10 The Two Towers             Elf    Male          513
#> 11 The Fellowship Of The Ring Elf    Male          971
#> 12 The Fellowship Of The Ring Elf    Female       1229
#> 13 The Fellowship Of The Ring Man    Male         1995
#> 14 The Return Of The King     Man    Male         2459
#> 15 The Two Towers             Hobbit Male         2463
#> 16 The Return Of The King     Hobbit Male         2673
#> 17 The Two Towers             Man    Male         3589
#> 18 The Fellowship Of The Ring Hobbit Male         3644
```

---

# Sort data frame with `arrange()`

Use the `desc()` function to sort in descending order

``` r
lotr %>%
    arrange(desc(word_count))
```

```
#> # A tibble: 18 × 4
#>    film                       race   gender word_count
#>    <chr>                      <chr>  <chr>       <dbl>
#>  1 The Fellowship Of The Ring Hobbit Male         3644
#>  2 The Two Towers             Man    Male         3589
#>  3 The Return Of The King     Hobbit Male         2673
#>  4 The Two Towers             Hobbit Male         2463
#>  5 The Return Of The King     Man    Male         2459
#>  6 The Fellowship Of The Ring Man    Male         1995
#>  7 The Fellowship Of The Ring Elf    Female       1229
#>  8 The Fellowship Of The Ring Elf    Male          971
#>  9 The Two Towers             Elf    Male          513
#> 10 The Return Of The King     Elf    Male          510
#> 11 The Two Towers             Man    Female        401
#> 12 The Two Towers             Elf    Female        331
#> 13 The Return Of The King     Man    Female        268
#> 14 The Return Of The King     Elf    Female        183
#> 15 The Fellowship Of The Ring Hobbit Female         14
#> 16 The Return Of The King     Hobbit Female          2
#> 17 The Fellowship Of The Ring Man    Female          0
#> 18 The Two Towers             Hobbit Female          0
```

---

# Your turn

Read in the `data.csv` file in the "data" folder:

``` r
data <- read_csv(here('data', 'data.csv'))
```

Now answer these questions:

- Create a new data frame, `flights_fall`, that contains only flights that departed in the fall semester.
- Create a new data frame, `flights_dc`, that contains only flights that flew to DC airports (Reagan or Dulles).
- Create a new data frame, `flights_dc_carrier`, that contains only flights that flew to DC airports (Reagan or Dulles) and only the columns about the month and airline.
- How many unique airlines were flying to DC airports in July?
- Create a new variable, `speed`, in miles per hour using the `time` (minutes) and `distance` (miles) variables. 
- Which flight flew the fastest?
- Remove rows that have `NA` for `air_time` and re-arrange the resulting data frame based on the longest air time and longest flight distance.

]

---

# Week 1: .fancy[Getting Started]

### 1. Course Goal
### 2. Course Introduction
### 3. Break: Install Stuff
### 4. Quarto
### 5. Workflow & Reading In Data
### 6. Wrangling Data
### 7. .orange[Visualizing Data]

---

]

# "Grammar of Graphics"

Concept developed by Leland Wilkinson (1999)

**ggplot2** package developed by Hadley Wickham (2005)

]

---

# Making plot layers with ggplot2

<br>

### 1. The data 
### 2. The aesthetic mapping (what goes on the axes?)
### 3. The geometries (points? bars? etc.)
### 4. The annotations / labels
### 5. The theme

---

# Layer 1: The data

``` r
head(mpg)
```

```
#> # A tibble: 6 × 11
#>   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class  
#>   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr>  
#> 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compact
#> 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compact
#> 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compact
#> 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compact
#> 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compact
#> 6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compact
```

---

# Layer 1: The data

The `ggplot()` function initializes the plot with whatever data you're using

``` r
mpg %>% 
  ggplot()
```

]

]]

---

# Layer 2: The aesthetic mapping

The `aes()` function determines which variables will be _mapped_ to the geometries<br>(e.g. the axes)

``` r
mpg %>% 
* ggplot(aes(x = displ, y = hwy))
```

]

]]

---

# Layer 3: The geometries

Use `+` to add geometries, e.g. `geom_points()` for points

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
* geom_point()
```

]

]]

---

# Layer 4: The annotations / labels

Use `labs()` to modify most labels

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* labs(
*   x = "Engine displacement (liters)",
*   y = "Highway fuel economy (mpg)",
*   title = "Most larger engine vehicles are less fuel efficient"
* )
```

]

]]

---

# Layer 5: The theme

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
  labs(
    x = "Engine displacement (liters)",  
    y = "Highway fuel economy (mpg)", 
    title = "Most larger engine vehicles are less fuel efficient"
  ) + 
* theme_bw()
```

]

]]

---

### Common themes

`theme_bw()`

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_bw()
```

]

`theme_minimal()`

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_minimal()
```

]

---

### Common themes

`theme_classic()`

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_classic()
```

]

`theme_void()`

``` r
mpg %>% 
  ggplot(aes(x = displ, y = hwy)) +
  geom_point() +
* theme_void()
```

]

---

]

## Your turn

Open `practice.qmd`

Use the `mpg` data frame and ggplot to create these charts

]

---

# Extra practice

]

]