1 Teaching Team

1.1 Meet your instructor!

Dr. John Helveston, Assistant Professor in Engineering Management & Systems Engineering

Background:

  • 2016 PhD in Engineering & Public Policy at Carnegie Mellon University
  • 2015 MS in Engineering & Public Policy at Carnegie Mellon University
  • 2010 BS in Engineering Science & Mechanics at Virginia Tech

1.2 Meet your tutors!

Yanjie He, Masters student in Data Analytics Lingmei Zhao, Masters student in Statistics

2 For new students

Students should have taken Programming for Analytics or have experience with at least one programming language. If you’re not sure whether you have the necessary prerequisite skills, you can try and get up to speed by completing Assignment 0 before classes start. Once classes start, it may be difficult to keep up without this background, and it may be more beneficial to wait and take this course next year after taking Programming for Analytics in the coming Fall.

3 Course prep

For this class, you’ll need to install some software and register for some websites. Go to the course prep page to get setup.

4 What’s new?

Students taking this course should have already taken Programming for Analytics. If you haven’t, I strongly recommend you review the lessons and assignments on the previous semester website. You can also get up to speed by completing Assignment 0.

While this course follows a similar structure as P4A, there will be several key distinctions:

  1. Whereas in P4A we worked with nice, tidy data sets, in this course we will be working more with messier, “raw” data that often needs to be significantly processed before being able to explore it.
  2. Rather than solve puzzles (e.g. write the function isPrime()), assignments will involve more real-world data problems that often have multiple, subjective solutions.
  3. Style and aesthetics will matter - consider your assignments and your final project to be professional data products that you would want to show off to future employers.

5 Course mantras

Here are some philosophies that will get you far in data analytic work. We will be revisiting these over and over again.

1) Embrace plain text

You will write code to produce rich outputs that include text and graphics. While your output may have lots of different formatting, your code will be written in plain text.

2) Embrace reproducibility

Everything you produce in this course will be a reproducible output. That is, you should be able to reproduce your output from the raw data and code. For example, This webpage was generated from this markdown source file on GitHub.

If you want to generate this very HTML page, download the .Rmd file, then open it in RStudio and run the following code:

rmarkdown::render('L1.1-course-introduction.Rmd')

6 The syllabus

The syllabus is lengthy, but I do expect you to look through each section. If any changes need to be made, you’ll be notified through Slack.

7 The schedule

The course schedule is your roadmap for the semester. Visit it often to make sure you are well-prepared for class and aware of upcoming assignment / quiz dates.

8 Communication & Help

This can be a challenging class - don’t suffer in silence! Look at the “Getting Help” page, come to office hours, send me a message on Slack.

9 Readings!

9.1 Workflow

Now that you’ve got R and RStudio installed, read the “Getting Started” lesson in Healy. We will follow the conventions laid out in this chapter throughout the class, including:

  • Using RStudio Projects to stay organized.
  • Working in plain text.
  • Using RMarkdown to conduct and report on our analyses.

9.2 Reading in Data

Check out the readr and readxl packages - we’ll be using these throughout the semester to import data into R.

9.3 Tidy Data

We’ll cover the concept of “tidy” data in class on day 1. To get familiar with it, read Chapter 12 in R4DS, and take a look at these Tidy data explanations.

9.4 Writing a research question

Read through this guide from GMU on how to write a research question. We’ll come back to these ideas again later when you start working on your final projects, but it’s a good idea to start thinking about your research question early.


EMSE 4197 (CRN 78916): Exploratory Data Analysis - Spring 2020
George Washington University | School of Engineering & Applied Science
Dr. John Paul Helveston | jph@gwu.edu | Wednesdays | 12:45–3:15 PM | District House B205 | |
This work is licensed under a Creative Commons ShareAlike 4.0 International License.
See the licensing page for more details about copyright information.
Content 2020 John Paul Helveston