Due: 09 March, 11:59 pm

Weight: This assignment is worth 0.5% of your final grade.

Purpose: The purpose of this assignment is learn some techniques for dealing with messy data. Most of the time, the raw data you get is in the wrong format and variables are not properly coded, so you will need to “clean” the data before starting any analysis. Other times the data will be split into two or more data sets, and you will need to “join” them together into a single data frame.

Assessment: This assignment is graded using a check system:

  • ✔+ (110%): Reflection shows phenomenal thought and engagement with the course content. I will not assign these often.
  • ✔ (100%): Reflection is thoughtful, well-written, and shows engagement with the course content. This is the expected level of performance.
  • ✔− (50%): Reflection is hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.

Notice that this is essentially a pass/fail or completion-based system. I’m not grading your writing ability, I’m not counting the exact number of words you write, and I’m not looking for encyclopedic citations of every single reading to prove that you did indeed read everything. I’m looking for thoughtful engagement, that’s all. Do good work and you’ll get a ✓.

Tasks

  1. Read: Open up a notebook (physical, digital…whatever you take notes in best), and take notes while you go through the readings below.

  2. Reflection: When you have completed all of the readings, download and edit this template to write a ~150 word (or more) reflection about on what you’ve read (be sure to edit the YAML at the top). That’s fairly short - there are ~250 words on a typical double-spaced page in Microsoft Word (500 when single-spaced). You can do a lot of different things with this memo, for example:

    • Discuss something you learned from the course content
    • Write about the best or worst data visualization you saw recently
    • Connect the course content to your own work
    • Discuss some of the key insights or things you found interesting in the readings
  3. Submit Everything: Knit your document to a html page, then create a zip file of everything in your R Project folder. Go to the “Assignment Submission” page on Blackboard and submit your zip file.


Readings

Optional Practice

If you want some extra practice working with data in R, check out Alison Hill’s course on DataCamp: Working with Data in the Tidyverse. In particular, I suggest going through parts 2 (“Tame your data”) and 4 (“Transform your data”)

Bonus Points*

In your reflection this week, share a story about or link to the messiest data you’ve ever come across.


*There are no actual bonus points


EMSE 4575: Exploratory Data Analysis (Spring 2021)
Wednesdays | 12:45 - 3:15 PM | Dr. John Paul Helveston | jph@gwu.edu |
LICENSE: CC-BY-SA