Cleaning Data

Due: Sep 13 by 11:59pm

Weight: This assignment is worth 1% of your final grade.

Purpose: The purpose of this assignment is learn some techniques for dealing with messy data. Most of the time, the raw data you get is in the wrong format and variables are not properly coded, so you will need to “clean” the data before starting any analysis. Other times the data will be split into two or more data sets, and you will need to “join” them together into a single data frame.

Assessment: This assignment is graded using a check system:

  • ✔+ (110%): Responses shows phenomenal thought and engagement with the course content. I will not assign these often.
  • ✔ (100%): Responses are thoughtful, well-written, and show engagement with the course content. This is the expected level of performance.
  • ✔− (50%): Responses are hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.

Notice that this is essentially a pass/fail system. I’m not grading your writing ability and I’m not counting the number of words you write - I’m looking for thoughtful engagement. One or two sentences is not enough. Write at least a paragraph and show me that you did the readings assigned.

1. Get Organized

Download and edit this template when working through this assignment.

2. Read & Practice

Unzip the template folder (make sure you unzip it!), then open the .Rproj file to open RStudio. Open the hw2.Rmd file, take notes, and write some example code as you go through the following readings / exercises:

3. Reflect

Reflect on what you’ve learned while going through these readings and exercises. Is there anything that jumped out at you? Anything you found particularly interesting or confusing? Write at least a paragraph in your hw2.Rmd file. Here are some suggestions:

  • Discuss some of the key insights or things you found interesting in the readings or recent class periods.
  • Write about the messiest data you’ve seen.
  • Connect the course content to your own work or project you’re working on.

4. Knit

Click the “knit” button to compile your hw2.Rmd file into a html web page. Then open the hw2.html file in a web browser and proofread your report. Does all of the formatting look correct?

5. Submit

To submit this assignment, create a zip file of all the files in your R project folder for this assignment. Name the zip file hw2-netID.zip, replacing netID with your netID (e.g., hw2-jph.zip). Then copy that zip file into the “submissions” folder in your Box folder created for this class (the Box folder is named your GW netID).