Due: 26 January, 11:59 pm

Weight: This assignment is worth 0.5% of your final grade.

Purpose: The purpose of this assignment is to develop some basic strategies for exploring data sets to gain a greater understanding of the variable types, their centrality, and their variability.

Assessment: This assignment is graded using a check system:

  • ✔+ (110%): Reflection shows phenomenal thought and engagement with the course content. I will not assign these often.
  • ✔ (100%): Reflection is thoughtful, well-written, and shows engagement with the course content. This is the expected level of performance.
  • ✔− (50%): Reflection is hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.

Notice that this is essentially a pass/fail or completion-based system. I’m not grading your writing ability, I’m not counting the exact number of words you write, and I’m not looking for encyclopedic citations of every single reading to prove that you did indeed read everything. I’m looking for thoughtful engagement, that’s all. Do good work and you’ll get a ✓.

Tasks

  1. Register: If you haven’t already, register for DataCamp. You must use your @gwu.edu email for this to work (not the @email.gwu.edu address). This will give you access to lots of extra practice opportunities. None of these will be mandatory.

  2. Read: Open up a notebook (physical, digital…whatever you take notes in best), and take notes while you go through the readings below.

  3. Optional Exercises: You don’t have to do these, but they can be really helpful, especially if you are coming to this course without having taken EMSE 4574 Programming for Analytics:

  4. Reflection: When you have completed all of the readings, download and edit this template to write a ~150 word reflection about on what you’ve read (be sure to edit the YAML at the top). That’s fairly short - there are ~250 words on a typical double-spaced page in Microsoft Word (500 when single-spaced). You can do a lot of different things with this memo, for example:

    • Discuss something you learned from the course content
    • Write about the best or worst data visualization you saw recently
    • Connect the course content to your own work
    • Discuss some of the key insights or things you found interesting in the readings
  5. Submit Everything: Knit your document to a html page, then create a zip file of everything in your R Project folder. Go to the “Assignment Submission” page on Blackboard and submit your zip file.


Readings

1) Get Familiar with the Course

Follow Snoop’s advice and read the entire Course Syllabus (actually read the whole thing). Then review the schedule and make sure to note important upcoming deadlines, quizzes, etc.

2) Video on Data types & descriptive statistics

This week, we will learn some strategies for exploring data sets to gain a greater understanding of the variable types and their relationships. To get started, open up a notebook (physical, digital…whatever you take notes in best), and take notes while you watch this 20 minute video to learn about some basic data types and descriptive statistics:

3) Chapters on EDA

Read through the following chapters and take notes.

4) Choosing the right chart

You will want to choose different chart types depending on the relationship or message you want to convey. Fortunately, there are lots of great guides to help you make those choices. View them here.


Page sources: This assignment is inspired by Andrew Heiss’s course on Data Visualization.


EMSE 4575: Exploratory Data Analysis (Spring 2021)
Wednesdays | 12:45 - 3:15 PM | Dr. John Paul Helveston | jph@gwu.edu |
LICENSE: CC-BY-SA