Due: 02 February, 11:59 pm

Weight: This assignment is worth 0.5% of your final grade.

Purpose: The purpose of this assignment is to learn some methods to recognize and measure the correlation between two variables as well as how to find a “line of best fit” between them.

Assessment: This assignment is graded using a check system:

  • ✔+ (110%): Reflection shows phenomenal thought and engagement with the course content. I will not assign these often.
  • ✔ (100%): Reflection is thoughtful, well-written, and shows engagement with the course content. This is the expected level of performance.
  • ✔− (50%): Reflection is hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.

Notice that this is essentially a pass/fail or completion-based system. I’m not grading your writing ability, I’m not counting the exact number of words you write, and I’m not looking for encyclopedic citations of every single reading to prove that you did indeed read everything. I’m looking for thoughtful engagement, that’s all. Do good work and you’ll get a ✓.

Tasks

  1. Read: Open up a notebook (physical, digital…whatever you take notes in best), and take notes while you go through the readings below.

  2. Optional Exercises: You don’t have to do these, but they can be really helpful for extra practice. This week, take a look at the DataCamp course “Correlation and Regression in R”.

  3. Reflection: When you have completed all of the readings, download and edit this template to write a ~150 word reflection about on what you’ve read (be sure to edit the YAML at the top). That’s fairly short - there are ~250 words on a typical double-spaced page in Microsoft Word (500 when single-spaced). You can do a lot of different things with this memo, for example:

    • Discuss something you learned from the course content
    • Write about the best or worst data visualization you saw recently
    • Connect the course content to your own work
    • Discuss some of the key insights or things you found interesting in the readings
  4. Submit Everything: Knit your document to a html page, then create a zip file of everything in your R Project folder. Go to the “Assignment Submission” page on Blackboard and submit your zip file.


Readings

This week, we will learn some methods to recognize and measure the correlation between two variables as well as how to find a “line of best fit” between them. By the end, you will be able to:

  • Recognize when to use correlation.
  • Interpret the magnitude and direction of a correlation.
  • Explain the influence of outliers.
  • Describe the Pearson correlation and the Spearman correlation.
  • Explain the term “line of best fit”.
  • Interpret intercept and slope coefficients.
  • Define residuals.
  • Describe the standard error of the estimate.
  • Report the assumptions of linear regression and how to test them.
  • Explain standardized regression and its connection to correlation.

1) Readings on visualizing correlation

Optional Readings

Correlation, as innocent as it may seem, has a long and ugly history of racism that should never be forgotten. These readings discuss this topics in more detail:

2) Video on correlation


EMSE 4575: Exploratory Data Analysis (Spring 2021)
Wednesdays | 12:45 - 3:15 PM | Dr. John Paul Helveston | jph@gwu.edu |
LICENSE: CC-BY-SA