Correlation

Due: Sep 27 by 11:59pm

Weight: This assignment is worth 1% of your final grade.

Purpose: The purpose of this assignment is to learn some methods to recognize and measure the correlation between two variables as well as how to find a “line of best fit” between them.

Assessment: This assignment is graded using a check system:

  • ✔+ (110%): Responses shows phenomenal thought and engagement with the course content. I will not assign these often.
  • ✔ (100%): Responses are thoughtful, well-written, and show engagement with the course content. This is the expected level of performance.
  • ✔− (50%): Responses are hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.

Notice that this is essentially a pass/fail system. I’m not grading your writing ability and I’m not counting the number of words you write - I’m looking for thoughtful engagement. One or two sentences is not enough. Write at least a paragraph and show me that you did the readings assigned.

1. Get Organized

Download and edit this template when working through this assignment.

Then unzip the template folder (make sure you unzip it!), then open the .Rproj file to open RStudio. Open the hw4.Rmd file, take notes, and write some example code as you go through the following.

2. Readings / Videos

This week, we will learn some methods to recognize and measure the correlation between two variables as well as how to find a “line of best fit” between them. By the end, you will be able to:

  • Recognize when to use correlation.
  • Interpret the magnitude and direction of a correlation.
  • Explain the influence of outliers.
  • Describe the Pearson correlation and the Spearman correlation.
  • Explain the term “line of best fit”.
  • Interpret intercept and slope coefficients.
  • Define residuals.
  • Describe the standard error of the estimate.
  • Report the assumptions of linear regression and how to test them.
  • Explain standardized regression and its connection to correlation.

Readings on visualizing correlation

Video on correlation measures

Optional Readings

The history of statistical correlation is rather ugly and routed in racism and eugenics. These readings discuss this topics in more detail:

3. Reflect

Reflect on what you’ve learned while going through these readings and exercises. Is there anything that jumped out at you? Anything you found particularly interesting or confusing? Write at least a paragraph in your hw4.Rmd file. Here are some suggestions:

  • Discuss some of the key insights or things you found interesting in the readings or recent class periods.
  • Write about the messiest data you’ve seen.
  • Connect the course content to your own work or project you’re working on.

4. Knit

Click the “knit” button to compile your hw4.Rmd file into a html web page. Then open the hw4.html file in a web browser and proofread your report. Does all of the formatting look correct?

5. Submit

To submit this assignment, create a zip file of all the files in your R project folder for this assignment. Name the zip file hw4-netID.zip, replacing netID with your netID (e.g., hw4-jph.zip). Then copy that zip file into the “submissions” folder in your Box folder created for this class.