Exploring Data
Due: 2023-09-19 by 11:59pm
Weight: This assignment is worth 1% of your final grade.
Purpose: The purpose of this assignment is to develop some basic strategies for exploring data sets to gain a greater understanding of the variable types as well as key relationships such as centrality, variability, and correlation
Assessment: This assignment is graded using a check system:
- ✔+ (110%): Responses shows phenomenal thought and engagement with the course content. I will not assign these often.
- ✔ (100%): Responses are thoughtful, well-written, and show engagement with the course content. This is the expected level of performance.
- ✔− (50%): Responses are hastily composed, too short, and/or only cursorily engages with the course content. This grade signals that you need to improve next time. I will hopefully not assign these often.
Notice that this is essentially a pass/fail system. I’m not grading your writing ability and I’m not counting the number of words you write - I’m looking for thoughtful engagement. One or two sentences is not enough. Write at least a paragraph and show me that you did the readings assigned.
1. Get Organized
Follow these instructions:
- Download and edit this template.
- Unzip the template folder. Make sure you actually unzip it! (in Windows, right-click it and use “extract all”)
- Open the .Rproj file to open RStudio.
- Inside RStudio, open the
hw3.qmd
file, take notes, and write some example code as you go through the readings / exercises below.
2. Read & Practice
- R4DS - 7: Exploratory data analysis: Provides some more “hands on” strategies for exploring data using R and the ggplot2 library.
- Wilke - 12: Visualizing associations: An overview of ways to visualize associations between two or more variables.
- “Beware Spurious Correlations”, Harvard Business Review
- Watch this 20 minute video to learn about some basic data types and descriptive statistics.
- Watch this 15 minute video to learn about the basics of correlation measures.
Choosing the right chart
You will want to choose different chart types depending on the relationship or message you want to convey. Fortunately, there are lots of great guides to help you make those choices.
- View a quick overview of many common plot types in Wilke - 5: Directory of visualizations.
- View the course site help page on choosing the right chart.
Optional readings
The history of statistical correlation is rather ugly and routed in racism and eugenics. These readings discuss this topics in more detail:
3. Exercises
Complete the following RStudio Primer lesson: Exploratory Data Analysis
4. Reflect
Reflect on what you’ve learned while going through these readings and exercises. Is there anything that jumped out at you? Anything you found particularly interesting or confusing? Write at least a paragraph in your hw3.qmd
file, and include at least one question. The teaching team will review the questions we get and will try to answer them either in Slack or in class.
Some thoughts you may want to try in your reflection:
- “I used to think ______, now I think ______ 🤔”
- Discuss some of the key insights or things you found interesting in the readings or recent class periods.
- Connect the course content to your own work or project you’re working on.
5. Submit
To submit your assignment, follow these instructions:
- Render your .qmd file by either clicking the “Render” button in RStudio or running the command
quarto::quarto_render("hw3.qmd")
command. - Open the rendered html file and make sure it looks good! Is all the formatting as you expected?
- Create a zip file of all the files in your R project folder for this assignment and submit it on the corresponding assignment submission on Blackboard.