For your final project, you will take a dataset, explore it, tinker
with it, and tell a nuanced story about it using at least three key
charts. I want this project to be as useful for you and your future
career as possible - you’ll hopefully want to show off your final
project in a portfolio or during job interviews. Accordingly, you get to
choose what data you will use for your project. Choose whatever one you
are most interested in or will have the most fun with.
Check out these examples of
other exploratory data analyses for inspiration
Overview
For the final project, you will:
- Write a research question.
- Conduct an exploratory data analysis of one or more data sources to
address your research question.
- Create visualizations to effectively communicate the results of your
analysis using the principles of information visualization covered in
this course.
- Document your entire analysis in a reproducible format, including a
detailed curation of the data you used as well as your analysis and
results.
Purpose: The skills covered in this course are
rooted in design rules and principles rather than formulas and
equations. As such, the application of these principles to a
real data problem is one of the best ways to learn and assess mastery of
these skills. I guarantee you one day you will need to apply these
principles to communicate an idea to colleagues, so let’s make sure you
have at least one chance to practice before the stakes
are higher.
Skills & Knowledge: Your final project should
illustrate your ability to transform raw data into insights by making
the non-visible visible, showing clear trends or patterns, and / or
identifying outliers or missing information. The specific skills
involved in achieving this goal include all of the course learning objectives.
Teams: Your final project will be a team project
made up of 2 to 3 classmates of your choosing. Teams will be finalized
at the time of submitting your proposal. Graduate students taking the
course must work on individual projects.
Summary of Deliverables: To make the overall project
more manageable, it is broken down into separate “milestone”
deliverables due throughout the semester, starting with a proposal due
just before spring break.
Proposal |
10 % |
March 12 |
Progress Report |
10 % |
April 16 |
Final Report |
15 % |
May 02 |
Presentation |
10 % |
May 04 |
Interview |
10 % |
Exam week |
Proposal
Due: March 12 by 11:59pm.
Weight: This assignment is worth 10% of your final
grade.
Assessment: We will use this
rubric to grade your proposal.
Write a proposal of an exploratory data analysis that you plan to
conduct for your final project. The instructors will review and grade
your proposal and provide feedback upon returning from spring break. If
your proposal is approved, you are done and can move on towards the next
project task. In some cases, the instructors may ask you to submit a
revised proposal, most likely by focusing / adjusting the proposal scope
and / or research question. Below is a list of specific items your
proposal should include.
1. Get organized
Download and unzip this
template for your proposal report. Open the
project.Rproj
file and write your proposal in the
report.Rmd
file. The template comes with some text and code
explaining how to use it - should should delete this code as it is only
for explanatory purposes. Be sure to adjust the content in the YAML:
- Write your project title in the
title
field (and
provide a subtitle if you wish, or delete the subtitle
field).
- In the
author
field, list the names of all teammates,
e.g. author: Luke Skywalker, Leia Organa, and Han Solo
.
2. Write a research question
State a clear research question. Follow these
guidelines. Your question should be:
- Clear: it provides enough specifics that your
audience can easily understand its purpose without needing additional
explanation.
- Focused: it is narrow enough that it can be
addressed thoroughly with the data available and within the limits of
the final project report.
- Concise: it is expressed in the fewest possible
words.
- Complex: it is not answerable with a simple “yes”
or “no,” but rather requires synthesis and analysis of data.
- Arguable: its potential answers are open to debate
rather than accepted facts.
3. Discuss your data sources
Discuss the data source(s) you plan to use for your analysis. Follow
these guidelines:
- If you have already identified the source(s), describe them and
include urls and / or references to the sources. - If you have not
identified a source yet, describe the data you hope to use, and
give at least one plausible source that may have the data (regardless of
whether the source makes it available or not).
- Discuss the validity of the source(s). For each data source, is the
data the original data, or has it been pre-processed by someone
else? How was the original data collected and by whom? If you do not yet
have a source, discuss what concerns you have about a plausible source
that might have the data.
4. Describe anticipated results
- Choose two variables that you expect to find in your data that are
relevant to your research question.
- Regardless of whether those variables actually exist, describe how
you would expect each to be distributed (e.g. unimodal,
multimodel, tightly-group, widely-spread, etc.). For example, you might
expect the price of gasoline over a particular period to be rather
tightly-grouped around a mean, whereas the stock price of a particular
company might vary much more widely over the same period.
- Describe two relationships you expect to find among variables in
your data (again, regardless of whether they actually exist in your
data). For example, you might expect sales of hybrid vehicles to
increase when gas prices increase; in this case, I am expecting that
hybrid vehicle sales and gasoline prices are positively correlated.
- Describe at least two charts that you expect will help you visualize
the relationships that you expect to find. For example, a scatterplot of
gasoline prices and hybrid vehicle sales over a particular time period
might be useful for visualizing the level of correlation between these
two variables.
- Discuss how your expectations about the variables you chose will
help inform you about your research question.
5. Knit and submit
Click the “knit” button to compile your .Rmd
file into a
html web page, then create a zip file of everything in your R Project
folder. Go to the “Assignment Submission” page on Blackboard and submit
your zip file under “Final Project: Proposal”. Only one person
from your team should submit the report.
Progress Report
Due: April 16 by 11:59pm
Weight: This assignment is worth 10% of your final
grade.
Assessment: We will use this
rubric to grade your progress report.
Write a report summarizing progress you have made towards your
project thus far, including summary statistics of your data and
preliminary charts. You should have already identified data source(s)
for your project and begun exploring the data. Your report should be
written in a narrative format (i.e. using coherent pargraphs rather than
a series of bullet points). You may use headings where appropriate to
break up your report into sections. Below is a list of specific items
your progress report should include.
1. Get organized
Download and unzip this
template for your proposal report. Open the
project.Rproj
file and write your progress report in the
report.Rmd
file. The template comes with some text and code
explaining how to use it - should should delete this code as it is only
for explanatory purposes. Be sure to adjust the content in the YAML:
- Write your project title in the
title
field (and
provide a subtitle if you wish, or delete the subtitle
field).
- In the
author
field, list the names of all teammates,
e.g. author: Luke Skywalker, Leia Organa, and Han Solo
.
You should also put the data you are working with for your
project in the appropriate folders.
2. Write a research question
This can (and almost certainly should) be a revised research question
from your proposal.
Again, follow these
guidelines.
3. Discuss your data sources
Discuss the data source(s) you are using for your analysis. For each
data source:
Describe it and include urls or references to the original
sources as well as links to any pre-processed or formatted data you are
using. These are not always the same. For example, in the Mini Project 1 assignment, the original
source was the Transit Costs
Project whereas the data file used was posted on this
GitHub repo.
Discuss the validity of your data:
- Is the data you are using from the original source, or has
it been pre-processed?
- How was the original data collected and by whom?
- Are there perhaps any missing data that might not have been
observed?
- Could the data be biased in some way?
Finally, provide a data dictionary for each data file you are
using in an appendix at the end of your report. The dictionary should
contain a table of each variable name along with a brief description of
the variable.
4. Evaluate your proposal expectations
- Do you find support for your original proposal expectation about how
one variable might be distributed? Show it by looking at summaries of
the variable. If that variable doesn’t exist in your data, discuss
another key variable instead and how it is distributed.
- Do you find support for the expected relationships you wrote about
in your proposal? If those variables don’t exist in your data, discuss
one key relationship that you have identified between two or more
variables.
5. Include two preliminary charts
- Your charts should either support or oppose your research question,
or they should illustrate what else you might need to address your
research question.
- You may choose whatever chart types you wish, but your choices
should highlight the point you want to make or should clearly show the
relationship you want to emphasize. Consider these resources when choosing your
charts.
- Your charts should follow the design principles we have covered in
class. They do not have to be fully “polished” yet, but at a minimum
they should be accurate (i.e. not misleading) and they should not
include distracting non-data ink.
6. Review your teammates
For students working in teams: On Blackboard under
the assignment titled “Team Member Review: Progress Report”, submit a
short description of the specific contributions of each team member in
your team. Here is an example review:
Student A identified the data source and wrote the documentation for it. Student B led the data cleaning process and did much of the initial data exploration. Student C helped write code for the main visualizations.
These reviews will be kept confidential and compared to assess that
the workload and grading for team members are equitable. Team members
who do not make meaningful contributions to their projects will not
recieve the same grate as that of their team mates. If you are
having any disputes among team members, please contact Professor
Helveston asap so we can find a resolution.
7. Knit and submit
Click the “knit” button to compile your .Rmd
file into a
html web page, then create a zip file of everything in your R Project
folder. Go to the “Assignment Submission” page on Blackboard and submit
your zip file under “Final Project: Progress Report”. Only one
person from your team should submit the report.
Final Report
Due: May 02 by 11:59pm.
Weight: This assignment is worth 15% of your final
grade.
Assessment: We will use this
rubric to grade your progress report.
Your final report should be a fully reproducible product and
available online as a html webpage. It should include text, data, code,
and plots. Your report should be written in a narrative format
(i.e. using coherent paragraphs rather than a series of bullet points).
You may use headings where appropriate to break up your report into
sections. There is no length requirement - your report should be
sufficiently detailed to address the requirements listed below and
sufficiently concise such that it is expressed in the fewest necessary
words. It is also perfectly acceptable for large portions of your final
report to be identical to sections in your progress report, albeit
perhaps more “polished.” Below is a list of specific items your report
should include.
1. Get organized
Download and unzip this
template for your proposal report. Open the
project.Rproj
file and write your progress report in the
report.Rmd
file. The template comes with some text and code
explaining how to use it - you should delete the template code
and text as it is only for explanatory purposes. Be sure to
adjust the content in the YAML:
- Write your project title in the
title
field (and
provide a subtitle if you wish, or delete the subtitle
field).
- In the
author
field, list the names of all teammates,
e.g. author: Luke Skywalker, Leia Organa, and Han Solo
.
Put the data you are working with for your project in the
appropriate folders. Even if you are reading in data directly from a
website
(e.g. data <- read_csv("www.somewhere.com/data.csv)
), be
sure to download a local copy of the data and put it in the appropriate
data folder.
At the bottom of your report, add an Appendix section that
includes data dictionaries for the key data sets used in your analysis
as well as a copy of all of the code used in your report, which can be
automatically generated with a single chunk like this:
```{r ref.label=knitr::all_labels(), echo=TRUE, eval=FALSE}
```
2. State your research question
Clearly state your research question and motivate why it is important
/ why the reader should care about it. You should follow these
guidelines.
3. Discuss your data sources
Discuss the data source(s) you are using for your analysis. For each
data source:
Describe it and include urls or references to the original
sources as well as links to any pre-processed or formatted data you are
using. These are not always the same. For example, in the Mini Project 1 assignment, the original
source was the Transit Costs
Project whereas the data file used was posted on this
GitHub repo.
Discuss the validity of your data:
- Is the data you are using from the original source, or has
it been pre-processed?
- How was the original data collected and by whom?
- Are there perhaps any missing data that might not have been
observed?
- Could the data be biased in some way?
Finally, provide a data dictionary for each data file you are
using in an appendix at the end of your report. The dictionary should
contain a table of each variable name along with a brief description of
the variable.
4. Describe your results with text and charts
- Display polished charts that either support or oppose your research
question or illustrate what else you might need to address your research
question.
- Your charts should be “polished” with appropriate labels and follow
the design principles we have covered in class.
- Your plot type choices should highlight the main point(s) you want
to make or clearly show the relationship you want to emphasize.
- Around your charts, write narrative text explaining what the chart
shows and its significance towards addressing your research question.
This should read as a continuous story rather than as a reply to each of
the requirements described here.
- Your analysis must include at least three polished charts /
animations, though you can of course include more.
Alternatively, you can build an interactive shiny app with at least two
dynamic charts. If you build a shiny app, include a link to your app and
screenshots of key charts from your app in your final report.
5. Review your teammates
For students working in teams: On Blackboard under
the assignment titled “Team Member Review: Final Report”, each
team member should submit their own short description of the
specific contributions of each team member in your team. These reviews
will be kept confidential and compared to assess that the workload and
grading for team members are equitable. Team members who do not make
meaningful contributions to their projects will not receive the same
grade as that of their team mates.
6. Knit, publish, and submit
- Click the “knit” button to compile your
.Rmd
file into
a html web page.
- Spell check your report by running:
spelling::spell_check_files("report.Rmd")
.
- Proofread your html webpage!
- Publish your compiled page online via RPubs (follow these
instructions).
- Create a zip file of everything in your R Project folder. Go to the
“Assignment Submission” page on Blackboard and submit your zip file
under “Final Project: Final Report”. Include a link to the published
report page in your Blackboard submission. Only one person from
your team should submit the report.
Presentation
Due: May 04 by 11:59pm.
Weight: This assignment is worth 10% of your final
grade.
Assessment: We will use this rubric to grade your final presentation
Create a video recording of a slideshow presentation of your findings
in your final report and publish it on the web (e.g. via Youtube, Vimeo,
etc.). We will watch all the videos together during the period scheduled
for the exam (May 05, 2021). Below is a list of specific items your
presentation should include.
2. Recording strategies
Given that it may be difficult or impossible to be in the same room
with your teammates, I recommend using Zoom to record your presentation.
Have someone share their screen showing the slides and talk over the
slides while recording the call.
Another option is to use a screen recording program, such as
Quicktime (mac only), to record a screen recording on someone talking
over the slides, though this may be challening if not all teammates can
be in the same room.
3. Content
Your slides do not need to include the detailed code used to conduct
your analysis - that should be accessible from your report. Rather, the
purpose of your presentation is to present the “big picture” overview of
your project. You should discuss:
- What you studied and why it matters (i.e. your research
question).
- What data you used (describe your sources)
- What you found (show polished charts, discuss key results).
4. Publish and submit
Publish your video recording on the web (e.g. via Youtube, Vimeo,
etc.). Only one team member needs to publish the video.
Once your video is posted, go to the “Assignment Submission” page on
Blackboard and submit your slides as a pdf or pptx file under “Final
Project: Presentation”. Include a link to the published video recording
in your Blackboard submission. Only one person from your team
should submit the presentation files.
Final Interview
Due: Students will sign up for individual meetings
during finals week (May 03 - 07).
Weight: This assignment is worth 10% of your final
grade.
Assessment: We will use this rubric to
assess your interview.
The instructor will hold a 10-minute one-on-one interview with each
student. The interview will consist of questions about your final
project as well as questions about core concepts covered during the
course. The full list of question can be seen here.
Some suggestions from Prof. Mazzuchi on interviews:
- Bring water to drink - it will calm you when you are nervous and
your mouth dries up. You can also pause and think while you drink.
- Don’t answer right away - think a bit.
- Answer the question asked. Don’t try to impress or I will ask more
questions.
- Don’t say “I don’t know.” Try and I will help you.
Page sources:
Some of the content on this page is inspired by and / or modified
from other sources, including: