For your final project, you will take a dataset, explore it, tinker with it, and tell a nuanced story about it using at least three key charts. I want this project to be as useful for you and your future career as possible - you’ll hopefully want to show off your final project in a portfolio or during job interviews. Accordingly, you get to choose what data you will use for your project. Choose whatever one you are most interested in or will have the most fun with.

Check out these examples of other exploratory data analyses for inspiration

Overview

For the final project, you will:

  • Write a research question.
  • Conduct an exploratory data analysis of one or more data sources to address your research question.
  • Create visualizations to effectively communicate the results of your analysis using the principles of information visualization covered in this course.
  • Document your entire analysis in a reproducible format, including a detailed curation of the data you used as well as your analysis and results.

Purpose: The skills covered in this course are rooted in design rules and principles rather than formulas and equations. As such, the application of these principles to a real data problem is one of the best ways to learn and assess mastery of these skills. I guarantee you one day you will need to apply these principles to communicate an idea to colleagues, so let’s make sure you have at least one chance to practice before the stakes are higher.

Skills & Knowledge: Your final project should illustrate your ability to transform raw data into insights by making the non-visible visible, showing clear trends or patterns, and / or identifying outliers or missing information. The specific skills involved in achieving this goal include all of the course learning objectives.

Teams: Your final project will be a team project made up of 2 to 3 classmates of your choosing. Teams will be finalized at the time of submitting your proposal. Graduate students taking the course must work on individual projects.

Summary of Deliverables: To make the overall project more manageable, it is broken down into separate “milestone” deliverables due throughout the semester, starting with a proposal due just before spring break.

Item Weight towards
final grade
Due Date
(by 11:59pm)
Proposal 10 % March 12
Progress Report 10 % April 16
Final Report 15 % May 02
Presentation 10 % May 04
Interview 10 % Exam week

Proposal

Due: March 12 by 11:59pm.

Weight: This assignment is worth 10% of your final grade.

Assessment: We will use this rubric to grade your proposal.

Write a proposal of an exploratory data analysis that you plan to conduct for your final project. The instructors will review and grade your proposal and provide feedback upon returning from spring break. If your proposal is approved, you are done and can move on towards the next project task. In some cases, the instructors may ask you to submit a revised proposal, most likely by focusing / adjusting the proposal scope and / or research question. Below is a list of specific items your proposal should include.

1. Get organized

Download and unzip this template for your proposal report. Open the project.Rproj file and write your proposal in the report.Rmd file. The template comes with some text and code explaining how to use it - should should delete this code as it is only for explanatory purposes. Be sure to adjust the content in the YAML:

  • Write your project title in the title field (and provide a subtitle if you wish, or delete the subtitle field).
  • In the author field, list the names of all teammates, e.g. author: Luke Skywalker, Leia Organa, and Han Solo.

2. Write a research question

State a clear research question. Follow these guidelines. Your question should be:

  • Clear: it provides enough specifics that your audience can easily understand its purpose without needing additional explanation.
  • Focused: it is narrow enough that it can be addressed thoroughly with the data available and within the limits of the final project report.
  • Concise: it is expressed in the fewest possible words.
  • Complex: it is not answerable with a simple “yes” or “no,” but rather requires synthesis and analysis of data.
  • Arguable: its potential answers are open to debate rather than accepted facts.

3. Discuss your data sources

Discuss the data source(s) you plan to use for your analysis. Follow these guidelines:

  • If you have already identified the source(s), describe them and include urls and / or references to the sources. - If you have not identified a source yet, describe the data you hope to use, and give at least one plausible source that may have the data (regardless of whether the source makes it available or not).
  • Discuss the validity of the source(s). For each data source, is the data the original data, or has it been pre-processed by someone else? How was the original data collected and by whom? If you do not yet have a source, discuss what concerns you have about a plausible source that might have the data.

4. Describe anticipated results

  • Choose two variables that you expect to find in your data that are relevant to your research question.
  • Regardless of whether those variables actually exist, describe how you would expect each to be distributed (e.g. unimodal, multimodel, tightly-group, widely-spread, etc.). For example, you might expect the price of gasoline over a particular period to be rather tightly-grouped around a mean, whereas the stock price of a particular company might vary much more widely over the same period.
  • Describe two relationships you expect to find among variables in your data (again, regardless of whether they actually exist in your data). For example, you might expect sales of hybrid vehicles to increase when gas prices increase; in this case, I am expecting that hybrid vehicle sales and gasoline prices are positively correlated.
  • Describe at least two charts that you expect will help you visualize the relationships that you expect to find. For example, a scatterplot of gasoline prices and hybrid vehicle sales over a particular time period might be useful for visualizing the level of correlation between these two variables.
  • Discuss how your expectations about the variables you chose will help inform you about your research question.

5. Knit and submit

Click the “knit” button to compile your .Rmd file into a html web page, then create a zip file of everything in your R Project folder. Go to the “Assignment Submission” page on Blackboard and submit your zip file under “Final Project: Proposal”. Only one person from your team should submit the report.


Progress Report

Due: April 16 by 11:59pm

Weight: This assignment is worth 10% of your final grade.

Assessment: We will use this rubric to grade your progress report.

Write a report summarizing progress you have made towards your project thus far, including summary statistics of your data and preliminary charts. You should have already identified data source(s) for your project and begun exploring the data. Your report should be written in a narrative format (i.e. using coherent pargraphs rather than a series of bullet points). You may use headings where appropriate to break up your report into sections. Below is a list of specific items your progress report should include.

1. Get organized

Download and unzip this template for your proposal report. Open the project.Rproj file and write your progress report in the report.Rmd file. The template comes with some text and code explaining how to use it - should should delete this code as it is only for explanatory purposes. Be sure to adjust the content in the YAML:

  • Write your project title in the title field (and provide a subtitle if you wish, or delete the subtitle field).
  • In the author field, list the names of all teammates, e.g. author: Luke Skywalker, Leia Organa, and Han Solo.

You should also put the data you are working with for your project in the appropriate folders.

2. Write a research question

This can (and almost certainly should) be a revised research question from your proposal.

Again, follow these guidelines.

3. Discuss your data sources

Discuss the data source(s) you are using for your analysis. For each data source:

  • Describe it and include urls or references to the original sources as well as links to any pre-processed or formatted data you are using. These are not always the same. For example, in the Mini Project 1 assignment, the original source was the Transit Costs Project whereas the data file used was posted on this GitHub repo.

  • Discuss the validity of your data:

    • Is the data you are using from the original source, or has it been pre-processed?
    • How was the original data collected and by whom?
    • Are there perhaps any missing data that might not have been observed?
    • Could the data be biased in some way?
  • Finally, provide a data dictionary for each data file you are using in an appendix at the end of your report. The dictionary should contain a table of each variable name along with a brief description of the variable.

4. Evaluate your proposal expectations

  • Do you find support for your original proposal expectation about how one variable might be distributed? Show it by looking at summaries of the variable. If that variable doesn’t exist in your data, discuss another key variable instead and how it is distributed.
  • Do you find support for the expected relationships you wrote about in your proposal? If those variables don’t exist in your data, discuss one key relationship that you have identified between two or more variables.

5. Include two preliminary charts

  • Your charts should either support or oppose your research question, or they should illustrate what else you might need to address your research question.
  • You may choose whatever chart types you wish, but your choices should highlight the point you want to make or should clearly show the relationship you want to emphasize. Consider these resources when choosing your charts.
  • Your charts should follow the design principles we have covered in class. They do not have to be fully “polished” yet, but at a minimum they should be accurate (i.e. not misleading) and they should not include distracting non-data ink.

6. Review your teammates

For students working in teams: On Blackboard under the assignment titled “Team Member Review: Progress Report”, submit a short description of the specific contributions of each team member in your team. Here is an example review:

Student A identified the data source and wrote the documentation for it. Student B led the data cleaning process and did much of the initial data exploration. Student C helped write code for the main visualizations.

These reviews will be kept confidential and compared to assess that the workload and grading for team members are equitable. Team members who do not make meaningful contributions to their projects will not recieve the same grate as that of their team mates. If you are having any disputes among team members, please contact Professor Helveston asap so we can find a resolution.

7. Knit and submit

Click the “knit” button to compile your .Rmd file into a html web page, then create a zip file of everything in your R Project folder. Go to the “Assignment Submission” page on Blackboard and submit your zip file under “Final Project: Progress Report”. Only one person from your team should submit the report.


Final Report

Due: May 02 by 11:59pm.

Weight: This assignment is worth 15% of your final grade.

Assessment: We will use this rubric to grade your progress report.

Your final report should be a fully reproducible product and available online as a html webpage. It should include text, data, code, and plots. Your report should be written in a narrative format (i.e. using coherent paragraphs rather than a series of bullet points). You may use headings where appropriate to break up your report into sections. There is no length requirement - your report should be sufficiently detailed to address the requirements listed below and sufficiently concise such that it is expressed in the fewest necessary words. It is also perfectly acceptable for large portions of your final report to be identical to sections in your progress report, albeit perhaps more “polished.” Below is a list of specific items your report should include.

1. Get organized

  1. Download and unzip this template for your proposal report. Open the project.Rproj file and write your progress report in the report.Rmd file. The template comes with some text and code explaining how to use it - you should delete the template code and text as it is only for explanatory purposes. Be sure to adjust the content in the YAML:

    • Write your project title in the title field (and provide a subtitle if you wish, or delete the subtitle field).
    • In the author field, list the names of all teammates, e.g. author: Luke Skywalker, Leia Organa, and Han Solo.
  2. Put the data you are working with for your project in the appropriate folders. Even if you are reading in data directly from a website (e.g. data <- read_csv("www.somewhere.com/data.csv)), be sure to download a local copy of the data and put it in the appropriate data folder.

  3. At the bottom of your report, add an Appendix section that includes data dictionaries for the key data sets used in your analysis as well as a copy of all of the code used in your report, which can be automatically generated with a single chunk like this:

```{r ref.label=knitr::all_labels(), echo=TRUE, eval=FALSE}
```

2. State your research question

Clearly state your research question and motivate why it is important / why the reader should care about it. You should follow these guidelines.

3. Discuss your data sources

Discuss the data source(s) you are using for your analysis. For each data source:

  • Describe it and include urls or references to the original sources as well as links to any pre-processed or formatted data you are using. These are not always the same. For example, in the Mini Project 1 assignment, the original source was the Transit Costs Project whereas the data file used was posted on this GitHub repo.

  • Discuss the validity of your data:

    • Is the data you are using from the original source, or has it been pre-processed?
    • How was the original data collected and by whom?
    • Are there perhaps any missing data that might not have been observed?
    • Could the data be biased in some way?
  • Finally, provide a data dictionary for each data file you are using in an appendix at the end of your report. The dictionary should contain a table of each variable name along with a brief description of the variable.

4. Describe your results with text and charts

  • Display polished charts that either support or oppose your research question or illustrate what else you might need to address your research question.
  • Your charts should be “polished” with appropriate labels and follow the design principles we have covered in class.
  • Your plot type choices should highlight the main point(s) you want to make or clearly show the relationship you want to emphasize.
  • Around your charts, write narrative text explaining what the chart shows and its significance towards addressing your research question. This should read as a continuous story rather than as a reply to each of the requirements described here.
  • Your analysis must include at least three polished charts / animations, though you can of course include more. Alternatively, you can build an interactive shiny app with at least two dynamic charts. If you build a shiny app, include a link to your app and screenshots of key charts from your app in your final report.

5. Review your teammates

For students working in teams: On Blackboard under the assignment titled “Team Member Review: Final Report”, each team member should submit their own short description of the specific contributions of each team member in your team. These reviews will be kept confidential and compared to assess that the workload and grading for team members are equitable. Team members who do not make meaningful contributions to their projects will not receive the same grade as that of their team mates.

6. Knit, publish, and submit

  1. Click the “knit” button to compile your .Rmd file into a html web page.
  2. Spell check your report by running: spelling::spell_check_files("report.Rmd").
  3. Proofread your html webpage!
  4. Publish your compiled page online via RPubs (follow these instructions).
  5. Create a zip file of everything in your R Project folder. Go to the “Assignment Submission” page on Blackboard and submit your zip file under “Final Project: Final Report”. Include a link to the published report page in your Blackboard submission. Only one person from your team should submit the report.

Presentation

Due: May 04 by 11:59pm.

Weight: This assignment is worth 10% of your final grade.

Assessment: We will use this rubric to grade your final presentation

Create a video recording of a slideshow presentation of your findings in your final report and publish it on the web (e.g. via Youtube, Vimeo, etc.). We will watch all the videos together during the period scheduled for the exam (May 05, 2021). Below is a list of specific items your presentation should include.

1. Format

  • Your presentation should be no longer than 10 minutes in length (it can be shorter though).
  • Each team member must present at least one slide.
  • You may use as many slides as you feel helps you communicate your ideas, but keep the 10 minute limit in mind (<1 slide / min is probably a guideline).
  • Your title slide should include the project title, team member names, and the presentation date: May 05, 2021.
  • All slides should be numbered in the bottom-left or bottom-right corners.

2. Recording strategies

Given that it may be difficult or impossible to be in the same room with your teammates, I recommend using Zoom to record your presentation. Have someone share their screen showing the slides and talk over the slides while recording the call.

Another option is to use a screen recording program, such as Quicktime (mac only), to record a screen recording on someone talking over the slides, though this may be challening if not all teammates can be in the same room.

3. Content

Your slides do not need to include the detailed code used to conduct your analysis - that should be accessible from your report. Rather, the purpose of your presentation is to present the “big picture” overview of your project. You should discuss:

  1. What you studied and why it matters (i.e. your research question).
  2. What data you used (describe your sources)
  3. What you found (show polished charts, discuss key results).

4. Publish and submit

Publish your video recording on the web (e.g. via Youtube, Vimeo, etc.). Only one team member needs to publish the video.

Once your video is posted, go to the “Assignment Submission” page on Blackboard and submit your slides as a pdf or pptx file under “Final Project: Presentation”. Include a link to the published video recording in your Blackboard submission. Only one person from your team should submit the presentation files.


Final Interview

Due: Students will sign up for individual meetings during finals week (May 03 - 07).

Weight: This assignment is worth 10% of your final grade.

Assessment: We will use this rubric to assess your interview.

The instructor will hold a 10-minute one-on-one interview with each student. The interview will consist of questions about your final project as well as questions about core concepts covered during the course. The full list of question can be seen here.

Some suggestions from Prof. Mazzuchi on interviews:

  1. Bring water to drink - it will calm you when you are nervous and your mouth dries up. You can also pause and think while you drink.
  2. Don’t answer right away - think a bit.
  3. Answer the question asked. Don’t try to impress or I will ask more questions.
  4. Don’t say “I don’t know.” Try and I will help you.

Page sources:

Some of the content on this page is inspired by and / or modified from other sources, including:


EMSE 4575: Exploratory Data Analysis (Spring 2021)
Wednesdays | 12:45 - 3:15 PM | Dr. John Paul Helveston | jph@gwu.edu |
LICENSE: CC-BY-SA