Due: 06 April, 11:59 pm
Weight: This assignment is worth 9% of your final grade.
Purpose: We have spent a lot of time exploring data, but most of the time that has only involved a single data frame. This project will give you a chance to practice exploring and joining together multiple data files.
Assessment: I will use this rubric to grade your submissions.
For this assignment, you’ll be exploring data on global crop yields from Our World in Data. The data have already been pre-processed and are available on this GitHub repository - take a look at the data dictionaries on that page to get familiar with the five data sets we’ll be working with and their variables.
Download and unzip this
template for your project, then open the project.Rproj
file. Run the setup chunk in your report.Rmd
file to load
all of the five data frames:
key_crop_yields
fertilizer
tractors
land_use
arable_land
Note: The template folders are just there as suggested names in case you want to use them.
In your report, note that the original data came from the Our World in Data page on crop yields and also that the formatted data you used came from the Tidy Tuesday GitHub page. You don’t need to include the detailed data dictionaries.
Go to the Our World in Data page and view some of the charts. At the very top of the page you can click through all of the charts, or you can also scroll down and view charts that are related to key research questions on the page. Seeing the different questions and charts will help you get a sense for key trends and research questions related to the data.
Then in RStudio preview the data sets (e.g. using
head()
, glimpse()
, View()
, and /
or make some quick plots). Take note of what variables are available,
their types, their units, and what they measure. Hint: Read the data
dictionaries on the GitHub page!
Most of the data frames are relatively “clean”, but you may need to do some light cleaning before you dig into your analysis. Be careful to check the following:
gather()
or spread()
)?Once you have a sense for what is captured in the data, list at least three questions you think you may be able to answer with these data (you can list more if you want). You can use some of the research questions on the Our World in Data page page directly or for inspiration (for example, you could modify the question, “How have crop yields changed since 1960?” to be more specific about a single crop or a single region / country).
Your three research questions must meet these criteria:
tractors
and
fertilizer
data frames could be joined to compare how
fertilizer use (from the fertilizer
data frame) and total
population (from the tractors
data frame) changed over
time.For each of your questions:
For each of your questions:
Important points to consider in your charts:
Click the “knit” button to compile your .Rmd
file into a
html web page, then create a zip file of everything in your R Project
folder. Go to the “Assignment Submission” page on Blackboard and submit
your zip file under “Mini Project 3.”