Finding Data
The world is full of data, but sometimes finding it can be difficult. As a starting point, GW libraries has an excellent set of Research Guides that can be helpful. Depending on your research question, searching Google can also lead to good sources.
Here are some other resources:
Large, search-able sites
Basically like running a Google search but only for data sets:
- Google Dataset Search
- Quandl: Over 2 million financial, economic, and social datasets.
- Wharton Research Data Services: Single-point access to over 200 terabytes of data across multiple disciplines, including Finance, Marketing, and Economics.
Websites
A collection of various websites that have lots of interesting datasets:
- “Tidy Tuesday” challenges: Wide variety of datasets for data visualization practice. See many of the visualizations others have created here.
- “Makeover Monday” challenges: Wide variety of datasets for data visualization practice.
- Tableau Community Forums: Sometimes you’ll find some interesting datasets in here.
- “Data is Beautiful” subreddit: Feed of user-generated data visualizations, though quality varies substantially.
- Kaggle: Hosts machine learning competitions, and as a result has lots of really interesting data sets open to the public.
Packages
Many packages contain interesting datasets. If you can find a package with data, it will usually be nicely-formatted 😄
Here is a table of lots of packages that contain multiple dataset. Some of my favorites are:
In addition, some packages exclusively just contain datasets, such as:
- fivethirtyeight: Datasets and code published by the data journalism website ‘FiveThirtyEight’, available at https://github.com/fivethirtyeight/data
- gapminder
- babynames
- nycflights13
- fueleconomy
- nasaweather
- usdanutrients
Government-ish sources:
“Government-ish” because while some of these sites host government data, the sites themselves may or may not be affiliated with a government agency:
- US Open Gov Data: links to federal and non-federal sources in US and abroad
- US City Open Data Census: Open data from more than 100 U.S. cities.
- Open Africa
- EU Open Data Portal
- China data: Various datasets on China posted by Gang He.
Energy data
Since I happen to work with energy data a lot, here’s some common go-to sources:
- China Energy Portal Statistics: Loads of energy statistics from China.
- U.S. Energy Information Administration: from Wikipedia: the “principal agency of the U.S. Federal Statistical System responsible for collecting, analyzing, and disseminating energy information to promote sound policymaking, efficient markets, and public understanding of energy and its interaction with the economy and the environment. EIA programs cover data on coal, petroleum, natural gas, electric, renewable and nuclear energy. EIA is part of the U.S. Department of Energy.”
- Environmental Performance Index, Yale University: Ranks 163 countries on 25 performance indicators tracked across ten policy categories covering both environmental public health and ecosystem vitality. These indicators provide a gauge at a national government scale of how close countries are to established environmental policy goals.
Spatial data
- http://freegisdata.rtwilson.com/
- https://data.nasa.gov/
- https://planet.parts/
- https://www.efi.int/knowledge/maps/forest
- https://land.copernicus.eu/
- https://maps.elie.ucl.ac.be/CCI/viewer/
- https://www.ecad.eu//dailydata/predefinedseries.php
- http://www.worldclim.org/version2
- https://www.geoportal.org/
- https://scihub.copernicus.eu/
- https://sedac.ciesin.columbia.edu/
- https://earthexplorer.usgs.gov/