Why electric vehicles matter

Climate change is an urgent problem we face as a society. Humans burn fossil fuels to harness energy to power our economies, transportation, food systems, and everything in between. Emissions from burning the fossil fuels get trapped in Earth’s atmosphere causing temperatures to rise which will eventually cause irreversible damage to Earth making life as we know it impossible to sustain. To slow down and change the trajectory of global warming, we should focus on the areas where we can make the biggest changes. One of these areas is transportation which accounts for almost a third of US global warming emissions (source). In fact, personal vehicles are responsible for most of those emissions. But there are many solutions available to us. One of which is the electric vehicle (EV) (source).

Conventional vehicles run on gasoline. Electric cars on the other hand run on electricity which can be much cleaner than gasoline depending on how the electricity is made. ⅔ of Americans already live in areas where powering an electric car results in fewer global warming emissions than driving even a 50 mpg gasoline car. As the country adds more and more sources of renewable energy, driving electric will get even cleaner. Even considering the environmental impact of making a electric vehicle and its battery, a gasoline car emits about double the emissions that an average battery electric vehicle emits (BEV) (source ). Since EVs emit fewer pollutants, communities would be less exposed to harmful tail pipe emissions that can cause heart and lung disease.

EVs can play an important role in reducing global warming emissions and improving public health. But do people want to drive them? In a survey conducted in 2019, ⅔ of prospective car buyers were interested in owning an electric vehicle (source). However, interest in owning and actually making the choice to purchase and EV are not the same.

Research Questions

Now that concerns about driving range and charging accessibility are declining as technology improves, the future of EV sales is more optimistic. Additionally, new models are announced each year at multiple price points increasing accessibility to all income levels. Long term, EVs make economic sense: they are cheaper to charge than it is to buy gasoline and are cheaper to maintain than conventional vehicles (source). Electric vehicles are an important piece to reducing global warming emissions, so we are interested in how the market has grown, and how it is predicted to grow in the future.

Many companies and organizations forecast EV sales to try and predict how and when the market will grow. We are interested in how the U.S. Energy Information Administration (EIA) has done historically with their EV forecasts as well as what they think the future of electric vehicles will look like. So, we ask the following two research questions.

  1. What is the difference between historical BEV sales and EIA’s projected sales in the past?

  2. Has EIA become more or less optimistic about future BEV sales? What were they predicting the future market would look like 10 years ago vs now?

We chose to focus our research on data from EIA because it is published to the web regularly and updated often. EIA is responsible for collecting, analyzing, and disseminating energy information to policy makers, efficient markets, and the public. This report will focus on forecasts for battery electric vehicle (BEV) sales in the US because BEVs made up the largest percent of electric vehicle sales in 2017 after hybrids and are seen as a promising option to buyers. BEVs are powered solely by an electric battery with no gas engine parts. They are considered zero emission vehicles because they do not generate any harmful tailpipe emissions or air pollution hazards caused by traditional gasoline-powered vehicles (source).

Data Sources

EIA Annual EV Sales Forecasts

Our main data source for EV forecasts will be U.S. Energy Information Administration (EIA) Energy Outlook Annual Reports from 2010 to 2020. Each annual report has a chart published titled “Light-duty vehicle sales by technology type and Census Division United States” which breaks down projected EV sales for the next 20-30 years. The data for these charts is also provided in an excel sheet which we can download and use as our raw data.

The data was downloaded directly from the EIA website. EIA is recognized as a gold standard in the US in terms of energy reporting. The data was originally produced by EIA using their own modeling system to project into the future.

Below are the links to the raw data for each energy report published:

US Historical EV Sales

The next data source breaks down EV car sales in the US from 2010 - 2019 by month. This data was given to us by Professor Helveston who previously used this data in his own work. The data was collected by two sources, hybridcars.com and insideEVs.com

Hybridcars.com reports the EV sales by year and technology type (bev, hybrid, diesel). The HybridCars.com monthly sales Dashboard is a collaboration between HybridCars.com and Baum & Associates, a Michigan-based market research firm focusing on automotive issues including the hybrid and electric vehicle market. Here is an example of the June 2018 sales dashboard.

insideEVs.com was reporting monthly EV sales in the US based on actual reports from the manufacturers themselves. Unfortunately, most manufacturers stopped reporting sales data so the data only goes until the end of 2019. Here is the monthly sales data.

We note that since the data was self reported by manufacturers, it is possible certain companies with certain motives could have under- or over-reported or alternatively misreported by accident.

Cleaning the data

After loading in all the data, we cleaned up the names of the columns and harmonized them so we could join all the EIA data into one table in tidy form.

### turning eia data into tidy data 
eia_2010 <- eia_2010 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2011 <- eia_2011 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2012 <- eia_2012 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2013 <- eia_2013 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2014 <- eia_2014 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2015 <- eia_2015 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2016 <- eia_2016 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2017 <- eia_2017 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2018 <- eia_2018 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2019 <- eia_2019 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2020 <- eia_2020 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2021 <- eia_2021 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)



##binding together all the eia data 

eia <- eia_2010 %>% 
  full_join(eia_2011) %>% 
  full_join(eia_2012) %>% 
  full_join(eia_2013) %>% 
  full_join(eia_2014) %>%
  full_join(eia_2015) %>% 
  full_join(eia_2016) %>% 
  full_join(eia_2017) %>% 
  full_join(eia_2018) %>% 
  full_join(eia_2019) %>% 
  full_join(eia_2020) %>% 
  full_join(eia_2021) 


kable(head(eia))
publish_year forcast_year source forcastMarket type cars_in_mil
2010 2010 eia US 100_mi_BEV NA
2010 2011 eia US 100_mi_BEV NA
2010 2012 eia US 100_mi_BEV NA
2010 2013 eia US 100_mi_BEV NA
2010 2014 eia US 100_mi_BEV NA
2010 2015 eia US 100_mi_BEV NA

The ev_sales data is reported monthly, so next we find yearly aggregates for true EV sales and then join the two data frames into the data frame proj_vs_true.

#find yearly aggregates sales
ev_sales_agg <- ev_sales %>% 
  group_by(year,category) %>% 
  summarize(sum(sale)) %>% 
  #recode observations to match naming in eia
  mutate(type=ifelse(
    category == 'hybrid','HEV',ifelse(
    category == 'bev','BEV', ifelse(
    category == 'phev','PHEV',NA)))) 

#join tables
proj_vs_true <- eia %>% 
  left_join(ev_sales_agg, by=c('forcast_year'='year', 'type'='type')) %>% 
  rename(sale_proj=cars_in_mil, sale=`sum(sale)`) %>% 
  select(forcast_year, type, sale_proj, sale, publish_year)

kable(head(proj_vs_true))
forcast_year type sale_proj sale publish_year
2010 100_mi_BEV NA NA 2010
2011 100_mi_BEV NA NA 2010
2012 100_mi_BEV NA NA 2010
2013 100_mi_BEV NA NA 2010
2014 100_mi_BEV NA NA 2010
2015 100_mi_BEV NA NA 2010

Looking Historically

In general, we want to look at how accurate EIA forecasts have been compared to the true BEV sales.

Were EIA forecasts getting better year after year?

To see if EIA forecasts for BEVs were improving year after year, we look at the forecasts made for 2016, 2017, and 2018 versus the actual number of BEVs sold in those years.

eia_historical_2017 <- proj_vs_true %>% 
  filter( type== "BEV", 
        forcast_year %in% c("2016", "2017", "2018")
        #forcast_year %in% c("2016")
         )%>% 
  mutate(sale = sale/10^6) 

eia_historical_2017_bev <- eia_historical_2017 %>% 
  filter(type == "BEV")

#data frame with just category and y intercept 

df_intercept <- eia_historical_2017 %>% 
  slice(1:3) %>% 
  mutate(label = c("true sales", "", ""))

eia_historical_2017_plot <- eia_historical_2017 %>% 
  ggplot()+
  geom_point(aes(x = publish_year, y = sale_proj))+
  geom_hline( data = df_intercept, aes( yintercept = sale),
    color = "red", linetype = 'dashed'
    )+
  facet_wrap(vars(forcast_year))+
  labs(x = "Year forecast was published", 
       y = "Projected BEV sales (in millions)", 
       subtitle = "EIA is generally conservative with its BEV forecasts", 
       title = "EIA is updating and improving their forecasts every year")+
  theme_minimal_hgrid()+
  theme(panel.grid = element_blank())+
  geom_text(data = df_intercept, aes(x = 2012.5, y = 0.10, label = label), color = "red")
  

eia_historical_2017_plot 
Figure 1: Comparing forecasts to the actual number of BEVs sold in the US

Figure 1: Comparing forecasts to the actual number of BEVs sold in the US

Looking at Figure 1, we see that as the publish year approaches the year we are looking at, the projections get closer to the actual sales for that year. The forecasts approaching the actual sales for all 3 years leads us to believe EIA was updating their forecasts for BEVs each year. As the year in question got closer, the forecasts got better. This makes sense because there is more uncertainty forecasting 8 years into the future than there would be forecasting 2 years into the future.

Generally, EIA was conservative with their forecasts since most points are below the true sales lines in red. We also see that BEV sales are in fact increasing year after year since the red dashed lines move up the y axis from 2016 to 2018. We see a larger jump from 2017 to 2018 than from 2016 to 2017. This jump could have led EIA to be over optimistic about the future of BEV sales after 2018 which we will investigate further later in the report. Two major marketplace factors could have led to the jump in true sales from 2017 to 2018. First is a regulatory environment that is increasingly beneficial to EV consumption. Across states, this may include tax incentives for purchasing BEVs and PHEVs, free parking programs, HOV lane access, and outreach campaigns targeting low- and middle-income consumers. For example, a consumer purchasing a 2021 Toyota RAV-4 PHEV is eligible for 7,500 USD in federal tax rebates (20% of MSRP) plus potential state rebates. The second factor has been an expansion in consumer choice in the EV marketplace. While nearly all manufacturers sell an EV model, the majority of Americans outside of major metropolitan areas only had access to about 10 EV models in 2017. Another concern for buying an EV is charging station locations. Electric utility companies have become increasingly more compatible with EV charging across the country recently. This is likely a reaction to the increase in EV sales and an increased need for household charging. Both this and regulatory incentives to increase public and workplace charging has further fueled the EV marketplace growth (source: International Council on Clean Transportation).

Another likely reason for the increase in true BEV sales between 2017 and 2018 was the release of the Tesla Model 3, production of which began in 2017. This BEV offered luxury and sustainability at a mid-level price point of around 45,000 USD MSRP. Consumers were eligible for 7,500 USD in federal tax rebates as well (17% of MSRP). This was a substantial decline in price from other Tesla models, like the Model S which was roughly 62,000 USD more in 2018. (source: US News)

Finally, we note an odd occurrence across all 3 years. In the 2017 facet for example, EIA published a forecast for 2017 in their 2018 report that was well below the true BEV sales of 2017. We expected any ‘forecasts’ after the year 2017 to be equal to the true sales of 2017. So, we interpret this discrepancy by understanding EIA either has different true sales data or they are not updating their forecasting software with true sales before making their new forecast publication each year.

Have BEV sales projections become more accurate over time?

To study how the EIA BEV sales projections have performed historically, we look at BEV sales projections for the forecast years 2010 to 2019 compared to true BEV sales in the same time period. We compare this data in each projection publication, from 2010 to 2019. The EIA data includes projections published in 2020 and 2021, but the EV sales data only includes up to 2019 since at that point manufacturers began to self-report.

proj_vs_true_plot <- proj_vs_true %>% 
  filter(publish_year<=2019,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019),
         type =='BEV') %>%
  mutate(sale=sale/(10^6)) %>% 
  ggplot()+
  geom_line(aes(x=forcast_year, y=sale), colour='steelblue') +
  geom_line(aes(x=forcast_year, y=sale_proj), color='red')+
  geom_text(data=data.frame(x=2017.6, y=0.03, label='projected', publish_year=2010),
            aes(x=x,y=y,label=label,publish_year=2010),color='red', size=3)+
  geom_text(data=data.frame(x=2018, y=0.28, label='true', publish_year=2010),
            aes(x=x,y=y,label=label,publish_year=2010),color='steelblue', size=3)+
  facet_wrap(vars(publish_year), nrow=2)+
  theme_minimal_hgrid(font_size=12)+
  scale_x_continuous(expand = c(0, 0.5), breaks=seq(2012,2018,3))+
  labs(x='Forecast Year', y='BEV Sales, Millions of Units',
       title='BEV True vs. Projected Sales', 
       subtitle='BEV sales projections have become more accurate in reflecting true marketplace trends over time',
       size=18)

proj_vs_true_plot
Figure 2: Breakdown of true BEV sales vs. projected by publication year

Figure 2: Breakdown of true BEV sales vs. projected by publication year

Looking at figure 2, it is clear by the true BEV sales trend in the above graph that true BEV sales have increased across the country in the past decade. There was a relatively slow rate of BEV marketplace growth before 2017, with a sharper increase towards the end of the decade. It is unclear exactly why we see this change, but we can assume that factors like financial and regulatory incentives discussed previously had an effect on this growth.

In figure 2, we can see that the EIA sales projections for BEVs have become more accurate in reflecting true marketplace conditions over time because the red projected sales lines are getting closer the blue true sales line year after year. The rate of change for the BEV sales projections between years has also increased with later publish years.

Finally, there is great optimism in BEV sales projections in the 2019 publish years. Not only has the EIA changed it’s marketplace outlook over the past decade, but they have become generally more optimistic about BEV sales over time. In reality, the number of consumers choosing to buy BEVs has increased steadily over the past decade and more so in the past 3 years. This growth was unexpected, as seen in the 2014 and 2015 publications where EIA projected BEV sales to decrease in the future.

At the end of the day, more consumers are choosing to purchase BEVs than was expected at the start of the decade. EIA anticipates a substantial increase in BEV sales in the future.

Looking Forward

While we can compare past EIA forecasts to true sales to get an idea for how EIA forecasts, we do not have car sales data on years in the future. Instead, we want to look ahead at forecasts for the future to understand how EIA envisions the BEV market 15 - 25 years out from now.

Has EIA gotten more or less optimistic over time?

To explore this, we are going to look at what 2035, 2040, and 2045 were forecast to be from the 2010 to 2021 reports.

eia_future_bev <- eia %>% 
  filter(type == "BEV", 
         forcast_year %in% c( "2035", "2040", "2045")) %>%
  mutate(forcast_year = as.factor(forcast_year), 
         forcast_year = fct_recode(forcast_year, 
                                  "'35" = "2035",
                                  "'40" = "2040",
                                  "'45" = "2045"
                                  )) %>% 
  #filter(publish_year %in% c(2016, 2017)) %>% 
  ggplot()+
    geom_col(aes(x = forcast_year, y = cars_in_mil), width = 0.5)+
    facet_wrap(vars(publish_year), nrow = 1)+
    theme_minimal_hgrid(font_size = 12)+
    scale_y_continuous(expand = c(0, 0))+
    labs(x = "Forecast year ('35 = 2035)", 
         y = "Projected BEVs sold (in millions)", 
         title = "BEV sales forecasts over time", 
         subtitle = "EIA was very optomistic in 2018 and has since reduced its forecast back to 2016 levels", 
         caption = "Note: EIA only started forecasting out to '35 and '40 after 2012")

eia_future_bev
Figure 3: EIA forecasts the BEV market to grow in the future but the rate of growth is unknown.

Figure 3: EIA forecasts the BEV market to grow in the future but the rate of growth is unknown.

In Figure 3, we can see EIA only started projecting out to 2040 in 2013. 2017 is the first time EIA projected all the way out to 2045. From 2017 to 2019, EIA was optimistic about BEV sales with the peak in 2045 around 1.25 million BEVs sold. However, since 2019, forecasts for 2035, 2040, and 2045 have declined leading us to believe EIA has become less optimistic about BEV sales in the last 3 years. The most recent forecast published in 2021 projects sales numbers back around the same level as what was projected in 2016 before the peak of optimism. Now the estimates sales in 2045 is 0.75 million - half of what it was expected to be 3 years ago. The over-optimistic swell around 2018 could be partially due to the sharp increase in sales for 2017 - 2018 which we saw in Figure 1. The steep slope could have led forecasters to believe the pace at which BEV sales were increasing more rapidly than they actually were. EIA has now readjusted its forecasts for the future by reducing them consistently the past 3 years.

On the bright side, EIA is still projecting that sales will increase from 2035 to 2045 meaning the market will continue to grow just at a slower pace than what was forecasted in 2018 and 2019.

As of this year, EIA forecasts less than 500,000 BEVs will be sold in 2035 in the US. For comparison, China sold over 1 million BEVs in 2019 (source ). That means, 15 years in the future, the US is expected to sell only half as many BEVs than China sold two years ago. The US is slow to catch up to China in electric vehicle adoption. But, its possible with regulation and incentive changes, the US could speed up its BEV market growth.

Conclusions

As the climate crisis comes to a head in the coming years, environmental regulations across the US work to increase electric car sales in order to reduce carbon emissions from transportation. Financial incentives like tax rebates, regulatory steps promoting EV usage like HOV lane access and free parking, and the increasing availability of charging and EV models across the country have all significantly reduced barriers from purchasing EVs. The US Energy Information Administration published vehicle sales projections since 2010 and into the future.

We asked, how has the EIA projections fared compared to true sales in the past? More specifically, has consumer behavior followed expectations? We found that overall, more consumers have entered the BEV marketplace than were expected each year. At the beginning of the decade, EIA expected BEV sales to either not increase at all or decline eventually by 2019. By the end of the decade, EIA had more optimistic expectations, yet these were still below the true BEV sales across the country except for in 2019. Meanwhile, the true sales increased steadily from 2010 to 2017, with a sharp increase between 2017 and 2018 and leveling off in 2019. This tell us two things: 1) EIA has improved their BEV projections over time as the BEV marketplace expands, and 2) more consumers have entered the BEV marketplace than were expected.

Next, we asked how the EIA projects BEV sales to fare in the future. By the end of the decade, EIA was highly optimistic about BEV sales, projecting substantially more BEV sales in 2019 than the true sales in the 2019 publication. We have found that EIA was the most optimistic about future growth in the 2019 publication, and the 2020 and 2021 publications showed less optimism. However, they still predicted BEV marketplace growth in the future.

If we were to continue our exploration of the data, it would be interesting to see how the results changed if we were to look at different types of electric vehicles such as hybrids or plug-ins. Additionally, we would like to see 2020 sales data added to our raw data so we can draw each comparison out a year further. That may mean finding a new true sales data set since the one we used is not collecting data anymore.

Through our analysis, we see that true BEV sales have grown over the past decade. It would be interesting to research more when certain regulations changed or incentives were offered and test which regulatory or economic changes actually resulted in a statistically significant increase in EV sales.

Reducing emissions from transportation is a crucial step to fostering a green, livable, Earth. Families replacing gasoline powered cars for EVs could help lower greenhouse gas emissions and keep our air clean. So, learning which government or societal changes have actually increased BEV sales could be useful information to companies and governments with a green transportation agenda.

Appendix

Data dictionaries

EIA forecast data

Data files:

  • eia_2010.csv

  • eia_2011.csv

  • eia_2012.csv

  • eia_2013.csv

  • eia_2014.csv

  • eia_2015.csv

  • eia_2016.csv

  • eia_2017.csv

  • eia_2018.csv

  • eia_2020.csv

  • eia_2021.csv

Date downloaded: January 4, 2020

Description: Electric vehicle sale projections from Energy Information Administration (EIA) Energy Outlook Annual Reports from 2010 to 2020. Each annual report has a chart published called “Light-duty vehicle sales by technology type and Census Division United States” which breaks down projected EV sales for the next 20-30 years.

Source of downloaded files:

Original source: EIA produced these data files themselves using their own forecasting modeling system. The exact inputs to the modeling system are below: - Projected values are sourced from Projections: EIA, AEO2020 National Energy Modeling System (runs: ref2020.d112119a, highprice.d112619a, lowprice.d112619a, highmacro.d112619a, lowmacro.d112619a, highogs.d112619a, lowogs.d112619a, hirencst.d1126a, lorencst.1201a)

Modifications: Before loading the data into R we converted all the files into millions of vehicles (a few were in thousands of vehicles). We also only selected a portion of the data EIA provided in their spreadsheets because we are only interested in light duty vehicle sales for this report. Otherwise the data is identical to the data provided directly from EIA.

eia.csv is just all the years of the eia data joined together and in tidy format.

Dictionary of eia.csv:

variable class description
publish_year double Year the forecast was published
forcast_year double Year the sales are projected for
source character Where the data came from
forcastMarket character Where the forecast is for
type character Type of electric vehicle
cars_in_mil double Projected number of light duty cars sold (in millions of cars)

For reference:

  • 100_mi_BEV = 100 Mile Electric Vehicle (car that travels 100 miles on 33.7 kWh of battery power)

  • 200_mi_BEV = 200 Mile Electric Vehicle

  • 300_mi_BEV = 300 Mile Electric Vehicle

  • BEV = Battery electric vehicle (exclusively uses chemical energy stored in rechargeable battery packs)

  • PHEV = Plug-in hybrid electric vehicle (battery can be recharged by plugging a charging cable into an external electric power source)

  • BEV_PHEV = Battery electric vehicle + Plug-in hybrid electric vehicle

  • HEV = Hybrid electric vehicle (uses two or more distinct types of power, one of which being electricity)

  • CV = Conventional vehicle (uses gasoline or diesel to power an internal combustion engine)

  • NGV = Natural gas vehicle (alternative fuel vehicle that uses compressed natural gas or liquefied natural gas)

  • FCV = Fuel cell vehicle

  • NGV_FCV = Natural gas vehicle + Fuel cell vehicle

True electric vehicle sales in the US

Data File: usPevSales.csv

Date downloaded: January 4, 2020

Description: This data includes electric vehicle (EV) sales in the US by month from 2010 to 2019. The data set includes reported sales from both HybridCars.com and InsideEVs.com. The data includes information on the type of vehicle, month, year, total sales, and source.

Source of Downloaded File: - The HybridCars.com data was downloaded from this page: https://www.hybridcars.com/june-2018-hybrid-cars-sales-dashboard/

Original Data Source: HybridCars.com monthly data is “a collaboration between HybridCars.com and Baum & Associates, a Michigan-based market research firm focusing on automotive issues including the hybrid and electric vehicle market.” The InsideEVs.com monthly data was collected directly from auto manufactures who reported EV sales data, as compiled by InsideEV’s Lead Analyst Wade Malone. That data ends at 2019 since auto manufacturers began to self report EV sales then.

Validity: This data was used by Professor Helveston in his prior work. While the original data could be biased, this is unlikely since each source collaborated with external sources in their data collection.

Dictionary of usPEVSales.csv:

variable class description
category character Type of Vehicle
year double Sales Year
month character Sales Month
source character Where the data came from
sale double Monthly Vehicle Sale

Additional Plots

#create data frame of aggregate CV sales across years for publish year 2010
cv <- eia %>% 
  mutate(cv = ifelse(
    type == 'CV', cars_in_mil, ifelse(
      type == 'NGV', cars_in_mil, ifelse(
        type == 'FCV', cars_in_mil, ifelse(
          type== 'NGV_FCV', cars_in_mil, NA
        )
      )
    )
  )) %>% 
  filter(!is.na(cv),
         publish_year == 2010,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019)) %>% 
  select(forcast_year, cv) %>% 
    group_by(forcast_year) %>%  
    summarize(sum(cv))

#create data frame of aggregate EV sales across years for publish year 2010
ev <- eia %>% 
  mutate(ev= ifelse (
    type == 'BEV', cars_in_mil, ifelse(
      type == 'HEV', cars_in_mil, ifelse(
        type == 'PHEV', cars_in_mil, ifelse(
          type == 'BEV_PHEV', cars_in_mil, NA
        )
      )
    )
  )) %>% 
  filter(!is.na(ev),
         publish_year == 2010,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019)) %>% 
    select(forcast_year, ev) %>% 
    group_by(forcast_year) %>%  
    summarize(sum(ev))
  
#join data frames
proj_ev_vs_cv <- ev %>% 
  left_join(cv, by='forcast_year')

#calculate r and r-squared
corr <- cor(proj_ev_vs_cv$`sum(cv)`, proj_ev_vs_cv$`sum(ev)` ,use = "complete.obs")
corr2 <- corr^2
#create labels
corr_label <- paste0('r=', round(corr,2))
corr2_label <- paste0('r squared=', round(corr2,2))

#scatterplot
proj_ev_vs_cv_plot <- 
  proj_ev_vs_cv %>% 
  ggplot()+
  geom_point(aes(x=`sum(ev)`, y=`sum(cv)`))+
  theme_minimal_hgrid()+
  theme(panel.grid = element_blank())+
  annotate(geom='text', x=0.3, y=7, label=corr_label)+
  annotate(geom='text', x=0.33, y=6.8, label=corr2_label)+
  labs(x='EV Projected Sales', y='CV Projected Sales',
       title = 'Yearly Projected EV vs Yearly Projected CV Sales',
       subtitle = 'The EIA projected sales for EVs and CVs in 2010 are positively correlated')

proj_ev_vs_cv_plot
Figure 4: In 2010, EIA did not think consumers would replace their conventional vehicles with electric vehicles .

Figure 4: In 2010, EIA did not think consumers would replace their conventional vehicles with electric vehicles .

As seen in Figure 4, the EIA’s projections for CV and EV sales across the decade as it was published in 2010 are positively correlated. This goes against economic intuition that EVs and CVs are considered substitutable and opposite in utilization by consumers. At the beginning of the decade it was assumed that consumers would not be replacing their CV consumption with EV consumption. To explore this correlation further, it would be interesting to make the same scatter plot for each published report so see if ideas have changed.

proj_vs_true_plot <- proj_vs_true %>% 
  filter(publish_year<=2019,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019),
         type =='BEV') %>%
  mutate(sale=sale/(10^6), 
         publish_year = as.factor(publish_year)) %>% 
  ggplot()+
  geom_line(aes(x=forcast_year, y=sale), color='steelblue', size = 2) +
  geom_text(data=data.frame(x=2019.3, y=0.24, label='True'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.44, label='2019'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.18, label='2018'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.29, label='2017'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.15, label='2016'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.03, label='2015'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.04, label='2014'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.003, label='2013'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.015, label='2012'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.053, label='2011'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=-0.005, label='2010'),
            aes(x=x,y=y,label=label), size=3)+
  geom_line(aes(x=forcast_year, y=sale_proj, group=publish_year), alpha=0.5, color = "dimgray")+
  labs(x='Forecast Year', y='BEV Sales, Millions of Units', 
       title='BEV True vs. Projected Sales', 
       subtitle='BEV sales projections have become more accurate in reflecting true marketplace trends over time')+
  theme_classic()+
  theme(legend.position = 'none')+
  scale_x_continuous(expand = c(0, 0.5), breaks=seq(2010,2019,3))
  
proj_vs_true_plot
Figure 3: True BEV sales compared to what was forecasted by EIA over the past decade

Figure 3: True BEV sales compared to what was forecasted by EIA over the past decade

Figure 5 shows the results from Figure 2 but in one plot rather than faceted. The blue line represents true BEV sales over the decade. The grey lines represent projected BEV sales over each publication year. As seen above, the EIA projections have progressively gotten closer to true BEV sales over the decade.

All code used in report

library(knitr)
library(tidyverse)
library(cowplot)
library(readxl)
library(lubridate)
library(janitor)
library(here)
library(viridis)
library(ggrepel)
library(hrbrthemes)

#install.packages("rmdformats")
#library(rmdformats)


knitr::opts_chunk$set(
    fig.height = 4,
    fig.path = "figs/",
    fig.retina = 3,
    fig.width = 7.252,
    message = FALSE,
    warning = FALSE,
    comment = "#>"
)
dplyr.width = Inf

eia_2010 <- read_csv(here('data', 'eia_2010.csv'))
eia_2011 <- read_csv(here('data', 'eia_2011.csv'))
eia_2012 <- read_csv(here('data', 'eia_2012.csv'))
eia_2013 <- read_csv(here('data', 'eia_2013.csv'))
eia_2014 <- read_csv(here('data', 'eia_2014.csv'))
eia_2015 <- read_csv(here('data', 'eia_2015.csv'))
eia_2016 <- read_csv(here('data', 'eia_2016.csv'))
eia_2017 <- read_csv(here('data', 'eia_2017.csv'))
eia_2018 <- read_csv(here('data', 'eia_2018.csv'))
eia_2019 <- read_csv(here('data', 'eia_2019.csv'))
eia_2020 <- read_csv(here('data', 'eia_2020.csv'))
eia_2021 <- read_csv(here('data', 'eia_2021.csv'))

ev_sales <- read_csv(here('data', 'usPevSales.csv'))


### turning eia data into tidy data 
eia_2010 <- eia_2010 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2011 <- eia_2011 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2012 <- eia_2012 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2013 <- eia_2013 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2014 <- eia_2014 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2015 <- eia_2015 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2016 <- eia_2016 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2017 <- eia_2017 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2018 <- eia_2018 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2019 <- eia_2019 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2020 <- eia_2020 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)

eia_2021 <- eia_2021 %>% 
  gather(key = "type", 
         value = "cars_in_mil", 
         `100_mi_BEV`:NGV_FCV)



##binding together all the eia data 

eia <- eia_2010 %>% 
  full_join(eia_2011) %>% 
  full_join(eia_2012) %>% 
  full_join(eia_2013) %>% 
  full_join(eia_2014) %>%
  full_join(eia_2015) %>% 
  full_join(eia_2016) %>% 
  full_join(eia_2017) %>% 
  full_join(eia_2018) %>% 
  full_join(eia_2019) %>% 
  full_join(eia_2020) %>% 
  full_join(eia_2021) 


kable(head(eia))
#find yearly aggregates sales
ev_sales_agg <- ev_sales %>% 
  group_by(year,category) %>% 
  summarize(sum(sale)) %>% 
  #recode observations to match naming in eia
  mutate(type=ifelse(
    category == 'hybrid','HEV',ifelse(
    category == 'bev','BEV', ifelse(
    category == 'phev','PHEV',NA)))) 

#join tables
proj_vs_true <- eia %>% 
  left_join(ev_sales_agg, by=c('forcast_year'='year', 'type'='type')) %>% 
  rename(sale_proj=cars_in_mil, sale=`sum(sale)`) %>% 
  select(forcast_year, type, sale_proj, sale, publish_year)

kable(head(proj_vs_true))

eia_historical_2017 <- proj_vs_true %>% 
  filter( type== "BEV", 
        forcast_year %in% c("2016", "2017", "2018")
        #forcast_year %in% c("2016")
         )%>% 
  mutate(sale = sale/10^6) 

eia_historical_2017_bev <- eia_historical_2017 %>% 
  filter(type == "BEV")

#data frame with just category and y intercept 

df_intercept <- eia_historical_2017 %>% 
  slice(1:3) %>% 
  mutate(label = c("true sales", "", ""))

eia_historical_2017_plot <- eia_historical_2017 %>% 
  ggplot()+
  geom_point(aes(x = publish_year, y = sale_proj))+
  geom_hline( data = df_intercept, aes( yintercept = sale),
    color = "red", linetype = 'dashed'
    )+
  facet_wrap(vars(forcast_year))+
  labs(x = "Year forecast was published", 
       y = "Projected BEV sales (in millions)", 
       subtitle = "EIA is generally conservative with its BEV forecasts", 
       title = "EIA is updating and improving their forecasts every year")+
  theme_minimal_hgrid()+
  theme(panel.grid = element_blank())+
  geom_text(data = df_intercept, aes(x = 2012.5, y = 0.10, label = label), color = "red")
  

eia_historical_2017_plot 
proj_vs_true_plot <- proj_vs_true %>% 
  filter(publish_year<=2019,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019),
         type =='BEV') %>%
  mutate(sale=sale/(10^6)) %>% 
  ggplot()+
  geom_line(aes(x=forcast_year, y=sale), colour='steelblue') +
  geom_line(aes(x=forcast_year, y=sale_proj), color='red')+
  geom_text(data=data.frame(x=2017.6, y=0.03, label='projected', publish_year=2010),
            aes(x=x,y=y,label=label,publish_year=2010),color='red', size=3)+
  geom_text(data=data.frame(x=2018, y=0.28, label='true', publish_year=2010),
            aes(x=x,y=y,label=label,publish_year=2010),color='steelblue', size=3)+
  facet_wrap(vars(publish_year), nrow=2)+
  theme_minimal_hgrid(font_size=12)+
  scale_x_continuous(expand = c(0, 0.5), breaks=seq(2012,2018,3))+
  labs(x='Forecast Year', y='BEV Sales, Millions of Units',
       title='BEV True vs. Projected Sales', 
       subtitle='BEV sales projections have become more accurate in reflecting true marketplace trends over time',
       size=18)

proj_vs_true_plot
  

eia_future_bev <- eia %>% 
  filter(type == "BEV", 
         forcast_year %in% c( "2035", "2040", "2045")) %>%
  mutate(forcast_year = as.factor(forcast_year), 
         forcast_year = fct_recode(forcast_year, 
                                  "'35" = "2035",
                                  "'40" = "2040",
                                  "'45" = "2045"
                                  )) %>% 
  #filter(publish_year %in% c(2016, 2017)) %>% 
  ggplot()+
    geom_col(aes(x = forcast_year, y = cars_in_mil), width = 0.5)+
    facet_wrap(vars(publish_year), nrow = 1)+
    theme_minimal_hgrid(font_size = 12)+
    scale_y_continuous(expand = c(0, 0))+
    labs(x = "Forecast year ('35 = 2035)", 
         y = "Projected BEVs sold (in millions)", 
         title = "BEV sales forecasts over time", 
         subtitle = "EIA was very optomistic in 2018 and has since reduced its forecast back to 2016 levels", 
         caption = "Note: EIA only started forecasting out to '35 and '40 after 2012")

eia_future_bev

#create data frame of aggregate CV sales across years for publish year 2010
cv <- eia %>% 
  mutate(cv = ifelse(
    type == 'CV', cars_in_mil, ifelse(
      type == 'NGV', cars_in_mil, ifelse(
        type == 'FCV', cars_in_mil, ifelse(
          type== 'NGV_FCV', cars_in_mil, NA
        )
      )
    )
  )) %>% 
  filter(!is.na(cv),
         publish_year == 2010,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019)) %>% 
  select(forcast_year, cv) %>% 
    group_by(forcast_year) %>%  
    summarize(sum(cv))

#create data frame of aggregate EV sales across years for publish year 2010
ev <- eia %>% 
  mutate(ev= ifelse (
    type == 'BEV', cars_in_mil, ifelse(
      type == 'HEV', cars_in_mil, ifelse(
        type == 'PHEV', cars_in_mil, ifelse(
          type == 'BEV_PHEV', cars_in_mil, NA
        )
      )
    )
  )) %>% 
  filter(!is.na(ev),
         publish_year == 2010,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019)) %>% 
    select(forcast_year, ev) %>% 
    group_by(forcast_year) %>%  
    summarize(sum(ev))
  
#join data frames
proj_ev_vs_cv <- ev %>% 
  left_join(cv, by='forcast_year')

#calculate r and r-squared
corr <- cor(proj_ev_vs_cv$`sum(cv)`, proj_ev_vs_cv$`sum(ev)` ,use = "complete.obs")
corr2 <- corr^2
#create labels
corr_label <- paste0('r=', round(corr,2))
corr2_label <- paste0('r squared=', round(corr2,2))

#scatterplot
proj_ev_vs_cv_plot <- 
  proj_ev_vs_cv %>% 
  ggplot()+
  geom_point(aes(x=`sum(ev)`, y=`sum(cv)`))+
  theme_minimal_hgrid()+
  theme(panel.grid = element_blank())+
  annotate(geom='text', x=0.3, y=7, label=corr_label)+
  annotate(geom='text', x=0.33, y=6.8, label=corr2_label)+
  labs(x='EV Projected Sales', y='CV Projected Sales',
       title = 'Yearly Projected EV vs Yearly Projected CV Sales',
       subtitle = 'The EIA projected sales for EVs and CVs in 2010 are positively correlated')

proj_ev_vs_cv_plot
proj_vs_true_plot <- proj_vs_true %>% 
  filter(publish_year<=2019,
         forcast_year %in% c(2010,2011,2012,2013,2014,2015,2016,2017,2018,2019),
         type =='BEV') %>%
  mutate(sale=sale/(10^6), 
         publish_year = as.factor(publish_year)) %>% 
  ggplot()+
  geom_line(aes(x=forcast_year, y=sale), color='steelblue', size = 2) +
  geom_text(data=data.frame(x=2019.3, y=0.24, label='True'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.44, label='2019'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.18, label='2018'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.29, label='2017'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.15, label='2016'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.03, label='2015'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.04, label='2014'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.003, label='2013'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.015, label='2012'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=0.053, label='2011'),
            aes(x=x,y=y,label=label), size=3)+
  geom_text(data=data.frame(x=2019.3, y=-0.005, label='2010'),
            aes(x=x,y=y,label=label), size=3)+
  geom_line(aes(x=forcast_year, y=sale_proj, group=publish_year), alpha=0.5, color = "dimgray")+
  labs(x='Forecast Year', y='BEV Sales, Millions of Units', 
       title='BEV True vs. Projected Sales', 
       subtitle='BEV sales projections have become more accurate in reflecting true marketplace trends over time')+
  theme_classic()+
  theme(legend.position = 'none')+
  scale_x_continuous(expand = c(0, 0.5), breaks=seq(2010,2019,3))
  
proj_vs_true_plot