Research Question

US and China are two major markets for Electric Vehicles (EV). Two markets combined sold over 60% of world’s EV in 2017. Despite the high market share, car makers are facing totally different types of competition in those two markets. In US, EV market is dominated by Tesla along while in China, there are significantly more brands competing with each other.

It is really interesting that why china is able to have a good number of EV makers while the market is still growing. One factors that need to be considered here is local protectionism. This kind of policy is not like nation-wide subsidize policies from which all the EV makers can be benefited. Local governments and state-owned enterprises have great incentives to shift their EV purchases toward local brands in order to gain job creations and tax revenue. Although purchases from government agencies are small numbers, they are important for the local EV makers to survive when the EV market is at early stage, supporting infrastructures are still waiting to be built and private purchase is extremely low.

I would like to examine the market protectionism in China’s EV market, what patterns does it exhibits, how it evolves over time, how local markets are affected by it and how it affects EV technology adoption in China. For this analysis, I would like to examine is there disproportional local brand purchase in each province, are those orders coming from personal buyers or government agencies and how the EV market evolve over time.

Data Source

NEV data 2015-2017.xlsx

The data I have in hand is the insurance filing data from Innovation Center of Energy and Transportation (iCET). The data contains the location and specification information of each newly insurance car in China from 2015 to 2017. The data is not publicly disclosed so we cannot list the source for this data set.

For 2016, there is no city data in it. Most of the non-numeric data are in Chinese except some car models. In order to present in a chart, translation is required. There is also a portion of the data that is missing ownership and OEM. Overall, the dataset is a good representation of what is happening in the Chinese EV market from 2015 to 2017 since insurance is required for all new vehicles.

I split the original data into three files according to years and those three datasets are loaded here for analysis.

OEM_localtion.xlsx

This dataset is created by Innovation Center of Energy and Transportation (iCET). It includes all OEM locations in China. This is the original data that have not been manipulated at all.

passenger vehicle sales data 2015-2016.xlsx

2017 passenger vehicle sales.xlsx

Those two datasets are provided by Innovation Center of Energy and Transportation (iCET). It includes total auto sale in China from 2015 to 2016. The data is original data that has not been touched by others. Since we have some NA in the NEV sale data, there will be some bias if we combine those two dataset together. However, only a few rumber of EV sales are affected by this NA issue and giving the size of Chinese auto market, we can ignore the small bias here.

Results

In this analysis, we choose 15 provinces out of 31. The reason for this is that not all provinces have local EV makers in 2015 and 2016. In order to analysis local purchase over time, I limited the scope to provinces that have EV makers since 2015. I also distinguished the province into two groups: Top 5 Provinces and Other Provinces. Top 5 provinces include Beijing, Shanghai, Zhejiang, Guangdong and Hunan.

As shown in the previous chart, those 5 provinces in total has more EV sales than other 10 provinces’ combined in 2015 and 2016, and only slightly less in 2017. They represent provinces with relatively big EV market at the beginning stage.

For those two groups, I break down order types into two categories, personal purchases and state purchases. This is determined according to the ownership of the car. I then plot the share of local brand of EV sales based on the province group and order type and the following chart presents the result.

We can clearly see that for the top 5 provinces, the state purchase has been focusing on the local brands for three years. In 2015, when the EV market is still at the early stage, over 50% of state orders went to local brands in those 5 provinces and it stays around 50% for the next two years. This pattern is also true for the other provinces. For those provinces, local brands are the main driving force of the private and state purchases in 2015 and 2016. A high local brand share in state purchases is reasonable. Those purchases are not market oriented, meaning price and performance is not the main concern here. Relation between local automaker and government agencies could have an important role here. As for the high local brand share in private purchases in other provinces, one reason could be the less developed economy lead to a very small market that lack brand diversity. Only few brands are selling and out-of-province brands are selling more expensive models while local brands focus on cheap ones which are more affordable to the locals.

This lack of market diversity in less developed provinces is obvious in the charts above. between 2015 and 2016, top 5 provinces generally have more brands selling in its EV markets and its EV market becomes more and more competitive throughout those three years. The fairly low local brand market share reflects this high competition in the market. In the private side, only 2 to 3 other provinces out of 10 have similar brand diversity as a top 5 province. The brand diversity of other provinces catch up in 2017 and this is reflected on the decrease of local brand market share. So it is clear that high brand diversity is related to a lower local brand market share which is reasonable since the market becomes more and more competitive.

However, this is not the case in state purchases. In top 5 provinces, State purchase has a much more diversified market in which more brands are selling, but local brands are dominating the market according to Fig1. Unlike private market, a high brand diversity leads to a more competitive market so customers are not limited to local brands, state purchase orders stick to the local brands. This further indicates that strong local protectionism may exist in the state purchases.

Finally, if we take a closer look at less developed provinces, we will see that even with a diversified EV market like top 5 provinces’, local protectionism could still play an important role there. I choose Anhui, Jiangxi, Zhejiang and Beijing here. The first two provinces represent less developed province with a big local auto manufactures. Zhejiang represents highly developed province but local auto maker did not start making EV until late 2015. Beijing is a highly developed province with huge EV and auto markets.

In fig3, I plotted the local brands market share of NEV sale against the NEV market share of total auto sales. We can see a clear cluster of two provinces in the gray area. This indicates that Anhui and Jiangxi have a really small EV market comparing to its own auto market and over 75% of those EV sales are going to a handful of local brands. Meanwhile in Zhejiang, major auto maker Geely did not start its EV model until November 2015 and this is part of the reason why local brand share jumps from about 55% to over 80% in 2016. However, as the EV market had a significant growth in 2017, the local brand also see a sharp drop.

For a less developed province, local auto manufactures are very important source of income to local government and also major employers to the locals. As a result, compared to Beijing and Zhejiang where auto industry is only a piece of their economy, local auto manufactures may gain more influences to government agencies. As the EV market is too small to support all those brands compete with each other, local protectionism will be vital for local brands to survive and Fig3 indicates that a strong protectionism may exist in those less developed provinces.

An interesting finding from Fig3 is that, even in a highly developed province like Zhejiang, local protectionism may still be important for local brands when the EV market is at the beginning stage. As Geely starts its EV model in late 2015, over 80% of the EV sale goes to local brands in the next year and as EV market share tripled in 2017, this frenzy of buying local brands seems did not continue. Also if we take a look at Beijing’s curve, we can see that the local brand’s market share is dropping as the EV market grow over time. It will be interesting if we could have data from 2010 to 2015 to see when Beijing’s EV market was at the beginning stage, did similar patterns occurs too? If so, then we will be more confident to say that local protectionism play an important role in supporting local brand at early stage and lead to more competition in the future.

In conclusion, from the Fig1, we sees that local protectionism may exist in the state purchase and in less developed EV markets. Combined with Fig2, those charts indicate that private purchase in a less developed may be due to low brand diversity. Customers do not have much choices besides local brands and less developed economy in those provinces makes affordability a major concern to the customers too. In the state purchases however, high market diversity does not mean less local brand shares. This indicates a local protectionism from that side of the market. Lastly from Fig3, it indicates that local protectionism may be vital to the survival of local brands in less developed provinces. For provinces like Beijing and Zhejiang, local protectionism may be also important when the EV market is at early stage.

Appendix

The data includes following variables relative to this study:

from NEV data 2015-2017.xlsx

variable description
Province Name of the province (in Chinese)
City Name of the city (in Chinese)
Year Year of Purchase
Month Month of incident
Model Model of the EV (Some in Chinese)
OEM Name of the OEM (in Chinese)
Amount Number of NEV sold
Ownership Type of ownership of this NEV (in Chinese)

from OEM_list.xlsx:

variable description
Province Name of the province (in Chinese)
City Name of the city (in Chinese)
OEM Name of the OEM (in Chinese)

from passenger vehicle sales data 2015-2016.xlsx and 2017 passenger vehicle sales.xlsx

variable description
Province Name of the province (in Chinese)
Total Number of total sale in each province
# Load libraries and settings here
library(tidyverse)
library(here)
library(readxl)
library(janitor)
library(lubridate)
library(cowplot)
library(viridis)
library(ggpubr)

'%ni%' <- Negate('%in%')
knitr::opts_chunk$set(
    fig.height = 4,
    fig.path = "figs/",
    fig.retina = 3,
    fig.width = 7.252,
    message = FALSE,
    warning = FALSE,
    comment = "#>"
)
# Load data below here
df_evsale_15 <- read_csv(here::here('data_raw','df15.csv'))
df_evsale_16 <- read_csv(here::here('data_raw','df16.csv'))
df_evsale_17 <- read_excel(here::here('data_raw/data-raw','2017 NEV sales data-new.xlsx'))
df_oem <-  read_excel(here::here('data_raw/data-raw','OEM_list.xlsx'))
df_17autosale <- read_excel(here::here('data_raw/data-raw','2017 passenger vehicle sales.xlsx'))
df_15autosale <- read_excel(here::here('data_raw/data-raw','passenger vehicle sales data 2015-2016.xlsx'),sheet='2015')
df_16autosale <- read_excel(here::here('data_raw/data-raw','passenger vehicle sales data 2015-2016.xlsx'),sheet='2016')

colnames(df_17autosale)[1] <- 'province'


# Put any other "global" settings here, e.g. a ggplot theme:
theme_set(theme_bw(base_size = 20))

#rename the oem column
df_oem <- df_oem %>% 
  rename(oem_province=省份, oem_city=城市) %>% 
  clean_names()

# process sale data 
df_evsale_15 <- df_evsale_15 %>% 
  select("Year", 'Month','Brand',"Model","OEM","Province" ,"amount","ownership" )%>%
  clean_names()

df_evsale_17 <- df_evsale_17 %>% 
  separate(年月, into = c('year','month'),sep=4) %>% 
  select("year",'month', '省份','品牌',"车型","汽车生产企业名称" ,"amount","所有权") %>% 
  rename(
    ownership=所有权,
    province=省份,
    brand=品牌,
    model=车型,
    oem=汽车生产企业名称) %>% 
  mutate(
    month=as.numeric(month),
    year=as.numeric(year)
  ) %>%
  clean_names()

df_evsale_16 <- df_evsale_16 %>% 
  select("year", 'month','brand2',"model","OEM","province" ,"amount","所有权") %>% 
  rename(
    brand=brand2) %>% 
  rename(ownership=所有权) %>% 
  clean_names()

# merge oem_with local sale number

df_evsale_joint <- rbind(df_evsale_15,df_evsale_16,df_evsale_17)
df_evsale_joint <- df_evsale_joint %>% 
  left_join(df_oem,
            by= 'oem') %>% 
  mutate(
    local=ifelse(province==oem_province,'Local',"Not Local")) %>% 
  filter(!is.na(ownership),ownership!="未知",!is.na(local))

# get the total auto sale in each province
province_name <-c( unique(df_evsale_joint$province))
df_17autosale <- df_17autosale %>% 
  select("province","总计") %>% 
  rename(total_sale=总计) %>%
  mutate(year=2017) %>% 
  filter(province %in% province_name) %>% 
  distinct(province,total_sale,year)

df_15autosale <- df_15autosale %>% 
  rename(total_sale=amount,
         province=`Province/city`) %>%
  mutate(year=2015) %>% 
  filter(province %in% province_name) %>% 
  distinct(province,total_sale,year)

df_16autosale <- df_16autosale %>% 
  rename(total_sale=amount,
         province=`Province/city`) %>%
  mutate(year=2016) %>% 
  filter(province %in% province_name)

Auto_sale <- rbind(df_15autosale,df_16autosale,df_17autosale)


target_province <- c("北京市",
                     "上海市",
                     "广东省",
                     "浙江省",
                     "湖南省"
                    # "天津市",
                    # "山东省",
                    # "河南省",
                     #"江苏省"
                    # "福建省",
                    # "江西省"
                     )

# target_province <- c("Beijing",
#                      "Shanghai", 
#                      "Guangdong", 
#                      "Zhejiang", 
#                      "Tianjin",
#                      "Shandong",
#                      "Henan",
#                      "Jiangsu",
#                      "Fujian",
#                      "Jiangxi")
# df_evsale_joint$province[df_evsale_joint$province=='北京市'] <-'Beijing' 
# df_evsale_joint$province[df_evsale_joint$province=='上海市'] <-'Shanghai' 
# df_evsale_joint$province[df_evsale_joint$province=='广东省'] <-'Guangdong' 
# df_evsale_joint$province[df_evsale_joint$province=='浙江省'] <-'Zhejiang' 
# df_evsale_joint$province[df_evsale_joint$province=='天津市'] <-'Tianjin' 
# df_evsale_joint$province[df_evsale_joint$province=='山东省'] <-'Shandong' 
# df_evsale_joint$province[df_evsale_joint$province=='河南省'] <-'Henan' 
# df_evsale_joint$province[df_evsale_joint$province=='江苏省'] <-'Jiangsu' 
# df_evsale_joint$province[df_evsale_joint$province=='福建省'] <-'Fujian' 
# df_evsale_joint$province[df_evsale_joint$province=='江西省'] <-'Jiangxi' 
df_evsale_joint$ownership[df_evsale_joint$ownership=='个人'] <-'Personal'
df_evsale_joint$ownership[df_evsale_joint$ownership=='单位'] <-'State'
temp <- df_evsale_joint %>% 
  group_by(year,province,local) %>% 
  summarise(amount=sum(amount)) %>% 
  filter(amount!=0,year==2015,local=="Local") %>% 
  select(province)
OEM_location <- temp$province  
 
  
df_evsale_joint %>% 
  mutate(
    province=fct_other(province,keep=target_province),
    province=if_else(province!='Other','Top 5','Other')
  ) %>% 
  group_by(province,year) %>% 
  summarise(amount=sum(amount)/1000) %>% 
  ungroup() %>% 
  mutate(
    year=as_factor(year),
    province=fct_reorder2(province, year, amount,)) %>% 
  ggplot()+
  geom_col(aes(x=amount,y=province,fill=year),position='dodge')+
  theme_minimal_vgrid()+
  scale_x_continuous(
    expand = expand_scale(mult = c(0, 0.05)),
    
  )+
  labs(
    x='EV sale in thousands',
    y='',
    title='Sale comparison between top 5 provinces and other provinces ',
    caption = 'Source: Innovation Center of Energy and Transportation'
  )

#Plot 1 State VS Personal ownership
plot1 <- df_evsale_joint %>% 
  filter(province%in% target_province,province%in% OEM_location) %>% 
  mutate(
    year=as.factor(year)
    )%>% 
  group_by(year,ownership,local) %>%
  summarise(amount=sum(amount)) %>% 
  group_by(year,ownership) %>% 
  summarise(pct=amount/sum(amount),local=local) %>%
  mutate(label=str_c(round(pct*100,digits = 2),'%')) %>% 
  ggplot()+
  geom_col(aes(x=pct,y=year,fill=local))+
  # geom_text(
  # aes(x = 1-pct+0.1, y = year, label = label),
  # color = "white", size = 6)+
  scale_fill_brewer(palette="Set1")+
  theme_minimal_grid(font_size = 12)+
  facet_wrap(~ownership, ncol=5)+
  labs(
    x='',
    y='top 5 provinces',
    title = "Fig1. New EV sale in China from 2015 to 2017"
  )

plot2 <-  df_evsale_joint %>% 
  filter(province%ni% target_province,province %in% OEM_location) %>% 
  mutate(
    year=as.factor(year)
    )%>% 
  group_by(year,ownership,local) %>%
  summarise(amount=sum(amount)) %>% 
  group_by(year,ownership) %>% 
  summarise(pct=amount/sum(amount),local=local) %>% 
  mutate(label=str_c(round(pct*100,digits = 2),'%')) %>% 
  ggplot()+
  geom_col(aes(x=pct,y=year,fill=local))+
  theme_minimal_grid(font_size = 12)+
  scale_fill_brewer(palette="Set1")+
  facet_wrap(~ownership, ncol=5)+
  labs(
    x='New EV sale in percentage',
    y ='Other Provinces',
    caption = 'Source: Innovation Center of Energy and Transportation'
  )

figure <- ggarrange(plot1, plot2,nrow=2)
figure
# Plot 2 Brand Diversity 
plot_2015_1 <- df_evsale_joint %>% 
  filter(province%in% target_province, province%in% OEM_location,year==2015) %>%
  distinct(year, province, ownership, brand, local) %>% 
  group_by(year, province,ownership) %>% 
  summarise(brand_number=length(unique(brand)),
            local_brand=length(which(local=="Local"))) %>% 
  ungroup() %>% 
  mutate(label=str_c(round((local_brand/brand_number)*100,digits = 2),'%'),
      province=case_when(
      province=="北京市"~'Beijing',
      province=="上海市"~'Shanghai',
      province=="广东省"~'Guangdong',
      province=="湖南省"~'Hunan',
      province=="浙江省"~'Zhejiang'
    ),
       province=fct_reorder(province,brand_number,)
    ) %>% 
  ggplot()+
  geom_point(aes(x=brand_number,y=province),size=3,color='red')+
  geom_segment(aes(x=0,xend=brand_number,y=province,yend=province),color='grey')+
  # geom_vline(
  #   xintercept = ,
  #   color = 'red', linetype = 'dashed') +
  facet_wrap(~ownership,ncol = 2)+
  theme_minimal_vgrid()+
  # geom_text(
  #   aes(x = brand_number+2, y = province, label = label),
  #   color = "black", size = 6)+
  scale_x_continuous(
    breaks = (seq(0, 14, by = 2)),
    limits = c(0,15),
    expand = expand_scale(mult = c(0, 0.05))
  )+
  theme(text = element_text(size=10),
        axis.text.x = element_text( size = 10),
        axis.text.y = element_text( size = 10)) +
  labs(
    x='',
    y='Top 5 Provinces',
    title = 'Fig2-1. Diversity of brands in local market in 2015'
  )

plot_2015_2 <- df_evsale_joint %>% 
  filter(province%ni% target_province, province%in% OEM_location,year==2015) %>%
  distinct(year, province, ownership, brand, local) %>% 
  group_by(year, province,ownership) %>% 
  summarise(brand_number=length(unique(brand)),
            local_brand=length(which(local=="Local"))) %>% 
  ungroup() %>% 
  mutate(
      label=str_c(round((local_brand/brand_number)*100,digits = 2),'%'),
      province=case_when(
                      province=="吉林省"~'Jilin',
                      province=="四川省"~'Sichuan',
                      province=="安徽省"~'Anhui',
                      province=="江西省"~'Jiangxi',
                      province=="河南省"~'Henan',
                      province=="江苏省"~'Jiangsu',
                      province=="湖北省"~'Hubei',
                      province=="福建省"~'Fujian',
                      province=="重庆市"~'Chongqing',
                      province=="陕西省"~'Shaanxi'
                      ),
      province=fct_reorder(province,brand_number)
    ) %>% 
  ggplot()+
  geom_point(aes(x=brand_number,y=province),size=3,color='steelblue')+
  geom_segment(aes(x=0,xend=brand_number,y=province,yend=province),color='grey')+
  # geom_text(
  #   aes(x = brand_number+2, y = province, label = label),
  #   color = "black", size = 6)+
  facet_wrap(~ownership,ncol = 2)+
  theme_minimal_vgrid()+
  scale_x_continuous(
    breaks = (seq(0, 14, by = 2)),
    limits = c(0,15),
    expand = expand_scale(mult = c(0, 0.05))
  )+
  theme(text = element_text(size=10),
        axis.text.x = element_text( size = 10),
        axis.text.y = element_text( size = 10)) +
  labs(
    x='Number of brands in the market',
    y='Other Provinces'
  )
figure1 <- ggarrange(plot_2015_1, plot_2015_2,nrow=2)
figure1
# Plot 2 Brand Diversity 
plot_2016_1 <- df_evsale_joint %>% 
  filter(province%in% target_province, province%in% OEM_location,year==2016) %>%
  distinct(year, province, ownership, brand, local) %>% 
  group_by(year, province,ownership) %>% 
  summarise(brand_number=length(unique(brand)),
            local_brand=length(which(local=="Local"))) %>% 
  ungroup() %>% 
  mutate(label=str_c(round((local_brand/brand_number)*100,digits = 2),'%'),
      province=case_when(
      province=="北京市"~'Beijing',
      province=="上海市"~'Shanghai',
      province=="广东省"~'Guangdong',
      province=="湖南省"~'Hunan',
      province=="浙江省"~'Zhejiang'
    ),
  province=fct_reorder(province,brand_number,)
    ) %>% 
  ggplot()+
  geom_point(aes(x=brand_number,y=province),size=3,color='red')+
  geom_segment(aes(x=0,xend=brand_number,y=province,yend=province),color='grey')+
  facet_wrap(~ownership,ncol = 2)+
  theme_minimal_vgrid()+
  # geom_text(
  #   aes(x = brand_number+2, y = province, label = label),
  #   color = "black", size = 6)+
  scale_x_continuous(
    breaks = (seq(0, 25, by = 2)),
    limits = c(0,25),
    expand = expand_scale(mult = c(0, 0.05))
  )+
  theme(text = element_text(size=10),
        axis.text.x = element_text( size = 10),
        axis.text.y = element_text( size = 10)) +
  labs(
    x='',
    y='',
    title = 'Fig2-2.Diversity of brands in local market in 2016'
  )

plot_2016_2 <- df_evsale_joint %>% 
  filter(province%ni% target_province, province%in% OEM_location,year==2016) %>%
  distinct(year, province, ownership, brand, local) %>% 
  group_by(year, province,ownership) %>% 
  summarise(brand_number=length(unique(brand)),
            local_brand=length(which(local=="Local"))) %>% 
  ungroup() %>% 
  mutate(
      label=str_c(round((local_brand/brand_number)*100,digits = 2),'%'),
      province=case_when(
                      province=="吉林省"~'Jilin',
                      province=="四川省"~'Sichuan',
                      province=="安徽省"~'Anhui',
                      province=="江西省"~'Jiangxi',
                      province=="河南省"~'Henan',
                      province=="江苏省"~'Jiangsu',
                      province=="湖北省"~'Hubei',
                      province=="福建省"~'Fujian',
                      province=="重庆市"~'Chongqing',
                      province=="陕西省"~'Shaanxi'
                      ),
      province=fct_reorder(province,brand_number)
    ) %>% 
  ggplot()+
  geom_point(aes(x=brand_number,y=province),size=3,color='steelblue')+
  geom_segment(aes(x=0,xend=brand_number,y=province,yend=province),color='grey')+
  # geom_text(
  #   aes(x = brand_number+2, y = province, label = label),
  #   color = "black", size = 6)+
  facet_wrap(~ownership,ncol = 2)+
  theme_minimal_vgrid()+
  scale_x_continuous(
    breaks = (seq(0, 25, by = 2)),
    limits = c(0,25),
    expand = expand_scale(mult = c(0, 0.05))
  )+
  theme(text = element_text(size=10),
        axis.text.x = element_text( size = 10),
        axis.text.y = element_text( size = 10)) +
  labs(
    x='Number of brands in the market',
    y=''
  )
figure2 <- ggarrange(plot_2016_1, plot_2016_2,nrow=2)
figure2

# Plot 2 Brand Diversity 
plot_2017_1 <- df_evsale_joint %>% 
  filter(province%in% target_province, province%in% OEM_location,year==2017) %>%
  distinct(year, province, ownership, brand, local) %>% 
  group_by(year, province,ownership) %>% 
  summarise(brand_number=length(unique(brand)),
            local_brand=length(which(local=="Local"))) %>% 
  ungroup() %>% 
  mutate(label=str_c(round((local_brand/brand_number)*100,digits = 2),'%'),
      province=case_when(
      province=="北京市"~'Beijing',
      province=="上海市"~'Shanghai',
      province=="广东省"~'Guangdong',
      province=="湖南省"~'Hunan',
      province=="浙江省"~'Zhejiang'
    ),
  province=fct_reorder(province,brand_number,)
    ) %>% 
  ggplot()+
  geom_point(aes(x=brand_number,y=province),size=3,color='red')+
  geom_segment(aes(x=0,xend=brand_number,y=province,yend=province),color='grey')+
  # geom_text(
  #   aes(x = brand_number+5, y = province, label = label),
  #   color = "black", size = 6)+
  facet_wrap(~ownership,ncol = 2)+
  theme_minimal_vgrid()+
  scale_x_continuous(
    breaks = (seq(0, 40, by = 4)),
    limits = c(0,40),
    expand = expand_scale(mult = c(0, 0.05))
  )+
  theme(text = element_text(size=10),
        axis.text.x = element_text( size = 10),
        axis.text.y = element_text( size = 10)) +
  labs(
    x='',
    y='Top 5 Provinces',
    title = 'Diversity of brands in local market in 2017'
  )

plot_2017_2 <- df_evsale_joint %>% 
  filter(province%ni% target_province, province%in% OEM_location,year==2017) %>%
  distinct(year, province, ownership, brand, local) %>% 
  group_by(year, province,ownership) %>% 
  summarise(brand_number=length(unique(brand)),
            local_brand=length(which(local=="Local"))) %>% 
  ungroup() %>% 
  mutate(
      label=str_c(round((local_brand/brand_number)*100,digits = 2),'%'),
      province=case_when(
                      province=="吉林省"~'Jilin',
                      province=="四川省"~'Sichuan',
                      province=="安徽省"~'Anhui',
                      province=="江西省"~'Jiangxi',
                      province=="河南省"~'Henan',
                      province=="江苏省"~'Jiangsu',
                      province=="湖北省"~'Hubei',
                      province=="福建省"~'Fujian',
                      province=="重庆市"~'Chongqing',
                      province=="陕西省"~'Shaanxi'
                      ),
      province=fct_reorder(province,brand_number)
    ) %>% 
  ggplot()+
  geom_point(aes(x=brand_number,y=province),size=3,color='steelblue')+
  geom_segment(aes(x=0,xend=brand_number,y=province,yend=province),color='grey')+
  # geom_text(
  #   aes(x = brand_number+5, y = province, label = label),
  #   color = "black", size = 6)+
  facet_wrap(~ownership,ncol = 2)+
  theme_minimal_vgrid()+
  scale_x_continuous(
    breaks = (seq(0, 40, by = 4)),
    limits = c(0,40),
    expand = expand_scale(mult = c(0, 0.05))
  )+
  theme(text = element_text(size=10),
        axis.text.x = element_text( size = 10),
        axis.text.y = element_text( size = 10)) +
  labs(
    x='Number of brands in the market',
    y='Other Provinces',
    caption = 'Source: Innovation Center of Energy and Transportation'
  )
figure3 <- ggarrange(plot_2017_1, plot_2017_2,nrow=2)
figure3
#plot 3 big manufactures in small markets
target_province <- c("北京市",
                     #"上海市",
                     "安徽省",
                     "江西省",
                     "浙江省"
                     )
NEV_sale <- df_evsale_joint %>% 
  filter(province%in% target_province) %>% 
  group_by(year,province,local) %>%
  summarise(amount=sum(amount)) %>% 
  group_by(year,province) %>% 
  summarise(pct=100*amount/sum(amount),local=local)%>% 
  filter(local=='Local')

total_auto_sale <- Auto_sale %>% 
  filter(province%in% target_province) %>% 
  group_by(year,province) %>% 
  summarise(total_sale=sum(total_sale)) 

NEV_sale_pct <- df_evsale_joint %>% 
  filter(province%in% target_province) %>% 
  group_by(year,province) %>%
  summarise(amount=sum(amount)) %>% 
  left_join(total_auto_sale,by=c('province','year')) %>% 
  mutate(auto_sale_pct=100*amount/total_sale)

NEV_sale %>% 
  left_join(NEV_sale_pct,by=c('province','year')) %>% 
  
  mutate(
    province=case_when(
      province=="北京市"~'Beijing',
      province=="上海市"~'Shanghai',
      province=="安徽省"~'Anhui',
      province=="江西省"~'Jiangxi',
      province=="浙江省"~'Zhejiang'
      
    ),
    year=as.factor(year)
  ) %>% 
  select(province,year,pct,auto_sale_pct) %>% 
  ggplot(aes(x=auto_sale_pct,y=pct,shape=year,color=province,group=province))+
  geom_line()+
  geom_point(size=3)+
  annotate(geom = "rect",
    xmin = 0, xmax = 2.5,
    ymin = 75, ymax = 100,
    fill = "grey55", alpha = 0.2) +
 #theme_minimal_grid(font_size = 16)+
  theme_half_open(font_size = 15)+
  scale_x_continuous(
  breaks = c(0, 2, 4,6, 8),
  limits = c(0 , 10),
  expand = expand_scale(mult = c(0, 0.05)))+
  annotate(geom = 'text', x = 1.5, y = 100,
             label = 'Anhui', size =4 , color = 'orange')+
  annotate(geom = 'text', x = 3, y = 90,
             label = 'Jiangxi', size = 4, color = 'green')+
  annotate(geom = 'text', x = 3, y = 40,
             label = 'Zhejiang', size = 4, color = 'purple')+
  annotate(geom = 'text', x = 9, y =32 ,
             label = 'Beijing', size = 4, color = 'steel blue')+
  #annotate(geom = 'text', x = 11, y = 37,
             #label = 'Shanghai', size = 4, color = 'steel blue')+
  labs(
    x='NEV market share of total sales in %',
    y='Local brands market share of NEV sales in %',
    title = 'Fig3. local brand market share vs NEV market share',
    caption = 'Source: Innovation Center of Energy and Transportation'

  )
  #theme(legend.position = 'none')