Research Question:

How does FEMA (Federal Emergency Management Agency), complete disaster relief projects that are assigned to different CRC’s (Consolidated Recourse Centers) throughout the country?

About the Data:

Our data is a Microsoft Excel spreadsheet that has been provided to us by a representative that works for FEMA. Because the data has been directly given to us by Julia Moline, we know that the data is from a primary source and processed. Julia currently works for FEMA and was kind enough to present us with this data set upon request. We also know from her description, that it is original data. The data consists of 19 variables describing some 119,959 projects that have been assigned to FEMA. The variables include the project number, Consolidated Resource Center (CRC) assigned to, category of project (i.e Debris Removal, Emergency Protective Measures, Utilities), net cost, federal share, project status type, and many more. The data covers all projects FEMA processes from the years 2016-2020. It is a private data set that FEMA internally collects to track all projects that come through FEMA.

Describing the Data:

The main variables of interest in our project were CRC Net Cost, Sent to CRC date, Sent to EHP date, Catagory, and CRC. Our analysis begins exploring the money that FEMA spends within each CRC. To do this, both the “CRC variable”, which describes which CRC the project went through, and the “CRC Net Cost” variable, which gives the total USD amount spent on that project, are crucial. As we move into analyzing how many projects are processed, the “Catagory” variable is important to make conclusions regarding the workload FEMA is seeing. Lastly, when analyzing the time projects take to process, the “Sent to CRC” and “Sent to EHP” variables are important. “Sent to CRC” essentially represents the date that FEMA recieved the project and “Sent to EHP” represents the date the project was completed.

Load all the data sets:

xlsxPath <- here::here('data_raw', 'Copy of Data for S20.xlsm')
FemaData <- read_excel(xlsxPath, sheet = 'ProjectsQuery')
glimpse(FemaData)
## Observations: 119,959
## Variables: 19
## $ `Project #`                   <dbl> 550, 541, 597, 665, 646, 376, 377,…
## $ `EMMIE P/W #`                 <dbl> 437, 200, 339, NA, 248, 222, 379, …
## $ CRC                           <chr> "PA CRC Central", "PA CRC Central"…
## $ Category                      <chr> "B - Emergency Protective Measures…
## $ Event                         <chr> "4284DR-GA (4284DR)", "4284DR-GA (…
## $ Type                          <chr> "Work Completed / Fully Documented…
## $ `Sent to CRC Date`            <dttm> 2017-03-22 18:05:06, 2017-02-17 2…
## $ `Sent to EHP Date`            <dttm> 2017-05-04 16:11:54, 2017-03-21 1…
## $ Large_Project                 <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRU…
## $ `Approx Cost`                 <dbl> 489654.27, 13077807.05, 3730869.57…
## $ `CRC Net Cost`                <dbl> 489654.27, 13078205.08, 3730869.57…
## $ `Federal Share`               <dbl> 367240.703, 10462564.064, 2798152.…
## $ version                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Process Step`                <chr> "Obligated", "Obligated", "Obligat…
## $ `Phase #`                     <chr> "Phase: 5", "Phase: 5", "Phase: 5"…
## $ Applicant_Project_Status_Type <chr> "Active", "Active", "Active", "Del…
## $ `Alternate Project Count`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Self-Cert Count`             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Expedited Project Count`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Filtering out the discontinued projects:

filteredFemaData <- FemaData %>%
  clean_names(case = 'lower_camel') %>% 
  filter(str_detect(processStep, "Process Discontinued") == FALSE) %>%
  filter(!is.na(crc))
glimpse(filteredFemaData)
## Observations: 69,584
## Variables: 19
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
unique(filteredFemaData$crc)
## [1] "PA CRC Central"  "PA CRC East"     "PA CRC West"     "PA CRC Atlantic"
FemaPlusTime <- filteredFemaData %>%
  mutate(time = as.double(sentToEhpDate - sentToCrcDate)/(3600*24)) 
    
glimpse(FemaPlusTime)
## Observations: 69,584
## Variables: 20
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ time                       <dbl> 42.921387, 31.950747, 12.775515, NA, …

Following the Money

# total cost of each crc per year

glimpse(filteredFemaData)
## Observations: 69,584
## Variables: 19
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
CRCYearCost <- filteredFemaData %>% 
    filter(!is.na(sentToCrcDate)) %>% 
    filter(!is.na(crcNetCost)) %>%
    mutate(year = year(sentToCrcDate)) %>% 
    group_by(year) %>% 
    summarise(totalCost = sum(crcNetCost)/1000000000)


CRCYearCostPerCRC <- filteredFemaData %>%
    filter(!is.na(sentToCrcDate)) %>% 
    filter(!is.na(crcNetCost)) %>%
    group_by(year(sentToCrcDate), crc) %>%
    summarise(totalCost = sum(crcNetCost)/1000000000)


label <- ("2019 had a drastically higher 
total cost compared to 
other years. Why?")

glimpse(CRCYearCost)
## Observations: 5
## Variables: 2
## $ year      <dbl> 2016, 2017, 2018, 2019, 2020
## $ totalCost <dbl> 0.00184426, 0.29458422, 3.10428014, 18.45513223, 0.497…
ggplot(CRCYearCost) +
    geom_area(aes(x = year, y = totalCost), alpha = 0.9, fill = '#3A6190') +
    geom_line(aes(x = 2019, y = totalCost - 0.25), color = '#E75146', lwd = 2 ) +
    geom_line(aes(x = year, y = totalCost)) +
    scale_x_continuous(expand = expand_scale(mult = c(0, 0.05))) +
    theme_minimal_hgrid() + 
    labs(x = 'Year',
         y = 'Total Cost (in billions USD)', 
         title = 'Total CRC Net Cost per Year',
         subtitle = 'From the years 2016-2020') +
    geom_label(aes(x = 2017, y = 12.5, label = label), hjust = 0)

This graph is displaying the summed net cost accross all four CRCs for each given year. The data shows that there is a significant jump in the net cost in 2019 that reaches a wopping 18.5 billion dollars spent. In the next graph, we will investigate why this has happened, and what factors contribute to the influx.

CRCMoney <- filteredFemaData %>%
    filter(!is.na(crcNetCost)) %>%
    filter(!is.na(crc)) %>% 
    group_by(crc) %>%
    summarise(total = sum(crcNetCost)/1000000000)  
    
   glimpse(CRCMoney)
## Observations: 4
## Variables: 2
## $ crc   <chr> "PA CRC Atlantic", "PA CRC Central", "PA CRC East", "PA CR…
## $ total <dbl> 11.005482, 5.677476, 4.271855, 2.259152
ggplot(CRCMoney) +
    geom_col(aes(x = fct_reorder(crc,total), y = total), alpha = 0.9, fill = '#3A6190') +
    coord_flip() +
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.05)),
                     breaks = c(0, 3, 6, 9, 12)) +
    theme_minimal_vgrid() +
    labs(
          x = "CRC" , 
          y = "Total Money Spent (in billions USD)",
          title = 'Total Money Spent per CRC',
          subtitle = 'From the years 2016-2020'
      )

This graph shows the distribution of money spent in each specific CRC. It is clear to see that the CRC Atlantic has spent the most money from the years 2016-2020. It makes sense that the Atlantic coast takes up most of the expenses especially because they have to deal with the most dangerous tropical storms. CRC Atlantic is located in Puerto Rico and covers projects done in Florida and the US Virgin Islands as well. Upon further investigation on why CRC Atlantic spent so much money during this time, I found that in 2019, Puerto Rico and the US Virgin Islands were still rebuilding after the damage done by hurricane Maria, which struck in 2017. Also, due to the damage done by hurricane Irma (2017), Florida was still rebuilding through mid year 2019. With the social eye being on rebuilding after those hard times, it put pressure on the US Government to offer significant help and we see they did just that.

#breakdown of each Category cost

CRC2019 <- filteredFemaData %>% 
    mutate(year = year(sentToCrcDate)) %>% 
    filter(year == 2019) %>% 
    filter(!is.na(crcNetCost)) %>% 
    group_by(category) %>% 
    summarise(cost = sum(crcNetCost)/1000000000)
glimpse(CRC2019)
## Observations: 11
## Variables: 2
## $ category <chr> "A - Debris Removal", "B - Emergency Protective Measure…
## $ cost     <dbl> 3.36880164, 7.87217392, 0.01130943, 1.32683969, 0.30877…
label1 <- ('Category B (Emergency 
protective Measures), 
consumes more money than the other 
categories significantly.')

ggplot(CRC2019)+
    geom_col(aes(x = fct_reorder(category,cost), y = cost), fill = '#3A6190', alpha = 0.9)+
    geom_col(aes(x = 'B - Emergency Protective Measures', y = 0.426*cost), fill = '#E75146')+
    coord_flip()+
    theme_minimal_vgrid()+
    scale_y_continuous(expand = expand_scale(c(0,0.05)), breaks = c(0, 2, 4, 6, 8))+
    labs(x = 'Damage Category',
         y = 'Cost in 2019 (in billions USD)',
         title = 'Category of Disaster vs. Cost',
         subtitle = 'From the year 2019'
       )

This graph is displaying the distribution of expenses between the different categories. It is very clear to see that the category that contributed most to the 2019 total net cost was the Emergency Protective Measures category. This category far outweighs the others as its cost is more than double the second most expensive category. As we learn from this graph, most of the damages done by a hurricanes fall into this catagory.

Figuring out the distribution of projects:

FemaCount <- FemaPlusTime %>%  
  filter(!is.na(crc)) %>% 
  select(category, crc) %>% 
  group_by(category) %>% 
  summarise(count = n()) 


glimpse(FemaCount)
## Observations: 11
## Variables: 2
## $ category <chr> "A - Debris Removal", "B - Emergency Protective Measure…
## $ count    <int> 7596, 11126, 482, 17514, 1558, 14296, 4300, 6540, 2552,…
ggplot(FemaCount)+
  geom_col(aes(x = fct_reorder(category, count), y = count), fill = '#3A6190', alpha = 0.9) +
  coord_flip()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05)))+
  theme_minimal_vgrid()+
  labs(
        x = "Damage Category" ,
        y = 'Total # of Projects',
        title = 'Damage Category vs. \nTotal Projects',
        fill = 'CRC',
        subtitle = "From the years 2016-2020"
      )

This graph shows the damage category occurences from the years 2016-2020. This helps us to realize what type of category gets the most attention. As you can see, the top 3 categories are categories that are essential in maintaining productive civiliation. For insance, if the roads and bridges category was near the bottom of the list, you would see a country that does not do a whole lot to maintain their travelling ability throughout the country. The US has a fantastic road system, which helps a lot with getting anywhere you want in a very convenient way. Another thing to point out is that debris removal will have a high variance due to inclimate weather in different years. For instance, in 2019, the hurricane season was very bad, which caused for high expense in the debris removal category.

CRC Analysis

CRCSide <- FemaPlusTime %>%
    filter(!is.na(crc)) %>% 
    group_by(category, crc) %>% 
    summarise(count = n()/1000) 

glimpse(CRCSide)
## Observations: 43
## Variables: 3
## Groups: category [11]
## $ category <chr> "A - Debris Removal", "A - Debris Removal", "A - Debris…
## $ crc      <chr> "PA CRC Atlantic", "PA CRC Central", "PA CRC East", "PA…
## $ count    <dbl> 0.568, 3.773, 2.932, 0.323, 1.712, 5.881, 2.896, 0.637,…
ggplot(CRCSide) +
  geom_col(aes(x = fct_reorder(category, count), y = count),  fill = '#3A6190', alpha = 0.9) +
  facet_wrap(~crc, ncol = 2) +
  coord_flip() +
  scale_y_continuous(expand = expand_scale(c(0,0.05)))+
  theme_minimal_vgrid()+
  labs(
        x = 'Disaster Category' , 
        y = 'Total # of Projects (in thousands)' ,
        title = "Number of Projects \nOver Different CRC's",
        subtitle = 'From the years 2016-2020'
      )

This graph is highlighting what the 4 different CRC’s deal with over the years. It is clear to see that CRC Central deals mostly with road and bridges due to the interstate system that runs right through the middle of the country. There is expected to be a higher correlation between the different CRC’s total incidents and the total net cost, but surprisingly, there is not a strong correlation. For example, CRC Atlantic had the highest CRC net cost, yet it doesn’t have nearly the number of projects that CRC Central has. This is due to the significance of each project and whether or not it is costing a large chunk of money. While CRC Central has more projects, CRC Atlantic likely has a much larger cost per project.

Project Distribution Over the Years

glimpse(FemaPlusTime)
## Observations: 69,584
## Variables: 20
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ time                       <dbl> 42.921387, 31.950747, 12.775515, NA, …
femaYears <- FemaPlusTime %>% 
  mutate(year = year(sentToCrcDate)) %>% 
  filter(!is.na(sentToCrcDate)) %>% 
  group_by(year, category) %>% 
  summarise(count = n()/1000) %>% 
  filter(year != 2016) %>% 
  filter(year != 2020) 

glimpse(femaYears)
## Observations: 29
## Variables: 3
## Groups: year [3]
## $ year     <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2…
## $ category <chr> "A - Debris Removal", "B - Emergency Protective Measure…
## $ count    <dbl> 0.432, 0.441, 0.563, 0.018, 0.129, 0.153, 0.095, 0.004,…
ggplot(femaYears) +
  geom_col(aes(x = fct_reorder(category, count), y = count), fill = '#3A6190', alpha = 0.9)+
  facet_wrap(~year, nrow = 2)+
  coord_flip()+
  theme_minimal_vgrid() +
  labs(x = 'Damage Category',
       y = '# of Incidents (in thousands)',
       title = 'Damage Category vs. # of Incidents)',
       subtitle = 'From the years 2017-2019'
  )

This graph is describing the yearly rundown of what category was called upon the most. It is also showing how the projects needed ranges year to year. 2019 was clearly a big year given that it had significantly more projects to complete than the previous years. The reason for the range of years from 2017-2019 was due to the fact the there was very little data collected from our source on 2016 and 2020. I would predict that the 2020 damage category that would outweigh the others would have to be Emergency Protective Measures. With the Coronavirus surging, I would suspect that there will be a mass number of projects in 2020.

Breakdown of Time for each CRC

CRCTime <- FemaPlusTime %>%
  filter(!is.na(time)) %>%
  group_by(crc) %>% 
  summarise( totalTime = sum(time)/1000)


ggplot(CRCTime) +
  geom_col(aes(x = fct_reorder(crc,totalTime), y = totalTime), fill = '#3A6190', alpha = 0.9) +
    scale_y_continuous(expand = expand_scale(c(0,0.05)))+
    coord_flip() +
    theme_minimal_vgrid()+
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
    labs(
        title = 'Total Hours Spent on Projects per CRC',
        x = 'CRC' , 
        y = '# of total processing hours spent (in thousands)',
        subtitle = "From the years 2016-2020"
      )

Here in the plot abive, we can see the summed # of processing hours spent. Numbers in the x-axis are pretty big just because it sums the processing times for all projects across all four CRC’s. We know that many projects can be undertaken at the same time so if we sum all the days that took to complete each project, the number would be huge. This means that CRC has a lot of projects going on at their hands.In order to follow up with them, they need to process as many projects as they can in a timely manner.

As seen in the plot, CRC Central and CRC East is taking a big load in FEMA system. On the other hand CRC Atlantic and CRC West is taking relatively little amount of load if compared to Central and West. So in order to better see the time distribution, I want to seperate them into two groups.

CRCTime2 <- FemaPlusTime %>% 
  filter(!is.na(crc)) %>% 
  filter(!is.na(time)) %>%
  filter(time >0) %>% 
  filter(str_detect(crc, "PA CRC Atlantic") | str_detect(crc,"PA CRC West") == T) %>% 
  mutate(
          category = as.factor(category)
        ) %>%
  group_by(category)  
CRCTime2
## # A tibble: 4,750 x 20
## # Groups:   category [7]
##    projectNumber emmiePWNumber crc        
##            <dbl>         <dbl> <chr>      
##  1         24413             2 PA CRC West
##  2         35478           333 PA CRC West
##  3         39191           350 PA CRC West
##  4         26979           117 PA CRC West
##  5         39951           255 PA CRC West
##  6         40658           194 PA CRC West
##  7         48315           314 PA CRC West
##  8         43397           251 PA CRC West
##  9         41156           299 PA CRC West
## 10         41161           273 PA CRC West
##    category                                            event             
##    <fct>                                               <chr>             
##  1 A - Debris Removal                                  4344DR-CA (4344DR)
##  2 B - Emergency Protective Measures                   4344DR-CA (4344DR)
##  3 A - Debris Removal                                  4344DR-CA (4344DR)
##  4 G - Parks, Recreational Facilities, and Other Items 4344DR-CA (4344DR)
##  5 C - Roads and Bridges                               4344DR-CA (4344DR)
##  6 E - Buildings and Equipment                         4344DR-CA (4344DR)
##  7 E - Buildings and Equipment                         4344DR-CA (4344DR)
##  8 G - Parks, Recreational Facilities, and Other Items 4344DR-CA (4344DR)
##  9 C - Roads and Bridges                               4344DR-CA (4344DR)
## 10 G - Parks, Recreational Facilities, and Other Items 4344DR-CA (4344DR)
##    type                              sentToCrcDate      
##    <chr>                             <dttm>             
##  1 Work Completed / Fully Documented 2019-07-18 15:39:31
##  2 Work Completed / Fully Documented 2018-08-27 22:12:20
##  3 Work Completed / Fully Documented 2019-04-24 23:44:03
##  4 Specialized                       2018-05-14 20:52:04
##  5 Standard                          2018-07-02 21:56:03
##  6 Specialized                       2018-06-06 21:52:04
##  7 Standard                          2018-08-13 14:25:00
##  8 Standard                          2018-06-07 23:42:19
##  9 Standard                          2018-07-12 16:46:03
## 10 Standard                          2018-07-12 15:28:27
##    sentToEhpDate       largeProject approxCost crcNetCost federalShare
##    <dttm>              <lgl>             <dbl>      <dbl>        <dbl>
##  1 2019-08-15 20:43:32 TRUE         153855201. 163755201.   147379681.
##  2 2019-02-15 22:53:26 TRUE          20500000   14906794.    14906794.
##  3 2019-07-19 18:42:05 TRUE            250000    1329162.     1196246.
##  4 2018-07-12 17:51:54 TRUE            500000     544019       408014.
##  5 2018-07-27 16:34:52 FALSE           100000     103703.       77777.
##  6 2018-10-04 15:21:58 TRUE           1058400     485100       363825 
##  7 2018-09-20 15:43:56 FALSE            50000      40022.       30017.
##  8 2018-07-27 16:27:20 FALSE            73056      17236.       12927.
##  9 2018-10-12 21:56:07 TRUE            364005     139585.      104689.
## 10 2018-08-09 15:51:27 TRUE            244354     200573.      150430.
##    version processStep                    phaseNumber
##      <dbl> <chr>                          <chr>      
##  1       1 Pending Final FEMA Review      Phase: 4   
##  2       0 Pending Recipient Final Review Phase: 4   
##  3       0 Pending Recipient Final Review Phase: 4   
##  4       0 Obligated                      Phase: 5   
##  5       0 Obligated                      Phase: 5   
##  6       0 Obligated                      Phase: 5   
##  7       0 Obligated                      Phase: 5   
##  8       0 Obligated                      Phase: 5   
##  9       0 Obligated                      Phase: 5   
## 10       0 Obligated                      Phase: 5   
##    applicantProjectStatusType alternateProjectCount selfCertCount
##    <chr>                                      <dbl>         <dbl>
##  1 Active                                         0             0
##  2 Active                                         0             0
##  3 Active                                         0             0
##  4 Active                                         0             0
##  5 Active                                         0             0
##  6 Active                                         0             0
##  7 Active                                         0             0
##  8 Active                                         0             0
##  9 Active                                         0             0
## 10 Active                                         0             0
##    expeditedProjectCount  time
##                    <dbl> <dbl>
##  1                     1  28.2
##  2                     0 172. 
##  3                     0  85.8
##  4                     0  58.9
##  5                     0  24.8
##  6                     0 120. 
##  7                     0  38.1
##  8                     0  49.7
##  9                     0  92.2
## 10                     0  28.0
## # … with 4,740 more rows
ggplot(CRCTime2)+
  geom_col(aes(x = fct_reorder(category , time)  , y = time , fill = type)) +
  facet_wrap(~crc, nrow = 2)+
  coord_flip()+
  theme_minimal_vgrid()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.1))) +
  labs(
        title = 'Time Spent on Different CRCs' , 
        x = 'Damage Categoty', 
        y = 'Total # of Processing Hours Spent(Summed)' , 
        fill = 'Type of Project',
        subtitle = 'From the years 2016-2020'
      )

Here in the plot above, we can see the total processing days spent of Atlantic and West in different damage categories.As we can see in the plots, CRC Atlantic mostly spent time on processing Category C which is ‘Roads and Bridges’ however CRC West nearly spent 1/3 of the that time on Category C. West spent more time on Category E type of Damage. This uneven distribution between CRC’s can simply be due to the different events that occurs in that region.

We can also see that no Expedidet projects processed in CRC Atlantic but there are some time spent on it at the West.

Another point to highlight would be that most of the projects that have been processed in CRC Atlantic are the ‘Standart’ type of projects, There are a good portion of ‘Completed’ projects as well but not that much compared to ‘Standart’. Also some amount of ‘Specialized’ projects have been processed as well but it is relatively small if compared to other types of projects.

CRCTime3 <- FemaPlusTime %>% 
  filter(!is.na(crc)) %>% 
  filter(!is.na(time)) %>%
  filter(time >0) %>% 
  filter(str_detect(crc, "PA CRC East") | str_detect(crc,"PA CRC Central") == T) %>% 
  mutate(
          category = as.factor(category) , 
          time = time/1000
        ) %>% 
  group_by(category)
CRCTime3
## # A tibble: 32,075 x 20
## # Groups:   category [7]
##    projectNumber emmiePWNumber crc           
##            <dbl>         <dbl> <chr>         
##  1           550           437 PA CRC Central
##  2           541           200 PA CRC Central
##  3           597           339 PA CRC Central
##  4           244           101 PA CRC Central
##  5           257           476 PA CRC Central
##  6           596           100 PA CRC Central
##  7           534           223 PA CRC Central
##  8           689           350 PA CRC Central
##  9           751           446 PA CRC Central
## 10           746           411 PA CRC Central
##    category                                            event             
##    <fct>                                               <chr>             
##  1 B - Emergency Protective Measures                   4284DR-GA (4284DR)
##  2 A - Debris Removal                                  4284DR-GA (4284DR)
##  3 A - Debris Removal                                  4284DR-GA (4284DR)
##  4 A - Debris Removal                                  4284DR-GA (4284DR)
##  5 B - Emergency Protective Measures                   4284DR-GA (4284DR)
##  6 A - Debris Removal                                  4284DR-GA (4284DR)
##  7 A - Debris Removal                                  4284DR-GA (4284DR)
##  8 G - Parks, Recreational Facilities, and Other Items 4284DR-GA (4284DR)
##  9 F - Utilities                                       4284DR-GA (4284DR)
## 10 G - Parks, Recreational Facilities, and Other Items 4284DR-GA (4284DR)
##    type                              sentToCrcDate      
##    <chr>                             <dttm>             
##  1 Work Completed / Fully Documented 2017-03-22 18:05:06
##  2 Work Completed / Fully Documented 2017-02-17 21:09:41
##  3 Standard                          2017-03-29 18:53:50
##  4 Work Completed / Fully Documented 2017-01-26 18:57:20
##  5 Work Completed / Fully Documented 2017-03-13 20:55:06
##  6 Work Completed / Fully Documented 2017-03-03 15:02:17
##  7 Work Completed / Fully Documented 2017-02-09 21:05:59
##  8 Work Completed / Fully Documented 2017-02-14 19:09:40
##  9 Work Completed / Fully Documented 2017-03-03 11:45:19
## 10 Standard                          2017-04-14 15:33:51
##    sentToEhpDate       largeProject approxCost crcNetCost federalShare
##    <dttm>              <lgl>             <dbl>      <dbl>        <dbl>
##  1 2017-05-04 16:11:54 TRUE            489654.    489654.      367241.
##  2 2017-03-21 19:58:45 TRUE          13077807.  13078205.    10462564.
##  3 2017-04-11 13:30:35 TRUE           3730870.   3730870.     2798152.
##  4 2017-02-09 20:58:06 TRUE           7899645.   7912152.     6725329.
##  5 2017-07-21 17:23:59 TRUE            728724.    838166.      628624.
##  6 2017-03-13 13:46:11 FALSE           110284.    110284.       93742.
##  7 2017-05-01 19:21:47 FALSE            39842.     39842.       31874.
##  8 2017-04-13 19:38:50 FALSE             5932.      5932.        4449.
##  9 2017-05-11 13:48:48 TRUE            134263.    134747.      101061.
## 10 2017-04-28 14:46:03 FALSE            38854.     24674.       18506.
##    version processStep phaseNumber applicantProjectStatusType
##      <dbl> <chr>       <chr>       <chr>                     
##  1       0 Obligated   Phase: 5    Active                    
##  2       0 Obligated   Phase: 5    Active                    
##  3       0 Obligated   Phase: 5    Active                    
##  4       0 Obligated   Phase: 5    Active                    
##  5       0 Obligated   Phase: 5    Active                    
##  6       0 Obligated   Phase: 5    Active                    
##  7       0 Obligated   Phase: 5    Active                    
##  8       0 Obligated   Phase: 5    Active                    
##  9       0 Obligated   Phase: 5    Active                    
## 10       0 Obligated   Phase: 5    Active                    
##    alternateProjectCount selfCertCount expeditedProjectCount    time
##                    <dbl>         <dbl>                 <dbl>   <dbl>
##  1                     0             0                     0 0.0429 
##  2                     0             0                     0 0.0320 
##  3                     0             0                     0 0.0128 
##  4                     0             0                     0 0.0141 
##  5                     0             0                     0 0.130  
##  6                     0             0                     0 0.00995
##  7                     0             0                     0 0.0809 
##  8                     0             0                     0 0.0580 
##  9                     0             0                     0 0.0691 
## 10                     0             0                     0 0.0140 
## # … with 32,065 more rows
ggplot(CRCTime3)+
  geom_col(aes(x = fct_reorder(category , time)  , y = time , fill = type)) +
  facet_wrap(~crc, nrow = 2)+
  theme_minimal_vgrid()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.5))) +
  coord_flip()+
  
  labs(
        title = 'Time Spent on Different CRCs' , 
        x = 'Damage Categoty', 
        y = '# of Hours Spent on Projects(in thousands)' , 
        fill = 'Type of Project'
      )

Here in the plot above, we can see the total processing days spent of Central and East in different damage categories. We can seethat Central spend the most time on Category C projects however, East spent more time on Category E. In fact, Central spent more time on Category E projects than East which gives us the idea that, Central is the CRC with the highest capacity across the US(this was also obvious from the first plot). Another thing we can understand from the plots is that, Central Spends most of their time on Category C and E projects, rest is nearly evenly distributed except Category D projects which is much less than the other categories. We cannot see neither the spikes in certain categories or nearly even distribution in East.

There little or no ‘Expedited’ types of projects in both CRC’s.

We saw the dominance of ‘Standat’ types of projects in Atlantic and even distribution in West, if we come to this one, with only an eye inspection, we can say that ‘Completed’ type of projects look relatively took more time in the process. We can also add that ‘Specialized’ types of projects in both CRC’s seem much less than the previous ones, but this might be due to the dimensions and axis labels of the plots. Needs to be double-checked from the actual dataset to confirm.

CRCTime4 <- FemaPlusTime %>% 
  filter(crc == "PA CRC Central") %>%
  filter(!is.na(crc)) %>% 
  filter(!is.na(time)) %>%
  filter(time >0) %>% 
  group_by(category) %>% 
  summarise(avg_time = mean(time)) %>% 
    mutate(total_average = mean(avg_time))
CRCTime4
## # A tibble: 7 x 3
##   category                                            avg_time
##   <chr>                                                  <dbl>
## 1 A - Debris Removal                                      31.9
## 2 B - Emergency Protective Measures                       31.2
## 3 C - Roads and Bridges                                   46.6
## 4 D - Water Control Facilities                            64.8
## 5 E - Buildings and Equipment                             62.3
## 6 F - Utilities                                           57.5
## 7 G - Parks, Recreational Facilities, and Other Items     65.6
##   total_average
##           <dbl>
## 1          51.4
## 2          51.4
## 3          51.4
## 4          51.4
## 5          51.4
## 6          51.4
## 7          51.4
ggplot(CRCTime4)+
  geom_col(aes(x = fct_reorder(category,avg_time) , y = avg_time), alpha = 0.9, fill = '#3A6190')+
  geom_hline(aes(yintercept = total_average),
            color = '#E75146')+
  coord_flip()+
  theme_minimal_vgrid()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
  labs(
        title = 'Average Time Spent\nFor Different\nDamage Categories\n at CRC Central' , 
        x = 'Damage Category' , 
        y = '# of Days Spent'
      )+
    geom_label(aes(x = 'B - Emergency Protective Measures', y = 50, label = 'Mean time spent\n for all categories'))

In order to see how different categories are processed in different locations, I wanted to find the average processing times for different categories in CRC’s.

Here in the plot above, we can see that Category G takes the most amount of time in order to be processed at CRC Cental. For category B projects however, processing time is nearly half of the time taken for processing Category G.

CRC Central takes at least around 30 days in order to process a project.

CRC Central takes at most around 60 days in order to proces a project.

CRCTime5 <- FemaPlusTime %>% 
  filter(crc == "PA CRC East") %>%
  filter(!is.na(crc)) %>% 
  filter(!is.na(time)) %>%
  filter(time >0) %>% 
  group_by(category) %>% 
  summarise(avg_time = mean(time)) %>% 
    mutate(mean_total = mean(avg_time))
CRCTime5
## # A tibble: 7 x 3
##   category                                            avg_time mean_total
##   <chr>                                                  <dbl>      <dbl>
## 1 A - Debris Removal                                      66.3       65.3
## 2 B - Emergency Protective Measures                       50.8       65.3
## 3 C - Roads and Bridges                                   54.4       65.3
## 4 D - Water Control Facilities                            71.2       65.3
## 5 E - Buildings and Equipment                             73.5       65.3
## 6 F - Utilities                                           67.9       65.3
## 7 G - Parks, Recreational Facilities, and Other Items     73.1       65.3
ggplot(CRCTime5)+
  geom_col(aes(x = fct_reorder(category,avg_time) , y = avg_time), alpha = 0.9, fill = '#3A6190')+
    geom_hline(aes(yintercept = mean_total),
            color = '#E75146')+
  coord_flip()+
  theme_minimal_vgrid()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
  labs(
        title = 'Average Time Spent\nFor Different\n Damage Categories\nat CRC East' , 
        x = 'Damage Category' , 
        y = '# of Days Spent'
      )

In this plot we can see the average time taken for different damage categories in CRC East. There slight shifts in the order if compared to Central. Instead of G, CRC East takes the most time on processing category E. Category E is followed by G and D which are nearly same likewise CRC Central.

Category B also takes the least amount of average time to process in CRC East as well. So we can easily say that, easiest projects that are processed in these two CRC’s are categories A,C and B.

CRCTime6 <- FemaPlusTime %>% 
  filter(crc == "PA CRC Atlantic") %>%
  filter(!is.na(crc)) %>% 
  filter(!is.na(time)) %>%
  filter(time >0) %>% 
  group_by(category) %>% 
  summarise(avg_time = mean(time)) %>% 
    mutate(mean_total = mean(avg_time))
CRCTime6
## # A tibble: 7 x 3
##   category                                            avg_time mean_total
##   <chr>                                                  <dbl>      <dbl>
## 1 A - Debris Removal                                      13.8       59.3
## 2 B - Emergency Protective Measures                       19.3       59.3
## 3 C - Roads and Bridges                                   81.3       59.3
## 4 D - Water Control Facilities                            72.3       59.3
## 5 E - Buildings and Equipment                             64.0       59.3
## 6 F - Utilities                                           88.5       59.3
## 7 G - Parks, Recreational Facilities, and Other Items     76.1       59.3
ggplot(CRCTime6)+
  geom_col(aes(x = fct_reorder(category,avg_time) , y = avg_time), alpha = 0.9, fill = '#3A6190')+
    geom_hline(aes(yintercept = mean_total),
            color = '#E75146')+
  coord_flip()+
  theme_minimal_vgrid()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
  labs(
        title = 'Average Time Spent\nFor Different\nDamage Categories\nat CRC Atlantic' , 
        x = 'Damage Category' , 
        y = '#of Days Spent'
      )+
     geom_label(aes(x = 'A - Debris Removal', y = mean_total, label = 'Mean time spent\n for all categories'))

If we jump across the States, to CRC Atlantic we see that average times to process shifted. CRC Atlantic spends most time on Category F followed by C and G. An interesting point is that Categories D and E nearly takes the same amount of time to be processed if compared to two precious CRC’s. We can see a slight increse in the average time required to process Category G. However there is a huge increse in the time taken to process categories F and C which is odd. This might be due to lack of professionals ir field officers regarding to that category. On the other hand, we see a sharp decrease in the average processing time of categories A and B which is very interesting. It is obvious that this CRC is a lot faster in some categories and a lot slower in some.

CRCTime7 <- FemaPlusTime %>% 
  filter(crc == "PA CRC West") %>%
  filter(!is.na(crc)) %>% 
  filter(!is.na(time)) %>%
  filter(time >0) %>% 
  group_by(category) %>% 
  summarise(avg_time = mean(time)) %>% 
    mutate(mean_total = mean(avg_time))
CRCTime7
## # A tibble: 7 x 3
##   category                                            avg_time mean_total
##   <chr>                                                  <dbl>      <dbl>
## 1 A - Debris Removal                                      38.9       53.4
## 2 B - Emergency Protective Measures                       35.3       53.4
## 3 C - Roads and Bridges                                   56.2       53.4
## 4 D - Water Control Facilities                            59.2       53.4
## 5 E - Buildings and Equipment                             66.5       53.4
## 6 F - Utilities                                           57.8       53.4
## 7 G - Parks, Recreational Facilities, and Other Items     60.1       53.4
ggplot(CRCTime7)+
  geom_col(aes(x = fct_reorder(category,avg_time) , y = avg_time),alpha = 0.9, fill = '#3A6190')+
    geom_hline(aes(yintercept = mean_total),
            color = '#E75146')+
  coord_flip()+
  theme_minimal_vgrid()+
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.05))) +
  labs(
        title = 'Average Time Spent\nFor Different\nDamage Categories\nat CRC West' , 
        x = 'Damage Category' , 
        y = '#of Days Spent'
      )+
     geom_label(aes(x = 'B - Emergency Protective Measures', y = mean_total, label = 'Mean time spent\n for all categories'))

Here in the last plot we see the average times taken to process at CRC West. This CRC looks similer to the Central and East CRC’s by eye inspection. We see same three categories(E,G and D) at the top of the plot as the ones that takes the most time to complete. Also the bottom three are alike as well as the top three. We can see that times taken for bottom three increased if compared to CRC Central but decreased for CRC East and Atlantic. CRC West spends the most time on Category E following by G and D, spends the least time in Category B following by A and C. If compared to Atlantic, we can see a dramatic increse in the average processing times of Category A and B projects however there is a sharp decrease in the time taken for Category F and C projects. This is an average CRC which means the processing times distributions are similer to the majority unlike CRC Atlantic.

Appendix

Name Description Type
Project # Project number dbl
EMMIE P/W System Identification number dbl
CRC CRC that processes project chr
Category Damage Category chr
Event Event occured that caused the project chr
Type Type of work chr
Sent to CRC Date Coming in to the CRC date
Sent to EHP Date Going out from CRC date
Large_Project Determines whether if the project is large or not logical
Approx Cost Approximate cost of the project dbl
CRC Net Cost Net cost calculated by CRC dbl
Federal Share Federal Share that is rewarded dbl
Process Step Determines whether if the project is obligated or not chr
Phase # Determines the step in the system chr
Applicant Project Status Type Status of the project chr
Alternate Project Count Determines whether it is an alternate project dbl
Expedited Project Count Determines whether it is an expedited project dbl

.