Research Question:

How does FEMA (Federal Emergency Management Agency), complete disaster relief projects that are assigned to different CRC’s (Consolidated Recourse Centers) throughout the country?

About the Data:

Our data is a Microsoft Excel spreadsheet that has been provided to us by a representative that works for FEMA. Because the data has been directly given to us by Julia Moline, we know that the data is from a primary source and processed. Julia currently works for FEMA and was kind enough to present us with this data set upon request. We also know from her description, that it is original data. The data consists of 19 variables describing some 119,959 projects that have been assigned to FEMA. The variables include the project number, Consolidated Resource Center (CRC) assigned to, category of project (i.e Debris Removal, Emergency Protective Measures, Utilities), net cost, federal share, project status type, and many more. The data covers all projects FEMA processes from the years 2016-2020. It is a private data set that FEMA internally collects to track all projects that come through FEMA.

Describing the Data:

The main variables of interest in our project were CRC Net Cost, Sent to CRC date, Sent to EHP date, Catagory, and CRC. Our analysis begins exploring the money that FEMA spends within each CRC. To do this, both the “CRC variable”, which describes which CRC the project went through, and the “CRC Net Cost” variable, which gives the total USD amount spent on that project, are crucial. As we move into analyzing how many projects are processed, the “Catagory” variable is important to make conclusions regarding the workload FEMA is seeing. Lastly, when analyzing the time projects take to process, the “Sent to CRC” and “Sent to EHP” variables are important. “Sent to CRC” essentially represents the date that FEMA recieved the project and “Sent to EHP” represents the date the project was completed.

Load all the data sets:

xlsxPath <- here::here('data_raw', 'Copy of Data for S20.xlsm')
FemaData <- read_excel(xlsxPath, sheet = 'ProjectsQuery')
glimpse(FemaData)
## Observations: 119,959
## Variables: 19
## $ `Project #`                   <dbl> 550, 541, 597, 665, 646, 376, 377,…
## $ `EMMIE P/W #`                 <dbl> 437, 200, 339, NA, 248, 222, 379, …
## $ CRC                           <chr> "PA CRC Central", "PA CRC Central"…
## $ Category                      <chr> "B - Emergency Protective Measures…
## $ Event                         <chr> "4284DR-GA (4284DR)", "4284DR-GA (…
## $ Type                          <chr> "Work Completed / Fully Documented…
## $ `Sent to CRC Date`            <dttm> 2017-03-22 18:05:06, 2017-02-17 2…
## $ `Sent to EHP Date`            <dttm> 2017-05-04 16:11:54, 2017-03-21 1…
## $ Large_Project                 <lgl> TRUE, TRUE, TRUE, FALSE, TRUE, TRU…
## $ `Approx Cost`                 <dbl> 489654.27, 13077807.05, 3730869.57…
## $ `CRC Net Cost`                <dbl> 489654.27, 13078205.08, 3730869.57…
## $ `Federal Share`               <dbl> 367240.703, 10462564.064, 2798152.…
## $ version                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Process Step`                <chr> "Obligated", "Obligated", "Obligat…
## $ `Phase #`                     <chr> "Phase: 5", "Phase: 5", "Phase: 5"…
## $ Applicant_Project_Status_Type <chr> "Active", "Active", "Active", "Del…
## $ `Alternate Project Count`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Self-Cert Count`             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ `Expedited Project Count`     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Filtering out the discontinued projects:

filteredFemaData <- FemaData %>%
  clean_names(case = 'lower_camel') %>% 
  filter(str_detect(processStep, "Process Discontinued") == FALSE) %>%
  filter(!is.na(crc))
glimpse(filteredFemaData)
## Observations: 69,584
## Variables: 19
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
unique(filteredFemaData$crc)
## [1] "PA CRC Central"  "PA CRC East"     "PA CRC West"     "PA CRC Atlantic"
FemaPlusTime <- filteredFemaData %>%
  mutate(time = as.double(sentToEhpDate - sentToCrcDate)/(3600*24)) 
    
glimpse(FemaPlusTime)
## Observations: 69,584
## Variables: 20
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ time                       <dbl> 42.921387, 31.950747, 12.775515, NA, …

Following the Money

# total cost of each crc per year

glimpse(filteredFemaData)
## Observations: 69,584
## Variables: 19
## $ projectNumber              <dbl> 550, 541, 597, 646, 376, 377, 306, 24…
## $ emmiePWNumber              <dbl> 437, 200, 339, 248, 222, 379, 213, 10…
## $ crc                        <chr> "PA CRC Central", "PA CRC Central", "…
## $ category                   <chr> "B - Emergency Protective Measures", …
## $ event                      <chr> "4284DR-GA (4284DR)", "4284DR-GA (428…
## $ type                       <chr> "Work Completed / Fully Documented", …
## $ sentToCrcDate              <dttm> 2017-03-22 18:05:06, 2017-02-17 21:0…
## $ sentToEhpDate              <dttm> 2017-05-04 16:11:54, 2017-03-21 19:5…
## $ largeProject               <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, …
## $ approxCost                 <dbl> 489654.27, 13077807.05, 3730869.57, 1…
## $ crcNetCost                 <dbl> 489654.27, 13078205.08, 3730869.57, 1…
## $ federalShare               <dbl> 367240.703, 10462564.064, 2798152.178…
## $ version                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ processStep                <chr> "Obligated", "Obligated", "Obligated"…
## $ phaseNumber                <chr> "Phase: 5", "Phase: 5", "Phase: 5", "…
## $ applicantProjectStatusType <chr> "Active", "Active", "Active", "Active…
## $ alternateProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ selfCertCount              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ expeditedProjectCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
CRCYearCost <- filteredFemaData %>% 
    filter(!is.na(sentToCrcDate)) %>% 
    filter(!is.na(crcNetCost)) %>%
    mutate(year = year(sentToCrcDate)) %>% 
    group_by(year) %>% 
    summarise(totalCost = sum(crcNetCost)/1000000000)


CRCYearCostPerCRC <- filteredFemaData %>%
    filter(!is.na(sentToCrcDate)) %>% 
    filter(!is.na(crcNetCost)) %>%
    group_by(year(sentToCrcDate), crc) %>%
    summarise(totalCost = sum(crcNetCost)/1000000000)


label <- ("2019 had a drastically higher 
total cost compared to 
other years. Why?")

glimpse(CRCYearCost)
## Observations: 5
## Variables: 2
## $ year      <dbl> 2016, 2017, 2018, 2019, 2020
## $ totalCost <dbl> 0.00184426, 0.29458422, 3.10428014, 18.45513223, 0.497…
ggplot(CRCYearCost) +
    geom_area(aes(x = year, y = totalCost), alpha = 0.9, fill = '#3A6190') +
    geom_line(aes(x = 2019, y = totalCost - 0.25), color = '#E75146', lwd = 2 ) +
    geom_line(aes(x = year, y = totalCost)) +
    scale_x_continuous(expand = expand_scale(mult = c(0, 0.05))) +
    theme_minimal_hgrid() + 
    labs(x = 'Year',
         y = 'Total Cost (in billions USD)', 
         title = 'Total CRC Net Cost per Year',
         subtitle = 'From the years 2016-2020') +
    geom_label(aes(x = 2017, y = 12.5, label = label), hjust = 0)

This graph is displaying the summed net cost accross all four CRCs for each given year. The data shows that there is a significant jump in the net cost in 2019 that reaches a wopping 18.5 billion dollars spent. In the next graph, we will investigate why this has happened, and what factors contribute to the influx.

CRCMoney <- filteredFemaData %>%
    filter(!is.na(crcNetCost)) %>%
    filter(!is.na(crc)) %>% 
    group_by(crc) %>%
    summarise(total = sum(crcNetCost)/1000000000)  
    
   glimpse(CRCMoney)
## Observations: 4
## Variables: 2
## $ crc   <chr> "PA CRC Atlantic", "PA CRC Central", "PA CRC East", "PA CR…
## $ total <dbl> 11.005482, 5.677476, 4.271855, 2.259152
ggplot(CRCMoney) +
    geom_col(aes(x = fct_reorder(crc,total), y = total), alpha = 0.9, fill = '#3A6190') +
    coord_flip() +
    scale_y_continuous(expand = expand_scale(mult = c(0, 0.05)),
                     breaks = c(0, 3, 6, 9, 12)) +
    theme_minimal_vgrid() +
    labs(
          x = "CRC" , 
          y = "Total Money Spent (in billions USD)",
          title = 'Total Money Spent per CRC',
          subtitle = 'From the years 2016-2020'
      )