To explain the importance of this research question, I will follow up with another question: Do you know what a credit union is?
Before this year I had no clue what a credit union was and what the whole “credit union movement” was either.
Credit unions are financial institutions, similar to banks, but they are nonprofit entities that the members own. They aim to solely exist for the benefit of their members.
The purpose of this research question is to put a value to how much credit union actually benefit their local communities and their members. Using the Home Mortgage Disclosure Act Data I can compare banks to credit unions to see if credit unions provide better mortgage opportunities than banks. Of course, this is a very specific topic but it is the beginning to actually putting a value to credit unions and showing whether or not they actually are more beneficial than banks.
By this end of this report we definitely won’t settle whether or not credit unions are better than banks. However, we will be able to contribute to that debate and see whether credit unions are more inclined than banks to originate mortgage loans to their members.Link to my data source:
The data I am using comes from the Consumer Financial Protection Bureau (CFPB). Their data is technically processed by the CFPB and formatted in csv files and can be uploaded with filters of: “Mortgages for first lien, owner-occupied, 1-4 family homes”, “All originated mortgages”, and “All records.” However, they do not offer completely raw data of these reports in order to protect applicant and borrower privacy. I can also filter the data per state rather than nationwide. The Consumer Financial Protection Bureau has data collected under the House Mortgage Disclosure act from 2007-2017. The data is free and public, and should be highly reliable coming from the Consumer Financial Protection Bureau. The Federal Financial Institutions Examination Council (FFIEC) mandates certain institutions report their data each year, as of 2018, only credit unions with over $45 million in assets were mandated to report. Also, as of the information provided, the only empty slots they intentionally include in their data tables are when thee are blank spots because there were no edit failures in their loans.
When attempting to download all of the data, I realized the nationwide reports were way too large to load so I decided to download the data from DC spanning over the 11 years. HMDA data helps analysts look at market activity by lender, geography, race, gender, and household income.
For more information on the Home Mortgage Disclosure Act:
https://www.ffiec.gov/hmda/reporter.htm
The source also included their own explanations for the raw data variables in the following URLs to PDFs. However, below I include my own data dictionary using the information given.
HMDA Loan Application Register Format: https://files.consumerfinance.gov/hmda-historic-data-dictionaries/lar_record_format.pdf
HMDA Loan Application Register Code Sheet: https://files.consumerfinance.gov/hmda-historic-data-dictionaries/lar_record_codes.pdf
The raw data given had an immense amount of variables, and for the purpose of what I am trying to prove, I cut it down by a lot. To start, I filtered my data to only look at the agencies: Federal Deposit Insurance Corporation and the National Credit Union Administration. I do this because I only care to compare banks and credit unions and these are the two agencies that the types are held within. The variables I created across the numerous data sets I made were mean applicant income (of the loans originated), median applicant income (of the loans originated), total loans originated, and percent of loans originated. Originally, I was only interested in looking at the loans originated by the two agencies so I filtered out for that, and then continued to look at the entire applicant pool between the two institutions as I realized there were discrepancies between the two. To be more specific, banks in total reach out to more members/receive more applications than credit unions because they are more commercialized so it wasn’t fair to look at the total loans originated between the two.
In the results you will definitely see a progression of my data as when one graph points against my hypothesis that credit unions are more beneficial, I try to normalize the data and create different metrics to further explore.
allYearsBanks<- read_csv(here::here("data_processed", "allYearsBanks.csv"))
allYearsCU<- read_csv(here::here("data_processed", "allYearsCU.csv"))
Bank2007<- read_csv(here::here("data_processed", "Bank2007.csv"))
Bank2017<- read_csv(here::here("data_processed", "Bank2017.csv"))
CU2007<- read_csv(here::here("data_processed", "CU2007.csv"))
CU2017<- read_csv(here::here("data_processed", "CU2017.csv"))
#Summary of data
summary(allYearsBanks)
## year agency_code agency_name applicant_income_000s
## Min. :2007 Min. :3 Length:13003 Min. : 1.0
## 1st Qu.:2010 1st Qu.:3 Class :character 1st Qu.: 88.0
## Median :2013 Median :3 Mode :character Median : 133.0
## Mean :2012 Mean :3 Mean : 176.5
## 3rd Qu.:2015 3rd Qu.:3 3rd Qu.: 201.0
## Max. :2017 Max. :3 Max. :8008.0
##
## loan_amount_000s action_taken_name action_taken applicant_ethnicity_name
## Min. : 3.0 Length:13003 Min. :1 Length:13003
## 1st Qu.: 250.0 Class :character 1st Qu.:1 Class :character
## Median : 366.0 Mode :character Median :1 Mode :character
## Mean : 396.2 Mean :1
## 3rd Qu.: 508.0 3rd Qu.:1
## Max. :4000.0 Max. :1
##
## applicant_ethnicity applicant_sex hud_median_family_income
## Min. :1.000 Min. :1.000 Min. : 92600
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:101700
## Median :2.000 Median :1.000 Median :105700
## Mean :2.149 Mean :1.594 Mean :104136
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:107100
## Max. :4.000 Max. :4.000 Max. :109400
## NA's :1
summary(allYearsCU)
## year agency_code agency_name applicant_income_000s
## Min. :2007 Min. :5 Length:10925 Min. : 1
## 1st Qu.:2009 1st Qu.:5 Class :character 1st Qu.: 92
## Median :2012 Median :5 Mode :character Median : 140
## Mean :2012 Mean :5 Mean : 194
## 3rd Qu.:2015 3rd Qu.:5 3rd Qu.: 214
## Max. :2017 Max. :5 Max. :47000
##
## loan_amount_000s action_taken_name action_taken applicant_ethnicity_name
## Min. : 1.0 Length:10925 Min. :1 Length:10925
## 1st Qu.: 120.0 Class :character 1st Qu.:1 Class :character
## Median : 288.0 Mode :character Median :1 Mode :character
## Mean : 385.1 Mean :1
## 3rd Qu.: 445.0 3rd Qu.:1
## Max. :255000.0 Max. :1
##
## applicant_ethnicity applicant_sex hud_median_family_income
## Min. :1.000 Min. :1.000 Min. : 92600
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:100800
## Median :2.000 Median :2.000 Median :105700
## Mean :2.097 Mean :1.602 Mean :103181
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:105900
## Max. :4.000 Max. :4.000 Max. :109400
## NA's :15
Bank2007
## # A tibble: 864 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2007 3 Federal De… 120 400
## 2 2007 3 Federal De… 220 78
## 3 2007 3 Federal De… 246 210
## 4 2007 3 Federal De… 113 304
## 5 2007 3 Federal De… 58 200
## 6 2007 3 Federal De… 200 600
## 7 2007 3 Federal De… 149 500
## 8 2007 3 Federal De… 126 432
## 9 2007 3 Federal De… 76 258
## 10 2007 3 Federal De… 73 291
## # … with 854 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#864 loans originated by banks in 2007
Bank2017
## # A tibble: 657 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2017 3 Federal De… 157 304
## 2 2017 3 Federal De… 117 424
## 3 2017 3 Federal De… 127 548
## 4 2017 3 Federal De… 70 218
## 5 2017 3 Federal De… 114 570
## 6 2017 3 Federal De… 72 186
## 7 2017 3 Federal De… 43 138
## 8 2017 3 Federal De… 205 792
## 9 2017 3 Federal De… 204 647
## 10 2017 3 Federal De… 105 312
## # … with 647 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#647 loans originated by banks in 2017
bankGrowth <- round(100 * ((864 - 647) / 864), 2)
bankGrowth
## [1] 25.12
CU2007
## # A tibble: 1,015 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2007 5 National C… 85 36
## 2 2007 5 National C… 141 417
## 3 2007 5 National C… 82 255
## 4 2007 5 National C… 23 19
## 5 2007 5 National C… 54 200
## 6 2007 5 National C… 166 400
## 7 2007 5 National C… 203 585
## 8 2007 5 National C… 75 117
## 9 2007 5 National C… 193 675
## 10 2007 5 National C… 102 440
## # … with 1,005 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#1,015 loans originated
CU2017
## # A tibble: 900 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2017 5 National C… 100 115
## 2 2017 5 National C… 345 750
## 3 2017 5 National C… 145 679
## 4 2017 5 National C… 215 675
## 5 2017 5 National C… 181 100
## 6 2017 5 National C… 29000 81000
## 7 2017 5 National C… 258 150
## 8 2017 5 National C… 59 34
## 9 2017 5 National C… 69 25
## 10 2017 5 National C… 85 255
## # … with 890 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#900 loans originated in 2017
CUGrowth <- round(100 * ((1015 - 900) / 1015), 2)
CUGrowth
## [1] 11.33
I had originally suspected that the amount of loans originated by credit unions over the years would be greater than the amount originated by banks. I had hoped credit unions would have a higher growth rate for loan origination that banks. When cleaning the data I filtered to only look at loans that were originated between credit unions and commercial banks and created numerous data sets that would look at the data for both banks and credit unions separately over the years. When calculating the rate of growth solely between the years 2007 and 2017, the growth rate does not support my original suspicion. The growth rate of loans originated by credit union institutions is 11.33% while the growth rate for banks is 25.12%.
By looking at the numbers of loans originated for both 2007 and 2017, however, credit unions do originate more loans than banks each year. Also, looking at the summaries for average income of applicants for 2017(mean income = $ 164,000), we see that on average credit unions originate loans for applicants with higher income status than banks do (mean income = $365,000). This does not support that more recently credit unions have been taking riskier loans, meaning offering loans to more low-income households to support their members.
As you can see in the summaries of AllYearsBanks and AllYearsCreditUnions, more loans are originated as the years go on, indicated by min and max years. Therefore, thee is a positive relationship between year and number of loans originated for both banks and credit unions.
As of right now, the data analyzed shows that credit unions do not provide more mortgage loan opportunities than banks. However, I believe it is important to next measure their rates for funding loans over total applicants as bank institutions do tend to have more total members than credit unions.
I wanted to begin with looking at the total loans originated between the two institutions and below are my results.
totalLoansOverYears <- read_csv(here::here("data_processed", "totalLoansOverYears.csv"))
totalLoanOverYearsChart <- ggplot(totalLoansOverYears) +
geom_line(aes(x = year, y = n, group = agency_code, color = agency_name))+
geom_point(aes(x = year, y = n, group = agency_code, color = agency_name), size = 0.3)+
labs(
title = 'Total Loan Origination Between DC Banks and Credit Unions\n2007-2017',
x = 'Year',
y = 'Total Amount of Loans Originated',
caption = "Source: Consumer Financial Protection Bureau - 2007-17",
color = 'Agency Name'
)+
scale_x_continuous(limits=c(2007, 2018), breaks=c(2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017))+
geom_vline(xintercept = 2008, col = 'darkgrey',
linetype = 'dashed')+
geom_vline(xintercept = 2016, col = 'darkgrey',
linetype = 'dashed')+
annotate('text', x = 2008.5, y = 700,
label = 'Very low drop \nin loan originations \nin 2008 \nthe year of the\nhousing crisis.',
color = 'black', size = 2, hjust = 0)+
annotate('text', x = 2014, y = 1700,
label = 'Huge spike \nin loan\noriginations \nin 2016.',
color = 'black', size = 2, hjust = 0)
ggsave(filename = file.path('images', 'totalLoanOverYearsChart.png'),
plot = totalLoanOverYearsChart,
width = 10,
height = 6)
totalLoanOverYearsChart
This chart clearly shows that both groups of institutions originate, or fund, loans more as the years go on. The chart directly goes against my expectations as it shows banks do originate more loans than credit unions over the years. However, I don’t think total loans is a great measure of this data as it could just be due to banks reaching a larger customer base. When polishing my data more I will create a new variable that looks at the loan originations over the total loans applied for. Therefore, we can compare the rates at which banks and credit unions originate loan offers. Also important to note the spikes and drops of loans originated over the year. Of course, a drop occurs in 2008 during the housing crisis. There is also an interesting spike in 2016. Through research, I found the amount of mortgage originations (in total among all institutions) rose 13%, and this can be attributed to the rising housing prices over the years. According to Business Insider, 2016 was the best year for the housing market in general since the 2008 crisis which attributes to the spike in loans originated. Also, according to the federal reserve, mortgage rates hovered right above the historically low rates in 2012 and remained low until they spiked in the November election. This is probably what attributed to the drop in loans originated in 2017. (Sources cited at the end of report)
As you can see, the first visualization definitely did not prove that credit unions provided better mortgage opportunities for their members. I realized in general, banks may have more members than credit unions and therefore I should create a variable to normalize the data. You will see below that I measured the percent of loans originated over the total applicant pool for each institution over the years. To my dismay, it did not show much of a difference from the first graph.
ratesBoth <- read_csv(here::here("data_processed", "ratesBoth.csv"))
totalRateOverYearsChart <- ggplot(ratesBoth, aes(x = year, y = rate)) +
geom_line(aes(group = agency_code, color = agency_name))+
geom_point(aes(group = agency_code, color = agency_name), size = 0.3)+
theme_minimal()+
labs(
title = 'Percent of Loans Originated\nAmong DC Credit Unions vs Banks\n2007-2017',
x = 'Year',
y = 'Percent of Loans Originated\nOver Total Applications',
caption = "Source: Consumer Financial Protection Bureau - 2007-17",
color = "Agency"
)+
geom_vline(xintercept = 2012, col = 'darkgrey',
linetype = 'dashed')+
annotate('text', x = 2012.5, y = 60,
label = 'Spike \nin loan orginations \nin 2012.',
color = 'black', size = 2, hjust = 0)+
scale_x_continuous(limits=c(2007, 2017), breaks=c(2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017))
totalRateOverYearsChart