To explain the importance of this research question, I will follow up with another question: Do you know what a credit union is?
Before this year I had no clue what a credit union was and what the whole “credit union movement” was either.
Credit unions are financial institutions, similar to banks, but they are nonprofit entities that the members own. They aim to solely exist for the benefit of their members.
The purpose of this research question is to put a value to how much credit union actually benefit their local communities and their members. Using the Home Mortgage Disclosure Act Data I can compare banks to credit unions to see if credit unions provide better mortgage opportunities than banks. Of course, this is a very specific topic but it is the beginning to actually putting a value to credit unions and showing whether or not they actually are more beneficial than banks.
By this end of this report we definitely won’t settle whether or not credit unions are better than banks. However, we will be able to contribute to that debate and see whether credit unions are more inclined than banks to originate mortgage loans to their members.Link to my data source:
The data I am using comes from the Consumer Financial Protection Bureau (CFPB). Their data is technically processed by the CFPB and formatted in csv files and can be uploaded with filters of: “Mortgages for first lien, owner-occupied, 1-4 family homes”, “All originated mortgages”, and “All records.” However, they do not offer completely raw data of these reports in order to protect applicant and borrower privacy. I can also filter the data per state rather than nationwide. The Consumer Financial Protection Bureau has data collected under the House Mortgage Disclosure act from 2007-2017. The data is free and public, and should be highly reliable coming from the Consumer Financial Protection Bureau. The Federal Financial Institutions Examination Council (FFIEC) mandates certain institutions report their data each year, as of 2018, only credit unions with over $45 million in assets were mandated to report. Also, as of the information provided, the only empty slots they intentionally include in their data tables are when thee are blank spots because there were no edit failures in their loans.
When attempting to download all of the data, I realized the nationwide reports were way too large to load so I decided to download the data from DC spanning over the 11 years. HMDA data helps analysts look at market activity by lender, geography, race, gender, and household income.
For more information on the Home Mortgage Disclosure Act:
https://www.ffiec.gov/hmda/reporter.htm
The source also included their own explanations for the raw data variables in the following URLs to PDFs. However, below I include my own data dictionary using the information given.
HMDA Loan Application Register Format: https://files.consumerfinance.gov/hmda-historic-data-dictionaries/lar_record_format.pdf
HMDA Loan Application Register Code Sheet: https://files.consumerfinance.gov/hmda-historic-data-dictionaries/lar_record_codes.pdf
The raw data given had an immense amount of variables, and for the purpose of what I am trying to prove, I cut it down by a lot. To start, I filtered my data to only look at the agencies: Federal Deposit Insurance Corporation and the National Credit Union Administration. I do this because I only care to compare banks and credit unions and these are the two agencies that the types are held within. The variables I created across the numerous data sets I made were mean applicant income (of the loans originated), median applicant income (of the loans originated), total loans originated, and percent of loans originated. Originally, I was only interested in looking at the loans originated by the two agencies so I filtered out for that, and then continued to look at the entire applicant pool between the two institutions as I realized there were discrepancies between the two. To be more specific, banks in total reach out to more members/receive more applications than credit unions because they are more commercialized so it wasn’t fair to look at the total loans originated between the two.
In the results you will definitely see a progression of my data as when one graph points against my hypothesis that credit unions are more beneficial, I try to normalize the data and create different metrics to further explore.
allYearsBanks<- read_csv(here::here("data_processed", "allYearsBanks.csv"))
allYearsCU<- read_csv(here::here("data_processed", "allYearsCU.csv"))
Bank2007<- read_csv(here::here("data_processed", "Bank2007.csv"))
Bank2017<- read_csv(here::here("data_processed", "Bank2017.csv"))
CU2007<- read_csv(here::here("data_processed", "CU2007.csv"))
CU2017<- read_csv(here::here("data_processed", "CU2017.csv"))
#Summary of data
summary(allYearsBanks)
## year agency_code agency_name applicant_income_000s
## Min. :2007 Min. :3 Length:13003 Min. : 1.0
## 1st Qu.:2010 1st Qu.:3 Class :character 1st Qu.: 88.0
## Median :2013 Median :3 Mode :character Median : 133.0
## Mean :2012 Mean :3 Mean : 176.5
## 3rd Qu.:2015 3rd Qu.:3 3rd Qu.: 201.0
## Max. :2017 Max. :3 Max. :8008.0
##
## loan_amount_000s action_taken_name action_taken applicant_ethnicity_name
## Min. : 3.0 Length:13003 Min. :1 Length:13003
## 1st Qu.: 250.0 Class :character 1st Qu.:1 Class :character
## Median : 366.0 Mode :character Median :1 Mode :character
## Mean : 396.2 Mean :1
## 3rd Qu.: 508.0 3rd Qu.:1
## Max. :4000.0 Max. :1
##
## applicant_ethnicity applicant_sex hud_median_family_income
## Min. :1.000 Min. :1.000 Min. : 92600
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:101700
## Median :2.000 Median :1.000 Median :105700
## Mean :2.149 Mean :1.594 Mean :104136
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:107100
## Max. :4.000 Max. :4.000 Max. :109400
## NA's :1
summary(allYearsCU)
## year agency_code agency_name applicant_income_000s
## Min. :2007 Min. :5 Length:10925 Min. : 1
## 1st Qu.:2009 1st Qu.:5 Class :character 1st Qu.: 92
## Median :2012 Median :5 Mode :character Median : 140
## Mean :2012 Mean :5 Mean : 194
## 3rd Qu.:2015 3rd Qu.:5 3rd Qu.: 214
## Max. :2017 Max. :5 Max. :47000
##
## loan_amount_000s action_taken_name action_taken applicant_ethnicity_name
## Min. : 1.0 Length:10925 Min. :1 Length:10925
## 1st Qu.: 120.0 Class :character 1st Qu.:1 Class :character
## Median : 288.0 Mode :character Median :1 Mode :character
## Mean : 385.1 Mean :1
## 3rd Qu.: 445.0 3rd Qu.:1
## Max. :255000.0 Max. :1
##
## applicant_ethnicity applicant_sex hud_median_family_income
## Min. :1.000 Min. :1.000 Min. : 92600
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:100800
## Median :2.000 Median :2.000 Median :105700
## Mean :2.097 Mean :1.602 Mean :103181
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:105900
## Max. :4.000 Max. :4.000 Max. :109400
## NA's :15
Bank2007
## # A tibble: 864 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2007 3 Federal De… 120 400
## 2 2007 3 Federal De… 220 78
## 3 2007 3 Federal De… 246 210
## 4 2007 3 Federal De… 113 304
## 5 2007 3 Federal De… 58 200
## 6 2007 3 Federal De… 200 600
## 7 2007 3 Federal De… 149 500
## 8 2007 3 Federal De… 126 432
## 9 2007 3 Federal De… 76 258
## 10 2007 3 Federal De… 73 291
## # … with 854 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#864 loans originated by banks in 2007
Bank2017
## # A tibble: 657 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2017 3 Federal De… 157 304
## 2 2017 3 Federal De… 117 424
## 3 2017 3 Federal De… 127 548
## 4 2017 3 Federal De… 70 218
## 5 2017 3 Federal De… 114 570
## 6 2017 3 Federal De… 72 186
## 7 2017 3 Federal De… 43 138
## 8 2017 3 Federal De… 205 792
## 9 2017 3 Federal De… 204 647
## 10 2017 3 Federal De… 105 312
## # … with 647 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#647 loans originated by banks in 2017
bankGrowth <- round(100 * ((864 - 647) / 864), 2)
bankGrowth
## [1] 25.12
CU2007
## # A tibble: 1,015 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2007 5 National C… 85 36
## 2 2007 5 National C… 141 417
## 3 2007 5 National C… 82 255
## 4 2007 5 National C… 23 19
## 5 2007 5 National C… 54 200
## 6 2007 5 National C… 166 400
## 7 2007 5 National C… 203 585
## 8 2007 5 National C… 75 117
## 9 2007 5 National C… 193 675
## 10 2007 5 National C… 102 440
## # … with 1,005 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#1,015 loans originated
CU2017
## # A tibble: 900 x 11
## year agency_code agency_name applicant_incom… loan_amount_000s
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 2017 5 National C… 100 115
## 2 2017 5 National C… 345 750
## 3 2017 5 National C… 145 679
## 4 2017 5 National C… 215 675
## 5 2017 5 National C… 181 100
## 6 2017 5 National C… 29000 81000
## 7 2017 5 National C… 258 150
## 8 2017 5 National C… 59 34
## 9 2017 5 National C… 69 25
## 10 2017 5 National C… 85 255
## # … with 890 more rows, and 6 more variables: action_taken_name <chr>,
## # action_taken <dbl>, applicant_ethnicity_name <chr>,
## # applicant_ethnicity <dbl>, applicant_sex <dbl>,
## # hud_median_family_income <dbl>
#900 loans originated in 2017
CUGrowth <- round(100 * ((1015 - 900) / 1015), 2)
CUGrowth
## [1] 11.33
I had originally suspected that the amount of loans originated by credit unions over the years would be greater than the amount originated by banks. I had hoped credit unions would have a higher growth rate for loan origination that banks. When cleaning the data I filtered to only look at loans that were originated between credit unions and commercial banks and created numerous data sets that would look at the data for both banks and credit unions separately over the years. When calculating the rate of growth solely between the years 2007 and 2017, the growth rate does not support my original suspicion. The growth rate of loans originated by credit union institutions is 11.33% while the growth rate for banks is 25.12%.
By looking at the numbers of loans originated for both 2007 and 2017, however, credit unions do originate more loans than banks each year. Also, looking at the summaries for average income of applicants for 2017(mean income = $ 164,000), we see that on average credit unions originate loans for applicants with higher income status than banks do (mean income = $365,000). This does not support that more recently credit unions have been taking riskier loans, meaning offering loans to more low-income households to support their members.
As you can see in the summaries of AllYearsBanks and AllYearsCreditUnions, more loans are originated as the years go on, indicated by min and max years. Therefore, thee is a positive relationship between year and number of loans originated for both banks and credit unions.
As of right now, the data analyzed shows that credit unions do not provide more mortgage loan opportunities than banks. However, I believe it is important to next measure their rates for funding loans over total applicants as bank institutions do tend to have more total members than credit unions.
I wanted to begin with looking at the total loans originated between the two institutions and below are my results.
totalLoansOverYears <- read_csv(here::here("data_processed", "totalLoansOverYears.csv"))
totalLoanOverYearsChart <- ggplot(totalLoansOverYears) +
geom_line(aes(x = year, y = n, group = agency_code, color = agency_name))+
geom_point(aes(x = year, y = n, group = agency_code, color = agency_name), size = 0.3)+
labs(
title = 'Total Loan Origination Between DC Banks and Credit Unions\n2007-2017',
x = 'Year',
y = 'Total Amount of Loans Originated',
caption = "Source: Consumer Financial Protection Bureau - 2007-17",
color = 'Agency Name'
)+
scale_x_continuous(limits=c(2007, 2018), breaks=c(2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017))+
geom_vline(xintercept = 2008, col = 'darkgrey',
linetype = 'dashed')+
geom_vline(xintercept = 2016, col = 'darkgrey',
linetype = 'dashed')+
annotate('text', x = 2008.5, y = 700,
label = 'Very low drop \nin loan originations \nin 2008 \nthe year of the\nhousing crisis.',
color = 'black', size = 2, hjust = 0)+
annotate('text', x = 2014, y = 1700,
label = 'Huge spike \nin loan\noriginations \nin 2016.',
color = 'black', size = 2, hjust = 0)
ggsave(filename = file.path('images', 'totalLoanOverYearsChart.png'),
plot = totalLoanOverYearsChart,
width = 10,
height = 6)
totalLoanOverYearsChart
This chart clearly shows that both groups of institutions originate, or fund, loans more as the years go on. The chart directly goes against my expectations as it shows banks do originate more loans than credit unions over the years. However, I don’t think total loans is a great measure of this data as it could just be due to banks reaching a larger customer base. When polishing my data more I will create a new variable that looks at the loan originations over the total loans applied for. Therefore, we can compare the rates at which banks and credit unions originate loan offers. Also important to note the spikes and drops of loans originated over the year. Of course, a drop occurs in 2008 during the housing crisis. There is also an interesting spike in 2016. Through research, I found the amount of mortgage originations (in total among all institutions) rose 13%, and this can be attributed to the rising housing prices over the years. According to Business Insider, 2016 was the best year for the housing market in general since the 2008 crisis which attributes to the spike in loans originated. Also, according to the federal reserve, mortgage rates hovered right above the historically low rates in 2012 and remained low until they spiked in the November election. This is probably what attributed to the drop in loans originated in 2017. (Sources cited at the end of report)
As you can see, the first visualization definitely did not prove that credit unions provided better mortgage opportunities for their members. I realized in general, banks may have more members than credit unions and therefore I should create a variable to normalize the data. You will see below that I measured the percent of loans originated over the total applicant pool for each institution over the years. To my dismay, it did not show much of a difference from the first graph.
ratesBoth <- read_csv(here::here("data_processed", "ratesBoth.csv"))
totalRateOverYearsChart <- ggplot(ratesBoth, aes(x = year, y = rate)) +
geom_line(aes(group = agency_code, color = agency_name))+
geom_point(aes(group = agency_code, color = agency_name), size = 0.3)+
theme_minimal()+
labs(
title = 'Percent of Loans Originated\nAmong DC Credit Unions vs Banks\n2007-2017',
x = 'Year',
y = 'Percent of Loans Originated\nOver Total Applications',
caption = "Source: Consumer Financial Protection Bureau - 2007-17",
color = "Agency"
)+
geom_vline(xintercept = 2012, col = 'darkgrey',
linetype = 'dashed')+
annotate('text', x = 2012.5, y = 60,
label = 'Spike \nin loan orginations \nin 2012.',
color = 'black', size = 2, hjust = 0)+
scale_x_continuous(limits=c(2007, 2017), breaks=c(2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017))
totalRateOverYearsChart
Even when the two institutions are put on even playing fields, the data shows that banks tend to provide more mortgage loan originations than credit unions. As explained before, according to the Federal Reserve, mortgage interest rates hit record lows in 2012 which explains for the increase in loan originations that year.
Therefore, I found I should possibly look at a different metric to compare the data and hopefully attribute to the answer I expected. I created graphs looking at the applicant income of loans originated between the two institutions.
averageIncomeAcceptance <- ggplot(totalLoansOverYears) +
geom_col(aes(x = year, y = meanApplicantIncome, color = agency_name))+
facet_wrap(~agency_name)+
labs(x = "Year", y = "Mean Applicant Income", title = "Average Income of DC Loan Applicants\n2007-2017",
colour = NULL)+
theme_minimal()+
theme(legend.position = 'none')+
theme(axis.text.x = element_text(angle = 60, hjust = 1))+
scale_x_continuous(limits=c(2006, 2018), breaks=c(2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017))
averageIncomeAcceptance
ggsave(filename = file.path('data_processed', 'averageIncomeAcceptance.pdf'),
plot = averageIncomeAcceptance,
width = 10,
height = 6)
ggsave(filename = file.path('images', 'averageIncomeAcceptance.png'),
plot = averageIncomeAcceptance,
width = 10,
height = 6)
The two entities compared in this chart are the two agencies of institutions and the variables highlighted in this chart are average applicant income over year. By faceting the charts, you can see that overall the average applicant income whose loans were originated are much lower for credit unions over the years 2010-2016. However, there is a huge difference in the years 2017 and 2008 where banks tend to originate loans for applicants with lower income.
The average income fluctuated quite frequently so I found it useful to continue on to graph the median income among applicants whose loans were originated over the years. I also improved on my visualizations by using dodged bars in this next visualization.
loansOrig7_17 <- read_csv(here::here("data_processed", "loansOrig7_17.csv"))
ggplot(loansOrig7_17) +
geom_col(aes(x = as.factor(year),
y = median_income, fill = agency_name),
position = 'dodge',
width = 0.7, alpha = 0.8)+
coord_flip() +
theme_minimal_vgrid() +
labs(x = 'Year',
y = 'Median Income (in thousands)',
fill = 'Agency',
title = 'Median Income of Loans Orginated\nBetween Banks and Credit Unions\n2007-2017')
As seen with my data, nothing supports my original expectations of credit unions significantly providing better mortgage opportunities than banks. In this case, the answer to my research question is a clear no. I am shocked to say the least, but as they say “the facts don’t lie.”
In the future I can definitely explore different variables the raw data includes. In my report I focus on purely numbers but it might be interesting to look at the differences among loan applicants with varying ethnicity and race. Another variable to look at could be applicant sex, I could facet graphs to see the relationship between loans originated and those variables.
In conclusion, the data shown completely shocked my expectation and as of right now points to the answer: no, credit unions don’t seem as beneficial as banks in providing mortgage opportunities. # Appendix
For further detail on each variable here is a URL with the source: https://files.consumerfinance.gov/hmda-historic-data-dictionaries/lar_record_codes.pdf
Variable Name | Variable Type | Variable Description |
---|---|---|
As of Year | Numeric | The year data was reported |
Respondent ID | Alphanumeric | 10 Character Identifier for each institution |
Agency Code | Alphanumeric | Each agency the institution falls under is assigned a number: 1 = Office of the Comptroller of the Currency (OCC), Federal Reserve System (FRS), 3 = Federal Deposit Insurance Corporation (FDIC), 5 = National Credit Union Administration (NCUA), 7 = Department of Housing and Urban Development (HUD), 9 = Consumer Financial Protection Bureau (CFPB) |
Loan Type | Numeric | Each type is assigned a number:1 = Conventional (any loan other than FHA, VA, FSA, or RHS loans), 2 = FHA-insured (Federal Housing Administration), 3 = VA-guaranteed (Veterans Administration), 4 = FSA/RHS (Farm Service Agency or Rural Housing Service) |
Property Type | Alphanumeric | Each property type is assigned a number |
Loan Purpose | Numeric | Each loan purpose is assigned a number : 1 = House purchase, 2 = Home improvement, 3 = Refinancing |
Occupancy | Numeric | Each owner occupancy is assigned a number: 1 = Owner occupied as a principal dwelling, 2 = not owner occupied, 3 = not applicable |
Loan Amount (000s) | Numeric | In thousands of dollars |
Preapproval | Alphanumeric | 1 = preapproval was request, 2 = preapproval was not requested, 3 = not applicable |
Action Type | Numeric | Action taken once the loan was applied for. 1 = loan originated, 2 = Application approved but not accepted, 3 = Application denied by financial institution, 4 = Application withdrawn by applicant, 5 = File closed for incompleteness, 6 = Loan purchased by the institution, 7 = Preapproval request denied by financial institution, 8 = Preapproval request approved but not accepted (optional reporting) |
MSA/MD | Alphanumeric | Metropolitan Statistical Area/Metropolitan Division |
State Code | Alphanumeric | Two-digit FIPS state identifier |
County Code | Alphanumeric | Three-digit FIPS county identifier |
Tract | Alphanumeric | Census tract number |
Applicant Ethnicity | Alphanumeric | Each ethnicity is assigned a number: 1 = Hispanic or Latino, 2 = Not Hispanic or Latino, 3 = - Information not provided by applicant in mail, Internet, or telephone application, 4 = Not applicable, 5 = No co-applicant |
Co Applicant Ethnicity | Alphanumeric | Each co applicant ethnicity is assigned a number |
Applicant Race 1 | Alphanumeric | Each race is assigned a variable number: 1 = American Indian or Alaska Native |
Applicant Race 2 | Alphanumeric | Asian |
Applicant Race 3 | Alphanumeric | Black or African American |
Applicant Race 4 | Alphanumeric | Native Hawaiian or Other Pacific Islander |
Applicant Race 5 | Alphanumeric | White |
Co Applicant Race 1 | Alphanumeric | Each co applicant race is assigned a variable number: 1 = American Indian or Alaska Native |
Co Applicant Race 2 | Alphanumeric | Asian |
Co Applicant Race 3 | Alphanumeric | Black or African American |
Co Applicant Race 4 | Alphanumeric | Native Hawaiian or Other Pacific Islander |
Co Applicant Race 5 | Alphanumeric | White |
Applicant Sex | Numeric | Each applicant sex is assigned a number: 1 = male, 2 = female, 3 = information not provided, 4 = not applicable, 5 = no co-applicant |
Co Applicant Sex | Numeric | Each co applicant sex is assigned a number: |
Applicant Income (000s) | Alphanumeric | In thousands of dollars |
Purchaser Type | Alphanumeric | Each type of purchaser is assigned a number (ex: 1 = Fannie Mae) |
Denial Reason 1 | Alphanumeric | Each loan denial is assigned a number / its own variable: 1 = Debt-to-income ratio |
Denial Reason 2 | Alphanumeric | Employment history |
Denial Reason 3 | Alphanumeric | Credit history |
HOEPA Status | Alphanumeric | 1 = HOEPA loan, 2 = Not a HOEPA loan |
Lien Status | Alphanumeric | (only for applications and origination) Each status is assigned a number (ex: 1 = Secured by a first lien) |
Edit Status | Alphanumeric | Each edit is assigned a number (ex: 5 = validity edit failure only) |
Sequence Number | Alphanumeric | One-up number scheme for each respondent |
Population | Alphanumeric | Total population in tract |
Minority Population % | Alphanumeric | Percentage of minority population to total population for tract |
FFIEC Median Family Income | Alphanumeric | In dollars |
Tract to MSA/MD Income % | Alphanumeric | Percentage of tract median family income compared to MSA/MD median family income |
Number of Owner-occupied units | Alphanumeric | Number of dwellings that are lived in by the owner |
Number of 1-to 4-Family units | Alphanumeric | Number of dwellings built to house fewer than 5 families |
Application Date Indicator | Numeric | Whether or not the application was filled before or after 01-01-2004 |
This is to show an example of one of the datasets I used and how I drastically gutted the raw data so I could utilize the relevant variables for my report.
Variable Name | Variable Type | Variable Description |
---|---|---|
year | double | years 2007-2017 |
n | double | total number of loans originated for each institution per year |
agency_code | agency code, either 3 or 5 | |
agency_name | character | FDIC or NCUA |
applicant_income_000s | double | in thousands of dollars |
meanApplicantIncome | double | in thousands of dollars |
Bhutta, Neil, et al. “Residential Mortgage Lending in 2016: Evidence from the Home Mortgage Disclosure Act Data - November 2017.” Board of Governors of the Federal Reserve System, Nov. 2017, www.federalreserve.gov/publications/2017-november-residential-mortgage-lending-in-2016.htm.
Credit Unions Vs. Banks — US Community Credit Union. US Community Credit Union, www.usccu.org/news/credit-unions-vs-banks.
“Home Mortgage Disclosure Act.” Home Mortgage Disclosure Act, FFEIC, www.ffiec.gov/hmda/.
Oyedele, Akin. 2016 was the best year for the US housing market since the financial crisis. Business Insider, Insider, 26 Jan. 2017, www.businessinsider.com/new-home-sales-december-2017-2017-1. Accessed 4 May 2020.