Intro
- Credit Risk - The likelihood that a borrower would not repay their loan to the lender (not receive owed principal and interest)
- Collection costs
- Default - A borrower not being able to repay their debt
- Lenders must assess credit risk associated with each borrower
- Collaterals
- Increase the interest rate (Risk-based pricing)
- Expected loss (EL) - PG, LGD, EAD
- UL - Unexpected losses - result of adverse economic circumstances
- SL - Exceptional (stress) losses - result of severe economic downturn
- There is certain amount of credit risk associated with every borrower
- Estimating expected loss (expected credit loss) - The amount a lender might lose by lending to a borrower
- Probability of Default (PD) - The borrowers inability to repay their debt in full or on time
- Loss Given Default (LGD) - The proportion of the total exposure that cannot be recovered by the lender once a default has occurred
- Exposure At Default (EAD) - The total value that a lender is exposed to when a borrower defaults
EL = PD * LGD * EAD
- Loan to value
- Capital Adequacy, Regulations and Basel II Accord
- Capital requirement/capital adequacy/regulatory capital
- Risk weighted assets
- The greater the risk a bank is exposed to, the greater the amount of capital it needs to hold
- Basel II accord
- Minimum capital requirements
- Credit risk
- Standardized Approach (SA) (% of the total exposure)
- Internal Ratings Based (IRB) Approaches
- Foundation Internal Ratings Based (F-IRB) Approach
- Advanced Internal Ratings Based (A-IRB) Approach
- Operational risk
- Market risk
- Credit risk
- Supervisory review
- Market discipline
- Minimum capital requirements
- Different facility types (asset classes) and credit risk modeling approaches
- PD - Binomial Logistic regression
- LGD/EAD - Beta regression
- Risk based pricing
- Dependent variables / Independent variables
- Discrete / Continuous
- Fine classing / Coarse classing
Dummy Variables
Dummy variables are binary indicators, 1 if an observation belongs to a category, 0 if it does not.
In statistics and econometrics, particularly in regression analysis, adummy variable is one that takes only the value 0 or 1 to indicate the absence or presence of some categorical effect that may be expected to shift the outcome.They can be thought of as numeric stand-ins for qualitative facts in a regression model, sorting data into mutually exclusive categories (such as smoker and non-smoker).
A dummy independent variable(also called a dummy explanatory variable) which for some observation has a value of 0 will cause that variable's coefficient to have no role in influencing the dependent variable, while when the dummy takes on a value 1 its coefficient acts to alter the intercept. For example, suppose membership in a group is one of the qualitative variables relevant to a regression. If group membership is arbitrarily assigned the value of 1, then all others would get the value 0. Then the intercept would be the constant term for non-members but would be the constant term plus the coefficient of the membership dummy in the case of group members.
Dummy variables are used frequently in time series analysis with regime switching, seasonal analysis and qualitative data applications.
https://en.wikipedia.org/wiki/Dummy_variable_(statistics)
We need only k-1 dummy variables to represent the information about k categories.
Weight of Evidence
To what extent an independent variable would predict a dependent variable
https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
Fine classing
Coarse classing
The process of constructing new categories based on the initial ones
Information value
How much information the original independent variable brings with respect to explaining the dependent variable
- Widely used in credit risk modeling
Overfitting
Our statistical power has focused on a particular dataset so much that it has missed the point
Underfitting
When the model fails to capture the underlying logic of the data
Logistic Regression
- The logistic regression module of scikit learn does not have a built-in way to calculate p-values
- One of the cleanest ways is to alter .fit() from the LogisticRegression class
- Each original independent variable is represented by several dummy variables
- If the coefficients for all of these dummy variables are statistically significant, we should retain all of them, and if the coefficients for none of the dummy variables are statistically significant, we should remove all of them
- If one or few dummy variables representing one original independent variable are statistically significant, it would be best to retain all dummy variables that represent that original independent variable.
- Conventionally, if a p-value is lower than 0.05, we conclude that the coefficient of a variable is statistically significant
logistic_regression_model.predit
logistic_regression_model.predit_proba
AUC curve , RoC curve
Gini - Measure of the inequality between rich and poor individuals in an economy
- Measure inequality between good (non-defaulted) and bad (defaulted) borrowers
- the cumulative proportion of defaulted borrowers as a function of the cumulative proportion of all borrowers.
- AUROC = (Gini -- 1) / 2.
Kolmogorov-Smirnov
Shows to what extent the model seperate the actual good borrowers from the actual bad borrowers
K-S coefficient
- The maximum difference between the cumulative distribution functions of good and bad borrowers
- The greater the difference the better the model (the further apart they are better the results, since model is able to clearly tell the difference between good and bad borrowers)
- Perfect model -> Maximum distance -> K-S = 1
- Predicting by chance -> Almost no distance -> K-S = 0
ScoreCard
The scores in the scorecard we created are: transformations of the regression coefficients of the PD model.
If we want to calculate the credit score of an applicant, it is enough to: sum the scores of each variable's categories from the scorecard where the applicant belongs.
The way we created our scorecard, to obtain an estimate for the PD of an applicant, using score, we have to: raise an exponent to the power of total score and divide that by the same thing plus one.