Lending Club case analysis in R programming – using various modeling techniques like Naïve Bayes’, KNN, Logistic Regression, and CART model.
Question Description
I am on Covid duty in hospital and have no time to work on this. I will need answers to all the questions and report as mentioned in this document. You will need good knowledge about R programming/R Studio. I have attached the document which has all the details. Everything should be done in R studio and I need rmd file. Questions welcome. Thank you
You are given a single combined file of “approved” loans data from six years, which are supposedly the pre and post periods of the controversy.
Step 1: The first step is to create two new columns as follows:
a)Comb_Risk_One: Create a binary column by combining categories A and B (Low Risk) into one category and all the remaining categories in another (High Risk).
b)Comb_Risk_Two: Create a binary column by combining categories A, B and C (Low Risk) into one category and all the remaining categories in another (High Risk).
Now, break the file into two files filtering out data for 2012, 13, and 14 in one file and 2015, 16 and 17 in another file.
Step 2: Each loan is graded (A to G) based on the risk, with A being least risky and G being the highest risk category. You are asked to predict Low and High-risk categories (for the two new response variables) using various modeling techniques like Naïve Bayes’, KNN, Logistic Regression, and CART model. Make sure to look for the following:
a.Outliers based on the independent columns (predictors)
b.Multicollinearity
c.Scaling and standardization of the predictors
d.Train-Test split for both files and compare the confusion matrices on the Test.
Step 3: Produce a “well documented and explained” R Markdown knit file analyzing the data with findings on the model with the highest classification ability. Also describe the features of the categories that are not classified correctly. Create a confusion matrix to answer the last question and run descriptive statistics on the misclassified categories. Provide any necessary EDA and visuals to enhance understanding of your analysis.
Refer to the document attached. Refer to the datafile here: https://www.sendspace.com/file/csi8r1
Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."