Shouvik Nandy

Data Governance | Data Management | Data Quality | Data Analytics

Data Science Projects

[Capstone Project : Loan Default Prediction]


Project 1 : Unsupervised Learning - PCA & tSNE

Principal Component Analysis - tSNE - Exploratory Data Analysis

In this project, we will explore dimensionality reduction using PCA & tSNE algorithm. We will be using the inbuilt Iris dataset from sklearn package. The data contains information about three types of Iris flower - Setosa, Versicolour & Virginica. For every type of flower there are four features sepal length, sepal width, petal length and petal width.


Project 2 : Segmentation of Bank customers using clustering techinques

K-Means - DBSCAN - Gaussian Mixture Model

In this project, we will cluster customer data from a bank. The goal is to indentify various customer segments to run marketing campaigns. They have been advised by their marketing research team, that the penetration in the market can be improved. Based on this input, the Marketing team proposes to run personalised campaigns to target new customers as well as upsell to existing customers. Another insight from the market research was that the customers perceive the support services of the bank poorly. Based on this, the Operations team wants to upgrade the service delivery model, to ensure that customers queries are resolved faster.


Project 3 : Prediction of Housing Prices

Linear Regression - Machine Learning - Model Metrics

In this project, we will predict the housing prices of a town or a suburb based on the features of the locality provided to us. In the process, we need to identify the most important features in the dataset. We need to employ techniques of data preprocessing and build a linear regression model that predicts the prices for us.Each record in the dataset describes a Boston suburb or town. The data was drawn from the Boston Standard Metropolitan Statistical Area (SMSA) in 1970.


Project 4 : Prediction of Loan Eligiblity

Logistic Regression - k Nearest neighbors - Feature Importance

In this project, we will predict whether a customer is eligible for bank loan. Credit risk is the default in payment of any loan by the borrower. In Banking sector this is an important factor to be considered before approving the loan of an applicant.Dream Housing Finance company deals in all home loans. They have presence across all urban, semi urban and rural areas. Customer first apply for home loan after that company validates the customer eligibility for loan.

Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have given a problem to identify the customers segments, those are eligible for loan amount so that they can specifically target these customers.


Project 5 : Prediction of Loan Eligiblity

Decision Tree - Random Forest Classifier

In this project, we will use the same data set to predict loan eligiblity with tree based models. We will implement Decision trees and Random Forest algorithms for classification.


Project 6 : Forecasting of Air Passengers

Time Series Forecasting - AR, MA & ARMA ( autoregressive–moving-average )

In this project, we will forecast the volume of air passengers based on the historic data. The dataset is from inbuilt dataset in R. The objective is to predict the monthly volume of air passengers for next 24 months.