Python
Machine Learning
Classification
Customer Churn
Streamlit
Project Overview
This project aims to predict customer churn for a bank using machine learning techniques. Churn occurs when customers stop using the bank’s services, leading to revenue loss. By accurately predicting churn, the bank can proactively implement retention strategies and improve customer satisfaction.
The dataset includes key customer attributes such as credit score, age, account balance, and engagement metrics.
Key Insights
- Banks can reduce churn by identifying high-risk customers early.
- Feature importance analysis revealed that age, credit score, and account balance significantly impact churn.
- Handling class imbalance is crucial—models trained on imbalanced data underperform in predicting actual churners.
Different approaches suit different business priorities:
- SMOTE improves precision (better targeting of churners).
- Class Weight Adjustment improves recall (catching more actual churners).
Technical Implementation
Preprocessing & Feature Engineering:
- Handled missing values and outliers.
- Encoded categorical variables like gender.
- Scaled numerical features for better model performance.
Modeling Approaches:
- Random Forest + SMOTE: Used Synthetic Minority Over-sampling Technique to balance classes.
- Random Forest + Class Weight Adjustment: Adjusted class weights to give more importance to churners.
Evaluation Metrics:
- Used Precision, Recall, F1-Score, and Confusion Matrix to assess model performance.
- Cross-validation ensured robust and reliable model accuracy.
Key Learnings
- Class imbalance significantly affects model performance, requiring targeted techniques like SMOTE or weight adjustment.
- Feature importance analysis helps in making data-driven business decisions.
- Model selection should align with business goals—whether prioritizing precision or recall.
- Hyperparameter tuning plays a vital role in optimizing model performance.