Engagement
Customer Loyalty Prediction
This project focuses on predicting customer loyalty by applying machine learning techniques to classify loyal and non-loyal customers based on individual profiles and behavioral data. Using variables such as age, gender, income, purchase frequency, membership duration, and spending behavior, the analysis combines clustering methods to identify customer segments and classification models, specifically logistic regression and decision trees, to predict loyalty. The goal is to determine the most accurate model while generating insights to enhance customer retention and support data-driven marketing strategies.
Course
605
Duration
2 months
Tools
Python, Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
Link
Overview
The dataset contained variables with strong predictive potential for customer loyalty. However, a key challenge was the absence of an explicit label defining loyalty, requiring the derivation of a proxy metric from behavioral patterns and purchase history. Moreover, the analysis integrated multiple complementary datasets and involved heterogeneous rating systems and numerous behavioral and demographic factors.
To address dimensionality and redundancy, dimensionality reduction techniques such as PCA and t-SNE were applied to preserve informative structure while minimizing noise. I engineered a loyalty label using statistical thresholds and business logic, then applied machine learning techniques to classify customers as loyal or non-loyal. The dataset was preprocessed through numerical standardization and categorical encoding. Using unsupervised methods like K-Means clustering, I segmented the population into behaviorally distinct groups, which enhanced the model's ability to differentiate loyalty profiles. Supervised learning models were then trained and tested, allowing for evaluation through misclassification error and performance metrics.
Results
Dimensionality reduction techniques such as PCA and t-SNE were explored to simplify the dataset; however, only PCA proved effective and was incorporated into the analysis. Clustering analysis revealed three distinct customer segments, each defined by unique combinations of income and spending behavior, providing actionable insights for targeted marketing strategies. In the classification phase, both logistic regression and decision trees identified key behavioral and demographic predictors of loyalty. While there is room to improve model performance, the results highlight the value of customer segmentation and the potential of integrating richer data, such as sentiment and psychographic variables for more accurate and meaningful loyalty prediction.



