FROM FEATURES TO FINANCIAL PERSONAS: MAPPING FEATURE TRANSFORMATION EFFICACY TO CUSTOMER ARCHETYPES IN BEHAVIORAL BANKING DATA

Authors

Rajitha Gentyala
Frisco, Texas, USA.

Keywords:

Customer Archetypes,  Feature Engineering, Interpretable Machine Learning, Personalized Banking, Behavioral Segmentation, SHAP, Predictive Modeling'

Synopsis

Current research in customer behavior prediction within retail banking heavily benchmarks the aggregate performance of feature engineering and machine learning pipelines. While these studies demonstrate that techniques like logarithmic scaling, polynomial expansion, and interaction term creation improve overall classification accuracy for metrics such as customer activity status, they provide scant insight into the heterogeneous effects these transformations have across a bank’s diverse customer base. This study posits that the efficacy of a feature transformation is not universal but is intrinsically linked to the underlying financial behavior pattern, or ‘persona,’ of the customer. To address this critical gap, we conduct a two-stage analytical process on a real-world dataset of 30,000 customers from a major European bank. First, we employ a combination of k-means clustering and Gaussian Mixture Models on raw transactional and demographic features to derive five distinct, interpretable customer archetypes: The Digital Nomad (high-frequency, low-value digital transactions), The Traditional Accumulator (low-frequency, high-value branch-based savings), The Credit-Reliant (revolving credit users), The Multi-Product Holder (diversified product portfolio), and The Dormant Saver. Subsequently, we apply eight common feature transformation techniques (including log1p, Box-Cox, polynomial degrees 2 & 3, and PCA) to create multiple engineered datasets. A Random Forest classifier is then trained and evaluated on each archetype’s subpopulation separately, using both raw and transformed features. Our findings reveal a striking disproportion in transformation utility. For instance, logarithmic transformation dramatically improved recall for The Digital Nomad (from 0.72 to 0.89) by normalizing heavy-tailed transaction frequency data but provided negligible improvement for The Traditional Accumulator. Conversely, interaction terms between branch visit frequency and account tenure were uniquely powerful for predicting the activity of The Traditional Accumulator archetype. Model interpretability tools, specifically SHAP (SHapley Additive exPlanations), were employed post-hoc to deconstruct why certain transformations ‘resonate’ with specific personas. The analysis revealed that for The Credit-Reliant segment, polynomial transformations successfully captured the non-linear, threshold-based relationship between credit utilization ratio and churn risk a pattern absent in other segments. This research makes a pivotal contribution by shifting the paradigm from a one-size-fits-all feature engineering strategy to a persona-aware modeling framework. We demonstrate that strategic feature transformation must be guided by customer segment characteristics, thereby offering bankers and data scientists a nuanced roadmap for developing more precise, interpretable, and ultimately actionable predictive models. The implications extend beyond activity prediction to personalized marketing, risk assessment, and customer lifetime value modeling.

References

[1] S. Özkan, M. F. Akçay, and E. K. Budak, “A benchmark for feature engineering in customer behavior prediction,” Expert Systems with Applications, vol. 180, p. 115079, Oct. 2021. doi: 10.1016/j.eswa.2021.115079.

[2] A. Amin, F. Al-Obeidat, B. Shah, A. Khan, and S. U. Khan, “CLV-based segmentation and personalized marketing in banking using hybrid clustering,” Information Systems Frontiers, vol. 24, no. 4, pp. 1067–1083, Aug. 2022. doi: 10.1007/s10796-021-10222-9.

[3] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” in Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 2017, pp. 4765–4774.

Published

May 15, 2024