Benchmarking Machine Learning Algorithms for Imbalanced Data in Fraud Detection
Keywords:
Fraud detection, Imbalanced data, Machine learning, Benchmarking, Classification performance, Resampling techniquesSynopsis
Fraud detection is a critical application of machine learning, yet the highly imbalanced nature of fraud datasets poses a significant challenge to conventional algorithms. This study benchmarks the performance of multiple machine learning techniques on imbalanced datasets for fraud detection tasks. We analyze their effectiveness using standard performance metrics suited for skewed class distributions and assess their strengths in real-world financial datasets. The results highlight the superiority of ensemble methods and data resampling techniques, offering a roadmap for practitioners to optimize fraud detection systems.
References
[1] Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: Synthetic Minority Over-sampling Technique." Journal of Artificial Intelligence Research, vol. 16, 2002, pp. 321–357.
[2] Dal Pozzolo, Andrea, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi. "Calibrating Probability with Undersampling for Unbalanced Classification." 2015 IEEE Symposium Series on Computational Intelligence, 2015, pp. 159–166.
[3] Nguyen, Nhat-Duc, Tan-Sanh Tran, Weidong Luo, and Dinh Phung. "A Comparative Evaluation of Ensemble Classifiers for Credit Card Fraud Detection." Information Sciences, vol. 560, 2021, pp. 54–72.
[4] Sirimalla A. Autonomous Performance Tuning Framework for Databases Using Python and Machine Learning. J Artif Intell Mach Learn & Data Sci 2023 1(4), 3139-3147. DOI: doi.org/10.51219/JAIMLD/adithya-sirimalla/642
[5] Zhang, Ying, Xia Zhou, and Tao Liu. "Hybrid Undersampling and Feature Selection Approach for Credit Card Fraud Detection." Expert Systems with Applications, vol. 212, 2023, article 118723.
[6] Bahnsen, Alejandro C., Djamila Aouada, Aleksandar Stojanovic, and Björn Ottersten. "Cost Sensitive Credit Card Fraud Detection Using Bayes Minimum Risk." 2013 12th International Conference on Machine Learning and Applications, 2013, pp. 333–338.
[7] Liu, Xue, and Zhi-Hua Zhou. "The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study." Proceedings of the Sixth International Conference on Data Mining, IEEE, 2006, pp. 970–974.
[8] Phua, Clifton, Vincent Lee, Kate Smith, and Ross Gayler. "A Comprehensive Survey of Data Mining-Based Fraud Detection Research." arXiv preprint, arXiv:1009.6119, 2010.
[9] Sahin, Yusuf, and Eymen Kara. "A Detailed Review of Machine Learning Algorithms for Fraud Detection." Procedia Computer Science, vol. 175, 2020, pp. 561–568.
[10] Buda, Mateusz, Atsuto Maki, and Maciej A. Mazurowski. "A Systematic Study of the Class Imbalance Problem in Convolutional Neural Networks." Neural Networks, vol. 106, 2018, pp. 249–259.
[11] Leevy, Joshua L., Timothy M. Khoshgoftaar, Randall Wald, and Alexandra Napolitano. "A Survey on Addressing High-Class Imbalance in Big Data." Journal of Big Data, vol. 5, no. 1, 2018, pp. 1–30.
[12] Sirimalla, A. (2022). End-to-end automation for cross-database DevOps deployments: CI/CD pipelines, schema drift detection, and performance regression testing in the cloud. World Journal of Advanced Research and Reviews, 14(3), 871–889. https://doi.org/10.30574/wjarr.2022.14.3.0555
[13] Fernández, Alberto, Salvador García, Mikel Galar, Román C. Prati, Bartosz Krawczyk, and Francisco Herrera. "Learning from Imbalanced Data Sets." SpringerBriefs in Computational Intelligence, Springer, 2018.
[14] Van Vlasselaer, Véronique, Tina Eliassi-Rad, Lise Getoor, Bart Goethals, and Wouter Verbeke. "Gotcha! Network-Based Fraud Detection for Social Security Fraud." Management Science, vol. 63, no. 9, 2017, pp. 3090–3110.
[15] Brown, Ivan, and Adel Bouhoula. "An Improved Framework for Credit Card Fraud Detection Using Machine Learning." Journal of Information Security and Applications, vol. 55, 2020, article 102583.
Published
Series
Categories
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.