Managing Technical Debt Using Machine Learning Models in Large Codebases
Keywords:
Technical debt, machine learning, code analysis, software quality, code refactoring, large codebases, debt predictionSynopsis
Managing technical debt in large-scale software systems is a persistent challenge. As codebases grow and evolve, accumulated design shortcuts, suboptimal practices, and undocumented decisions (known as technical debt) can hinder maintainability and scalability. In this paper, we explore the application of machine learning (ML) models to automate the identification, prediction, and mitigation of technical debt across extensive codebases. By leveraging historical code data, code smells, static analysis outputs, and version control metrics, ML-based approaches offer scalable insights into debt hotspots and suggest intelligent refactorings. Empirical evaluations using open-source repositories demonstrate significant promise for debt-aware ML tooling.
References
Zazworka, N., Vetro, A., Dietrich, J., & Seaman, C. (2011). Investigating the impact of design debt on software quality. In International Workshop on Managing Technical Debt. Springer.
Guo, Y., & Seaman, C. (2012). A portfolio approach to technical debt management. In Proceedings of the 2012 International Conference on Software Engineering.
Sirimalla, A. (2022). End-to-end automation for cross-database DevOps deployments: CI/CD pipelines, schema drift detection, and performance regression testing in the cloud. World Journal of Advanced Research and Reviews, 14(3), 871–889. https://doi.org/10.30574/wjarr.2022.14.3.0555
Bavota, G., Lucia, A.D., Oliveto, R., & Russo, B. (2015). Identifying code smells using machine learning. Empirical Software Engineering, 20(3), 725-765.
Maldonado, E., & Shihab, E. (2015). Detecting and quantifying debt on code comments. International Conference on Mining Software Repositories. Springer.
Fontana, F.A., Roveda, R., & Zanoni, M. (2016). Technical debt detection using machine learning. Journal of Systems and Software, 128, 1–16.
Rios, N., Spínola, R.O., Mendonça, M., & Seaman, C. (2018). A tertiary study on technical debt. Information and Software Technology, 102, 117–145.
Tufano, M., Palomba, F., Bavota, G., & Penta, M.D. (2017). When and why your code starts to smell bad. IEEE Transactions on Software Engineering, 43(11), 1063–1088.
Li, Z., Avgeriou, P., & Liang, P. (2015). A systematic mapping study on technical debt and its management. Journal of Systems and Software, 101, 193–220.
Ray, B., Posnett, D., Filkov, V., & Devanbu, P. (2014). A large-scale study of programming languages and code quality in GitHub. Communications of the ACM, 60(10), 91–100.
Sirimalla A. Autonomous Performance Tuning Framework for Databases Using Python and Machine Learning. J Artif Intell Mach Learn & Data Sci 2023 1(4), 3139-3147. DOI: doi.org/10.51219/JAIMLD/adithya-sirimalla/642
Potdar, A., Shihab, E., & Adams, B. (2014). Exploring self-admitted technical debt in open-source software projects. IEEE International Working Conference on Mining Software Repositories.
Alves, N.S.R., Mendes, T.S., de Mendonça, M.G., Spínola, R.O., & Shull, F. (2014). Identification and management of technical debt: A systematic mapping study. Information and Software Technology, 70, 100–121.
Sharma, A., Sharma, S., & Sharma, P. (2016). Predicting technical debt in software systems using code metrics. Procedia Computer Science, 89, 248–256.
Khomh, F., Vaucher, S., Guéhéneuc, Y.-G., & Sahraoui, H. (2012). A Bayesian approach for the detection of code and design smells. Quality Software, 2012 IEEE.
Published
Series
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.