A MICROSERVICES-BASED FRAMEWORK FOR DISTRIBUTED MACHINE LEARNING MODEL TRAINING AND DEPLOYMENT USING ARTIFICIAL INTELLIGENCE

Authors

Jason Sankai J
Independent Researcher, United Kingdom.

Keywords:

Microservices, Distributed Machine Learning, AI Orchestration, Model Deployment, Cloud Computing, MLOps

Synopsis

Purpose – This paper proposes a microservices-based framework for the efficient training and deployment of machine learning (ML) models in distributed environments, leveraging artificial intelligence (AI) to optimize model performance and scalability.

Design/methodology/approach – The framework utilizes containerized microservices to modularize ML tasks such as preprocessing, model training, validation, and deployment. These services communicate over APIs, enabling distributed execution across cloud-native infrastructures. AI-based orchestration selects optimal resources and configurations dynamically.

Findings – The microservices architecture enhances flexibility, fault isolation, scalability, and continuous integration/deployment (CI/CD) of ML pipelines. AI integration reduces resource waste by learning optimal configurations for task allocation. Experimental simulations demonstrate reduced model training time and improved inference throughput.

Practical implications – This framework facilitates collaboration among ML engineers and DevOps teams by abstracting model development workflows into manageable services. It supports hybrid cloud deployment, GPU pooling, and federated learning scenarios.

Originality/value – This paper uniquely combines AI orchestration with microservices for end-to-end ML pipeline automation in distributed systems, demonstrating significant improvements in model lifecycle management.

References

[1] Abadi, M., et al. (2016). TensorFlow: A system for large-scale machine learning. OSDI, 16, 265-283.

[2] Gummadi, V. P. K. (2019). Microservices architecture with APIs: Design, implementation, and MuleSoft integration. Journal of Electrical Systems, 15(4), 130–134. https://doi.org/10.52783/jes.9328

[3] Bauer, M., & Adams, M. (2012). The DevOps transformation. Cutter IT Journal, 25(8), 34-40.

[4] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.

[5] Dragoni, N., et al. (2017). Microservices: Yesterday, today, and tomorrow. In Present and Ulterior Software Engineering (pp. 195–216). Springer.

[6] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. NIPS, 25, 1097–1105.

[7] Li, M., et al. (2014). Scaling distributed machine learning with the parameter server. In OSDI (Vol. 14, pp. 583–598).

[8] Gummadi, V. P. K. (2020). API design and implementation: RAML and OpenAPI specification. Journal of Electrical Systems, 16(4). https://doi.org/10.52783/jes.9329

[9] Miao, Y., et al. (2017). Towards reinforcement learning-based resource allocation for cloud computing. Future Generation Computer Systems, 72, 512–520.

[10] Schuster, F., et al. (2018). MLflow: Manage the ML lifecycle. Proceedings of the 25th ACM SIGKDD.

[11] Sculley, D., et al. (2015). Hidden technical debt in machine learning systems. NIPS, 28.

[12] Thönes, J. (2015). Microservices. IEEE Software, 32(1), 116–116.

[13] Zaharia, M., et al. (2010). Spark: Cluster computing with working sets. In HotCloud.

Published

August 15, 2023