A Service-Oriented Approach to Real-Time Machine Learning Inference in Cloud-Native AI Systems

Authors

Susan Jane Doe
Senior Machine Learning Platform Engineer , Russian Federation

Keywords:

cloud-native, machine learning inference, real-time AI, service-oriented architecture, Kubernetes, microservices, MLOps, model serving

Synopsis

Purpose: This paper explores a service-oriented design paradigm for integrating real-time machine learning (ML) inference within cloud-native AI systems to address low-latency and high-scalability requirements.

Design/methodology/approach: A systematic review of architectures was conducted, drawing insights from 10 academic sources. We then present a two-layered architectural model utilizing container orchestration and service mesh technologies for efficient ML inference.

Findings: Service-oriented deployment enhances modularity and observability of AI systems, improving throughput and fault tolerance in dynamic environments. Stateless microservices combined with model serving frameworks such as TensorFlow Serving enable real-time inference at scale.

Practical implications: Organizations can achieve flexible and cost-efficient AI deployment by leveraging cloud-native patterns. This model is particularly beneficial for streaming analytics, fintech, and IoT systems.

Originality/value: This paper synthesizes service-based AI deployment with Kubernetes-native orchestration, offering a reusable pattern for operationalizing ML at low latency in distributed environments.

Author Biography

Susan Jane Doe, Senior Machine Learning Platform Engineer , Russian Federation

 , 

References

(1) Basnet, R. K. (2019). The Influence of Serverless AI Models on Optimizing Computational Efficiency. ResearchGate. https://www.researchgate.net/publication/398570975

(2) Harun, H. (2019). AI-Based Optimization of Resource Utilization in Edge and Cloud Environments. AIJCST. https://aijcst.org/article/view/13

(3) Kumar, T. V. (2015). Cloud-Native Model Deployment for Financial Applications. PhilPapers. https://philpapers.org/rec/VARCMD

(4) Gummadi, V. P. K. (2019). Microservices architecture with APIs: Design, implementation, and MuleSoft integration. Journal of Electrical Systems, 15(4), 130–134. https://doi.org/10.52783/jes.9328

(5) Micheal, L. (2019). Event-Driven ML Pipelines Leveraging Serverless Computing. ResearchGate. https://www.researchgate.net/publication/394276538

(6) Mohapatra, A., & Sehgal, N. (2018). Scalable Deep Learning on Cloud Platforms. IJTMH. https://ijtmh.com/article/view/183

(7) Nafees, M., Hossain, A., Alam, T., & Anjum, N. (2019). Real-Time Data Engineering for AI-Driven Applications. ResearchGate. https://www.researchgate.net/publication/393744889

(8) Oloke, K. (2019). Designing Cloud-Native Risk Orchestration Layers for Real-Time Fraud Detection. ResearchGate. https://www.researchgate.net/publication/398805292

(9) Pentyala, D. K. (2019). Cloud-Centric Data Engineering for AI Data Quality. IJMC. https://yuktabpublisher.com/article/view/175

(10) Prosper, J. (2019). Optimizing Cloud-Native AI Architectures for Retail. ResearchGate. https://www.researchgate.net/publication/389735279

(11) Rahman, M., Mahbuba, T., Siddiqui, A., & Nowshin, S. (2019). Cloud-Native Data Architectures for Machine Learning. ResearchGate. https://www.researchgate.net/publication/393745076

Published

June 1, 2021