AN AI-ENABLED MICROSERVICES ARCHITECTURE FOR REAL-TIME MACHINE LEARNING INFERENCE IN CLOUD-NATIVE SYSTEMS

Markus Weber Becker

AN AI-ENABLED MICROSERVICES ARCHITECTURE FOR REAL-TIME MACHINE LEARNING INFERENCE IN CLOUD-NATIVE SYSTEMS

Authors

Markus Weber Becker

Senior Research Scientist, Germany.

Keywords:

Microservices, Cloud-Native, Machine Learning Inference, Kubernetes, Real-Time AI, Model Serving, Scalability

Synopsis

Purpose: This paper proposes a cloud-native architecture that integrates AI-enabled microservices to facilitate real-time machine learning inference with scalable and efficient deployment mechanisms.

Design/methodology/approach: A hybrid microservices design was implemented, combining container orchestration (Kubernetes) with AI inference servers (like TensorRT, ONNX Runtime). Diagrams illustrate the end-to-end architecture, while tables compare latency and resource efficiency across deployments.

Findings: Microservices-based deployment enhances performance, reduces inference latency, and allows model versioning and rollback capabilities. Containerization further ensures portability across environments.

Practical implications: The proposed architecture can be applied across IoT systems, autonomous monitoring, and financial analytics platforms where low-latency inference is critical.

Originality/value: The novelty lies in integrating AI inference serving within a microservices pipeline while maintaining real-time constraints and enabling easy scaling within cloud-native environments.

References

[1] Baylor, D., et al. (2017). TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. KDD '17.

[2] Gummadi, V. P. K. (2019). Microservices architecture with APIs: Design, implementation, and MuleSoft integration. Journal of Electrical Systems, 15(4), 130–134. https://doi.org/10.52783/jes.9328

[3] Crankshaw, D., et al. (2017). Clipper: A Low-Latency Online Prediction Serving System. USENIX Symposium on NSDI.

[4] Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51(1), 107–113.

[5] Dragoni, N., et al. (2017). Microservices: Yesterday, Today, and Tomorrow. Present and Ulterior Software Engineering, 195-216.

[6] Gummadi, V. P. K. (2020). API design and implementation: RAML and OpenAPI specification. Journal of Electrical Systems, 16(4). https://doi.org/10.52783/jes.9329

[7] Gupta, H., & Simmhan, Y. (2018). An Adaptive and Elastic Stream Processing Platform for the Cloud. IEEE Transactions on Cloud Computing, 6(1), 91-104.

[8] Hasselbring, W., & Steinacker, G. (2017). Microservice Architectures for Scalability, Agility and Reliability in E-Commerce. Proceedings of the IEEE, 105(10), 1837–1850.

[9] Leite, L., et al. (2019). A systematic mapping study on microservices. Journal of Systems and Software, 155, 1–27.

[10] Olston, C., et al. (2017). TensorFlow-Serving: Flexible, High-Performance ML Serving. arXiv:1712.06139.

[11] Wang, C., et al. (2016). Fast and Scalable Deep Learning Inference with NVIDIA TensorRT. NVIDIA White Paper.

[12] Xie, C., et al. (2019). Performance Analysis of Container-Based Microservice Architectures for Deep Learning Workloads. ACM SoCC.

Published

August 29, 2023

Series

Research paper

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

AN AI-ENABLED MICROSERVICES ARCHITECTURE FOR REAL-TIME MACHINE LEARNING INFERENCE IN CLOUD-NATIVE SYSTEMS

Authors

Keywords:

Synopsis

References

Published

Series

Categories

License