Benchmarking AutoML Frameworks for Scalable and Reproducible Data Science Workflows

Goethe Chamisso Maria

Benchmarking AutoML Frameworks for Scalable and Reproducible Data Science Workflows

Authors

Goethe Chamisso Maria

AutoML Platform Engineer – Scalable Model Selection and Workflow Automation, United Arab Emirates

Keywords:

AutoML, Reproducibility, Scalability, Benchmarking, Data Science Workflow, Model Selection, Hyperparameter Tuning

Synopsis

Automated Machine Learning (AutoML) frameworks have transformed the landscape of data science by automating model selection, hyperparameter tuning, and pipeline optimization. However, the proliferation of AutoML tools raises critical questions about their scalability, reproducibility, and performance across diverse datasets and computing environments. This study benchmarks four widely-used AutoML frameworks—Auto-sklearn, TPOT, H2O AutoML, and Google Cloud AutoML—on standardized datasets and distributed infrastructures. We evaluate their performance across accuracy, runtime efficiency, scalability under larger data regimes, and reproducibility of outputs. The findings highlight nuanced trade-offs, with cloud-based frameworks excelling in scalability, while open-source tools offer better reproducibility. This paper contributes to best practices in selecting and deploying AutoML frameworks for robust, transparent, and production-ready machine learning workflows.

Author Biography

Goethe Chamisso Maria, AutoML Platform Engineer – Scalable Model Selection and Workflow Automation, United Arab Emirates

AutoML Platform Engineer – Scalable Model Selection and Workflow Automation, United Arab Emirates

References

[1] Feurer, Matthias, et al. "Efficient and Robust Automated Machine Learning." Advances in Neural Information Processing Systems, vol. 28, 2015.

[2] Olson, Randal S., et al. "Evaluation of a Tree-Based Pipeline Optimization Tool for Automating Data Science." Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '16), 2016, pp. 485–492.

[3] Gummad, V. P. K. (2025). Flex gateway, service mesh, and advanced API management evolution. International Journal of Applied Mathematics, 38(9s), 2199–2206. https://doi.org/10.12732/ijam.v38i9s.1643

[4] LeDell, Erin, and Sebastien Poirier. "H2O AutoML: Scalable Automatic Machine Learning." Proceedings of the AutoML Workshop at ICML, 2020.

[5] Nguyen, Duy Thanh, et al. "A Comparative Study of AutoML Platforms." arXiv preprint arXiv:2006.09952, 2020.

[6] Rebuffi, Sylvain A., et al. "Towards Transparent AutoML Pipelines." ICML Workshop on Automated Machine Learning, 2021.

[7] Thornton, Chris, et al. "Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms." Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 847–855.

[8] Waring, James, et al. "Automated Machine Learning: Review of the State-of-the-Art." arXiv preprint arXiv:2006.00664, 2020.

[9] Gijsbers, Pieter, et al. "An Open Source AutoML Benchmark." arXiv preprint arXiv:1907.00909, 2019.

[10] Zöller, Marc-André, and Marco F. Huber. "Benchmark and Survey of Automated Machine Learning Frameworks." Journal of Artificial Intelligence Research, vol. 70, 2021, pp. 409–472.

[11] Truong, Anh, et al. "Towards a Unified Taxonomy and Benchmark of AutoML Frameworks." Data Mining and Knowledge Discovery, vol. 35, 2021, pp. 1152–1206.

[12] Tuggener, Lukas, et al. "Automated Machine Learning in Practice: State of the Art and Recent Results." Data Mining and Knowledge Discovery, vol. 35, no. 4, 2021, pp. 1277–1305.

[13] Martin, Lukas, et al. "How Reproducible Are AutoML Experiments?" Automated Machine Learning Conference, 2021.

[14] He, Xin, et al. "AutoML: A Survey of the State-of-the-Art." Knowledge-Based Systems, vol. 212, 2021.

[15] Gunter, Tobias, and Isabelle Guyon. "Design of Experiments for Benchmarking AutoML Systems." AutoML Workshop at ICML, 2019.

[16] Karmaker Santu, Santu, et al. "On Benchmarking AutoML Frameworks." International Journal of Data Science and Analytics, vol. 11, no. 4, 2021, pp. 253–271.