ROBUSTNESS AND FAIRNESS EVALUATION OF MACHINE LEARNING MODELS IN HIGH STAKES DECISION SYSTEMS

Authors

Nguyen Leblanc Chen
Research Scientist – Ethical AI, USA.

Keywords:

Machine Learning, Fairness, Robustness, High Stakes Decision Systems, Evaluation Metrics, Algorithmic Bias

Synopsis

Purpose: This paper examines the importance of evaluating both robustness and fairness in machine learning (ML) models deployed in high stakes decision environments such as healthcare, criminal justice, and finance. Design/methodology/approach: We review theoretical and empirical studies on robustness and fairness, categorize key evaluation methods, and compare metrics and testing frameworks. Findings: Results indicate that while many fairness techniques reduce disparate impact, robustness to distributional shifts and adversarial perturbations remains a critical gap in current evaluations. Fairness interventions can inadvertently affect robustness, highlighting the need for joint evaluation frameworks. Practical implications: The paper offers guidance for practitioners on selecting appropriate evaluation metrics and testing protocols for high stakes ML deployments. Originality/value: This work systematically contrasts robustness and fairness assessments, highlighting trade offs and proposing integrated evaluation recommendations.

   

References

[1] Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv.

[2] Barocas, S., & Selbst, A. D. (2016). Big data’s disparate impact. California Law Review, 104(3), 671–732.

[3] Gummadi, V. P. K. (2023). MuleSoft batch processing: High-volume streaming architecture. Computer Fraud & Security, 2023(12), 50–57. https://doi.org/10.52710/cfs.886

[4] Beutel, A., Chen, J., Zhao, Z., & Chi, E. H. (2019). Fairness in recommendation ranking through pairwise comparisons. WWW.

[5] Chen, I. Y., Szolovits, P., & Ghassemi, M. (2018). Can AI help reduce disparities in general medical and mental health care? AMA Journal of Ethics, 20(9), E944–E952.

[6] Cotter, A., Gupta, M., Jiang, H., & Sridharan, K. (2019). Two wrong fairness criteria are not better than one. NeurIPS.

[7] Goodfellow, I., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv.

[8] Gulrajani, I., & Lopez Paz, D. (2020). In search of lost domain generalization. ICML.

[9] Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. NeurIPS.

[10] Hendrycks, D., & Gimpel, K. (2017). A baseline for detecting misclassified and out of distribution examples in neural networks. ICLR.

[11] Liu, S., Pradhan, M., Tasdizen, T., & Stoyanov, V. (2018). Delving into bias and fairness in clinical machine learning systems. Machine Learning for Healthcare.

Published

February 18, 2025