Detecting and Mitigating Algorithmic Bias in Machine Learning Models Trained on Imbalanced Datasets

Eleonora Domenico; Tommaso Gianmarco

Detecting and Mitigating Algorithmic Bias in Machine Learning Models Trained on Imbalanced Datasets

Authors

Eleonora Domenico

Responsible AI Engineer, Italy.

Tommaso Gianmarco

Bias & Fairness Analytics, Italy.

Keywords:

algorithmic bias, imbalanced datasets, fairness, machine learning, resampling, mitigation

Synopsis

The prevalence of algorithmic bias in machine learning (ML) systems trained on imbalanced datasets poses significant ethical and technical challenges. Bias can propagate existing societal inequities, undermine fairness, and lead to discriminatory outcomes, especially in high-stakes domains such as hiring, lending, and healthcare. This short paper surveys the detection and mitigation techniques for algorithmic bias arising from class imbalance, presents empirical analysis on benchmark datasets, and discusses effective mitigation strategies. Our study demonstrates that combining resampling methods with fairness-aware learning frameworks significantly reduces bias while maintaining performance.

References

[1] Barocas, S., Selbst, A.D.: Big data’s disparate impact. California Law Review 104, 671–732 (2016)

[2] Calders, T., Verwer, S.: Three naive Bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21, 277–292 (2010)

[3] Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

[4] Corbett-Davies, S., Goel, S.: The measure and mismeasure of fairness in algorithms. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 1–12 (2018)

[5] Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. Proceedings of the Innovations in Theoretical Computer Science Conference, 214–226 (2012)

[6] Friedler, S.A., Scheidegger, C., Venkatasubramanian, S.: On the (im)possibility of fairness. Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 1–9 (2016)

[7] Uppuluri, V. (2018). The Future of Business Intelligence in Value-Based Care Models. Journal of Artificial Intelligence, Machine Learning & Data Science, 1(1), 3009–3015. https://doi.org/10.51219/JAIMLD/vijitha-uppuluri/623

[8] Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. Advances in Neural Information Processing Systems, 3315–3323 (2016)

[9] He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21, 1263–1284 (2009)

[10] Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1–33 (2012)

[11] Kamiran, F., Calders, T., Pechenizkiy, M.: Discrimination-aware decision tree learning. Proceedings of the IEEE International Conference on Data Mining, 869–874 (2013)

[12] Potla, R.B. (2021). Blueprinting a Manufacturing Data Lakehouse: Harmonizing BOM, Routing, and Serialization Data for Advanced Analytics. International Journal of Innovative Research in Engineering & Multidisciplinary Physical Sciences, 9(1), 1–12. https://doi.org/10.37082/IJIRMPS.v9.i1.232841

[13] Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., Galstyan, A.: A survey on bias and fairness in machine learning. ACM Computing Surveys 54, 1–35 (2019)

[14] Pedreschi, D., Ruggieri, S., Turini, F.: Discrimination-aware data mining. Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, 560–568 (2009)

[15] Verma, S., Rubin, J.: Fairness definitions explained. Proceedings of the IEEE/ACM International Workshop on Software Fairness, 1–7 (2018)

[16] Zliobaite, I.: Learning under concept drift: an overview. Journal of Machine Learning Research 16, 1–25 (2015)

Published

April 12, 2022

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.