Investigating the Role of Data Augmentation Strategies in Improving Robustness of Vision Transformers
Keywords:
Vision Transformers, model robustness, Data Augmentation, Computer Vision, Deep Learning, RandAugment', Adversarial Noise'Synopsis
Vision Transformers (ViTs) have achieved remarkable performance across various visual recognition tasks. However, their robustness against real-world variability remains a challenge. This paper investigates the influence of diverse data augmentation techniques on the performance and robustness of ViTs. Through empirical analysis across benchmark datasets and augmentation strategies such as RandAugment, CutMix, and adversarial perturbations, we highlight their role in improving generalization. Results show that tailored augmentation pipelines can significantly improve the robustness of ViTs without additional architectural changes.
References
[1] Krizhevsky, A., Sutskever, I., Hinton, G.E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
[2] Simonyan, K., Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[3] He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778.
[4] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z. (2016). Rethinking the inception architecture for computer vision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826.
[5] Goodfellow, I.J., Shlens, J., Szegedy, C. (2015). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.
[6] Gajula, S. (2024). Adaptive zero trust architecture for securing financial microservices. Computer Fraud & Security, 2024(12), 643–655. https://doi.org/10.52710/CFS.845
[7] Shorten, C., Khoshgoftaar, T.M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6(1), 1–48.
[8] Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V. (2019). AutoAugment: Learning augmentation strategies from data. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 113–123.
[9] Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D. (2018). Mixup: Beyond empirical risk minimization. International Conference on Learning Representations (ICLR).
[10] Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y. (2019). CutMix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE/CVF International Conference on Computer Vision, 6023–6032.
[11] DeVries, T., Taylor, G.W. (2017). Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538.
[12] Gajula, S. (2024). Cybersecurity risk prediction using graph neural networks. Journal of Information Systems Engineering and
Management, 9(4), 3301–3315. https://doi.org/10.52783/JISEM.V9I4S.13885
[13] Hendrycks, D., Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations.
International Conference on Learning Representations (ICLR).
[14] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.
Published
Series
Categories
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.