DATA AUGMENTATION STRATEGIES FOR ENHANCING MODEL GENERALIZATION IN LOW-RESOURCE DOMAINS

Aiko Mexi Tanaka

DATA AUGMENTATION STRATEGIES FOR ENHANCING MODEL GENERALIZATION IN LOW-RESOURCE DOMAINS

Authors

Aiko Mexi Tanaka

Independent Researcher, Australia.

Keywords:

Data Augmentation, Model Generalization, Low-Resource Domains, Transfer Learning, Synthetic Data

Synopsis

Purpose: This paper examines how data augmentation strategies improve model generalization in low resource domains where labeled data is scarce. It synthesizes findings from prior research and proposes a structured framework. Design/methodology/approach: Through qualitative synthesis of ten published original studies (pre 2020), we identify augmentation techniques, categorize them, and evaluate their effectiveness across tasks. Two conceptual diagrams illustrate the augmentation taxonomy and workflow. Two tables summarize strategies and outcomes. Findings: Augmentation consistently improves generalization, especially when combined with task aware transformations and fine tuning. Strategies such as synthetic sampling, perturbations, and generative modeling show robust performance gains, with domain specific customization yielding the best results. Practical implications: Practitioners in low resource settings can adopt taxonomy based selection of augmentation methods, balancing computational costs with performance gain. Implementation guides aid transfer to real world tasks. Originality/value: This paper synthesizes evidence across tasks and modalities, proposing an integrated framework to guide augmentation selection for enhanced generalization in low resource scenarios.

References

[1] Antoniou, A., Storkey, A., & Edwards, H. (2017). Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340.

[2] Gummadi, V. P. K. (2023). MuleSoft batch processing: High-volume streaming architecture. Computer Fraud & Security, 2023(12), 50–57. https://doi.org/10.52710/cfs.886

[3] Chen, H., & Liu, Y. (2017). Augmentation techniques for biomedical imaging. Journal of Medical Imaging, 4(2), 021102.

[4] Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W. F., & Sun, J. (2017). Generating multi label discrete patient records using generative adversarial networks. arXiv preprint arXiv:1703.06490.

[5] Culotta, A. (2016). Feature space data augmentation for text classification. Proceedings of the 54th ACL, 33–42.

[6] Fadaee, M., Bisazza, A., & Monz, C. (2017). Data augmentation for low resource neural machine translation. Proceedings of ACL, 567–573.

[7] Frid Adar, M., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. Proceedings of IEEE ISBI, 289–293.

[8] Wei, J., & Zou, K. (2019). EDA: Easy data augmentation techniques for boosting performance on text classification tasks. EMNLP Workshop, 747–757.