ADAPTIVE DOCUMENT CLASSIFICATION USING HYBRID NLP AND COMPUTER VISION MODELS IN ENTERPRISE WORKFLOWS

Carter Rachel Thompson

ADAPTIVE DOCUMENT CLASSIFICATION USING HYBRID NLP AND COMPUTER VISION MODELS IN ENTERPRISE WORKFLOWS

Authors

Carter Rachel Thompson

Amazon Web Services (AWS), United States.

Keywords:

Hybrid Document Classification, NLP, Computer Vision, Multimodal Learning, Enterprise Workflows

Synopsis

Purpose: This paper examines how hybrid models combining Natural Language Processing (NLP) and Computer Vision (CV) enable adaptive document classification in enterprise workflows, improving accuracy and robustness across diverse document types. Design/methodology/approach: We propose a multimodal classification framework that integrates text and visual features extracted through NLP and CV networks. The framework is evaluated against traditional unimodal classification approaches. Findings: The hybrid approach achieves higher classification performance and adaptability across varying document structures compared to single modality methods. Practical implications: Enterprises can enhance automation pipelines for document processing—such as indexing, routing, and retrieval—by deploying hybrid models. Originality/value: This study synthesizes prior research and proposes a practical architecture for real world enterprise classification tasks, demonstrating the benefits of multimodality.

References

[1] Kölsch, A., Afzal, M. Z., Ebbecke, M., & Liwicki, M. (2017). Real time document image classification using deep CNN and extreme learning machines. arXiv.

[2] Gummadi, V. P. K. (2019). Microservices architecture with APIs: Design, implementation, and MuleSoft integration. Journal of Electrical Systems, 15(4), 130–134. https://doi.org/10.52783/jes.9328

[3] Mandivarapu, J. K., Bunch, E., You, Q., & Fung, G. (2021). Efficient document image classification using region based graph neural network. arXiv.

[4] Bakkali, S., Ming, Z., Coustaty, M., Rusiñol, M., & Terrades, O. R. (2022). VLCDoC: Vision Language contrastive pre training model for cross modal document classification. arXiv.

[5] Pappula, K. K. (2025). Transformer based classification of financial documents. IJMSM.

[6] Classic document classification overview. (2021). Document classification with machine learning. altexsoft.com.

[7] Gummadi, V. P. K. (2020). API design and implementation: RAML and OpenAPI specification. Journal of Electrical Systems, 16(4). https://doi.org/10.52783/jes.9329

[8] Document intelligence foundations. (2025). Document AI. Wikipedia.

[9] Dutta, S. (2024). VisFormers—Combining vision and transformers for complex document classification (related work).

[10] Early hybrid text classification approach. (2020). Document classification by using hybrid deep learning approach. ResearchGate.

[11] Gummadi, V. P. K. (2023). MuleSoft batch processing: High-volume streaming architecture. Computer Fraud & Security, 2023(12), 50–57. https://doi.org/10.52710/cfs.886

[12] NLP & CV combined examples. (2021). Learn how to classify documents using CV and NLP. Affine.ai.

[13] Beyond OCR in document processing. (2025). AI in document management use cases. SmartDev.com.

Published

February 6, 2026

Series

Research paper

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.