AUTOMATING INVOICE PROCESSING THROUGH INTELLIGENT OPTICAL CHARACTER RECOGNITION AND ENTITY EXTRACTION
Keywords:
Automated Invoice Processing, OCR, Entity Extraction, Machine Learning, Document UnderstandingSynopsis
Purpose: This paper investigates how intelligent Optical Character Recognition (OCR) combined with entity extraction techniques can automate invoice processing to improve accuracy and efficiency.
Design/methodology/approach: We propose a hybrid framework integrating OCR and structured data extraction using machine learning and natural language processing (NLP).
Findings: Results indicate that automated processing significantly reduces manual input, minimizes errors, and accelerates invoice handling.
Practical implications: The framework enables businesses to adopt scalable, touchless invoice processing workflows that reduce operational costs and improve financial data quality.
Originality/value: While existing systems focus on simple OCR, this work integrates entity extraction and context‑aware parsing optimized for real‑world invoice formats.
References
[1] Zhao, X., Wu, Z., Niu, E., & Wang, X. (2019). CUTIE: Learning to understand documents with Convolutional Universal Text Information Extractor. arXiv.
[2] Gummadi, V. P. K. (2019). Microservices architecture with APIs: Design, implementation, and MuleSoft integration. Journal of Electrical Systems, 15(4), 130–134. https://doi.org/10.52783/jes.9328
[3] Bardelli, C., Rondinelli, A., Vecchio, R., & Figini, S. (2020). Automatic electronic invoice classification using machine learning models. MDPI.
[4] Patel, S., & Bhatt, D. (2020). Abstractive information extraction from scanned invoices using end to end sequential approach. arXiv.
[5] Gummadi, V. P. K. (2020). API design and implementation: RAML and OpenAPI
[6] Guo, H., Qin, X., Liu, J., Han, J., & Ding, E. (2019). EATEN: Entity aware attention for sin
[7] Antonio, J. A., Putra, A. R., Abdurrohman, H. S. T. P., & Chandra, A. (2022). A survey on scanned receipts OCR and information extraction. ResearchGate.
[8] Ha, H. T. (2022). Information extraction from scanned invoice images using text analysis and layout features. (Related study).
[9] Automatic Electronic Invoice Classification Using Machine Learning Models. (2020). Journal Paper.
[10] Standard Accounts Payable Definition. Accounts payable. (n.d.). Wikipedia.
[11] Tesseract Optical Character Recognition Engine. Tesseract (software). (n.d.). Wikipedia.
[12] Gummadi, V. P. K. (2023). MuleSoft batch processing: High-volume streaming architecture. Computer Fraud & Security, 2023(12), 50–57. https://doi.org/10.52710/cfs.886
[13] Invoice OCR Practical Guide. OCR invoice processing: How it works & benefits. (2025).
Published
Series
Categories
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.