Document Intelligence with Vision‑Language Augmentation

The next evolution of intelligent document processing lies in combining traditional discriminative vision systems with the contextual understanding of vision‑language models (VLMs) fine‑tuned on domain‑specific document data. VLMs—originally developed as general‑purpose models—offer strong zero‑shot capabilities and semantic understanding, but their limitations in precision and resource efficiency, along with data privacy concerns when relying on external model services, make them unsuitable as stand‑alone replacements for production pipelines. A hybrid approach integrates VLMs alongside an optimized document pipeline that includes pre‑processing, layout analysis, OCR and handwriting recognition, document classification and field‑level semantic extraction. In this talk, we explore the design of an intelligent document processing system using this hybrid approach, balancing the determinism and efficiency of classical pipelines with the multimodal understanding of VLMs, and show how this approach enables document systems that are accurate, configurable by document family, resource‑optimized and adaptable to enterprise and on‑premise environments.

Track

Technical Insights

Session Speakers

Sanjay Nichani
VP of AI and Computer Vision, ABBYY

Sanjay Nichani is an accomplished innovator with numerous patents to his name. Currently, he leads the AI and Computer Vision team at ABBYY, working on a modern intelligent document processing pipeline. Prior to that he was VP of AI and Computer Vision at Peloton, focusing on human pose estimation, activity recognition and movement tracking for the fitness domain, where his team developed Peloton Guide, a camera-based, interactive strength-training product offering real-time AI-powered feedback. Previously, Nichani served as VP at Acuant, working on document forensics for fraud detection. As VP of the Mitek Labs R & D group, he led development of Core Flow, a deep learning-based image processing pipeline for identity verification. He also founded Merakona, which developed 3D sensor technology for access control systems, and co-founded Pelfunc, creator of the interactive photo app Squeak. Nichani holds advanced degrees in Business from Babson College (Massachusetts) and Computer Science from the University of South Florida.

Document Intelligence with Vision‑Language Augmentation

Track

Session Speakers

Sanjay Nichani

See you May 11-13, 2026 in Silicon Valley, California

Sponsors and Exhibitors

Get in Touch

Share