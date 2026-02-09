Bengaluru-based startup Sarvam AI has been making waves in the artificial intelligence (AI) community globally with its latest innovations, Sarvam Vision and Bulbul V3. The AI model has apparently outperformed global giants like Google Gemini and ChatGPT in key areas in optical character recognition (OCR).

In a post on X (formerly Twitter), the co-founder Pratyush Kumar claimed that Sarvam Vision achieved 84.3% accuracy on olmOCR-Bench, surpassing Gemini 3 Pro and DeepSeek OCR v2, and 93.28% on OmniDocBench v1.5.

And when it comes to Bulbul V3, its text-to-speech model supports 35 voices, with the sample set distributed across 22 official Indian languages, from 1800 to the present. It also has different quality of scans and content. "On Indian languages, Sarvam Vision is the best model by far, while supporting all 22 scheduled Indian languages," Kumar claimed.

The series includes a 3B-parameter state-space vision-language model, which is capable of visual understanding tasks, including image captioning, scene text recognition, chart interpretation, and complex table parsing.

Sarvam AI is a "sovereign" AI

The official website stated that the company aims to build a future where AI is widely accessible to everyone in India. "We want India to embrace the most important technological shift of our time with confidence and control. Our ambition is to build foundational components and apply them to the country's unique needs," the company wrote on its website.

Sarvam AI's success marks a significant milestone in India's AI journey, showcasing the country's potential in core AI innovation. The startup's focus on India-specific challenges has earned recognition from global experts, including tech commentator Deedy Das, who acknowledged the value of Sarvam's OCR and speech models for Indian languages.

"I was wrong about Sarvam," Das wrote on X.

"When I wrote about them a year ago, I felt like the direction to train small "indic" language models was wrong. But boy, have they turned it around. They have the best text-to-speech, speech-to text, and OCR models for Indic languages, and that's actually really valuable. The pricing is very reasonable. And the website is not only beautifully designed but dirt easy to use."

"They're filling a well needed gap in the ecosystem and doing things big labs will probably never focus on to the fullest extent (at least in the short term). I don't know anything about the business, but there's a lot to appreciate about what they've build technologically and I can't remember the last time I felt that way about software products coming out of India. Well done."