Amid a global boom in the artificial intelligence sector characterised by intense investment in infrastructure, a Bengaluru-based startup Sarvam AI has emerged as a major competitor to OpenAI's ChatGPT and Google Gemini.

Founded by Vivek Raghavan and Pratyush Kumar in August 2023, Sarvam AI has recently been in the spotlight in the AI community across the globe, especially for its two new innovations: Sarvam Vision and Bulbul V3.

In an X post, Kumar stated that the model is competitive with the best results in digitisation in English. He added that it defines a "significantly higher bar for Indian languages."

While sharing a graph, Pratyush claimed that Sarvam Vision has achieved a state-of-the-art accuracy of 84.3 per cent on the olmOCR-Bench for the English-only subset. It has outperformed several frontier models like Gemini 3 Pro, along with recent OCR models like DeepSeek OCR 2.

Similarly, it has achieved 93.28 per cent overall score on OmniDocBench v1.5 (English only subset) and excelled in "complex formulas and layout parsing and being within touching distance of the current state of the art," Pratyush noted.

He called it the "best model by far" as it supports "all 22 scheduled Indian languages".

The AI model is said to be performing better than several global models like Google Gemini and ChatGPT in many cases of Optical Character Recognition (OCR).

Introduced earlier this month, it is a three-billion-parameter state-space vision-language model that is capable to perform a wide range of visual understanding tasks, such as image captioning, scene text recognition, chart interpretation as well as complex table parsing.

Besides documents, Sarvam Vision holds general natural scene understanding and can perceive the world and describe it further.

Another major highlight is the Bulbul V3, a text-to-speech model. This has been designed to deliver "natural, expressive and production-ready voices for Indian languages," the company said in a February 5 blog.

It supports more than 35 voices across 11 Indian languages. It will soon expand to support 22 Indian languages.

The model delivered the highest listener preference and low error rates across use-cases and languages in an independent third-party human listening study, Kumar wrote on X.

A blind study had listeners comparing this model with ElevenLabs (v3 alpha and v2.5 flash) and Cartesia Sonic-3. The model topped the scores for 8kHz audio and set a new benchmark for speech synthesis for voice agents.

On its official website, Sarvam AI states that the company aims to build a future where AI is widely accessible to everyone in India. "We want India to embrace the most important technological shift of our time with confidence and control. Our ambition is to build foundational components and apply them to the country's unique needs," it said.