- The AI automatically generates lyrics and lets users control genre, tempo, mood, and vocal style
- Lyria 3 analyses photos and videos to create music reflecting the mood or story in the input
- Tracks include AI-generated cover art and can be shared or downloaded via the Gemini app
When Google first introduced AI tools for text, images and video, the focus was largely on productivity and visual creativity. Now, the company is pushing deeper into another frontier - music. Its latest model, Lyria 3, signals how quickly generative AI is expanding into new forms of human expression. But unlike some expectations around AI composing full albums or orchestral scores, Lyria 3 is currently designed for something more immediate - short, personalised music creation.
What Is Lyria 3?
According to Google blog, Lyria 3 is the newest generative music model from Google DeepMind, rolling out in beta inside the Gemini app. The system allows users to create 30-second music tracks simply by describing an idea in text or uploading an image or video for inspiration.
It also mentions an example prompt: "A comical R&B slow jam about a sock finding its match."
Within seconds, the AI generates a short track complete with vocals, instrumentation and lyrics.
The goal is not to produce studio-ready songs, but to enable fast, creative self-expression - something closer to musical messaging than professional composition.
Key Features Explained
Google says Lyria 3 improves significantly over earlier versions of its music models in three major ways:
1. Automatic Lyrics Generation: Users no longer need to provide lyrics. Gemini writes them automatically based on the prompt, matching the theme, tone and style requested.
2. Greater Creative Control: Users can guide multiple musical elements, including:
- Genre (pop, R&B, afrobeat, electronic, etc.)
- Tempo and energy level
- Vocal style
- Mood and narrative
This makes the interaction feel more collaborative than earlier AI music tools.
3. More Realistic and Complex Audio: Google claims improvements in musical layering and sound quality, producing tracks that feel more coherent and polished despite the short duration.
Music From Photos and Videos
One of the more distinctive capabilities is multimodal generation.
Users can upload:
- A photo
- A short video
- Personal memories or references
The AI then analyses the content and creates a track with lyrics reflecting the mood or story.
For instance, uploading images of a pet hiking could result in a custom song about that experience.
This reflects a broader industry shift toward AI systems that combine multiple types of input - text, visuals and audio - into a single creative workflow.
Tracks generated through the Gemini app are limited to about 30 seconds and come with AI-generated cover art created by Google's Nano Banana. Users can download or share them directly through links.
Google's positioning is clear: This is meant to be fun, fast and social, not necessarily a replacement for professional music production.
That design choice mirrors how short-form video transformed content creation. Instead of aiming for cinematic quality, tools prioritise speed and accessibility.
Integration With YouTube Creators
Lyria 3 is also being integrated into YouTube through Dream Track, a feature that helps creators generate custom soundtracks for Shorts.
Initially launched in the United States and expanding to more regions, the technology allows creators to produce:
- Short lyrical segments
- Background music
- Personalised audio themes
For short-form creators, music is often a critical part of engagement, and AI-generated soundtracks could reduce dependence on licensed audio libraries.
Why Lyria 3 Matters in the AI Race
While a 30-second music generator may seem modest compared to large language models, it reflects a deeper trend: AI systems are rapidly becoming multimodal creative engines.
Companies such as OpenAI, Google and others are competing to build platforms that can generate text, images, video and audio within a single interface. The pace of releases has accelerated dramatically, with major upgrades appearing every few months.
Three broader shifts define the current AI landscape:
- Creativity at scale: AI moving beyond productivity into entertainment and art
- Multimodal interaction: Combining text, visuals and audio seamlessly
- Consumer accessibility: Advanced tools reaching everyday users, not just professionals
Music generation is particularly complex because it involves timing, structure and emotional nuance.
Lyria 3 may not yet compose full symphonies, but the speed at which these capabilities are arriving suggests one thing clearly: The AI evolution curve is still climbing.
Track Latest News Live on NDTV.com and get news updates from India and around the world