Virasat: giving Goa's heritage a voice with Gemini and Gemma

The ruined bell tower of the Church of St. Augustine in Old Goa

Goa has some of the oldest churches in India and the largest surviving collection of Portuguese-colonial architecture in Asia. Walk through Old Goa and you are standing inside 16th-century history. But here is the quiet problem: most people walking past a 400-year-old facade — locals included — have no idea what it is, who built it, or why it matters. The buildings survive; the stories don't.

The same is true in people's homes. Almost every Goan family has a tin of old photographs — a wedding outside a village chapel, a grandfather in a borrowed suit, a feast-day procession — and within a generation, nobody is left who can say where or when they were taken.

So over a weekend I built Virasat (Hindi/Urdu for heritage) to see how much of that gap free AI could close.

What it does

You drop in any photo of Goan or Indian heritage — a church, a fort, a monument, an old portrait — and Virasat does three things:

Identifies it — names the subject when it can confidently recognise it, and describes the architectural style and likely era when it cannot, rather than making things up.
Tells its story — a short, spoken-style history: who built it, when, and what it meant.
Reads it aloud — narrates that story in a warm voice, so a heritage photo becomes something you listen to, not just read.

No account, nothing stored on a server. It's meant to feel like standing next to someone who knows the history and is happy to tell you.

A real example: the Church of St. Augustine

This isn't a mockup. I took a real photograph of the ruined bell tower of the Church of St. Augustine in Old Goa and ran it through Virasat untouched. Here's what it produced.

Gemma 4 identified it as: Church of St. Augustine — Old Goa, India, 16th century. Correct.

Gemini 3.5 Flash wrote this story:

Step back in time with me as we gaze upon the majestic ruins of Saint Augustine's Tower in Old Goa. Completed in 1602 by Augustinian friars, this striking forty-six-meter-high laterite belfry is all that remains of what was once one of the grandest churches in the entire Iberian world. Standing proudly against the Goan sky, it survived the abandonment of the monastery in the nineteenth century and the dramatic collapse of its massive vaults. Today, this UNESCO World Heritage monument whispers stories of Goa's golden age, acting as a silent, beautiful sentinel of history, faith, and the passage of time.

Every fact there checks out — the 1602 completion, the Augustinian friars, the 46-metre laterite tower, the collapsed vaults, the UNESCO status.

And here is the actual audio Virasat generated — Gemini text-to-speech reading that story aloud:

How it works — built on Google AI

The whole thing runs on Google's models, through the Google AI Studio Gemini API — and deliberately uses three different ones, each for what it's best at:

Gemini 3.5 Flash looks at the photo and writes the narration — the main story plus a short title. Its prose is clean and warm, and it's careful to describe rather than fabricate when unsure.
Gemma 4, Google's open model, runs a second, independent identification pass — a one-line "what and where" — so the result isn't resting on a single model's opinion.
Gemini 2.5 Flash text-to-speech reads the story aloud. It returns raw 16-bit PCM audio, which the server wraps in a WAV header so it plays straight in the browser.

All three calls run in parallel from one server route, so the story, the identification and the audio arrive together. The front end is Next.js on Vercel — lightweight on purpose, because the interesting part is the models.

A couple of things I learned the hard way: Gemma loves to leak its chain-of-thought ("1. Analyze the request…"), so the server strips that down to the substantive line. And Gemini's image-generation models — which I'd originally wanted for photo restoration — are paywalled on the free tier. Rather than gate the app behind billing, I built it entirely on the free models, and leaned into what they do brilliantly: understanding an image, and speaking.

Why it matters here

This is the kind of thing generative AI is genuinely good for: not replacing a guide or a historian, but putting a little of their knowledge in everyone's pocket. A tourist can understand the chapel in front of them. A student can hear the history of a monument in their own town. A family can attach a story to a photo before the last person who remembers it is gone.

It's small on purpose. But the idea — point modern AI at the specific, local, neglected job of remembering — scales far beyond Goa.

What's next

Narration in Konkani and Marathi, in the language of the people in the photo.
A growing gallery of narrated Goan landmarks.
Photo restoration, once I wire up a billing-enabled key for the image models.

If you have an old photo, or you're standing in front of a monument you can't name, try it here.

Tools & technologies used: Google Gemini 3.5 Flash, Google Gemma 4, Google Gemini 2.5 Flash text-to-speech, Google AI Studio (Gemini API), Next.js, Vercel. The St. Augustine photograph is used under a Creative Commons licence via Wikimedia Commons.