Case study · 2024 – 2026
Northway Tech.
AI voice caddie featuring 2× US Open champion Andy North — hands-free course navigation, contextual shot recommendations, voice-recorded analytics.
40k+
courses indexed
Andy North
celebrity caddie voice
<1.2s
voice E2E target
WebRTC
real-time transport
The problem
A round of golf is four hours of micro-decisions: which club, which line, what wind correction, where to miss, what the last three shots tell you about your swing today. Most golfers play with none of that information. Caddies who can hold it all in their head are a luxury — there are roughly 30,000 working caddies for the 60+ million golfers worldwide. The information loop between "what should I do?" and "here's the right call" is broken on the course in a way that no app has fully fixed, because the right interface during a round is voice, not a screen.
Northway Tech was built to be that caddie. A hands-free voice agent in your pocket, fronted by Andy North — 2× US Open champion — so the recommendations carry the credibility of a major winner. The product had to be conversational over real-time voice (no walkie-talkie press-and-hold ceremony), context-aware (what hole, what weather, what your swing did the last three shots), and respectful of the etiquette of the sport (silent during the address, available the moment you're walking down the fairway). And it had to do all of that on a phone in someone's pocket, sometimes on a 4G hotspot in the middle of nowhere.
Constraints
- The voice loop has to feel like a conversation. Anything above a 1.2s end-to-end gap breaks "asking my caddie a question" and turns it into "waiting for an app." Voice UX in noisy environments is a stricter version of this — half-second hesitations on a windy course feel like 3 seconds in a quiet room.
- 40,000+ courses, geospatially indexed. Yardage to the front / middle / back of every green, hazard locations, dogleg geometries — all queryable in real time as the player walks. The geospatial layer was as much a product as the voice layer.
- Andy North's voice has to sound like Andy North, not a TTS model. ElevenLabs voice cloning is the right tool, but the prosody on golf jargon ("172 to the pin, downwind half a club") is its own discipline.
- Field network conditions are hostile. Cellular coverage on a back-nine ranges from excellent to nothing. The voice agent had to handle dropouts gracefully and resume mid-conversation without losing context.
- Etiquette compliance. The agent must not speak during a player's address. It must be summoned by wake-word, button press, or by walking back to a known "asking distance" from the ball. Talking at the wrong time is a product-killing failure mode.
Architecture
Real-time voice over WebRTC, not HTTP
Northway is not a chat with a voice skin. It runs the same LiveKit + Pipecat + Deepgram + Claude + ElevenLabs pipeline as Phiny — but tuned for outdoor mobile use. The mobile app joins a LiveKit room with a short-lived JWT minted by our NestJS backend, and from that point on the audio path is phone ↔ LiveKit edge ↔ voice-agent — the API server sits out of the hot path entirely. WebRTC's adaptive bitrate, packet retransmission, and jitter buffer handle the field-network reality far better than a naive HTTP-streaming approach would.
Geospatial intelligence as a first-class context source
Every course in our 40,000+ catalog is stored as a structured document with hole-by-hole geometry — tee boxes, fairway centerlines, hazards (water, bunkers, OB), green outlines and pin positions. Storage is MongoDB with geospatial indexes so spatial queries ("what's within 30 yards of my GPS position?") run in milliseconds. The phone's GPS plus the player's self-reported starting hole feeds a state machine that always knows what hole the player is on, how far they are from each green feature, and what the relevant hazards in front of them are. That state is passed into the Claude prompt as compact structured context — never as a 5000-token map description.
Player-history loop
Every recorded shot — clubbed selection, distance hit, dispersion direction — is captured as a voice log that's transcribed by Deepgram, structured by a small Claude pass into a typed event, and stored against the user's play history. By hole 8 of a round, the agent knows the player's 7-iron is going 5 yards short of expected today, and the recommendation engine adjusts. By round 10 in the platform, the player's personal yardage curves replace defaults. This single feature is what turns Northway from "voice yardage book" into "caddie that knows you."
Andy North as a voice model
ElevenLabs voice cloning trained on Andy's clinic recordings, with prompt engineering to keep the output in his cadence — short declarative calls, no filler phrases, no over-explanation. The Claude system prompt explicitly steers toward "the way a major winner talks to an amateur during a round" — which is not the way a chat-bot wants to talk by default. Constant-tuning this voice was the highest product-touch part of the build.
Media pipeline for shot capture
Shot videos and round recaps run through an AWS Lambda + MediaConvert pipeline that compresses, watermarks, and pushes to CloudFront for low-latency edge delivery. Players get a per-round highlight reel without the app ever blocking on it.
Etiquette gates
The agent stays silent during a player's address. The trigger is a combination of wake-word detection (server-side, with a tight VAD threshold so wind doesn't false-trigger), a manual push-to-ask button, and a geospatial heuristic — when a player walks back ~10 yards from their last recorded position, the agent assumes the previous shot is over and re-arms.
What I built
- The NestJS backend integration — LiveKit token issuance, course catalog API, geospatial query layer, player-history ingestion, the round state machine, the wake-word + push-to-ask control plane.
- The media pipeline — AWS Lambda + MediaConvert + CloudFront for shot videos and round recaps; S3 for raw uploads; SES for round summary emails.
- The geospatial course catalog — schema, ingestion pipeline, MongoDB geospatial indexes, the "what's near me" query primitives the voice agent calls every few seconds while the player is moving.
- The voice agent integration — Pipecat assembly with Deepgram + Claude + ElevenLabs, the structured-output recommendation prompt, the prompt-cached system message that holds Andy North's voice + golf vocabulary + etiquette rules.
- The player-history loop — voice → Deepgram → Claude structured event → MongoDB user profile → fed back into next-shot context. This is the part that makes the caddie personal.
Trade-offs
- WebRTC vs. websocket-streamed audio. Websockets are simpler to wire and debug. WebRTC's built-in adaptive bitrate, jitter buffer, and packet recovery were non-negotiable for on-course mobile networks. We took the operational complexity and got reliability we couldn't have built ourselves.
- Cloud Claude vs. on-device LLM. An on-device LLM would survive offline holes but can't reach the production-grade reasoning Andy's voice demands. We made the cloud round-trip mandatory and instead invested in prompt caching, intent compression, and TTS chunking to keep latency inside the conversational budget.
- Custom course geometry vs. third-party feed. Several feeds exist for course geometry, but their licensing models conflict with consumer-facing AI products and their data quality on smaller/regional courses is uneven. We built our own catalog and ingestion pipeline. Higher up-front cost; better economics and product control downstream.
- One celebrity voice vs. a roster. Andy North is the marquee voice. We architected the voice layer so additional caddies (each as their own ElevenLabs voice + Claude system prompt variant) can drop in without changing any of the round / geospatial / history layers — the voice is a configurable layer, not the product.
Outcome
Northway Tech is live at northway-tech.com, with the Andy North voice caddie running over the production LiveKit + Pipecat + Claude + ElevenLabs stack. The geospatial layer indexes 40,000+ courses; the player-history loop re-personalises the recommendation engine round over round; the media pipeline produces shareable round recaps without blocking the app.
The deeper outcome is that Northway proved the voice-AI architecture from Phiny generalises far beyond career prep. Same skeleton — typed contracts between a small number of single-purpose processes, prompt-cached LLMs at the edges, an orchestrator that stays out of the audio hot path — applied to a completely different domain (golf, geospatial, celebrity voice) without architectural changes. That portability is what made the Voice AI Latency paper possible: you can't generalise findings from a single product, but two production deployments running the same skeleton is a real signal.
Stack
Want help shipping something like this? Book a call, or grab the snippets this case study draws from.