RAG with Algolia + Claude rerank
Hybrid retrieval (BM25 + vector) plus Claude as a relevance judge — the stack I use in production.
Pure vector search has a recall problem on real corpora — exact matches and rare tokens slip through. Pure keyword has a synonym problem. Production RAG that I trust = Algolia (BM25 + dense) for top-K, then Claude as a reranker. Below is the whole pipeline.
Retrieve top-K
import { algoliasearch } from "algoliasearch";
const client = algoliasearch(
process.env.ALGOLIA_APP_ID!,
process.env.ALGOLIA_API_KEY!
);
export async function retrieve(query: string, k = 20) {
const { results } = await client.search({
requests: [
{
indexName: "docs",
query,
hitsPerPage: k,
attributesToRetrieve: ["title", "body", "url"],
},
],
});
return (results[0] as any).hits as Array<{
objectID: string;
title: string;
body: string;
url: string;
}>;
}
Rerank with Claude
import Anthropic from "@anthropic-ai/sdk";
import { z } from "zod";
const client = new Anthropic();
const Ranking = z.object({
ranking: z.array(z.object({ id: z.string(), score: z.number() })),
});
export async function rerank(
query: string,
hits: { objectID: string; title: string; body: string }[]
) {
const corpus = hits
.map((h, i) => `<doc id="${h.objectID}">\n${h.title}\n\n${h.body.slice(0, 800)}\n</doc>`)
.join("\n\n");
const res = await client.messages.create({
model: "claude-haiku-4-5-20251001",
max_tokens: 1024,
system: [
{
type: "text",
text: "Rank docs by relevance to the query. Return JSON: { ranking: [{ id, score }] } where score is 0..1. Output JSON only.",
cache_control: { type: "ephemeral" },
},
],
messages: [
{ role: "user", content: `Query: ${query}\n\n${corpus}` },
],
});
const text = res.content
.filter((b): b is Anthropic.TextBlock => b.type === "text")
.map((b) => b.text)
.join("");
return Ranking.parse(JSON.parse(text)).ranking
.sort((a, b) => b.score - a.score)
.slice(0, 5);
}
Compose
const candidates = await retrieve(userQuery, 20);
const ranked = await rerank(userQuery, candidates);
const context = ranked
.map((r) => candidates.find((c) => c.objectID === r.id)!)
.map((c) => `[${c.title}](${c.url})\n${c.body}`)
.join("\n\n---\n\n");
// Now pass `context` into your answering call as cached system content.
Why hybrid + rerank
- BM25 catches exact tokens (error codes, function names, library names) that embeddings smear.
- Dense catches paraphrase ("how do I sign in" ≈ "login flow").
- Rerank is cheap. Haiku on 20 short docs is ~150 ms. Recall@K jumps because the ranker actually understands the query, not just lexical overlap.
- Truncate aggressively. Send only the first 800 chars per doc to the reranker. The full body goes to the answering call.