Run experiments in AI pipelines
When you're building AI features with durable functions, you often need to compare approaches: which model produces better results, whether a refined prompt outperforms a simpler one, or how a RAG pipeline stacks up against a single-shot call. group.experiment() lets you run these comparisons inside your function, with durable memoization so the same variant is always selected on retries and replays.
import { experiment } from "inngest";
Compare AI models
The simplest case: you want to test two models against each other. Define each model call as a variant and use weighted selection to control the traffic split.
import { experiment } from "inngest";
export default inngest.createFunction(
{
id: "summarize-document",
triggers: { event: "document/uploaded" },
},
async ({ event, step, group }) => {
const doc = await step.run("fetch-document", () =>
fetchDocument(event.data.documentId)
);
const summary = await group.experiment("model-comparison", {
variants: {
gpt4o: () =>
step.run("summarize-gpt4o", () =>
callOpenAI({ model: "gpt-4o", prompt: `Summarize: ${doc.text}` })
),
claude: () =>
step.run("summarize-claude", () =>
callAnthropic({ model: "claude-sonnet-4-20250514", prompt: `Summarize: ${doc.text}` })
),
},
select: experiment.weighted({ gpt4o: 50, claude: 50 }),
});
return summary;
}
);
The variant selection is wrapped in a memoized step. If the function retries or replays, the same model is used every time.
Bucket users to a consistent model
When users interact with your AI features repeatedly, you usually want them to get consistent behavior. Use experiment.bucket() with the user ID so the same user always hits the same variant.
const response = await group.experiment("assistant-model", {
variants: {
current: () =>
step.run("current-model", () =>
generateResponse({ model: "gpt-4o", messages: conversation })
),
candidate: () =>
step.run("candidate-model", () =>
generateResponse({ model: "gpt-4o-mini", messages: conversation })
),
},
select: experiment.bucket(event.data.userId, {
weights: { current: 90, candidate: 10 },
}),
});
The same user ID always maps to the same variant, even across different function runs. This prevents users from experiencing inconsistent quality between requests.
Test prompt strategies with multi-step variants
Variant callbacks can contain multiple sequential steps. Each step is individually retried and memoized. This is useful when one approach involves more work than another, like comparing a single-shot prompt against a retrieval-augmented pipeline.
const answer = await group.experiment("prompt-strategy", {
variants: {
single_shot: () =>
step.run("single-shot", () =>
callLLM({ prompt: `Answer this question: ${question}` })
),
rag_pipeline: async () => {
const chunks = await step.run("retrieve-context", () =>
searchVectorStore(question, { topK: 5 })
);
const context = chunks.map((c) => c.text).join("\n\n");
return await step.run("generate-with-context", () =>
callLLM({
prompt: `Using this context:\n${context}\n\nAnswer: ${question}`,
})
);
},
},
select: experiment.weighted({ single_shot: 70, rag_pipeline: 30 }),
});
Get the selected variant name
Set withVariant: true to receive both the result and which variant was selected. This is useful for logging, analytics, or downstream decisions.
const outcome = await group.experiment("tone-test", {
variants: {
concise: () =>
step.run("concise-prompt", () =>
callLLM({ prompt: "Be brief. " + userQuery })
),
detailed: () =>
step.run("detailed-prompt", () =>
callLLM({ prompt: "Be thorough and explain your reasoning. " + userQuery })
),
},
select: experiment.weighted({ concise: 50, detailed: 50 }),
withVariant: true,
});
await step.run("log-experiment", () =>
trackExperiment({
experiment: "tone-test",
variant: outcome.variant,
responseLength: outcome.result.length,
})
);
Run multiple experiments in one pipeline
You can run independent experiments in a single function. Use experiment.bucket() with a composite key so each experiment assigns variants independently.
export default inngest.createFunction(
{
id: "ai-document-pipeline",
triggers: { event: "document/process" },
},
async ({ event, step, group }) => {
const userId = event.data.userId;
const extraction = await group.experiment("extraction-model", {
variants: {
structured: () =>
step.run("structured-extract", () =>
extractWithSchema(event.data.documentUrl)
),
freeform: () =>
step.run("freeform-extract", () =>
extractFreeform(event.data.documentUrl)
),
},
select: experiment.bucket(`${userId}:extraction`),
withVariant: true,
});
const summary = await group.experiment("summary-approach", {
variants: {
map_reduce: async () => {
const chunks = await step.run("chunk-document", () =>
chunkText(extraction.result)
);
const partials = await step.run("summarize-chunks", () =>
Promise.all(chunks.map((c) => summarize(c)))
);
return await step.run("combine-summaries", () =>
combineSummaries(partials)
);
},
single_pass: () =>
step.run("single-pass-summary", () =>
summarize(extraction.result)
),
},
select: experiment.bucket(`${userId}:summary`),
withVariant: true,
});
return { extraction, summary };
}
);
By appending a feature-specific suffix to the bucket key (userId:extraction vs userId:summary), the same user can be independently assigned to different variants in each experiment.
Every variant callback must invoke at least one step.run() call. The SDK throws a NonRetriableError if a variant completes without calling any step tools.
For the full API surface, parameters, and selection strategy details, see the group.experiment() reference.