Step experiments
Step experiments let you test more than one version of function logic in production. You define named variants inside group.experiment(), choose a selection strategy, and Inngest executes one variant for each run.
They are useful when you want to:
- Roll out a risky rewrite to a small slice of traffic.
- Compare models, prompts, providers, or workflow strategies.
- Keep users or accounts on a consistent experience while you evaluate a change.
- Tune operational settings, such as batch size, concurrency, or retry behavior.
Experiments are part of the TypeScript SDK v4. You do not need a separate package or feature-flag service to start using them.
Basic example
Import the experiment helper from inngest, then call group.experiment() from inside a function handler. The group argument is injected alongside event and step.
import { experiment } from "inngest";
import { inngest } from "./client";
export default inngest.createFunction(
{
id: "summarize-document",
triggers: { event: "document/uploaded" },
},
async ({ event, step, group }) => {
const doc = await step.run("fetch-document", () =>
fetchDocument(event.data.documentId)
);
const summary = await group.experiment("model-comparison", {
variants: {
gpt4o: () =>
step.run("summarize-gpt4o", () =>
callOpenAI({
model: "gpt-4o",
prompt: `Summarize: ${doc.text}`,
})
),
claude: () =>
step.run("summarize-claude", () =>
callAnthropic({
model: "claude-sonnet-4-20250514",
prompt: `Summarize: ${doc.text}`,
})
),
},
select: experiment.weighted({ gpt4o: 50, claude: 50 }),
});
return summary;
}
);
Only the selected variant runs. The other variant callbacks are skipped.
How selection works
The experiment selection is itself a durable, memoized step. When a function run reaches group.experiment() for the first time, Inngest evaluates the select strategy and stores the selected variant. If the run retries or replays later, it uses the same selected variant again.
Variant callbacks can contain normal step tools, including step.run(), step.invoke(), step.waitForEvent(), and step.sendEvent(). Each step inside the selected variant is retried and memoized like any other step.
Every variant callback must call at least one step.* tool. Code that runs outside of steps is not durable and can re-execute on replay.
Selection strategies
Choose the strategy that matches the behavior you need.
Weighted
Use experiment.weighted() for run-level splits where each new run can receive a fresh assignment.
select: experiment.weighted({ control: 90, candidate: 10 })
Weights are relative. { control: 9, candidate: 1 } and { control: 90, candidate: 10 } produce the same split.
Weighted selection is seeded by the run, so the same run keeps its selected variant on retries. Changing weights in a later deploy only affects new selections.
Bucket
Use experiment.bucket() when the same user, account, or tenant should usually get the same variant across runs.
select: experiment.bucket(event.data.userId, {
weights: { control: 80, candidate: 20 },
})
The stable value you pass, such as event.data.userId, is hashed into a variant. This prevents a user from seeing a different experience every time they trigger the function.
Changing bucket weights can change future assignments for a key. If you need strict migration stickiness, store the assignment in your own database and use experiment.custom().
Custom
Use experiment.custom() when the selection should come from your own system, such as a feature flag, rollout table, entitlement, or database-backed migration assignment.
select: experiment.custom(async () => {
const assignment = await rolloutTable.get(event.data.accountId);
return assignment ?? "control";
})
The custom selector must return one of the variant names. The returned value is memoized for the run.
Fixed
Use experiment.fixed() when you want one variant every time. This is useful for manual overrides, testing a single path, or pinning one variant while you remove the experiment.
select: experiment.fixed("candidate")
Return the selected variant
By default, group.experiment() returns the selected variant's result. Set withVariant: true when you also need the selected variant name for logging, analytics, scoring, or downstream decisions.
const outcome = await group.experiment("copy-style", {
variants: {
short: () => step.run("short-copy", () => generateShortCopy(event.data)),
detailed: () =>
step.run("detailed-copy", () => generateDetailedCopy(event.data)),
},
select: experiment.bucket(event.data.userId, {
weights: { short: 50, detailed: 50 },
}),
withVariant: true,
});
await step.run("track-experiment", () =>
analytics.track("experiment.variant_selected", {
experiment: "copy-style",
variant: outcome.variant,
userId: event.data.userId,
})
);
return outcome.result;
With withVariant: true, the returned shape is { result, variant }.
Track outcomes
An experiment tells you which variant ran. It does not decide which variant is best for you. Attach your own outcome signal:
- Use scoring when the score is known during the run.
- Use deferred scoring when the outcome arrives later.
- Emit analytics events from a
step.run()when your team already uses an external analytics warehouse.
For AI model or prompt experiments, withVariant: true is often the simplest way to include the selected variant in your scoring or analytics payload.
Notes and best practices
- Give every experiment in a function a unique ID.
- Keep variant names stable because they show up in traces and analytics.
- Use
weighted()when each run can receive a fresh assignment. - Use
bucket()when a stable user or account experience matters. - Use
custom()when assignment must be controlled outside code or changed without a deploy. - Keep selectors simple. Put the behavior you are comparing inside the variant callbacks.
Troubleshooting
| Issue | Solution |
|---|---|
| Setup | |
| Experiments and scoring are not available. | Your SDK version must be at least 4.8.0. |
| Variant selection | |
| The same user sees different variants across runs. | experiment.weighted() is seeded by run ID, not user ID. Use experiment.bucket(userId, { weights }) for sticky user-level assignment. |
select() returns a variant that does not exist. | Make sure experiment.fixed(), experiment.custom(), or custom selector output exactly matches a key in variants. Typos fail the run. |
experiment.bucket() gives surprising skew. | Confirm the bucket value is not null or undefined. Missing values hash as an empty string, so they all collapse into the same deterministic bucket. |
| Scoring | |
step.score() is missing or throws. | Register scoreMiddleware() on the Inngest client: middleware: [scoreMiddleware()]. |
| Scores run, but do not attach to the experiment. | For in-callback scoring, call step.score() inside the selected variant callback. For outside-callback scoring, use inngest.score.experiment() with the returned experimentRef. |
| Score values are rejected. | Score value must be a finite number or boolean. Do not pass strings, objects, NaN, or Infinity. |
| Boolean scores look like numbers in aggregate views. | Boolean scores are valid and aggregate numerically. Treat true as 1 and false as 0 when interpreting averages. |
| Reading results | |
| A 50/50 split looks uneven. | Small samples are noisy. Fire more events before assuming selection is wrong. |
| Two functions use the same experiment name and the dashboard feels confusing. | This is allowed, and the backend scopes detail views by function. Still, prefer distinct experiment names when humans will compare them in the list or docs. |