TypeScript SDK v4 is now available! See what's new
AI Evals

Run experiments in production

Use group.experiment() to split traffic, keep cohorts stable, compare variants, and roll changes forward safely.

Production experiments let you compare two versions of logic on real traffic without building a separate A/B testing system. With group.experiment(), you declare the variants inside an Inngest function, choose a selection strategy, and Inngest records which variant ran in the trace.

This pattern is useful when you need to canary a rewrite, compare model or prompt choices, tune a costly operation, or route a stable customer cohort through a migration.

§Which strategy do you need?

GoalStrategyWhy
Roll out a rewrite graduallyexperiment.weighted()Split new runs by relative weights.
Keep a user or account on one experienceexperiment.bucket()Hash a stable key to a variant.
Read a flag or assignment from your own systemexperiment.custom()Delegate selection to your database or flag service.
Pin one variant for a manual overrideexperiment.fixed()Always choose the same variant.

Every strategy returns one variant for the run. The selected variant is memoized, so retries and replays run the same variant again.

§Weighted: canary a rewrite

Use experiment.weighted() when each new run can be assigned independently. Weights are relative, not percentages: { v1: 99, v2: 1 } gives v2 about one out of every hundred new runs, and { v1: 1, v2: 1 } gives an even split.

typescript
01import { experiment } from "inngest";
02import { inngest } from "./client";
03
04export default inngest.createFunction(
05 {
06 id: "generate-invoice",
07 triggers: { event: "billing/invoice.requested" },
08 },
09 async ({ event, step, group }) => {
10 return await group.experiment("invoice-engine", {
11 variants: {
12 current: () =>
13 step.run("generate-current", () =>
14 generateInvoiceV1(event.data)
15 ),
16 rewrite: () =>
17 step.run("generate-rewrite", () =>
18 generateInvoiceV2(event.data)
19 ),
20 },
21 select: experiment.weighted({ current: 99, rewrite: 1 }),
22 });
23 }
24);

To ramp the rewrite, deploy new weights over time: { current: 90, rewrite: 10 }, then { current: 50, rewrite: 50 }, then { current: 0, rewrite: 100 }. Runs that already selected a variant keep that memoized choice. New runs use the new weights.

Use the same strategy to tune configuration, such as batch size or model temperature, when each run can safely receive a fresh assignment.

§Bucket: keep a cohort stable

Use experiment.bucket() when a user, account, or tenant should get the same variant across runs. Pass a stable key, such as event.data.accountId, and optionally pass weights for the eligible variants.

typescript
01const charge = await group.experiment("payments-provider", {
02 variants: {
03 stripe: () =>
04 step.run("charge-stripe", () =>
05 chargeStripe(event.data.order)
06 ),
07 adyen: () =>
08 step.run("charge-adyen", () =>
09 chargeAdyen(event.data.order)
10 ),
11 },
12 select: experiment.bucket(event.data.accountId, {
13 weights: { stripe: 90, adyen: 10 },
14 }),
15});

This is the right shape for an A/B test where users should not bounce between experiences on every request. It can also help with migrations where each account should usually stay with one provider while the rollout is active.

Changing bucket weights can change future assignments for a key. If a migration requires a strict "once moved, never move back" guarantee, store the account assignment in your own database and select it with experiment.custom() instead.

§Custom: use your own assignment logic

Use experiment.custom() when the selection comes from a system outside Inngest: a feature flag, a rollout table, an entitlement, or a per-account migration record. The custom selector is still memoized for the run.

typescript
01const invoice = await group.experiment("invoice-engine", {
02 variants: {
03 current: () =>
04 step.run("generate-current", () =>
05 generateInvoiceV1(event.data)
06 ),
07 rewrite: () =>
08 step.run("generate-rewrite", () =>
09 generateInvoiceV2(event.data)
10 ),
11 },
12 select: experiment.custom(async () => {
13 const assignment = await rolloutAssignments.get(event.data.accountId);
14 return assignment ?? "current";
15 }),
16});

The selector must return one of the variant names. Use this for no-deploy kill switches too: read a flag in the selector and return the safe variant when the flag is off.

§Fixed: force one variant

Use experiment.fixed() when you want one variant every time. It is useful for manual overrides, testing a single code path, or temporarily pinning a winner while you decide whether to remove the experiment.

typescript
01select: experiment.fixed("rewrite")

Keeping the experiment wrapper in place lets you add another challenger later without rebuilding the function shape. When the old variant is no longer useful, remove it and simplify the function.

§Track outcomes

Experiments tell you which variant ran. To decide which variant is better, you still need an outcome signal. For simple technical outcomes, use the run trace, step latency, errors, and logs. For product or AI-quality outcomes, attach your own score or analytics event from inside the selected variant.

typescript
01const outcome = await group.experiment("email-copy", {
02 variants: {
03 short: () => step.run("short-copy", () => generateShortCopy(event.data)),
04 detailed: () =>
05 step.run("detailed-copy", () => generateDetailedCopy(event.data)),
06 },
07 select: experiment.bucket(event.data.userId, {
08 weights: { short: 50, detailed: 50 },
09 }),
10 withVariant: true,
11});
12
13await step.run("track-selected-variant", () =>
14 analytics.track("experiment.variant_selected", {
15 experiment: "email-copy",
16 variant: outcome.variant,
17 userId: event.data.userId,
18 })
19);

If the outcome is available during the run, pair the experiment with scoring. If the outcome arrives later, use a deferred scorer or your analytics system.

§Best practices

  • Give every experiment in a function a unique ID.
  • Keep variant names stable. Variant names appear in traces and in any analytics you emit.
  • Put real work inside step.* calls. Variant callbacks must call at least one step.* tool so the selected work is durable.
  • Use weighted() for independent run-level assignment and bucket() for stable user or account assignment.
  • Use custom() when changing assignment must not require a deploy, or when you need stronger migration stickiness than bucket weights provide.
  • Keep the selection strategy small. Put the behavior you are comparing in the variants, not in the selector.