Step experiments

Step experiments let you test more than one version of function logic in production. You define named variants inside group.experiment(), choose a selection strategy, and Inngest executes one variant for each run.

They are useful when you want to:

Roll out a risky rewrite to a small slice of traffic.
Compare models, prompts, providers, or workflow strategies.
Keep users or accounts on a consistent experience while you evaluate a change.
Tune operational settings, such as batch size, concurrency, or retry behavior.

Experiments are part of the TypeScript SDK v4. You do not need a separate package or feature-flag service to start using them.

Basic example

Import the experiment helper from inngest, then call group.experiment() from inside a function handler. The group argument is injected alongside event and step.

import { experiment } from "inngest";
import { inngest } from "./client";

export default inngest.createFunction(
  {
    id: "summarize-document",
    triggers: { event: "document/uploaded" },
  },
  async ({ event, step, group }) => {
    const doc = await step.run("fetch-document", () =>
      fetchDocument(event.data.documentId)
    );

    const summary = await group.experiment("model-comparison", {
      variants: {
        gpt4o: () =>
          step.run("summarize-gpt4o", () =>
            callOpenAI({
              model: "gpt-4o",
              prompt: `Summarize: ${doc.text}`,
            })
          ),
        claude: () =>
          step.run("summarize-claude", () =>
            callAnthropic({
              model: "claude-sonnet-4-20250514",
              prompt: `Summarize: ${doc.text}`,
            })
          ),
      },
      select: experiment.weighted({ gpt4o: 50, claude: 50 }),
    });

    return summary;
  }
);

Only the selected variant runs. The other variant callbacks are skipped.

How selection works

The experiment selection is itself a durable, memoized step. When a function run reaches group.experiment() for the first time, Inngest evaluates the select strategy and stores the selected variant. If the run retries or replays later, it uses the same selected variant again.

Variant callbacks can contain normal step tools, including step.run(), step.invoke(), step.waitForEvent(), and step.sendEvent(). Each step inside the selected variant is retried and memoized like any other step.

Every variant callback must call at least one step.* tool. Code that runs outside of steps is not durable and can re-execute on replay.

Selection strategies

Choose the strategy that matches the behavior you need.

Weighted

Use experiment.weighted() for run-level splits where each new run can receive a fresh assignment.

select: experiment.weighted({ control: 90, candidate: 10 })

Weights are relative. { control: 9, candidate: 1 } and { control: 90, candidate: 10 } produce the same split.

Weighted selection is seeded by the run, so the same run keeps its selected variant on retries. Changing weights in a later deploy only affects new selections.

Bucket

Use experiment.bucket() when the same user, account, or tenant should usually get the same variant across runs.

select: experiment.bucket(event.data.userId, {
  weights: { control: 80, candidate: 20 },
})

The stable value you pass, such as event.data.userId, is hashed into a variant. This prevents a user from seeing a different experience every time they trigger the function.

Changing bucket weights can change future assignments for a key. If you need strict migration stickiness, store the assignment in your own database and use experiment.custom().

Custom

Use experiment.custom() when the selection should come from your own system, such as a feature flag, rollout table, entitlement, or database-backed migration assignment.

select: experiment.custom(async () => {
  const assignment = await rolloutTable.get(event.data.accountId);
  return assignment ?? "control";
})

The custom selector must return one of the variant names. The returned value is memoized for the run.

Fixed

Use experiment.fixed() when you want one variant every time. This is useful for manual overrides, testing a single path, or pinning one variant while you remove the experiment.

select: experiment.fixed("candidate")

Return the selected variant

By default, group.experiment() returns the selected variant's result. Set withVariant: true when you also need the selected variant name for logging, analytics, scoring, or downstream decisions.

const outcome = await group.experiment("copy-style", {
  variants: {
    short: () => step.run("short-copy", () => generateShortCopy(event.data)),
    detailed: () =>
      step.run("detailed-copy", () => generateDetailedCopy(event.data)),
  },
  select: experiment.bucket(event.data.userId, {
    weights: { short: 50, detailed: 50 },
  }),
  withVariant: true,
});

await step.run("track-experiment", () =>
  analytics.track("experiment.variant_selected", {
    experiment: "copy-style",
    variant: outcome.variant,
    userId: event.data.userId,
  })
);

return outcome.result;

With withVariant: true, the returned shape is { result, variant }.

Track outcomes

An experiment tells you which variant ran. It does not decide which variant is best for you. Attach your own outcome signal:

Use scoring when the score is known during the run.
Use deferred scoring when the outcome arrives later.
Emit analytics events from a step.run() when your team already uses an external analytics warehouse.

For AI model or prompt experiments, withVariant: true is often the simplest way to include the selected variant in your scoring or analytics payload.

Notes and best practices

Give every experiment in a function a unique ID.
Keep variant names stable because they show up in traces and analytics.
Use weighted() when each run can receive a fresh assignment.
Use bucket() when a stable user or account experience matters.
Use custom() when assignment must be controlled outside code or changed without a deploy.
Keep selectors simple. Put the behavior you are comparing inside the variant callbacks.

Troubleshooting

Issue	Solution
Setup
Experiments and scoring are not available.	Your SDK version must be at least `4.8.0`.
Variant selection
The same user sees different variants across runs.	`experiment.weighted()` is seeded by run ID, not user ID. Use `experiment.bucket(userId, { weights })` for sticky user-level assignment.
`select()` returns a variant that does not exist.	Make sure `experiment.fixed()`, `experiment.custom()`, or custom selector output exactly matches a key in `variants`. Typos fail the run.
`experiment.bucket()` gives surprising skew.	Confirm the bucket value is not `null` or `undefined`. Missing values hash as an empty string, so they all collapse into the same deterministic bucket.
Scoring
`step.score()` is missing or throws.	Register `scoreMiddleware()` on the Inngest client: `middleware: [scoreMiddleware()]`.
Scores run, but do not attach to the experiment.	For in-callback scoring, call `step.score()` inside the selected variant callback. For outside-callback scoring, use `inngest.score.experiment()` with the returned `experimentRef`.
Score values are rejected.	Score value must be a finite number or boolean. Do not pass strings, objects, `NaN`, or `Infinity`.
Boolean scores look like numbers in aggregate views.	Boolean scores are valid and aggregate numerically. Treat `true` as `1` and `false` as `0` when interpreting averages.
Reading results
A 50/50 split looks uneven.	Small samples are noisy. Fire more events before assuming selection is wrong.
Two functions use the same experiment name and the dashboard feels confusing.	This is allowed, and the backend scopes detail views by function. Still, prefer distinct experiment names when humans will compare them in the list or docs.