# Run experiments in AI pipelines

When you're building AI features with durable functions, you often need to compare approaches: which model produces better results, whether a refined prompt outperforms a simpler one, or how a RAG pipeline stacks up against a single-shot call. `group.experiment()` lets you run these comparisons inside your function, with durable memoization so the same variant is always selected on retries and replays.

```ts
import { experiment } from "inngest";
```

## Compare AI models

The simplest case: you want to test two models against each other. Define each model call as a variant and use `weighted` selection to control the traffic split.

```ts
import { experiment } from "inngest";

export default inngest.createFunction(
  {
    id: "summarize-document",
    triggers: { event: "document/uploaded" },
  },
  async ({ event, step, group }) => {
    const doc = await step.run("fetch-document", () =>
      fetchDocument(event.data.documentId)
    );

    const summary = await group.experiment("model-comparison", {
      variants: {
        gpt4o: () =>
          step.run("summarize-gpt4o", () =>
            callOpenAI({ model: "gpt-4o", prompt: `Summarize: ${doc.text}` })
          ),
        claude: () =>
          step.run("summarize-claude", () =>
            callAnthropic({ model: "claude-sonnet-4-20250514", prompt: `Summarize: ${doc.text}` })
          ),
      },
      select: experiment.weighted({ gpt4o: 50, claude: 50 }),
    });

    return summary;
  }
);
```

The variant selection is wrapped in a memoized step. If the function retries or replays, the same model is used every time.

## Bucket users to a consistent model

When users interact with your AI features repeatedly, you usually want them to get consistent behavior. Use `experiment.bucket()` with the user ID so the same user always hits the same variant.

```ts
const response = await group.experiment("assistant-model", {
  variants: {
    current: () =>
      step.run("current-model", () =>
        generateResponse({ model: "gpt-4o", messages: conversation })
      ),
    candidate: () =>
      step.run("candidate-model", () =>
        generateResponse({ model: "gpt-4o-mini", messages: conversation })
      ),
  },
  select: experiment.bucket(event.data.userId, {
    weights: { current: 90, candidate: 10 },
  }),
});
```

The same user ID always maps to the same variant, even across different function runs. This prevents users from experiencing inconsistent quality between requests.

## Test prompt strategies with multi-step variants

Variant callbacks can contain multiple sequential steps. Each step is individually retried and memoized. This is useful when one approach involves more work than another, like comparing a single-shot prompt against a retrieval-augmented pipeline.

```ts
const answer = await group.experiment("prompt-strategy", {
  variants: {
    single_shot: () =>
      step.run("single-shot", () =>
        callLLM({ prompt: `Answer this question: ${question}` })
      ),
    rag_pipeline: async () => {
      const chunks = await step.run("retrieve-context", () =>
        searchVectorStore(question, { topK: 5 })
      );
      const context = chunks.map((c) => c.text).join("\n\n");
      return await step.run("generate-with-context", () =>
        callLLM({
          prompt: `Using this context:\n${context}\n\nAnswer: ${question}`,
        })
      );
    },
  },
  select: experiment.weighted({ single_shot: 70, rag_pipeline: 30 }),
});
```

## Get the selected variant name

Set `withVariant: true` to receive both the result and which variant was selected. This is useful for logging, analytics, or downstream decisions.

```ts
const outcome = await group.experiment("tone-test", {
  variants: {
    concise: () =>
      step.run("concise-prompt", () =>
        callLLM({ prompt: "Be brief. " + userQuery })
      ),
    detailed: () =>
      step.run("detailed-prompt", () =>
        callLLM({ prompt: "Be thorough and explain your reasoning. " + userQuery })
      ),
  },
  select: experiment.weighted({ concise: 50, detailed: 50 }),
  withVariant: true,
});

await step.run("log-experiment", () =>
  trackExperiment({
    experiment: "tone-test",
    variant: outcome.variant,
    responseLength: outcome.result.length,
  })
);
```

## Run multiple experiments in one pipeline

You can run independent experiments in a single function. Use `experiment.bucket()` with a composite key so each experiment assigns variants independently.

```ts
export default inngest.createFunction(
  {
    id: "ai-document-pipeline",
    triggers: { event: "document/process" },
  },
  async ({ event, step, group }) => {
    const userId = event.data.userId;

    const extraction = await group.experiment("extraction-model", {
      variants: {
        structured: () =>
          step.run("structured-extract", () =>
            extractWithSchema(event.data.documentUrl)
          ),
        freeform: () =>
          step.run("freeform-extract", () =>
            extractFreeform(event.data.documentUrl)
          ),
      },
      select: experiment.bucket(`${userId}:extraction`),
      withVariant: true,
    });

    const summary = await group.experiment("summary-approach", {
      variants: {
        map_reduce: async () => {
          const chunks = await step.run("chunk-document", () =>
            chunkText(extraction.result)
          );
          const partials = await step.run("summarize-chunks", () =>
            Promise.all(chunks.map((c) => summarize(c)))
          );
          return await step.run("combine-summaries", () =>
            combineSummaries(partials)
          );
        },
        single_pass: () =>
          step.run("single-pass-summary", () =>
            summarize(extraction.result)
          ),
      },
      select: experiment.bucket(`${userId}:summary`),
      withVariant: true,
    });

    return { extraction, summary };
  }
);
```

By appending a feature-specific suffix to the bucket key (`userId:extraction` vs `userId:summary`), the same user can be independently assigned to different variants in each experiment.

> **Callout:** Every variant callback must invoke at least one step.run() call. The SDK throws a NonRetriableError if a variant completes without calling any step tools.

For the full API surface, parameters, and selection strategy details, see the [`group.experiment()` reference](/docs-markdown/reference/typescript/v4/functions/group-experiment).