Zero-infrastructure LLM & AI

Build LLM and AI chains reliably in minutes — no memory, state, or infrastructure needed. Locally test then deploy to any platform using normal code.

Automatic Memory & Context

Functions automatically maintain state, allowing you to reference the output of any API call in normal code without using databases or caching.

Fully Serverless

Deploy to any provider, on any platform. Inngest ensures that each step is called once, and spreads each step over multiple function invocations while maintaining state.

Reliable by Default

Inngest automatically retries steps within functions on error. Never worry about issues with your provider's availability or API downtime again.

Build reliable AI products in a few lines of code

Chained LLMs

1 Define an event to trigger your chain function

2 Use for reliable API calls

3 Return state from each step

4 Use state in following steps in your chain

Automatic retries and persisted state across all steps in your chain.

import { inngest } from "./client";

  { id: "summarize-chat-and-documents" },
  { event: "api/chat.submitted" },
  async ({ event, step }) => {
    const llm = new OpenAI();

    const output = await"summarize-input", async () => {
      return await llm.createCompletion({
        model: "gpt-3.5-turbo",
        prompt: createSummaryPrompt(,

    const title = await"generate-a-title", async () => {
      return await llm.createCompletion({
        model: "gpt-3.5-turbo",
        prompt: createTitlePrompt(output),

    await"save-to-db", async () => {
      await db.summaries.create({

    return { output, title };

Advanced features, for production-ready systems


Cancel long running functions automatically or via an API call, keeping your resources free.


Set custom concurrency limits on functions or specific API calls, and only run when there's capacity.

Per-User Rate-Limiting

Set hard rate limits on functions using custom keys like user IDs, ensuring that you use your model tokens or GPU efficiently.

