Keeping your API fast

Offload LLM calls, data processing, and other heavy work from the request path into reliable background functions.

For web applications, best practices recommend completing a user's task between 100ms and 1000ms. Beyond that, the user is likely to abandon your app or feel that it's sluggish. In e-commerce, this tolerance is even lower.

The first thing you should do is minimize the work performed in each request. Your API endpoint should only perform what is needed to complete the critical path: return a response to the user. Everything else, whether it's sending a notification, calling an LLM, syncing data to a third-party service, or processing an uploaded file, should run asynchronously in the background.

This is especially important for AI-powered features. A single LLM call can take 5-30 seconds. Chaining multiple calls for summarization, classification, or tool use can take minutes. None of that belongs in a request handler. Send an event, return a 200, and let a background function handle the heavy lifting with built-in retries and observability.

§What this requires

A request handler that finishes in under a second has to push every non-essential piece of work elsewhere:

A transport to hand the work off: a queue, an event bus, or an HTTP call to another service.
A worker that picks it up (long-running process, container, or serverless function) and a way to deploy and observe it.
Retry semantics when the worker fails. At-least-once delivery, dead-letter routing, and a way to inspect what's stuck.
Idempotency so that retries don't double-charge, double-send, or double-create.

In a hand-rolled system, each of these is a separate concern with separate operational tooling. Most teams build it incrementally and end up with a system that no single person on the team fully understands.

§With Inngest

Inngest is serverless, so there are no queues to set up or configure. You send events and any number of functions can be defined to automatically run when that event is received:

typescript

01import { Inngest } from "inngest";
02
03app.post("/api/documents", async (req, res) => {
04  // Critical path: save the document and return immediately
05  const doc = await createDocument(req.body);
06
07  // Send an event for background processing
08  const inngest = new Inngest({ id: "my-app" });
09  await inngest.send({
10    name: "api/document.uploaded",
11    data: { documentId: doc.id, userId: req.user.id },
12  });
13
14  res.json({
15    data: { documentId: doc.id },
16    message: "Document uploaded successfully!"
17  });
18});

Functions are defined by declaring which event(s) should trigger them. When matching events are received, all corresponding functions run in the background automatically. Any returned value is logged and any error thrown will inform Inngest to retry the function.

typescript

01import { inngest } from "./client";
02
03export const processDocument = inngest.createFunction(
04  { id: "process-document", triggers: [{ event: "api/document.uploaded" }] },
05  async ({ event, step }) => {
06    const doc = await step.run("fetch-document", async () => {
07      return await getDocument(event.data.documentId);
08    });
09
10    // LLM calls can take 10-30 seconds each. Running them as steps
11    // means each retries independently if it fails.
12    const summary = await step.run("summarize-with-llm", async () => {
13      return await llm.summarize(doc.content);
14    });
15
16    await step.run("save-summary", async () => {
17      await updateDocument(doc.id, { summary });
18    });
19
20    return `Processed document ${doc.id} for user ${event.data.userId}`;
21  }
22);

Functions can be deployed in several ways that fit your current stack or needs.

§Alternative approaches

Redis + BullMQ / Celery / Sidekiq. Mature, well-understood. You own the Redis instance, the worker fleet, and the observability layer that ties them together.
AWS SQS + Lambda. Serverless workers solve the hosting problem but introduce SQS-specific behaviors (visibility timeouts, batching, redrive policies) you now have to learn and operate.
Inline in the request. No queue, no worker. Fine for sub-100ms tasks. Becomes a problem the first time someone adds an LLM call.

§Additional Resources

← PreviousBuild reliable webhooks