Post-completion side effects for AI agents

Score model output, log token costs, and notify your team after an agent run finishes without blocking the response or losing context.

AI agent functions do expensive, latency-sensitive work: calling models, executing tool loops, generating responses. After the agent finishes, you need to do secondary things that the user should never wait for: score the output quality, log token costs to track spend per customer, send a summary to Slack, update a CRM record.

If you add those as steps at the end of the function, a failure in your analytics pipeline retries the entire agent run, including the model calls you already paid for. If you send events to separate functions, you lose the typed connection to the data the agent just produced.

Deferred functions solve this. Register side effects inline with typed payloads. The agent returns immediately. Each side effect runs as its own function after the parent finalizes, with independent retries and no impact on the parent's success status.

§How this works

Define deferred functions for each post-completion task. Each receives a typed payload containing the AI-specific data it needs.

typescript

01import { createDefer } from "inngest/experimental";
02import { z } from "zod";
03
04const scoreOutput = createDefer(inngest, {
05  id: "score-agent-output",
06  schema: z.object({
07    response: z.string(),
08    model: z.string(),
09    ticketId: z.string(),
10  }),
11}, async ({ event, step }) => {
12  // Use a second model as a judge
13  const evaluation = await step.run("llm-as-judge", async () => {
14    return await openai.chat.completions.create({
15      model: "gpt-4o-mini",
16      messages: [
17        {
18          role: "system",
19          content: "Rate this support response 1-5 for helpfulness, accuracy, and tone. Return JSON: { helpfulness: number, accuracy: number, tone: number }",
20        },
21        { role: "user", content: event.data.response },
22      ],
23    });
24  });
25
26  await step.run("persist-scores", async () => {
27    const scores = JSON.parse(evaluation.choices[0].message.content);
28    await inngest.score({ name: "helpfulness", value: scores.helpfulness / 5, runId: event.data.ticketId });
29    await inngest.score({ name: "accuracy", value: scores.accuracy / 5, runId: event.data.ticketId });
30  });
31});
32
33const trackCosts = createDefer(inngest, {
34  id: "track-ai-costs",
35  schema: z.object({
36    model: z.string(),
37    promptTokens: z.number(),
38    completionTokens: z.number(),
39    customerId: z.string(),
40  }),
41}, async ({ event, step }) => {
42  await step.run("log-usage", async () => {
43    const costPer1k = event.data.model === "gpt-4o" ? 0.005 : 0.00015;
44    const totalTokens = event.data.promptTokens + event.data.completionTokens;
45    const cost = (totalTokens / 1000) * costPer1k;
46
47    await analytics.track("ai.cost.incurred", {
48      model: event.data.model,
49      tokens: totalTokens,
50      cost_usd: cost,
51      customer_id: event.data.customerId,
52    });
53  });
54});
55
56const notifyTeam = createDefer(inngest, {
57  id: "notify-agent-completion",
58  schema: z.object({
59    channel: z.string(),
60    summary: z.string(),
61    ticketId: z.string(),
62    model: z.string(),
63  }),
64}, async ({ event, step }) => {
65  await step.run("post-to-slack", async () => {
66    await slack.chat.postMessage({
67      channel: event.data.channel,
68      text: `Agent resolved ticket ${event.data.ticketId} using ${event.data.model}:\n>${event.data.summary}`,
69    });
70  });
71});

typescript

01serve({
02  client: inngest,
03  functions: [handleTicket, scoreOutput, trackCosts, notifyTeam],
04});

In the agent function, call defer() after the work is done. The agent returns the response to the user. Scoring, cost tracking, and notifications happen in the background.

typescript

01const handleTicket = inngest.createFunction(
02  { id: "handle-support-ticket", triggers: { event: "support/ticket.created" } },
03  async ({ event, step, defer }) => {
04    const response = await step.run("generate-response", async () => {
05      return await openai.chat.completions.create({
06        model: "gpt-4o",
07        messages: [
08          { role: "system", content: "You are a support agent. Be concise and helpful." },
09          { role: "user", content: event.data.content },
10        ],
11      });
12    });
13
14    const reply = response.choices[0].message.content;
15
16    await step.run("send-reply", async () => {
17      await supportPlatform.reply(event.data.ticketId, reply);
18    });
19
20    // Score the response with LLM-as-judge. Runs after the parent finishes.
21    defer("score-quality", {
22      function: scoreOutput,
23      data: {
24        response: reply,
25        model: "gpt-4o",
26        ticketId: event.data.ticketId,
27      },
28    });
29
30    // Track token costs per customer.
31    defer("track-spend", {
32      function: trackCosts,
33      data: {
34        model: "gpt-4o",
35        promptTokens: response.usage.prompt_tokens,
36        completionTokens: response.usage.completion_tokens,
37        customerId: event.data.customerId,
38      },
39    });
40
41    // Notify the team.
42    defer("notify-slack", {
43      function: notifyTeam,
44      data: {
45        channel: "#support-resolved",
46        summary: reply.slice(0, 200),
47        ticketId: event.data.ticketId,
48        model: "gpt-4o",
49      },
50    });
51
52    return { ticketId: event.data.ticketId, status: "resolved" };
53  }
54);

The agent responds to the customer in under a second. Three deferred functions fire after the parent finalizes: one scores the output with a cheaper model, one logs token costs per customer, one posts to Slack. Each has its own retries. If Slack is down, scoring and cost tracking still succeed. If the scoring model is slow, the customer already has their answer.

§Why this matters for AI

AI functions are uniquely expensive to retry. A failed analytics step at the end of a function that made three GPT-4o calls means re-running those calls on retry. With deferred functions, the agent run succeeds or fails based on the agent work alone. The scoring, logging, and notifications run on their own lifecycle.

This also opens up patterns like LLM-as-judge scoring where you use a second model to evaluate the first. That evaluation can take seconds or minutes, and it should never hold the user's response hostage.

§Alternative approaches

Add more steps at the end. A failure in analytics retries the entire function, including the model calls. Expensive and wasteful.
Send events to separate functions. Works, but you lose the typed schema and the parent/child linking in traces. You serialize data into event payloads manually.
External eval platforms (Braintrust, LangSmith, Arize). Scoring and cost tracking live in a separate system. Tightening the feedback loop between your agent code and its evaluation requires maintaining two platforms.
Fire-and-forget HTTP calls. No retries, no observability, no connection to the run that produced the data.

§Additional resources

← PreviousDeferred cleanup and rollbacks