Streaming

Send stream: true on a message and the reply comes back as Server-Sent Events. Each event is named, so you can render text, reasoning, and tool activity live. Events arrive in order, and the terminal response.completed event carries the final output_text and usage. The base URL is your instance URL: https://{instanceId}.agent37.app, with the same sk_live_ key sent as the X-Agent37-Key header on every request. This page documents the gateway’s streaming contract, the API every instance serves.

Start a stream

curl

curl -N https://ab12cd34ef.agent37.app/v1/responses \
  -H "X-Agent37-Key: sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Research the top 3 EV makers, write a memo.",
    "stream": true
  }'

The connection stays open and frames arrive as event: plus data: pairs separated by a blank line:

stream

event: response.created
data: {"id":"c91d2a7e84f04b6f9a3d5e1c0b87f4a2","session_id":"7f3e0b6c52a949d2b1c4a8e9d0f31726"}

event: response.reasoning.delta
data: {"text":"Comparing deliveries and margins across the big three..."}

event: response.tool_call.started
data: {"tool":"web_search","label":"EV deliveries 2025"}

event: response.tool_call.completed
data: {"tool":"web_search","duration_ms":1840}

event: response.output_text.delta
data: {"text":"## EV market memo\n\n"}

:keepalive

event: response.completed
data: {"output_text":"## EV market memo\n\n...","usage":{"input_tokens":1840,"output_tokens":920,"cost_usd":0.0137}}

Events

There are exactly eight event types:

Event	Payload
`response.created`	`{ id, session_id }`, always first; the response id and the session it runs in
`response.reasoning.delta`	`{ text }`, a chunk of the agent’s thinking
`response.output_text.delta`	`{ text }`, a chunk of the visible answer
`response.tool_call.started`	`{ tool, label? }`
`response.tool_call.completed`	`{ tool, duration_ms? }`
`response.tool_call.failed`	`{ tool, error? }`, the run continues
`response.completed`	`{ output_text, usage }`, terminal; `usage` can be `null`, and `cost_usd` inside it is `null` when the provider reports no cost
`response.failed`	`{ error: { code, message, param?, hint? } }`, terminal

These event names and payloads are the gateway’s streaming contract, not a per-agent detail. Hermes and OpenClaw emit them today, and the agents that follow will emit the same eight, so your client code does not change when you switch templates.

Rules the stream always follows:

response.created is always first, and exactly one terminal event (response.completed or response.failed) ends every live stream.
Every 30 seconds the gateway writes the comment line :keepalive, whether or not events are flowing. Comments are not events: ignore any line starting with :.
There is no [DONE] sentinel. The server closes the connection right after the terminal event; terminal event plus close is end of stream.
Once streaming starts, failures arrive as a response.failed event with the standard error body, never as an HTTP error status.

A cancelled turn (POST /v1/responses/{id}/cancel) still ends with response.completed, carrying whatever output_text accumulated before the cancel. Only failures emit response.failed. The stored response’s status is cancelled.

Reconnect after a drop

GET /v1/responses/{id}/stream

If your connection drops mid-turn, reconnect with the response id from response.created. Lost the id too (page reload, new device)? GET /v1/sessions/{id} returns the running response as active_response_id — null means no turn is in flight (see that field’s timing notes on the Sessions page).

curl

curl -N https://ab12cd34ef.agent37.app/v1/responses/c91d2a7e84f04b6f9a3d5e1c0b87f4a2/stream \
  -H "X-Agent37-Key: sk_live_..."

While the run is live, the gateway replays the entire ordered event buffer from response.created onward, then stays attached for the rest of the run. If the run just finished, it replays the buffer and ends. The buffer holds up to 100,000 events per run; the rare run that exceeds it stops buffering, so a reconnect replays the first 100,000 events and may end without the terminal event. When that happens, wait for active_response_id on GET /v1/sessions/{id} to read null, then recover the final answer from the same call’s history.

Reconnect and the answer is still there. Reconnect within about 30 minutes of a turn finishing and /stream still replays the final output_text. One caveat: about 60 seconds after a turn finishes, the in-memory event buffer expires and the replay is synthesized from the retained in-memory response record as response.created, one response.output_text.delta carrying the full text (omitted when the turn produced none), then the terminal event — reasoning and tool-call events from the original run are not preserved in that synthesized replay. After the record expires (about 30 minutes, or on a gateway restart) /stream returns 404 response_not_found; recover the answer from the session transcript instead, which holds every finished turn. (The transcript holds finished turns only — the harness writes a turn’s messages at turn end, so a still-running turn is never in it. To find a running turn, read active_response_id from GET /v1/sessions/{id}.)

Parse the stream

No SSE library needed. Read the response body, split on the blank-line frame boundary, skip comment lines, and branch on each frame’s event: line. Stop when the connection closes after a terminal event.

node

const res = await fetch("https://ab12cd34ef.agent37.app/v1/responses", {
  method: "POST",
  headers: {
    "X-Agent37-Key": "sk_live_...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: "Research the top 3 EV makers, write a memo.",
    stream: true,
  }),
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  // SSE frames are separated by a blank line
  const frames = buffer.split("\n\n");
  buffer = frames.pop(); // keep the trailing partial frame

  for (const frame of frames) {
    if (frame.startsWith(":")) continue; // comment line, e.g. :keepalive

    const event = frame.match(/^event: (.+)$/m)?.[1];
    const data = JSON.parse(frame.match(/^data: (.+)$/m)?.[1] ?? "{}");

    switch (event) {
      case "response.output_text.delta":
        process.stdout.write(data.text); // stream the answer
        break;
      case "response.reasoning.delta":
        // show the agent thinking, if you want
        break;
      case "response.tool_call.started":
        console.log(`\n[${data.tool}] ${data.label ?? ""}`);
        break;
      case "response.completed":
        console.log("\nusage:", data.usage);
        break;
      case "response.failed":
        console.error("\nerror:", data.error);
        break;
    }
  }
}

The browser’s built-in EventSource cannot send a POST body or custom headers like X-Agent37-Key, so it cannot start a stream here. Use fetch as above, in the browser and in Node. The hermes-chat example runs a browser version of this parser in a real chat UI (public/chat.js), with reconnect and cancel wired in.

Prefer not to stream? Send stream: false (the default) and the call returns the finished response as one JSON body, with the agent’s reply in output_text.

Get started

Build with Agent37

Hosting API

Agent API

Reference

Start a stream

Events

Reconnect after a drop

Parse the stream

​Start a stream

​Events

​Reconnect after a drop

​Parse the stream

Start a stream

Events

Reconnect after a drop

Parse the stream