kernl
Core

Realtime

Voice-enabled agents with real-time audio streaming. Browser and Node.js patterns.

Reference: RealtimeAgent, RealtimeSession

Build voice-enabled AI agents with real-time audio streaming.

Architecture patterns

There are two main approaches depending on where your audio originates:

Browser (direct connection)

For web apps, the browser connects directly to the realtime API using a short-lived ephemeral credential from your server. Audio never touches your server—lower latency and simpler infrastructure.

┌─────────────┐    credential    ┌─────────────┐
│   Browser   │◀────────────────│   Server    │
└─────────────┘                  └─────────────┘

       │  WebSocket / WebRTC

┌─────────────┐
│  Realtime   │
│    Model    │
└─────────────┘

Alternatively, you can proxy audio through your server if you need to inspect or modify the stream.

Node.js (server-side)

For Twilio, Discord, or custom audio sources, your server owns the session. Audio flows through your server to the realtime API.

┌─────────────┐               ┌─────────────┐               ┌─────────────┐
│   Twilio    │───┐           │             │               │             │
└─────────────┘   │   audio   │             │   WebSocket   │  Realtime   │
┌─────────────┐   ├──────────▶│   Server    │──────────────▶│    Model    │
│   Discord   │───┤           │             │               │             │
└─────────────┘   │◀──────────│             │◀──────────────│             │
┌─────────────┐   │   audio   │             │               │             │
│   Custom    │───┘           │             │               └─────────────┘
└─────────────┘               └─────────────┘

Channels abstract audio I/O—swap TwilioChannel for DiscordChannel without changing your agent logic.


Node.js usage

For server-side realtime (Twilio, Discord bots), create a session directly:

import { RealtimeAgent, RealtimeSession } from "kernl";
import { openai } from "@kernl-sdk/openai";

const jarvis = new RealtimeAgent({
  id: "jarvis",
  name: "Jarvis",
  instructions: "You are a helpful voice assistant. Be concise.",
});

const session = new RealtimeSession(jarvis, {
  model: openai.realtime("gpt-realtime"),
  channel: twilioChannel,
});

await session.connect();

For Node.js < 22, you must provide a WebSocket implementation:

import WebSocket from "ws";

const session = new RealtimeSession(jarvis, {
  model: openai.realtime("gpt-realtime"),
  channel: twilioChannel,
  websocket: WebSocket,
});

Browser usage

1. Define your agent

This definition lives in your client app (e.g. src/agents/jarvis.ts):

import { RealtimeAgent } from "kernl";

export const jarvis = new RealtimeAgent({
  id: "jarvis",
  name: "Jarvis",
  instructions: "You are a helpful voice assistant. Be concise.",
});

2. Create credential endpoint

Your API key stays on the server. The client fetches a short-lived ephemeral credential:

import { openai } from "@kernl-sdk/openai";

// POST /api/realtime/credential
export async function POST() {
  const model = openai.realtime("gpt-realtime");
  const credential = await model.authenticate();

  return Response.json({ credential });
}

The credential expires after a short period. The client connects directly to the realtime API—audio never touches your server.

3. React

For React apps, use the useRealtime and useBrowserAudio hooks:

import { useRealtime, useBrowserAudio, LiveWaveform } from "@kernl-sdk/react";
import { openai } from "@kernl-sdk/openai";

import { jarvis } from "@/agents/jarvis";

function VoiceAgent() {
  const { channel } = useBrowserAudio();

  const { status, connect, disconnect, muted, mute, unmute } = useRealtime(jarvis, {
    model: openai.realtime("gpt-realtime"),
    channel,
  });

  const start = useCallback(async () => {
    if (!channel) return;

    const res = await fetch("/api/realtime/credential");
    const { credential } = await res.json();

    await channel.init();
    connect(credential);
  }, [channel, connect]);

  const stop = useCallback(() => {
    disconnect();
    channel?.close();
  }, [disconnect, channel]);

  return (
    <div>
      <LiveWaveform audio={channel} active={status === "connected"} />

      {status === "disconnected" && (
        <button onClick={start}>Start</button>
      )}

      {status === "connected" && (
        <>
          <button onClick={() => (muted ? unmute() : mute())}>
            {muted ? "Unmute" : "Mute"}
          </button>
          <button onClick={stop}>End</button>
        </>
      )}
    </div>
  );
}

Tools

Realtime agents support tools. There are two patterns for where tools execute:

Client-side tools with context

For tools that need to update React state or trigger UI effects, use the ctx pattern. This gives your tool access to React closures:

import { tool, Toolkit } from "kernl";
import { z } from "zod";

// Define the context type
type CartContext = {
  addToCart: (item: Item) => void;
};

// Create tools that use context
const addToCart = tool({
  id: "add_to_cart",
  description: "Add an item to the shopping cart",
  parameters: z.object({
    productId: z.string(),
    quantity: z.number(),
  }),
  execute: async (ctx, { productId, quantity }) => {
    const item = await fetchProduct(productId);
    ctx.context.addToCart({ ...item, quantity }); // <- access React state via ctx.context
    return `Added ${item.name} to cart`;
  },
});

const cartToolkit = new Toolkit<CartContext>({
  id: "cart",
  tools: [addToCart],
});

// In your component, pass the context
function VoiceAgent() {
  const [cart, setCart] = useState<Item[]>([]);

  const ctx = useMemo<CartContext>(
    () => ({
      addToCart: (item) => setCart((prev) => [...prev, item]),
    }),
    [],
  );

  const { status, connect, disconnect } = useRealtime(jarvis, {
    model: openai.realtime("gpt-realtime"),
    channel,
    ctx, // <- pass context to make it available in tools
  });

  // ...
}

Server-side tools

For tools that need server-side resources (database, secrets, external APIs), call your server from the client tool:

const createOrder = tool({
  id: "create_order",
  description: "Create an order for the items in the cart",
  parameters: z.object({
    items: z.array(z.object({
      productId: z.string(),
      quantity: z.number(),
    })),
  }),
  execute: async (ctx, { items }) => {
    const res = await fetch("/api/orders", { // <- call your server for secrets/DB access
      method: "POST",
      body: JSON.stringify({ items }),
    });
    const order = await res.json();
    return `Order ${order.id} created successfully`;
  },
});

This keeps your API keys and business logic on the server while the tool executes client-side.

Voice configuration

Customize the voice:

const jarvis = new RealtimeAgent({
  id: "jarvis",
  name: "Jarvis",
  instructions: "...",
  voice: {
    voiceId: "alloy", // OpenAI voices: alloy, echo, fable, onyx, nova, shimmer
    speed: 1.0,
  },
});

On this page