Realtime
Voice-enabled agents with real-time audio streaming. Browser and Node.js patterns.
Reference: RealtimeAgent, RealtimeSession
Build voice-enabled AI agents with real-time audio streaming.
Architecture patterns
There are two main approaches depending on where your audio originates:
Browser (direct connection)
For web apps, the browser connects directly to the realtime API using a short-lived ephemeral credential from your server. Audio never touches your server—lower latency and simpler infrastructure.
┌─────────────┐ credential ┌─────────────┐
│ Browser │◀────────────────│ Server │
└─────────────┘ └─────────────┘
│
│ WebSocket / WebRTC
▼
┌─────────────┐
│ Realtime │
│ Model │
└─────────────┘Alternatively, you can proxy audio through your server if you need to inspect or modify the stream.
Node.js (server-side)
For Twilio, Discord, or custom audio sources, your server owns the session. Audio flows through your server to the realtime API.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Twilio │───┐ │ │ │ │
└─────────────┘ │ audio │ │ WebSocket │ Realtime │
┌─────────────┐ ├──────────▶│ Server │──────────────▶│ Model │
│ Discord │───┤ │ │ │ │
└─────────────┘ │◀──────────│ │◀──────────────│ │
┌─────────────┐ │ audio │ │ │ │
│ Custom │───┘ │ │ └─────────────┘
└─────────────┘ └─────────────┘Channels abstract audio I/O—swap TwilioChannel for DiscordChannel without changing your agent logic.
Node.js usage
For server-side realtime (Twilio, Discord bots), create a session directly:
import { RealtimeAgent, RealtimeSession } from "kernl";
import { openai } from "@kernl-sdk/openai";
const jarvis = new RealtimeAgent({
id: "jarvis",
name: "Jarvis",
instructions: "You are a helpful voice assistant. Be concise.",
});
const session = new RealtimeSession(jarvis, {
model: openai.realtime("gpt-realtime"),
channel: twilioChannel,
});
await session.connect();For Node.js < 22, you must provide a WebSocket implementation:
import WebSocket from "ws";
const session = new RealtimeSession(jarvis, {
model: openai.realtime("gpt-realtime"),
channel: twilioChannel,
websocket: WebSocket,
});Browser usage
1. Define your agent
This definition lives in your client app (e.g. src/agents/jarvis.ts):
import { RealtimeAgent } from "kernl";
export const jarvis = new RealtimeAgent({
id: "jarvis",
name: "Jarvis",
instructions: "You are a helpful voice assistant. Be concise.",
});2. Create credential endpoint
Your API key stays on the server. The client fetches a short-lived ephemeral credential:
import { openai } from "@kernl-sdk/openai";
// POST /api/realtime/credential
export async function POST() {
const model = openai.realtime("gpt-realtime");
const credential = await model.authenticate();
return Response.json({ credential });
}The credential expires after a short period. The client connects directly to the realtime API—audio never touches your server.
3. React
For React apps, use the useRealtime and useBrowserAudio hooks:
import { useRealtime, useBrowserAudio, LiveWaveform } from "@kernl-sdk/react";
import { openai } from "@kernl-sdk/openai";
import { jarvis } from "@/agents/jarvis";
function VoiceAgent() {
const { channel } = useBrowserAudio();
const { status, connect, disconnect, muted, mute, unmute } = useRealtime(jarvis, {
model: openai.realtime("gpt-realtime"),
channel,
});
const start = useCallback(async () => {
if (!channel) return;
const res = await fetch("/api/realtime/credential");
const { credential } = await res.json();
await channel.init();
connect(credential);
}, [channel, connect]);
const stop = useCallback(() => {
disconnect();
channel?.close();
}, [disconnect, channel]);
return (
<div>
<LiveWaveform audio={channel} active={status === "connected"} />
{status === "disconnected" && (
<button onClick={start}>Start</button>
)}
{status === "connected" && (
<>
<button onClick={() => (muted ? unmute() : mute())}>
{muted ? "Unmute" : "Mute"}
</button>
<button onClick={stop}>End</button>
</>
)}
</div>
);
}Tools
Realtime agents support tools. There are two patterns for where tools execute:
Client-side tools with context
For tools that need to update React state or trigger UI effects, use the ctx pattern. This gives your tool access to React closures:
import { tool, Toolkit } from "kernl";
import { z } from "zod";
// Define the context type
type CartContext = {
addToCart: (item: Item) => void;
};
// Create tools that use context
const addToCart = tool({
id: "add_to_cart",
description: "Add an item to the shopping cart",
parameters: z.object({
productId: z.string(),
quantity: z.number(),
}),
execute: async (ctx, { productId, quantity }) => {
const item = await fetchProduct(productId);
ctx.context.addToCart({ ...item, quantity }); // <- access React state via ctx.context
return `Added ${item.name} to cart`;
},
});
const cartToolkit = new Toolkit<CartContext>({
id: "cart",
tools: [addToCart],
});
// In your component, pass the context
function VoiceAgent() {
const [cart, setCart] = useState<Item[]>([]);
const ctx = useMemo<CartContext>(
() => ({
addToCart: (item) => setCart((prev) => [...prev, item]),
}),
[],
);
const { status, connect, disconnect } = useRealtime(jarvis, {
model: openai.realtime("gpt-realtime"),
channel,
ctx, // <- pass context to make it available in tools
});
// ...
}Server-side tools
For tools that need server-side resources (database, secrets, external APIs), call your server from the client tool:
const createOrder = tool({
id: "create_order",
description: "Create an order for the items in the cart",
parameters: z.object({
items: z.array(z.object({
productId: z.string(),
quantity: z.number(),
})),
}),
execute: async (ctx, { items }) => {
const res = await fetch("/api/orders", { // <- call your server for secrets/DB access
method: "POST",
body: JSON.stringify({ items }),
});
const order = await res.json();
return `Order ${order.id} created successfully`;
},
});This keeps your API keys and business logic on the server while the tool executes client-side.
Voice configuration
Customize the voice:
const jarvis = new RealtimeAgent({
id: "jarvis",
name: "Jarvis",
instructions: "...",
voice: {
voiceId: "alloy", // OpenAI voices: alloy, echo, fable, onyx, nova, shimmer
speed: 1.0,
},
});