Helicone smoothly integrates streaming functionality and offers benefits that you can’t find with the standard OpenAI package!

Currently, OpenAI doesn’t provide usage statistics such as prompt and completion tokens. However, Helicone overcomes this limitation by estimating these statistics with the help of the gpt3-tokenizer package, which is designed to work with all tokenized OpenAI GPT models.

All you have to do is import helicone or import openai through the Helicone package and the rest of your code works as it does.

from helicone.openai_proxy import openai

The following examples work with or without Helicone!

Streaming mode with synchronous requests

In this mode, the request is made synchronously, but the response is streamed.

for chunk in openai.ChatCompletion.create(
    model = 'gpt-3.5-turbo',
    messages = [{
        'role': 'user',
        'content': "Hello World!"
    }],
    stream=True
):
    content = chunk["choices"][0].get("delta", {}).get("content")
    if content is not None:
        print(content, end='')

Streaming mode with asynchronous requests

In this mode, both the request is made asynchronously and the response is streamed. You’ll need to use the await keyword when calling openai.ChatCompletion.acreate, and use an async for loop to iterate over the response.

for chunk in await openai.ChatCompletion.acreate(
    model = 'gpt-3.5-turbo',
    messages = [{
        'role': 'user',
        'content': "Hello World!"
    }],
    stream=True
):
    content = chunk["choices"][0].get("delta", {}).get("content")
    if content is not None:
        print(content, end='')

Enhanced Streaming Support

Helicone now provides significantly improved streaming functionality with several key updates:

Stream Fixes and Improvements

We’ve made several improvements to our stream handling across different LLM providers:

  • Better handling of stream interruptions and reconnections
  • Enhanced error handling for streaming responses
  • Improved compatibility with different LLM provider streaming formats
  • More reliable token counting for streamed content
  • Accurate timing calculations for streamed responses

New Streaming Methods

The HeliconeManualLogger class now includes enhanced methods for working with streams:

  • logBuilder: The recommended method for handling streaming responses with better error handling and simplified workflow
  • logStream: Logs a streaming operation with full control over stream handling
  • logSingleStream: Simplified method for logging a single ReadableStream
  • logSingleRequest: Logs a single request with a response body

Asynchronous Stream Parser

Our new asynchronous stream parser significantly improves performance when working with streamed responses:

  • Processes stream chunks asynchronously for reduced latency
  • Provides more reliable token counting for streamed responses
  • Accurately captures time-to-first-token metrics
  • Efficiently handles multiple concurrent streams

The new logBuilder method provides a more streamlined approach to working with streaming responses, with better error handling:

import { HeliconeManualLogger } from "@helicone/helpers";
import { after } from "next/server";
import Together from "together-ai";

const together = new Together();
const helicone = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY!,
});

export async function POST(request: Request) {
  const { question } = await request.json();
  const body = {
    model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages: [{ role: "user", content: question }],
    stream: true,
  };

  const heliconeLogBuilder = helicone.logBuilder(body, {
    "Helicone-Property-Environment": "dev",
  });

  try {
    const response = await together.chat.completions.create(body);
    return new Response(heliconeLogBuilder.toReadableStream(response));
  } catch (error) {
    heliconeLogBuilder.setError(error);
    throw error;
  } finally {
    after(async () => {
      // This will be executed after the response is sent to the client
      await heliconeLogBuilder.sendLog();
    });
  }
}

The logBuilder approach offers several advantages:

  • Better error handling with setError method
  • Simplified stream handling with toReadableStream
  • More flexible async/await patterns with sendLog
  • Proper error status code tracking

Using the Enhanced Streaming Features

OpenAI Streaming Example

import OpenAI from "openai";
import { HeliconeManualLogger } from "@helicone/helpers";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const helicone = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY!,
  headers: {
    "Helicone-Property-Environment": "production",
  },
});

async function generateStreamingResponse(prompt: string, userId: string) {
  const requestBody = {
    model: "gpt-4-turbo",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  };

  const response = await openai.chat.completions.create(requestBody);

  // For OpenAI's Node.js SDK, we can use the logSingleStream method
  const stream = response.toReadableStream();
  const [streamForUser, streamForLogging] = stream.tee();

  helicone.logSingleStream(requestBody, streamForLogging, {
    "Helicone-User-Id": userId,
  });

  return streamForUser;
}

Together AI Streaming Example

import Together from "together-ai";
import { HeliconeManualLogger } from "@helicone/helpers";

const together = new Together({ apiKey: process.env.TOGETHER_API_KEY });

const helicone = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY!,
  headers: {
    "Helicone-Property-Environment": "production",
  },
});

export async function generateWithTogetherAI(prompt: string, userId: string) {
  const body = {
    model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  };

  const response = await together.chat.completions.create(body);

  // Create two copies of the stream
  const [stream1, stream2] = response.tee();

  // Log the stream with Helicone
  helicone.logStream(
    body,
    async (resultRecorder) => {
      resultRecorder.attachStream(stream2.toReadableStream());
      return stream1;
    },
    { "Helicone-User-Id": userId }
  );

  return new Response(stream1.toReadableStream());
}

Anthropic Streaming Example

import Anthropic from "@anthropic-ai/sdk";
import { HeliconeManualLogger } from "@helicone/helpers";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const helicone = new HeliconeManualLogger({
  apiKey: process.env.HELICONE_API_KEY!,
  headers: {
    "Helicone-Property-Environment": "production",
  },
});

async function generateWithAnthropic(prompt: string, userId: string) {
  const requestBody = {
    model: "claude-3-opus-20240229",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  };

  const response = await anthropic.messages.create(requestBody);
  const stream = response.toReadableStream();
  const [userStream, loggingStream] = stream.tee();

  helicone.logSingleStream(requestBody, loggingStream, {
    "Helicone-User-Id": userId,
  });

  return userStream;
}

Calculating Costs with Streaming

For information on how to accurately calculate costs when using streaming features, please refer to our streaming usage guide.

You can enable accurate cost calculation by either:

  1. Including stream_options: { include_usage: true } in your request
  2. Adding the helicone-stream-usage: true header to your request

This ensures that token usage is properly tracked even when using streaming responses.

Vercel App Router Integration

When using Next.js App Router with Vercel, you can use the after function to log streaming responses without blocking the response to the client:

import { HeliconeManualLogger } from "@helicone/helpers";
import { after } from "next/server";
import Together from "together-ai";

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const together = new Together({ apiKey: process.env.TOGETHER_API_KEY });
  const helicone = new HeliconeManualLogger({
    apiKey: process.env.HELICONE_API_KEY!,
  });

  // Create a streaming request
  const requestBody = {
    model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  };

  const response = await together.chat.completions.create(requestBody);
  const [stream1, stream2] = response.tee();

  // Log the stream after sending the response to the client
  after(helicone.logSingleStream(requestBody, stream2.toReadableStream()));

  return new Response(stream1.toReadableStream());
}

This approach ensures that logging doesn’t delay the response to the user, providing the best possible experience while still capturing all the necessary data.

Learn More

For a comprehensive guide on using the Manual Logger with streaming functionality, check out our Manual Logger with Streaming cookbook.