Skip to main content
Helicone’s AI Gateway provides a unified interface for reasoning across providers. Use the same parameters regardless of provider - the Gateway handles the translation automatically.

Quick Start

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.HELICONE_API_KEY,
  baseURL: "https://ai-gateway.helicone.ai/v1",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [
    { role: "user", content: "What is the sum of the first 100 prime numbers?" }
  ],
  reasoning_effort: "medium",
  max_completion_tokens: 16000
});

Configuration

{
  reasoning_effort: "low" | "medium" | "high",
  reasoning_options: {
    budget_tokens: 8000  // Optional token budget
  }
}

reasoning_effort

LevelDescription
lowLight reasoning for simple tasks
mediumBalanced reasoning
highDeep reasoning for complex problems
For Anthropic models, the default is 4096 max completion tokens with 2048 budget reasoning tokens.

reasoning_options.budget_tokens

The budget_tokens parameter sets the maximum number of tokens the model can use for reasoning.
For Google (Gemini) models: reasoning_effort is required to enable thinking. Passing budget_tokens alone will not enable reasoning - you must also specify reasoning_effort.
// ✅ Correct: reasoning_effort enables thinking, budget_tokens limits it
{
  reasoning_effort: "high",
  reasoning_options: { budget_tokens: 4096 }
}

// ❌ Incorrect for Gemini: budget_tokens alone does nothing
{
  reasoning_options: { budget_tokens: 4096 }  // Reasoning will be disabled
}

Handling Responses

Chat Completions

When streaming, reasoning content arrives in chunks via the reasoning delta field, followed by content, and finally reasoning_details with the finish reason:
// Reasoning chunks arrive first
{
  "choices": [{
    "delta": { "reasoning": "Let me think about this..." }
  }]
}

// Then content chunks
{
  "choices": [{
    "delta": { "content": "The answer is 42." }
  }]
}

// Final chunk includes reasoning_details with signature
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "thinking": "The user is asking for...",
        "signature": "EpICCkYIChgCKkCfWt1pnGxEcz48yQJvie3ppkXZ8ryd..."
      }]
    },
    "finish_reason": "stop"
  }]
}

Responses API

Streaming events follow the Responses API format:
// Reasoning summary text delta
{
  "type": "response.reasoning_summary_text.delta",
  "item_id": "rs_0ab50bce3156357b...",
  "output_index": 0,
  "summary_index": 0,
  "delta": "Let me think about this..."
}

// Reasoning item complete
{
  "type": "response.output_item.done",
  "output_index": 0,
  "item": {
    "id": "rs_0ab50bce3156357b...",
    "type": "reasoning",
    "summary": [{
      "type": "summary_text",
      "text": "**Crafting the response**\n\nThe user wants..."
    }]
  }
}
Anthropic models always return encrypted_content (signatures) in reasoning items. These signatures validate the reasoning chain and are required for multi-turn conversations. Other providers like OpenAI can optionally return signatures when configured.