Reasoning

Helicone’s AI Gateway provides a unified interface for reasoning across providers. Use the same parameters regardless of provider - the Gateway handles the translation automatically.

Quick Start

Chat Completions
Responses API

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.HELICONE_API_KEY,
  baseURL: "https://ai-gateway.helicone.ai/v1",
});

const response = await client.chat.completions.create({
  model: "claude-sonnet-4-20250514",
  messages: [
    { role: "user", content: "What is the sum of the first 100 prime numbers?" }
  ],
  reasoning_effort: "medium",
  max_completion_tokens: 16000
});

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.HELICONE_API_KEY,
  baseURL: "https://ai-gateway.helicone.ai/v1",
});

const response = await client.responses.create({
  model: "claude-sonnet-4-20250514",
  input: "What is the sum of the first 100 prime numbers?",
  reasoning: {
    effort: "medium"
  }
});

Configuration

Chat Completions
Responses API

{
  reasoning_effort: "low" | "medium" | "high",
  reasoning_options: {
    budget_tokens: 8000  // Optional token budget
  }
}

{
  reasoning: {
    effort: "low" | "medium" | "high"
  },
  reasoning_options: {
    budget_tokens: 8000  // Optional token budget
  }
}

reasoning_effort

Level	Description
`low`	Light reasoning for simple tasks
`medium`	Balanced reasoning
`high`	Deep reasoning for complex problems

For Anthropic models, the default is 4096 max completion tokens with 2048 budget reasoning tokens.

reasoning_options.budget_tokens

The budget_tokens parameter sets the maximum number of tokens the model can use for reasoning.

For Google (Gemini) models: reasoning_effort is required to enable thinking. Passing budget_tokens alone will not enable reasoning - you must also specify reasoning_effort.

// ✅ Correct: reasoning_effort enables thinking, budget_tokens limits it
{
  reasoning_effort: "high",
  reasoning_options: { budget_tokens: 4096 }
}

// ❌ Incorrect for Gemini: budget_tokens alone does nothing
{
  reasoning_options: { budget_tokens: 4096 }  // Reasoning will be disabled
}

Handling Responses

Chat Completions

Streaming
Non-Streaming

When streaming, reasoning content arrives in chunks via the reasoning delta field, followed by content, and finally reasoning_details with the finish reason:

// Reasoning chunks arrive first
{
  "choices": [{
    "delta": { "reasoning": "Let me think about this..." }
  }]
}

// Then content chunks
{
  "choices": [{
    "delta": { "content": "The answer is 42." }
  }]
}

// Final chunk includes reasoning_details with signature
{
  "choices": [{
    "delta": {
      "reasoning_details": [{
        "thinking": "The user is asking for...",
        "signature": "EpICCkYIChgCKkCfWt1pnGxEcz48yQJvie3ppkXZ8ryd..."
      }]
    },
    "finish_reason": "stop"
  }]
}

Non-streaming responses include the full reasoning in the message:

{
  "id": "msg_01S1QpjYur8kLeEVKVoKxdTP",
  "object": "chat.completion",
  "model": "claude-haiku-4-5-20251001",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      "reasoning": "The user is asking for a very short joke. I should provide something quick, light, and funny...",
      "reasoning_details": [{
        "thinking": "The user is asking for a very short joke...",
        "signature": "Ev8DCkYIChgCKkBeHyembBdwl8C/a/8luinDP0w5/oQP..."
      }]
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 58,
    "completion_tokens": 108,
    "total_tokens": 166
  }
}

Responses API

Streaming
Non-Streaming (OpenAI)
Non-Streaming (Anthropic)

Streaming events follow the Responses API format:

// Reasoning summary text delta
{
  "type": "response.reasoning_summary_text.delta",
  "item_id": "rs_0ab50bce3156357b...",
  "output_index": 0,
  "summary_index": 0,
  "delta": "Let me think about this..."
}

// Reasoning item complete
{
  "type": "response.output_item.done",
  "output_index": 0,
  "item": {
    "id": "rs_0ab50bce3156357b...",
    "type": "reasoning",
    "summary": [{
      "type": "summary_text",
      "text": "**Crafting the response**\n\nThe user wants..."
    }]
  }
}

{
  "id": "resp_038bfaf6e50f1c45...",
  "object": "response",
  "status": "completed",
  "model": "gpt-5-mini-2025-08-07",
  "output": [
    {
      "id": "rs_038bfaf6e50f1c45...",
      "type": "reasoning",
      "summary": [{
        "type": "summary_text",
        "text": "**Generating programming jokes**\n\nThe user wants a short joke..."
      }]
    },
    {
      "id": "msg_038bfaf6e50f1c45...",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [{
        "type": "output_text",
        "text": "To understand recursion, you must first understand recursion."
      }]
    }
  ],
  "usage": {
    "input_tokens": 17,
    "output_tokens": 336,
    "output_tokens_details": {
      "reasoning_tokens": 320
    }
  }
}

Anthropic responses include encrypted_content for reasoning validation:

{
  "id": "msg_017G4K2w5s6zEn3KZ6jp455j",
  "object": "response",
  "status": "completed",
  "model": "claude-haiku-4-5-20251001",
  "output": [
    {
      "id": "rs_msg_017G4K2w5s6zEn3KZ6jp455j_0",
      "type": "reasoning",
      "summary": [{
        "type": "summary_text",
        "text": "The user wants me to tell a short joke about programming..."
      }],
      "encrypted_content": "EuYGCkYIChgCKkBxEozbYO/Z5AL2tlDHwBHcBEOG..."
    },
    {
      "id": "msg_msg_017G4K2w5s6zEn3KZ6jp455j",
      "type": "message",
      "status": "completed",
      "role": "assistant",
      "content": [{
        "type": "output_text",
        "text": "Why do programmers prefer dark mode?\n\nBecause light attracts bugs!"
      }]
    }
  ],
  "usage": {
    "input_tokens": 47,
    "output_tokens": 294
  }
}

Anthropic models always return encrypted_content (signatures) in reasoning items. These signatures validate the reasoning chain and are required for multi-turn conversations. Other providers like OpenAI can optionally return signatures when configured.

Responses API - Alternative API format with reasoning support
Context Editing - Manage context in long reasoning sessions

Getting Started

AI Gateway

Observability & Analytics

Prompt Management

Legacy Integrations

References

Quick Start

Configuration

reasoning_effort

reasoning_options.budget_tokens

Handling Responses

Chat Completions

Responses API

Getting Started

AI Gateway

Observability & Analytics

Prompt Management

Legacy Integrations

References

​Quick Start

​Configuration

​reasoning_effort

​reasoning_options.budget_tokens

​Handling Responses

​Chat Completions

​Responses API

​Related

Quick Start

Configuration

reasoning_effort

reasoning_options.budget_tokens

Handling Responses

Chat Completions

Responses API

Related