Prompt Assembly

When you make an LLM call with a prompt ID, the AI Gateway compiles your saved prompt alongside runtime parameters you provide. Understanding this assembly process helps you design effective prompt templates and make the most of runtime customization.

Version Selection

The AI Gateway automatically determines which prompt version to use based on the parameters you provide:

environment

string

Uses the version deployed to that environment (e.g., production, staging, development)

version_id

string

Uses a specific version directly by its ID

Default behavior: If neither parameter is provided, the production version is used. Environment takes precedence over version_id if both are specified.

Parameter Priority

Saved prompts store all the configuration you set in the playground - temperature, max tokens, response format, system messages, and more. At runtime, these saved parameters are used as defaults, but any parameters you specify in your API call will override them.

{
  "model": "gpt-4o-mini",
  "temperature": 0.6,
  "max_tokens": 1000,
  "messages": [
    {
      "role": "system", 
      "content": "You are a helpful customer support agent for {{hc:company:string}}."
    },
    {
      "role": "user",
      "content": "Hello, I need help with my account."
    }
  ]
}

Message Handling

Messages work differently than other parameters. Instead of overriding, runtime messages are appended to the saved prompt messages. This allows you to:

Define consistent system prompts and example conversations in your saved prompt
Add dynamic user messages at runtime
Build multi-turn conversations that maintain context

Since your saved prompts contain the required messages, the messages parameter becomes optional in API calls when using Helicone prompts. However, if your prompt template is empty or lacks messages, you’ll need to provide them at runtime.

Runtime messages are always appended to the end of your saved prompt messages. Make sure your saved prompt structure accounts for this behavior.

Prompt Partial Resolution

Prompt partials are resolved before variable substitution, allowing you to reference messages from other prompts and control their variables from the main prompt.

Resolution Order

The prompt assembly process follows this order:

Prompt Partial Resolution: All {{hcp:prompt_id:index:environment}} tags are replaced with the corresponding message content
Variable Substitution: All {{hc:name:type}} variables are replaced with their provided values

{
  "messages": [
    {
      "role": "system",
      "content": "{{hcp:sysPrompt:0}} Always be {{hc:tone:string}}."
    }
  ]
}

Partial Resolution Process

When a prompt partial is encountered:

Version Selection: The system determines which version of the referenced prompt to use based on the environment parameter (or defaults to production)
Message Extraction: The message at the specified index is extracted from that prompt version
Content Replacement: The partial tag is replaced with the extracted message content (which may contain its own variables)
Variable Collection: Variables from the resolved partial are collected and made available for substitution

Variable Control

Since partials are resolved before variables, variables within partials can be controlled from the main prompt’s inputs:

{
  "messages": [
    {
      "role": "user",
      "content": "{{hcp:greeting:0}} How can you help me?"
    }
  ]
}

Variables from prompt partials are automatically extracted and shown in the prompt editor. You only need to provide values for these variables in your main prompt’s inputs - they will be substituted in both the main prompt and any resolved partials.

Override Examples

Temperature Override
Max Tokens Override
Response Format Override

// Saved prompt has temperature: 0.8
const response = await openai.chat.completions.create({
  prompt_id: "abc123",
  temperature: 0.2, // Uses 0.2, not 0.8
  inputs: { topic: "AI safety" }
});

// Saved prompt has max_tokens: 500
const response = await openai.chat.completions.create({
  prompt_id: "abc123", 
  max_tokens: 1500, // Uses 1500, not 500
  inputs: { complexity: "detailed" }
});

// Saved prompt has no response format
const response = await openai.chat.completions.create({
  prompt_id: "abc123",
  response_format: { type: "json_object" }, // Adds JSON formatting
  inputs: { data_type: "user_preferences" }
});

This compilation approach gives you the flexibility to have consistent prompt templates while still allowing runtime customization for specific use cases.

Overview

Get started with Prompt Management

SDK Integration

Use prompts directly via SDK

Getting Started

AI Gateway

Observability & Analytics

Prompt Management

Legacy Integrations

References

Version Selection

Parameter Priority

Message Handling

Prompt Partial Resolution

Resolution Order

Partial Resolution Process

Variable Control

Override Examples

Overview

SDK Integration

Getting Started

AI Gateway

Observability & Analytics

Prompt Management

Legacy Integrations

References

​Version Selection

​Parameter Priority

​Message Handling

​Prompt Partial Resolution

​Resolution Order

​Partial Resolution Process

​Variable Control

​Override Examples

​Related Documentation

Overview

SDK Integration

Version Selection

Parameter Priority

Message Handling

Prompt Partial Resolution

Resolution Order

Partial Resolution Process

Variable Control

Override Examples

Related Documentation