Skip to main content
When you make an LLM call with a prompt ID, the AI Gateway compiles your saved prompt alongside runtime parameters you provide. Understanding this assembly process helps you design effective prompt templates and make the most of runtime customization.

Version Selection

The AI Gateway automatically determines which prompt version to use based on the parameters you provide:
environment
string
Uses the version deployed to that environment (e.g., production, staging, development)
version_id
string
Uses a specific version directly by its ID
Default behavior: If neither parameter is provided, the production version is used. Environment takes precedence over version_id if both are specified.

Parameter Priority

Saved prompts store all the configuration you set in the playground - temperature, max tokens, response format, system messages, and more. At runtime, these saved parameters are used as defaults, but any parameters you specify in your API call will override them.
{
  "model": "gpt-4o-mini",
  "temperature": 0.6,
  "max_tokens": 1000,
  "messages": [
    {
      "role": "system", 
      "content": "You are a helpful customer support agent for {{hc:company:string}}."
    },
    {
      "role": "user",
      "content": "Hello, I need help with my account."
    }
  ]
}

Message Handling

Messages work differently than other parameters. Instead of overriding, runtime messages are appended to the saved prompt messages. This allows you to:
  • Define consistent system prompts and example conversations in your saved prompt
  • Add dynamic user messages at runtime
  • Build multi-turn conversations that maintain context
Since your saved prompts contain the required messages, the messages parameter becomes optional in API calls when using Helicone prompts. However, if your prompt template is empty or lacks messages, you’ll need to provide them at runtime.
Runtime messages are always appended to the end of your saved prompt messages. Make sure your saved prompt structure accounts for this behavior.

Prompt Partial Resolution

Prompt partials are resolved before variable substitution, allowing you to reference messages from other prompts and control their variables from the main prompt.

Resolution Order

The prompt assembly process follows this order:
  1. Prompt Partial Resolution: All {{hcp:prompt_id:index:environment}} tags are replaced with the corresponding message content
  2. Variable Substitution: All {{hc:name:type}} variables are replaced with their provided values
{
  "messages": [
    {
      "role": "system",
      "content": "{{hcp:sysPrompt:0}} Always be {{hc:tone:string}}."
    }
  ]
}

Partial Resolution Process

When a prompt partial is encountered:
  1. Version Selection: The system determines which version of the referenced prompt to use based on the environment parameter (or defaults to production)
  2. Message Extraction: The message at the specified index is extracted from that prompt version
  3. Content Replacement: The partial tag is replaced with the extracted message content (which may contain its own variables)
  4. Variable Collection: Variables from the resolved partial are collected and made available for substitution

Variable Control

Since partials are resolved before variables, variables within partials can be controlled from the main prompt’s inputs:
{
  "messages": [
    {
      "role": "user",
      "content": "{{hcp:greeting:0}} How can you help me?"
    }
  ]
}
Variables from prompt partials are automatically extracted and shown in the prompt editor. You only need to provide values for these variables in your main prompt’s inputs - they will be substituted in both the main prompt and any resolved partials.

Override Examples

  • Temperature Override
  • Max Tokens Override
  • Response Format Override
// Saved prompt has temperature: 0.8
const response = await openai.chat.completions.create({
  prompt_id: "abc123",
  temperature: 0.2, // Uses 0.2, not 0.8
  inputs: { topic: "AI safety" }
});
This compilation approach gives you the flexibility to have consistent prompt templates while still allowing runtime customization for specific use cases.