LLM Security

Introduction

Generative AI is quickly changing the cybersecurity landscape. Helicone provides built-in security measures powered by Meta’s state-of-the-art security models to protect your LLM applications.

Adversarial Instructions

Indirect Injection

Data Exfiltration

Phishing

Security Implementation

Helicone’s LLM security is powered by two advanced models from Meta:

Prompt Guard (86M): A specialized model for detecting:
- Direct prompt injections
- Indirect/embedded malicious instructions
- Jailbreak attempts
- Multi-language attacks (supports 8 languages)

Advanced Security Analysis: Optional deeper security analysis using Meta’s Llama Guard (3.8B) for comprehensive threat detection across 14 categories:

Category	Description
Violent Crimes	Violence toward people or animals
Non-Violent Crimes	Financial crimes, property crimes, cyber crimes
Sex-Related Crimes	Trafficking, assault, harassment
Child Exploitation	Any content related to child abuse
Defamation	False statements harming reputation
Specialized Advice	Unauthorized financial/medical/legal advice
Privacy	Handling of sensitive personal information
Intellectual Property	Copyright and IP violations
Indiscriminate Weapons	Creation of dangerous weapons
Hate Speech	Content targeting protected characteristics
Suicide & Self-Harm	Content promoting self-injury
Sexual Content	Adult content and erotica
Elections	Misinformation about voting
Code Interpreter Abuse	Malicious code execution attempts

Quick Start

To enable LLM security in Helicone, simply add Helicone-LLM-Security-Enabled: true to your request headers. For advanced security analysis using Llama Guard, add Helicone-LLM-Security-Advanced: true:

curl https://oai.helicone.ai/v1/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <YOUR_API_KEY>' \
  -H 'Helicone-LLM-Security-Enabled: true' \
  -H 'Helicone-LLM-Security-Advanced: true' \
  -d '{
    "model": "text-davinci-003",
    "prompt": "How do I enable LLM security with helicone?",
}'

Security Checks

When LLM Security is enabled, Helicone:

Analyzes each user message using Meta’s Prompt Guard model (86M parameters) to detect:
- Direct jailbreak attempts
- Indirect injection attacks
- Malicious content in 8 languages (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai)
When advanced security is enabled (Helicone-LLM-Security-Advanced: true), activates Meta’s Llama Guard (3.8B) model for:
- Deeper content analysis across 14 threat categories
- Higher accuracy threat detection
- More nuanced understanding of context and intent

Blocks detected threats and returns an error response:

{
  "success": false,
  "error": {
    "code": "PROMPT_THREAT_DETECTED",
    "message": "Prompt threat detected. Your request cannot be processed.",
    "details": "See your Helicone request page for more info."
  }
}

Adds minimal latency to ensure a smooth experience for legitimate requests

Advanced Security Features

Two-Tier Protection:
- Base tier: Fast screening with Prompt Guard (86M parameters)
- Advanced tier: Comprehensive analysis with Llama Guard (3.8B parameters)
Multilingual Support: Detects threats across 8 languages
Low Base Latency: Initial screening uses the lightweight Prompt Guard model
High Accuracy:
- Base: Over 97% detection rate on jailbreak attempts
- Advanced: Enhanced accuracy with Llama Guard’s larger model
Customizable: Security thresholds can be adjusted based on your application’s needs

Need more help?

Getting Started

Integrations

Tracing

Prompt Engineering

AI Gateway

References

Introduction

Example

Example

Security Implementation

Quick Start

Security Checks

Advanced Security Features

Getting Started

Integrations

Tracing

Prompt Engineering

AI Gateway

References

​Introduction

​Security Implementation

​Quick Start

​Security Checks

​Advanced Security Features

Introduction

Security Implementation

Quick Start

Security Checks

Advanced Security Features