Introduction

Generative AI is quickly changing the cybersecurity landscape. Helicone provides built-in security measures powered by Meta’s state-of-the-art security models to protect your LLM applications.

Security Implementation

Helicone’s LLM security is powered by two advanced models from Meta:

  1. Prompt Guard (86M): A specialized model for detecting:

    • Direct prompt injections
    • Indirect/embedded malicious instructions
    • Jailbreak attempts
    • Multi-language attacks (supports 8 languages)
  2. Advanced Security Analysis: Optional deeper security analysis using Meta’s Llama Guard (3.8B) for comprehensive threat detection across 14 categories:

    CategoryDescription
    Violent CrimesViolence toward people or animals
    Non-Violent CrimesFinancial crimes, property crimes, cyber crimes
    Sex-Related CrimesTrafficking, assault, harassment
    Child ExploitationAny content related to child abuse
    DefamationFalse statements harming reputation
    Specialized AdviceUnauthorized financial/medical/legal advice
    PrivacyHandling of sensitive personal information
    Intellectual PropertyCopyright and IP violations
    Indiscriminate WeaponsCreation of dangerous weapons
    Hate SpeechContent targeting protected characteristics
    Suicide & Self-HarmContent promoting self-injury
    Sexual ContentAdult content and erotica
    ElectionsMisinformation about voting
    Code Interpreter AbuseMalicious code execution attempts

Quick Start

To enable LLM security in Helicone, simply add Helicone-LLM-Security-Enabled: true to your request headers. For advanced security analysis using Llama Guard, add Helicone-LLM-Security-Advanced: true:

Security Checks

When LLM Security is enabled, Helicone:

  • Analyzes each user message using Meta’s Prompt Guard model (86M parameters) to detect:
    • Direct jailbreak attempts
    • Indirect injection attacks
    • Malicious content in 8 languages (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai)
  • When advanced security is enabled (Helicone-LLM-Security-Advanced: true), activates Meta’s Llama Guard (3.8B) model for:
    • Deeper content analysis across 14 threat categories
    • Higher accuracy threat detection
    • More nuanced understanding of context and intent
  • Blocks detected threats and returns an error response:
    {
      "success": false,
      "error": {
        "code": "PROMPT_THREAT_DETECTED",
        "message": "Prompt threat detected. Your request cannot be processed.",
        "details": "See your Helicone request page for more info."
      }
    }
    
  • Adds minimal latency to ensure a smooth experience for legitimate requests

Advanced Security Features

  • Two-Tier Protection:
    • Base tier: Fast screening with Prompt Guard (86M parameters)
    • Advanced tier: Comprehensive analysis with Llama Guard (3.8B parameters)
  • Multilingual Support: Detects threats across 8 languages
  • Low Base Latency: Initial screening uses the lightweight Prompt Guard model
  • High Accuracy:
    • Base: Over 97% detection rate on jailbreak attempts
    • Advanced: Enhanced accuracy with Llama Guard’s larger model
  • Customizable: Security thresholds can be adjusted based on your application’s needs