When running evaluation frameworks to measure model performance, you need visibility into how well your AI applications are performing across different metrics. Scores let you report evaluation results from any framework to Helicone, providing centralized observability for accuracy, hallucination rates, helpfulness, and custom metrics.
Helicone doesn’t run evaluations for you - we’re not an evaluation framework. Instead, we provide a centralized location to report and analyze evaluation results from any framework (like RAGAS, LangSmith, or custom evaluations), giving you unified observability across all your evaluation metrics.
Use your evaluation framework or custom logic to assess model responses and generate scores (integers or booleans) for metrics like accuracy, helpfulness, or safety.
Evaluate retrieval-augmented generation for accuracy and hallucination:
Copy
Ask AI
import requestsfrom ragas import evaluatefrom ragas.metrics import Faithfulness, ResponseRelevancyfrom datasets import Dataset# Run RAG evaluationdef evaluate_rag_response(question, answer, contexts, ground_truth, requestId): # Initialize RAGAS metrics metrics = [Faithfulness(), ResponseRelevancy()] # Create dataset in RAGAS format data = { "question": [question], "answer": [answer], "contexts": [contexts], "ground_truth": [ground_truth] } dataset = Dataset.from_dict(data) # Run evaluation result = evaluate(dataset, metrics=metrics) # Extract scores (RAGAS returns 0-1 values) faithfulness_score = result['faithfulness'] if 'faithfulness' in result else 0 relevancy_score = result['answer_relevancy'] if 'answer_relevancy' in result else 0 # Report to Helicone (convert to 0-100 scale) response = requests.post( f"https://api.helicone.ai/v1/request/{requestId}/score", headers={ "Authorization": f"Bearer {HELICONE_API_KEY}", "Content-Type": "application/json" }, json={ "scores": { "faithfulness": int(faithfulness_score * 100), "answer_relevancy": int(relevancy_score * 100) } } ) return result# Example usagescores = evaluate_rag_response( question="What is the capital of France?", answer="The capital of France is Paris.", contexts=["France is a country in Europe. Paris is its capital."], ground_truth="Paris", requestId="your-request-id-here")