How to Run LLM Prompt Experiments
Run experiments with historical datasets to test, evaluate, and improve prompts over time while preventing regressions in production systems.
Feature Highlight
- Create as many prompt versions as you like, without impacting production data.
- Evaluate the outputs of your new prompt (and have data to back you up 📈).
- Save cost by testing on specific datasets and making fewer calls to providers like OpenAI. 🤑
Running your first prompt experiment
To start an experiment, first, go to the Prompts tab and select a prompt.
Click `Start Experiment`
On the top right, click Start Experiment
.
Select the base prompt
Select a base prompt and click Continue
. You can edit the prompt in the
next step.
To run an experiment on the production prompt, look for the production
tag.
Edit the prompt
Your changes will not affect the original prompt, but rather create a new one to test your experiment on.
Configure your experiment
Select the dataset, model and provider keys.
To run your experiment on a random dataset, click Generate random dataset
. We will pick up to 10 random data from your existing
requests.
Confirm and run
The Diff Viewer
compares your new prompt to the base prompt that you
selected.
View outputs
Once the experiment is finished, click on it to see a list of inputs and the associated outputs from the base prompt and the experiment.
Was this page helpful?