Run experiments with historical datasets to test, evaluate, and improve prompts over time while preventing regressions in production systems.
We are deprecating the Experiments feature and it will be removed from the platform on September 1st, 2025.
To start an experiment, first, go to the Prompts tab and select a prompt.
Click `Start Experiment`
On the top right, click Start Experiment
.
Select the base prompt
Select a base prompt and click Continue
. You can edit the prompt in the
next step.
To run an experiment on the production prompt, look for the production
tag.
Edit the prompt
Your changes will not affect the original prompt, but rather create a new one to test your experiment on.
Configure your experiment
Select the dataset, model and provider keys.
To run your experiment on a random dataset, click Generate random dataset
. We will pick up to 10 random data from your existing
requests.
Confirm and run
The Diff Viewer
compares your new prompt to the base prompt that you
selected.
View outputs
Once the experiment is finished, click on it to see a list of inputs and the associated outputs from the base prompt and the experiment.
Run experiments with historical datasets to test, evaluate, and improve prompts over time while preventing regressions in production systems.
We are deprecating the Experiments feature and it will be removed from the platform on September 1st, 2025.
To start an experiment, first, go to the Prompts tab and select a prompt.
Click `Start Experiment`
On the top right, click Start Experiment
.
Select the base prompt
Select a base prompt and click Continue
. You can edit the prompt in the
next step.
To run an experiment on the production prompt, look for the production
tag.
Edit the prompt
Your changes will not affect the original prompt, but rather create a new one to test your experiment on.
Configure your experiment
Select the dataset, model and provider keys.
To run your experiment on a random dataset, click Generate random dataset
. We will pick up to 10 random data from your existing
requests.
Confirm and run
The Diff Viewer
compares your new prompt to the base prompt that you
selected.
View outputs
Once the experiment is finished, click on it to see a list of inputs and the associated outputs from the base prompt and the experiment.