Supercharge prompt engineering
Compose, improve and evaluate LLM prompts.
Optimize and follow-up
Optimize and follow-up
Follow-up only
Evaluate
...
0 revisions
Follow-up
Follow-up
Evaluate
Shift + Enter for new line
Processing...
Assisted revision
Shift + Enter for new line
Ready to start testing?
Design comprehensive test scenarios and tools to evaluate your prompt's performance across different AI models.
Evaluate your prompt
Design comprehensive test scenarios and tools to evaluate your prompt's performance across different AI models
Template
Switch between your pre-saved templates.
Name
Untitled evaluation
Saved templates
Create new
Start with a fresh evaluation template
Test cases
Define test cases with expected results
Generating new test case...
Generate from transcript
Paste a previous conversation transcript to generate a relevant test case.
Available tools
Define tools that can be used across scenarios.
Generating new tool...
Available context
Define context definitions that can be used across scenarios.
Generating new context...
Models
Select AI models to evaluate against your scenarios.
OpenAI
More models
18 more
Anthropic
More models
2 more
More models
3 more
xAI
API usage
Select how the AI models are used during evaluation.