Writing Good Tests
Writing good tests is a skill, and like any skill, it can be learned. At its core, writing tests is really about prompting: giving the agent clear, well-structured instructions that lead to consistent, reliable results.
Don’t expect perfection on the first try. Iteration is key. You’ll write a test, run it, see where the agent struggles or misinterprets your intent, then refine. Each iteration gets you closer to a test that works reliably every time.
Throughout this section, we’ll share the patterns, tips, and tricks we’ve learned from thousands of test runs. Whether you’re new to AI-driven testing or looking to level up, we’re here to help you write tests that actually work.
The Three Agent Types
There are three types of agents, each designed for different use cases. Choosing the right one depends on what you’re trying to achieve.
| Agent | Best For | Setup Effort | Output |
|---|---|---|---|
| Verification | Regression & smoke testing | High (structured test cases) | Consistent, repeatable results |
| Discovery | Exploration & quick checks | Low (minimal or no instructions) | Broad coverage, bug discovery |
| Task | Workflow automation | Medium (task-specific prompts) | Reports, docs, automated workflows |
Verification
Use verification tests when you need consistent, repeatable checks. These are structured test cases with defined steps, goals, and expected results. They take more effort upfront but pay off with stable results you can run daily or on every build.
→ Ideal for: regression testing, smoke testing, core flow validation
Discovery
Use discovery when you want to explore without strict structure. The Discovery Agent (Disco) adapts as it goes, finding bugs and issues you might not have thought to test for. Great for quick checks, new features, or just “does anything break?”
→ Ideal for: exploratory testing, validating fixes, testing new content
Task
Use the Task Agent when you need automation beyond testing. It’s a general-purpose agent for workflows like generating documentation, market research, compliance checks, or emulating specific user behaviors.
→ Ideal for: documentation, research, workflow automation