Verification Testing
Test cases cover repetitive scenarios you want to verify consistently—every day or on every new build. They ensure core systems, flows, and features work as expected.
Ideal for:
- Regression testing (ensuring nothing breaks after changes)
- Smoke testing (checking that core systems still work)
- Repetitive checks (daily runs, every new build)
Once written, test cases provide high-quality, consistent verification. You explicitly define the steps and expected results, so any unexpected change is flagged.
Step Structure
A test case is a sequence of steps executed from a fresh app install/launch. Each step has:
**Instructions:**
Actions to perform (with clear completion indicators)
**Expected Results:**
Observable outcomes to verify
**Hints:** (optional)
Extra guidance for tricky situationsInstructions
A self-contained instruction with a clear endpoint. Each step should avoid overlapping with other steps to prevent the agent from running ahead.
| Bad | Problem | Good |
|---|---|---|
| Get some wood | No end condition | Collect 10 wood (inventory shows 10/10) |
| Follow the instructions on screen | Could run forever | Follow instructions until the “Well Done” popup appears |
| Play the game | No endpoint | Play until you reach level 3 |
Look for distinctive visual elements—titles, popup text, menu icons—to define clear endpoints. For example: “Follow the instructions until you see the ‘Tutorial Complete’ banner.”
Instruction patterns:
- “Wait until [specific UI element] appears”
- “Tap X, then tap Y, then tap Z”
- “Repeat [action] until [condition]”
Handle variations:
If the Terms of Service popup appears, tap "I Agree" (optional, can be skipped).
If you see popup X, close it by tapping the X button (this must appear).Expected Results
Specific, verifiable yes/no checks the agent confirms during or after completing the instructions.
For “Craft a Pickaxe”, expected results might include:
- The pickaxe is added to the inventory
- Materials used to craft are consumed
- The “Craft” button is disabled when requirements aren’t met
| Bad | Problem | Good |
|---|---|---|
| Game loads in expected time | What is expected? | Game loads within 2 minutes |
| Main menu displays correctly | What is correct? | Main menu shows Play button, Settings button, and player avatar |
| You receive stardust | How much? | You receive between 300-900 stardust |
Result patterns:
- “[Element] is visible/present”
- “[Counter] increases/decreases by [amount]”
- “No [error states]: no popups blocking, no loading spinners”
Turn Budget
The maximum number of actions the agent can take per step.
- Completes within budget → moves to next step
- Exceeds budget → step fails (prevents infinite loops)
Hints
Optional guidance that helps the agent succeed:
- “To interact with the King, press E when standing close to the throne.”
- “The crafting bench is in the bottom-right corner of the village.”
Hints make test cases more stable, reliable, and faster to execute.
Step Sequencing
Start Fresh
All test cases assume fresh install or app launch. Step 1 typically handles:
- Initial loading
- TOS/privacy acceptance (often optional)
- Login or guest play
- Dismissing initial popups
Clear Transitions
Each step should end in a known state that the next step can reliably continue from:
STEP 1 ends: "You are on the main menu, no popups blocking"
STEP 2 starts: "From the main menu, tap Settings..."When to Split
The agent performs better when complex tasks are broken into smaller steps.
Before (one step):
Wait for the game to load, accept the TOS, run the cheat command, and play until you’re in the main world.
After (three steps):
- Wait for the game to load and accept the Terms of Service.
- Run the cheat command:
activate_category(categoryName="User"). - Play until you reach the main game world and all tutorial popups are gone.
Split when a step has more than ~10 verification points, or when you need to verify intermediate states.
Keep in mind:
- The agent starts fresh for each step—don’t reference previous steps
- Define clear completion points to prevent running ahead
Shared Steps
For common sequences (setup, login, reset), create reusable shared step collections:
setup-fresh-account: TOS, login, dismiss popupssetup-with-debug-reset: Clear data, relaunch, fresh startnavigate-to-settings: Standard path to settings menu
Test Case Execution
When running a test case, the agent:
- Follows the instructions for the current step
- Verifies all expected results within the turn budget
- Decides what to do if results are not met
The agent handles failures smartly:
- Critical failures (e.g., cheat command doesn’t work) → Ends run early, marks as failed
- Non-critical failures (e.g., misspelled button label) → Continues to next step, logs as bug
This means:
- More coverage per run (fewer early cancellations)
- More bugs discovered, including minor issues
- Clear distinction between critical failures vs minor bugs
Example
## STEP 1
**Instructions:**
Wait for the app to finish loading.
If the Terms of Service popup appears, tap "I Agree" (optional, can be skipped).
Tap "Play as Guest" and confirm when prompted.
**Expected Results:**
You are on the main menu.
Game logo and "New Game" button are visible.
**Hints:**
- TOS popup may not appear on every launch—skip if absent.
- Always play as guest for this test.## STEP 2
**Instructions:**
Tap the inbox icon and collect ALL daily bonuses.
**Expected Results:**
The inbox popup opens with 3 tabs: Rewards, Messages, News.
The Rewards tab loads without errors.
DAILY BONUS is displayed and collectible (coins increase after collection).
**Hints:**
- The inbox icon is in the bottom-right corner of the main menu.In short: Verification Testing takes more upfront work than Discovery, but it delivers repeatable, high-quality results essential for regression and smoke testing.