Verification Testing

Test cases cover repetitive scenarios you want to verify consistently—every day or on every new build. They ensure core systems, flows, and features work as expected.

Ideal for:

Regression testing (ensuring nothing breaks after changes)
Smoke testing (checking that core systems still work)
Repetitive checks (daily runs, every new build)

Once written, test cases provide high-quality, consistent verification. You explicitly define the steps and expected results, so any unexpected change is flagged.

Step Structure

A test case is a sequence of steps executed from a fresh app install/launch. Each step has:


**Instructions:**
Actions to perform (with clear completion indicators)

**Expected Results:**
Observable outcomes to verify

**Hints:** (optional)
Extra guidance for tricky situations

Instructions

A self-contained instruction with a clear endpoint. Each step should avoid overlapping with other steps to prevent the agent from running ahead.

Bad	Problem	Good
Get some wood	No end condition	Collect 10 wood (inventory shows 10/10)
Follow the instructions on screen	Could run forever	Follow instructions until the “Well Done” popup appears
Play the game	No endpoint	Play until you reach level 3

Look for distinctive visual elements—titles, popup text, menu icons—to define clear endpoints. For example: “Follow the instructions until you see the ‘Tutorial Complete’ banner.”

Instruction patterns:

“Wait until [specific UI element] appears”
“Tap X, then tap Y, then tap Z”
“Repeat [action] until [condition]”

Handle variations:


If the Terms of Service popup appears, tap "I Agree" (optional, can be skipped).
If you see popup X, close it by tapping the X button (this must appear).

Expected Results

Specific, verifiable yes/no checks the agent confirms during or after completing the instructions.

For “Craft a Pickaxe”, expected results might include:

The pickaxe is added to the inventory
Materials used to craft are consumed
The “Craft” button is disabled when requirements aren’t met

Bad	Problem	Good
Game loads in expected time	What is expected?	Game loads within 2 minutes
Main menu displays correctly	What is correct?	Main menu shows Play button, Settings button, and player avatar
You receive stardust	How much?	You receive between 300-900 stardust

Result patterns:

“[Element] is visible/present”
“[Counter] increases/decreases by [amount]”
“No [error states]: no popups blocking, no loading spinners”

Turn Budget

The maximum number of actions the agent can take per step.

Completes within budget → moves to next step
Exceeds budget → step fails (prevents infinite loops)

Hints

Optional guidance that helps the agent succeed:

“To interact with the King, press E when standing close to the throne.”
“The crafting bench is in the bottom-right corner of the village.”

Hints make test cases more stable, reliable, and faster to execute.

Step Sequencing

Start Fresh

All test cases assume fresh install or app launch. Step 1 typically handles:

Initial loading
TOS/privacy acceptance (often optional)
Login or guest play
Dismissing initial popups

Clear Transitions

Each step should end in a known state that the next step can reliably continue from:


STEP 1 ends:   "You are on the main menu, no popups blocking"
STEP 2 starts: "From the main menu, tap Settings..."

When to Split

The agent performs better when complex tasks are broken into smaller steps.

Before (one step):

Wait for the game to load, accept the TOS, run the cheat command, and play until you’re in the main world.

After (three steps):

Wait for the game to load and accept the Terms of Service.
Run the cheat command: activate_category(categoryName="User").
Play until you reach the main game world and all tutorial popups are gone.

Split when a step has more than ~10 verification points, or when you need to verify intermediate states.

Keep in mind:

The agent starts fresh for each step—don’t reference previous steps
Define clear completion points to prevent running ahead

Shared Steps

For common sequences (setup, login, reset), create reusable shared step collections:

setup-fresh-account: TOS, login, dismiss popups
setup-with-debug-reset: Clear data, relaunch, fresh start
navigate-to-settings: Standard path to settings menu

Test Case Execution

When running a test case, the agent:

Follows the instructions for the current step
Verifies all expected results within the turn budget
Decides what to do if results are not met

The agent handles failures smartly:

Critical failures (e.g., cheat command doesn’t work) → Ends run early, marks as failed
Non-critical failures (e.g., misspelled button label) → Continues to next step, logs as bug

This means:

More coverage per run (fewer early cancellations)
More bugs discovered, including minor issues
Clear distinction between critical failures vs minor bugs

Example


## STEP 1
**Instructions:**
Wait for the app to finish loading.
If the Terms of Service popup appears, tap "I Agree" (optional, can be skipped).
Tap "Play as Guest" and confirm when prompted.

**Expected Results:**
You are on the main menu.
Game logo and "New Game" button are visible.

**Hints:**
- TOS popup may not appear on every launch—skip if absent.
- Always play as guest for this test.


## STEP 2
**Instructions:**
Tap the inbox icon and collect ALL daily bonuses.

**Expected Results:**
The inbox popup opens with 3 tabs: Rewards, Messages, News.
The Rewards tab loads without errors.
DAILY BONUS is displayed and collectible (coins increase after collection).

**Hints:**
- The inbox icon is in the bottom-right corner of the main menu.

In short: Verification Testing takes more upfront work than Discovery, but it delivers repeatable, high-quality results essential for regression and smoke testing.