Skip to Content
AI Agent
Improve Behavior

Improve the AI Agent

Think of the agent as an untrained human doing a task for the first time. Many tasks seem simple, but that’s often because you have context the agent doesn’t. Providing the right information can dramatically improve results.

Quick Comparison

MethodNotes
InstructionsMost impactful - get these right first, always loaded
HintsAlways Loaded, medium ROI and effort, quality over quantity is key
KnowledgeLazy Loaded, low effort way to add large amounts of information
Bug ReviewHighest ROI for testing, Lazy Loaded by the main agent and Auto Loaded by the bug checker
Flayer FunctionsHigh Effort and High ROI, Most powerful - unlocks new capabilities

Methods

Instructions

Clear instructions are the most important factor in agent performance. Focus on writing instructions that are easy to follow, self-contained, and have clear termination conditions.

If you don’t explicitly tell the agent to do X, it may sometimes do X and sometimes Y. For consistent behavior, be explicit. The same applies to stopping conditions, without clear termination criteria, the agent may stop early or continue longer than expected.

See the test case writing guide to learn more.

Hints

Hints provide additional guidance that’s always present in the agent’s context. There are two levels:

  • Agent Hints: Apply to all test cases and are configured at the agent level.
  • Test Case Hints: Apply only to a specific test step.

Use Agent Hints for recurring issues across multiple tests:

  • “Do not report capitalization errors”
  • “Close ads by clicking the >>> button in the bottom right corner”
  • “If stuck, focus on upgrading X to unlock more quests”

Use Test Case Hints for step-specific guidance:

  • “The forge is the building with the hammer icon”
  • “To interact with the furnace, click on it with a lava bucket in hand”

If the agent ignores your hints, they may be conflicting or lack weight. Add emphasis with words like CRITICAL, IMPORTANT, NEVER, or YOU MUST.

Knowledge

Knowledge files are lazy loaded, the agent queries and reads them when it thinks they’re relevant. There are three scopes:

  • Project: Shared across all agents and tasks (design documents, cheat sheets)
  • Agent: Available to a specific agent (domain knowledge, wikis)
  • Task: Attached to a specific test case (reference images, specs)

Unlike instructions and hints, knowledge is not always in context. This means you can add large amounts of reference material without bloating the agent’s context window. If you need the agent to consistently reference a specific file, mention it in your instructions:

Use the store_gdd.pdf to structure your exploration session

See the Knowledge System documentation for more details.

Bug Review

For testing agents, bug awareness significantly improves quality. The bug review system automatically generates bugs.md and issues.md files based on bugs and ignored issues you’ve tracked, then adds them to the agent’s knowledge. The bug review is not as much about fixing the behavior with executing the instructions, but rather about reporting issues.

This is one of the highest ROI improvements you can make, it requires minimal effort since it’s built into the run recording view, and the generated files are automatically included in the bug checker to reduce false positives.

See the Bug Review documentation to learn more.

Flayer Functions

Integrating the Nunu SDK requires the most effort but offers the highest impact by expanding what the agent can do. Common use cases:

  • Exposing cheat commands
  • Adding internal game state information
  • Navigation and camera control functions

These functions handle tasks that would otherwise require complex motor control or trial-and-error. Instead of playing through 27 levels to test a clan feature, the agent runs one cheat command. Instead of navigating a maze frame-by-frame, it calls a single navigation function.

See the Flayer Functions documentation to get started.

Current Limitations

The AI Agent excels at reasoning, planning, and performing static actions. Understanding its limitations will help you write better tests and set appropriate expectations.

Real-Time Scenarios

The agent needs time to think before each action, making it unsuitable for time-sensitive gameplay. Games with strict time limits or quick-reaction requirements are challenging.

3D Navigation

Navigating 3D environments from 2D screenshots is difficult. The agent lacks spatial and temporal awareness, resulting in poor navigation performance. This can be mitigated by exposing Points of Interest (POI) and navigation functions via the Nunu SDK.

Frame-Based Vision

The agent sees the world as discrete screenshots, not continuous video. This makes it difficult to detect animations or observe what happens during actions.

We’re actively working on frame sequences and limited video support for better temporal awareness.

Last updated on