What test plan formats do you support?

DOCX and PDF today. Tables, numbered lists, and free-form prose all parse — we extract steps, IDs, and expected outcomes automatically.

Which AI models power the verdicts?

Step routing uses Gemini 2.5 Flash for speed. Pass/fail vision judgements use Gemini 2.5 Pro on every screenshot.

Can the agent test sites that require login?

Yes. Include login credentials or steps inside your plan — the agent executes them like any other action, in an isolated cloud session.

Where do runs execute?

On Browserbase, in a real Chromium instance per run. No headless emulation, no shared state between runs.

How are findings ranked?

Vision reasoning produces a severity score per failure. The PDF surfaces the highest-severity items first so PMs see the blockers without scrolling.

What happens to my plan and screenshots?

Both are stored privately against your account so you can re-open past runs. You can delete them at any time from the Runs page.

Agentic UI QA,
driven by your test plan.

Upload a DOCX or PDF test plan, point at a URL, and get a PM-grade report with screenshots, pass/fail verdicts, and ranked findings. No selectors. No flaky tests. No QA backlog.

Start your first run

10×: Faster than manual click-through QA
0: Selectors or locators to maintain
Vision: Pass/fail verdicts, not DOM guesses
Minutes: From plan upload to PDF report

Manual QA is slow. Scripted tests are brittle.

HandyQATester replaces the spreadsheet-and-stopwatch ritual with a vision-based agent that reads your plan the way a human PM would.

The old way

PMs babysit Loom recordings for every release
Selectors break the moment a designer ships a polish
Engineers maintain hundreds of Playwright files
QA findings live in a Slack thread nobody re-reads

The HandyQATester way

Upload the test plan you already have
Agent runs each step in real Chromium
Gemini vision decides pass/fail from the screen, not the DOM
Ship the polished PDF straight to stakeholders

From test plan to PDF report in three steps

Upload plan

DOCX or PDF. Steps, IDs, expected outcomes — even messy tables work.

Agent runs

Real Chromium via Browserbase. Per-step screenshots. Vision-based verdicts.

PDF report

PM narrative or test-matrix. Embedded screenshots. One-click download.

Built for teams who ship every day

Every capability is designed around one principle: your test plan is the source of truth.

Plan-aware agent

The agent parses your plan and routes each step independently — no scripting required.

Vision verdicts

Gemini 2.5 scores pass/fail from the actual screenshot. No flaky DOM selectors.

Real Chromium

Browserbase runs an isolated cloud browser per session, not a headless emulation.

PM-grade PDF

Choose a narrative report or a test-matrix layout. Screenshots embedded, ready to share.

Accept & re-score loop

Disagree with a verdict? Edit the step, accept an AI suggestion, and re-score in seconds.

Duplicate-plan guard

We fingerprint every upload so you never run the same plan twice by accident.

One run. A team of agents.

Behind every test is a small crew of specialized agents — each doing the one job it's best at, then handing off.

Planner agent

Reads your DOCX/PDF, preserves your numbering, and turns each row into an executable step the rest of the crew can route.

Executor agent

Drives a real Browserbase Chromium session, deciding the next action from intent and the live screenshot — no selectors, no scripts.

Judge agent

Scores every step from the screenshot with vision-grade reasoning, and writes the verdict your PDF will quote.

Coach agent

Reviews shaky steps before and during the run, suggests rewrites you can accept in one click, and re-scores without restarting.

Plus a confidence scorer, a step router, and two report writers working quietly in the background — eight agents per run, one shareable PDF.

Reports your PM will actually read.

Every run produces a designed PDF with per-step screenshots, reasoned verdicts, and a severity-ranked findings list. Send it to stakeholders the same day you cut the release branch.

Step-by-step results with timestamps
Inline screenshots at the moment of verdict
Suggested rewrites for ambiguous steps
Findings ranked by severity, not order

https://acme.com/checkout

runningLogs

checkout-flow-v3.docx · narrative report

1Navigate to /pricingPASS

2Click 'Start free trial'PASS

3Enter card details and submitFAIL

4Verify confirmation messagepending

Top finding · Severity: High

Card submission returns a 500 on Visa test cards. Confirmation never renders.

Built for the people who care about quality

Product Managers

Stop pinging engineers for screenshots. Run the regression you wrote, get a PDF you can paste into a release note.

QA Leads

Cut maintenance to zero. Plan-driven runs scale across products without growing the script library.

Indie Founders

Ship on Friday and sleep on Saturday. Run your critical-path tests after every deploy without hiring a tester.

Frequently asked questions

Stop babysitting test runs.

Upload your first plan and get a PDF report in minutes. No credit card.