Agentic UI QA,
driven by your test plan.
Upload a DOCX or PDF test plan, point at a URL, and get a PM-grade report with screenshots, pass/fail verdicts, and ranked findings. No selectors. No flaky tests. No QA backlog.
- 10×
- Faster than manual click-through QA
- 0
- Selectors or locators to maintain
- Vision
- Pass/fail verdicts, not DOM guesses
- Minutes
- From plan upload to PDF report
Manual QA is slow. Scripted tests are brittle.
HandyQATester replaces the spreadsheet-and-stopwatch ritual with a vision-based agent that reads your plan the way a human PM would.
The old way
- PMs babysit Loom recordings for every release
- Selectors break the moment a designer ships a polish
- Engineers maintain hundreds of Playwright files
- QA findings live in a Slack thread nobody re-reads
The HandyQATester way
- Upload the test plan you already have
- Agent runs each step in real Chromium
- Gemini vision decides pass/fail from the screen, not the DOM
- Ship the polished PDF straight to stakeholders
From test plan to PDF report in three steps
Upload plan
DOCX or PDF. Steps, IDs, expected outcomes — even messy tables work.
Agent runs
Real Chromium via Browserbase. Per-step screenshots. Vision-based verdicts.
PDF report
PM narrative or test-matrix. Embedded screenshots. One-click download.
Built for teams who ship every day
Every capability is designed around one principle: your test plan is the source of truth.
Plan-aware agent
The agent parses your plan and routes each step independently — no scripting required.
Vision verdicts
Gemini 2.5 scores pass/fail from the actual screenshot. No flaky DOM selectors.
Real Chromium
Browserbase runs an isolated cloud browser per session, not a headless emulation.
PM-grade PDF
Choose a narrative report or a test-matrix layout. Screenshots embedded, ready to share.
Accept & re-score loop
Disagree with a verdict? Edit the step, accept an AI suggestion, and re-score in seconds.
Duplicate-plan guard
We fingerprint every upload so you never run the same plan twice by accident.
One run. A team of agents.
Behind every test is a small crew of specialized agents — each doing the one job it's best at, then handing off.
Planner agent
Reads your DOCX/PDF, preserves your numbering, and turns each row into an executable step the rest of the crew can route.
Executor agent
Drives a real Browserbase Chromium session, deciding the next action from intent and the live screenshot — no selectors, no scripts.
Judge agent
Scores every step from the screenshot with vision-grade reasoning, and writes the verdict your PDF will quote.
Coach agent
Reviews shaky steps before and during the run, suggests rewrites you can accept in one click, and re-scores without restarting.
Plus a confidence scorer, a step router, and two report writers working quietly in the background — eight agents per run, one shareable PDF.
Reports your PM will actually read.
Every run produces a designed PDF with per-step screenshots, reasoned verdicts, and a severity-ranked findings list. Send it to stakeholders the same day you cut the release branch.
- Step-by-step results with timestamps
- Inline screenshots at the moment of verdict
- Suggested rewrites for ambiguous steps
- Findings ranked by severity, not order
checkout-flow-v3.docx · narrative report
Card submission returns a 500 on Visa test cards. Confirmation never renders.
Built for the people who care about quality
Product Managers
Stop pinging engineers for screenshots. Run the regression you wrote, get a PDF you can paste into a release note.
QA Leads
Cut maintenance to zero. Plan-driven runs scale across products without growing the script library.
Indie Founders
Ship on Friday and sleep on Saturday. Run your critical-path tests after every deploy without hiring a tester.
Frequently asked questions
Stop babysitting test runs.
Upload your first plan and get a PDF report in minutes. No credit card.
