Also PageAgent's DOM based understanding is pretty simple and based on top of Browser-Use's approach.
On the other hand we construct our own custom Agent Accessibility Trees to represent webpages to models. This approach leads to twice as good performance in WebBench of 300+ tasks (81% vs 40%)
PageAgent doesn't have the strong page understanding - semantic tree representation of the pages - it's just a flat DOM basic stripping of HTML - which makes it hard to navigate shadow DOMs, even same origin iframes for that matter or diff frameworks. And also they do element marking - CUA style not sure if they use it in the actual calls to Qwen. And yeah, as arjun takes 30 steps to even do a basic task of find some info.
What we strengthened building agents working on 2M+ web workflows in the past 4 months - is our representation of pages that seamlessly helps agents go through any page old to new iframes, shadow-DOMs and more. Best part of Rover if you as website owner enable cross-origin reqs, say Doordash has Rover and a merchant be like get my restaurant menu from my website and update in Doordash. Rover agent determines the 3P website need, launches our cloud browser to securely execute 3P site actions gets the menu and updates the merchant menu on Doordash so your users never have to leave your site to do a task - one of a kind enabling cross-site interactions
How does this compare to https://github.com/alibaba/page-agent?
Also PageAgent's DOM based understanding is pretty simple and based on top of Browser-Use's approach.
On the other hand we construct our own custom Agent Accessibility Trees to represent webpages to models. This approach leads to twice as good performance in WebBench of 300+ tasks (81% vs 40%)
I appreciate the responses and will be looking deeply into Rover. Thank you.
PageAgent doesn't have the strong page understanding - semantic tree representation of the pages - it's just a flat DOM basic stripping of HTML - which makes it hard to navigate shadow DOMs, even same origin iframes for that matter or diff frameworks. And also they do element marking - CUA style not sure if they use it in the actual calls to Qwen. And yeah, as arjun takes 30 steps to even do a basic task of find some info.
What we strengthened building agents working on 2M+ web workflows in the past 4 months - is our representation of pages that seamlessly helps agents go through any page old to new iframes, shadow-DOMs and more. Best part of Rover if you as website owner enable cross-origin reqs, say Doordash has Rover and a merchant be like get my restaurant menu from my website and update in Doordash. Rover agent determines the 3P website need, launches our cloud browser to securely execute 3P site actions gets the menu and updates the merchant menu on Doordash so your users never have to leave your site to do a task - one of a kind enabling cross-site interactions
I actually tried out PageAgent it was reaaaally slow, and not that accurate.
You can actually try it out on our own site rtrvr.ai