MolmoWeb
Open web agents from data to deployment
MolmoWeb is an open visual web agent that performs browser tasks using only screenshot inputs. It includes a full codebase with training scripts, an evaluation harness, an annotation tool for recording human demonstrations, a synthetic data generation pipeline, and client‑side demo UI. The accompanying MolmoWebMix dataset provides a large public collection of web‑navigation examples for training and benchmarking.
The system is aimed at researchers and developers who want to build, fine‑tune, or evaluate web‑automation agents without relying on proprietary models or hidden data. Users can record custom task demonstrations, adapt the model to new domains, and run standardized benchmarks such as WebVoyager, Online‑Mind2Web, WebTailBench, and Deepshop through the provided eval harness.
What distinguishes MolmoWeb is its end‑to‑end openness: the same repository supplies the model, training data, synthetic data generator, and a ready‑to‑use interface for real‑time navigation. This unified stack enables reproducible experiments and serves as a baseline for further development of multimodal web agents.
Reviews
Loading reviews…
Similar apps

AI Coding Agents
OpenBrowser-AI
Connect AI agents to browser through raw CDP
System Monitoring & Maintenance
stagewise
The coding agent that works in its own browser environment
AI Coding Agents
GLM-5V-Turbo
Vision-to-code foundation model for real GUI automation

AI Coding Agents
Rusty Browser
Spawn Zero Code AI Browser Agents Infinitely

AI Coding Agents
ML Intern
Plan research-backed ML runs effortlessly

AI Coding Agents
ml-intern
Hugging Face's AI agent that automates post-training