Web Crawling API
Crawl any site. Every page. Zero infrastructure. Crawl entire websites and discover every URL automatically. Follow internal links, respect robots.txt, handle JavaScript-rendered navigation, and return structured sitemaps or full page content at scale.. BrowserSolver handles the infrastructure so your team can focus on what matters.
Quick Start
A single HTTP request is all you need. No SDK, no dependencies.
Loading...
How It Works
Send a Request
POST a JSON body or use GET with query parameters. No SDK or library needed.
Get Your Image
BrowserSolver executes your browser session, automation flow, barcode, QR code, or label request and returns it as PNG or SVG.
Use It Anywhere
Embed in emails, PDFs, dashboards, documents, or print directly. Works everywhere images work.
Features
Full-Site Discovery
Start from any seed URL and follow all internal links automatically. Get a complete map of every page on a domain in one request.
JavaScript Navigation
Real Chromium renders client-side routing, infinite scroll, and dynamic menus before crawling. No content is missed.
Configurable Depth & Scope
Set crawl depth, page limits, URL include/exclude patterns, and domain scope to target exactly the content you need.
Structured Output
Each crawled page returns its URL, status code, title, metadata, and full Markdown or HTML content as structured JSON.
Use Cases
- Building knowledge bases from documentation sites
- SEO audits and broken link detection across entire domains
- Competitive intelligence by crawling competitor websites
- Training LLMs on domain-specific web content
- Content aggregation for news and research platforms
- Automated site change monitoring and archiving
Frequently Asked Questions
How does the Web Crawling API work?
POST a seed URL to the /api/crawl endpoint. BrowserSolver launches a cloud Chromium browser, follows all internal links up to your configured depth, and returns each page's URL, title, metadata, and full content as Markdown or HTML. No local browser or infrastructure needed.
Does the crawler handle JavaScript-rendered navigation?
Yes. BrowserSolver uses a real Chromium browser to render each page before extracting links. Client-side routing, lazy-loaded menus, and SPAs are all handled correctly. Content hidden behind JavaScript is never missed.
How do I control which pages get crawled?
Set max_pages to cap the total crawl size, crawl_depth to limit link-following depth, and use include_patterns or exclude_patterns (regex or glob) to include or skip URLs matching specific patterns.
What output formats does the crawl API return?
Each crawled page returns structured JSON with the URL, status code, title, and content. Content can be Markdown (clean, LLM-ready text), HTML (raw source), or plain text. The full crawl result is available via a status polling endpoint.
Can I crawl authenticated sites?
Yes. Pass session cookies, Authorization headers, or use BrowserSolver's profile system to persist a logged-in browser state across crawl requests. Authenticated content is fully accessible.
Is there a free tier to get started?
Yes. Sign up for a free API key and start with the included free credits. No credit card required to get started.
Related APIs
Ready to build without browser headaches?
Join engineering teams shipping AI agents and automation at scale. No browser fleet to manage, no infra to maintain, just call the API and go.