TheCrawler
MCP/CLI/Apify web crawler for agent workflows: markdown, metadata, PDF/DOCX, structured errors, and schema extraction.
MCP/CLI/Apify web crawler for agent workflows: markdown, metadata, PDF/DOCX, structured errors, and schema extraction.
Features
Category
Current recommended path uses GitHub source, because npm is still on an older published version. 1. git clone https://github.com/manchittlab/TheCrawler.git 2. cd TheCrawler/engine 3. npm install 4. npm run build 5. Run the MCP server from the built engine. See the repository README and llms-install.md for current client setup notes.
crawl
Extract text, markdown, metadata, links/images, structured data, and structured errors from a public URL.
git clone https://github.com/manchittlab/TheCrawler.git
cd TheCrawler/engine
npm install
npm run buildPlaywright
This MCP Server will help you run browser automation and webscraping using Playwright
Brave Search
Web and local search using Brave's Search API
Fetch
Web content fetching and conversion for efficient LLM usage
Puppeteer
Browser automation and web scraping
Stagehand
AI-native browser automation using natural language selectors and visual understanding.
Browserbase
Automate browser interactions in the cloud (e.g. web navigation, data extraction, form filling, and more)