TheCrawler

Community
TypeScript

MCP/CLI/Apify web crawler for agent workflows: markdown, metadata, PDF/DOCX, structured errors, and schema extraction.

About TheCrawler MCP Server

MCP/CLI/Apify web crawler for agent workflows: markdown, metadata, PDF/DOCX, structured errors, and schema extraction.

Features

Tools
Getting Started

Current recommended path uses GitHub source, because npm is still on an older published version. 1. git clone https://github.com/manchittlab/TheCrawler.git 2. cd TheCrawler/engine 3. npm install 4. npm run build 5. Run the MCP server from the built engine. See the repository README and llms-install.md for current client setup notes.

Tools (1)
Functions this server exposes to AI clients

crawl

Extract text, markdown, metadata, links/images, structured data, and structured errors from a public URL.

Example
git clone https://github.com/manchittlab/TheCrawler.git
cd TheCrawler/engine
npm install
npm run build