Current recommended path uses GitHub source, because npm is still on an older published version. 1. git clone https://github.com/manchittlab/TheCrawler.git 2. cd TheCrawler/engine 3. npm install 4. npm run build 5. Run the MCP server from the built engine. See the repository README and llms-install.md for current client setup notes.

Tools (1)

Functions this server exposes to AI clients

crawl

Extract text, markdown, metadata, links/images, structured data, and structured errors from a public URL.

Example

git clone https://github.com/manchittlab/TheCrawler.git
cd TheCrawler/engine
npm install
npm run build

Related Servers

Other Web MCP servers you might find useful

Playwright

This MCP Server will help you run browser automation and webscraping using Playwright

83.1K

Brave Search

Web and local search using Brave's Search API

79.5K

Fetch

Web content fetching and conversion for efficient LLM usage

79.5K

Puppeteer

Browser automation and web scraping

79.5K

Stagehand

AI-native browser automation using natural language selectors and visual understanding.

5.5K

Browserbase

Automate browser interactions in the cloud (e.g. web navigation, data extraction, form filling, and more)

3.2K

Resources

GitHub Repository Homepage Documentation