Overview
ArcFetch is a URL fetching and article extraction tool that converts web pages to clean markdown. It uses Mozilla Readability for content extraction with automatic Playwright fallback for JavaScript-heavy sites.
Features
- Smart Fetching: Simple HTTP first, automatic Playwright fallback for JS-heavy sites
- Quality Gates: Scoring (0-100) with boilerplate, login wall, paywall, and error page detection
- Anti-Bot Detection: Stealth plugin, viewport/timezone/locale rotation, realistic headers
- Clean Markdown: Mozilla Readability + Turndown for 90-95% token reduction vs raw HTML
- Temp to Docs Workflow: Cache to temp folder, promote to docs when ready
- Link Extraction: Extract and batch-fetch all links from a cached reference
- CLI & MCP Server: Available as command-line tool and MCP server with 6 tools
- Multiple Output Formats: Plain text, JSON, filepath, or summary
Technology Stack
- TypeScript
- Bun
- Mozilla Readability
- Playwright (automatic fallback)
- Turndown (HTML to Markdown)