Skip to main content

ArcFetch

Zero-config URL fetching that converts web pages to clean markdown with automatic JavaScript rendering fallback. Perfect for AI workflows with 90-95% token reduction.

TypeScript CLI MCP AI Web Scraping

Overview

ArcFetch is a URL fetching and article extraction tool that converts web pages to clean markdown. It uses Mozilla Readability for content extraction with automatic Playwright fallback for JavaScript-heavy sites.

Features

  • Smart Fetching: Simple HTTP first, automatic Playwright fallback for JS-heavy sites
  • Quality Gates: Scoring (0-100) with boilerplate, login wall, paywall, and error page detection
  • Anti-Bot Detection: Stealth plugin, viewport/timezone/locale rotation, realistic headers
  • Clean Markdown: Mozilla Readability + Turndown for 90-95% token reduction vs raw HTML
  • Temp to Docs Workflow: Cache to temp folder, promote to docs when ready
  • Link Extraction: Extract and batch-fetch all links from a cached reference
  • CLI & MCP Server: Available as command-line tool and MCP server with 6 tools
  • Multiple Output Formats: Plain text, JSON, filepath, or summary

Technology Stack

  • TypeScript
  • Bun
  • Mozilla Readability
  • Playwright (automatic fallback)
  • Turndown (HTML to Markdown)