Paratran

Overview

Paratran is a transcription tool for macOS on Apple Silicon. It wraps NVIDIA’s Parakeet TDT model (running through parakeet-mlx) in three interfaces: a CLI for one-off jobs, a REST API for long-running services, and an MCP server so Claude Code or Claude Desktop can transcribe audio as a tool call.

I built it because Whisper is slow on a laptop and most of the fast alternatives are cloud services. The default Parakeet model hits 6.34% average WER across eight English benchmarks, supports 25 languages, and runs roughly 30x faster than Whisper on the same hardware via MLX. It’s fast enough that a one-hour recording transcribes in well under a minute on an M-series chip.

How paratran works: audio files (wav, mp3, flac, m4a, ogg, webm) flow into one of three clients (the local CLI, a FastAPI REST server, or an MCP server), which all hit the same core service. The core service ingests, decodes and resamples to 16 kHz mono, optionally chunks long audio with overlap, runs greedy or beam-search decoding, and post-processes into sentences with timestamps and tokens. Inference runs on parakeet-mlx via Apple's MLX framework using the default mlx-community/parakeet-tdt-0.6b-v3 model, optimized for M1/M2/M3/M4 with ~2 GB RAM and roughly 30x faster than Whisper. Output is returned as txt, json, srt, vtt, or all four. The Hugging Face model cache is configurable via --cache-dir or PARATRAN_MODEL_DIR, and CLI flags, env vars, and API/MCP parameters drive configuration.

Features

CLI

Transcribe one file or a batch in a single command
Output as plain text, JSON, SRT, or WebVTT (or all four at once)
Greedy or beam search decoding with full control over beam size, length penalty, patience, and duration reward
Chunking for long audio with configurable overlap so sentence boundaries survive joins
Optional sentence splitting by max words, max duration, or silence gap
BF16 by default with an --fp32 flag for environments that need it

Client mode (no per-file model reload)

The model takes a few seconds to load. Reloading it for every file is wasteful, so the same paratran binary doubles as a client: start paratran serve once, then point subsequent invocations at it with -s http://localhost:8000 (or PARATRAN_SERVER in the environment) and they hand the file off over HTTP and print the result. The CLI surface is identical either way.

REST API

A FastAPI server exposes /health and /transcribe. Upload a wav, mp3, flac, m4a, ogg, or webm file and get back the text plus per-sentence timing and per-token timestamps as JSON. All decoding and sentence-splitting options are query parameters, and interactive docs ship at /docs.

bash

curl -X POST http://localhost:8000/transcribe -F "file=@recording.m4a"

MCP server

The same package installs paratran-mcp, an MCP server that exposes a transcribe tool to any MCP client. Stdio is the default (drop a few lines into .claude/settings.json and Claude Code can transcribe audio files in-place), and a streamable HTTP transport is available for remote or multi-client setups.

json

{
  "mcpServers": {
    "paratran": {
      "command": "uvx",
      "args": ["--from", "paratran", "paratran-mcp"]
    }
  }
}

Quick start

bash

# Run without installing
uvx paratran recording.wav

# Or install as a tool
uv tool install paratran
paratran recording.wav

# Start the server once, transcribe many files instantly
paratran serve &
paratran -s http://localhost:8000 --output-format all recording.wav

The default model (parakeet-tdt-0.6b-v3) downloads on first use into the HuggingFace cache. --cache-dir or PARATRAN_MODEL_DIR redirects it to an external drive when the local SSD is tight.

Technology stack

Python 3.11+, distributed on PyPI as paratran
parakeet-mlx for inference, running on Apple’s MLX framework
FastAPI and Uvicorn for the REST server
The official mcp Python SDK for the MCP server, with stdio and streamable-HTTP transports
macOS on M1/M2/M3/M4 (the MLX backend is Apple Silicon only)