Paperless-ngx Docker Setup: Full Install with OCR, PostgreSQL, and Auto-Tagging
By LK Wood IV · 2026-05-21 · ~14 min read · St. Louis County, MO
Drop a scanned PDF into a folder. Thirty seconds later it’s OCR’d, tagged, searchable by full text, and automatically filed under the correspondent that sent it. That’s Paperless-ngx working correctly.
Getting there requires the full compose stack — not just the Paperless container. The official docs make Tika and Gotenberg optional. Treat them as required: without them, any non-PDF file that hits your consume folder will ingest silently with zero searchable text. This guide builds the complete stack from scratch.
The full stack
Paperless-ngx is not a single container. For production use you need five services:
| Container | Role |
|---|---|
paperless-ngx | Core app, web UI, OCR processing |
paperless-db | PostgreSQL — document metadata, tags, correspondents |
paperless-redis | Task queue for background OCR jobs |
paperless-tika | Document text extraction (Word, Excel, PowerPoint, ODT) |
paperless-gotenberg | PDF rendering of non-PDF documents for OCR |
Redis is the task broker — without it, OCR runs in-process and blocks the UI during large imports. Tika and Gotenberg together cover the full document format surface. PostgreSQL is required; SQLite write-locks under concurrent access.
Directory layout
Create this before writing any compose files:
mkdir -p /opt/paperless/{consume,data,export,media}
consume/— drop documents here for auto-processingdata/— Paperless application data (classification models, search index)export/— output ofdocument_exportercommand for backupsmedia/— archived original files and thumbnails
Docker Compose
services:
paperless-db:
image: postgres:16
container_name: paperless-db
restart: unless-stopped
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: CHANGE_THIS_PASSWORD
volumes:
- /opt/paperless/db:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U paperless"]
interval: 10s
timeout: 5s
retries: 5
paperless-redis:
image: redis:7-alpine
container_name: paperless-redis
restart: unless-stopped
command: redis-server --save 60 1 --loglevel warning
paperless-tika:
image: ghcr.io/paperless-ngx/tika:latest
container_name: paperless-tika
restart: unless-stopped
paperless-gotenberg:
image: gotenberg/gotenberg:8
container_name: paperless-gotenberg
restart: unless-stopped
command:
- gotenberg
- --chromium-disable-javascript=true
- --chromium-allow-list=file:///tmp/.*
paperless:
image: ghcr.io/paperless-ngx/paperless-ngx:2
container_name: paperless
restart: unless-stopped
depends_on:
paperless-db:
condition: service_healthy
paperless-redis:
condition: service_started
paperless-tika:
condition: service_started
paperless-gotenberg:
condition: service_started
ports:
- "8010:8000"
volumes:
- /opt/paperless/data:/usr/src/paperless/data
- /opt/paperless/media:/usr/src/paperless/media
- /opt/paperless/consume:/usr/src/paperless/consume
- /opt/paperless/export:/usr/src/paperless/export
environment:
PAPERLESS_REDIS: redis://paperless-redis:6379
PAPERLESS_DBHOST: paperless-db
PAPERLESS_DBNAME: paperless
PAPERLESS_DBUSER: paperless
PAPERLESS_DBPASS: CHANGE_THIS_PASSWORD
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_TIKA_ENDPOINT: http://paperless-tika:9998
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://paperless-gotenberg:3000
PAPERLESS_URL: https://paperless.yourdomain.com
PAPERLESS_SECRET_KEY: CHANGE_THIS_SECRET_KEY
PAPERLESS_TIME_ZONE: America/Chicago
PAPERLESS_OCR_LANGUAGE: eng
PAPERLESS_CONSUMER_POLLING: 60
PAPERLESS_CONSUMER_RECURSIVE: 1
PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: 1
USERMAP_UID: 1000
USERMAP_GID: 1000
Three values require changes before starting:
PAPERLESS_DBPASSandPOSTGRES_PASSWORD— must match, must not be the defaultPAPERLESS_SECRET_KEY— generate withopenssl rand -hex 32PAPERLESS_URL— the URL you’ll access Paperless from (used for CSRF protection)PAPERLESS_TIME_ZONE— your local timezone (see IANA timezone list)PAPERLESS_OCR_LANGUAGE— three-letter Tesseract language code;engfor English,deufor German,frafor French,spafor Spanish
First-run setup
cd /opt/paperless
docker compose up -d
# Create the admin user
docker compose exec paperless python3 manage.py createsuperuser
Follow the prompts for username, email, and password. Then access the UI at http://your-host-ip:8010.
Reverse proxy with Nginx Proxy Manager
Port 8010 is fine for local access. For a proper subdomain (paperless.yourdomain.com) with SSL, add an NPM proxy host pointing to your Paperless container’s port 8000.
In NPM:
- Add Proxy Host →
paperless.yourdomain.com→ Forward topaperless:8000(or your host IP:8010) - Enable SSL with Let’s Encrypt
- Advanced tab — add these headers:
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_max_body_size 50M;
The client_max_body_size 50M is necessary for uploading large scanned documents through the web UI. Without it, large PDFs return 413 errors.
Full NPM setup is in the Nginx Proxy Manager guide.
Subdirectory-to-tag mapping
The compose file includes PAPERLESS_CONSUMER_SUBDIRS_AS_TAGS: 1. This means if you create subdirectories inside the consume folder, files dropped there get automatically tagged with that folder name:
consume/
financial/ → documents tagged "financial"
medical/ → documents tagged "medical"
insurance/ → documents tagged "insurance"
tax/ → documents tagged "tax"
This is the fastest way to build an organized archive from the start — your scanner or phone app can save to the right subfolder and the tag appears automatically.
Auto-tagging with matching rules
After the initial setup, configure correspondents and document types via the Admin panel (Settings → Correspondents and Document Types) or through the main UI.
A matching rule checks the OCR’d text and applies metadata automatically. Examples that work well:
Correspondent: Chase Bank
- Matching algorithm: Any word
- Match:
Chase,JPMorgan - Auto-assign correspondent: Chase Bank
- Auto-assign tag: financial
Correspondent: IRS
- Matching algorithm: Any word
- Match:
Internal Revenue Service,Department of the Treasury,IRS - Auto-assign correspondent: IRS
- Auto-assign tag: tax, federal
Document type: Insurance
- Matching algorithm: Any word
- Match:
policy number,deductible,premium,coverage - Auto-assign document type: Insurance Policy
Once these rules exist, every document that hits the consume folder gets correspondent and type assignment without you touching it. The payoff compounds: after six months, your Paperless library is fully organized with zero manual filing.
Supported file types
Files Paperless-ngx handles with the full Tika + Gotenberg stack:
| Format | Handling |
|---|---|
| PDF (native text) | Extracted directly |
| PDF (scanned image) | Tesseract OCR |
| JPEG, PNG, TIFF | Tesseract OCR |
| DOCX, ODT | Tika text extraction + Gotenberg render |
| XLSX, ODS | Tika text extraction |
| PPTX, ODP | Tika text extraction + Gotenberg render |
| HTML | Gotenberg render + OCR |
| TXT, CSV | Direct text import |
| EML (email export) | Tika parsing |
Without Tika and Gotenberg, everything above PDF/image fails silently.
Performance and resource usage
At idle with no active processing:
| Container | Idle RAM |
|---|---|
| paperless-ngx | ~180 MB |
| paperless-db (PostgreSQL) | ~50 MB |
| paperless-redis | ~10 MB |
| paperless-tika | ~200 MB |
| paperless-gotenberg | ~60 MB |
| Total | ~500 MB |
During OCR of a 10-page scanned PDF: the Paperless container CPU spikes to 80–100% on one core for 5–15 seconds per document. OCR is CPU-bound. On an Intel N100, single-document OCR takes about 8–12 seconds per page. A multi-core machine processes documents faster because background task workers scale with CPU count.
For bulk imports (100+ documents), set PAPERLESS_TASK_WORKERS to match your core count:
PAPERLESS_TASK_WORKERS: 4 # For a 4-core host
Back up your archive
Paperless-ngx backup strategy has two components:
1. PostgreSQL database dump
docker compose exec paperless-db pg_dump -U paperless paperless \
> /opt/paperless/export/paperless-$(date +%Y%m%d).sql
2. Document export (optional but useful)
The document_exporter command writes all archived documents plus a manifest JSON to your export directory:
docker compose exec paperless document_exporter /usr/src/paperless/export
This export is human-readable and self-contained — if you need to rebuild Paperless from scratch, the exporter output plus a fresh database is enough to restore everything including tags, correspondents, and custom fields.
Schedule both to run nightly via a cron job or systemd timer, then send the output to off-site storage. The restic off-site backup guide covers sending these exports to Backblaze B2 automatically.
What to back up:
| Path | Contents | Required for restore |
|---|---|---|
/opt/paperless/media/ | Original document files + thumbnails | Yes |
/opt/paperless/data/ | Classification models, search index | Yes |
paperless-$(date).sql | Database dump | Yes |
/opt/paperless/export/ | Human-readable document export | Optional but recommended |
Scanning workflow that works
The consume folder approach works best when your scanner can write directly to it over the network. Two reliable paths:
Network scanner with SMB/FTP support: configure the scanner to save to a Samba share backed by the consume directory. On Linux, a simple smb.conf entry makes /opt/paperless/consume accessible as \\server\paperless-consume.
Mobile scanning: apps like Microsoft Lens, Adobe Scan, or Genius Scan save to any cloud storage or network share. Point them at an SMB or WebDAV share backed by the consume folder. Alternatively, use Nextcloud as the intermediary — Nextcloud can watch a folder and copy files to the consume directory via an automation. The Nextcloud AIO setup guide covers the WebDAV configuration.
Manual drop: for documents you already have digitally, drag them into the consume directory via scp, a mounted SMB share, or the Nextcloud interface.
Upgrading Paperless-ngx
Pin the major version tag (2) rather than latest to avoid breaking changes across major versions:
image: ghcr.io/paperless-ngx/paperless-ngx:2
To upgrade to a new minor release:
docker compose pull
docker compose up -d
Paperless runs database migrations automatically on startup. No manual migration step needed for minor version upgrades. For major version bumps (1.x → 2.x), read the release notes — major versions occasionally require a manual migration command.
Common problems
Documents ingested but no searchable text: Tika or Gotenberg is not reachable. Check docker logs paperless for connection errors to paperless-tika:9998 or paperless-gotenberg:3000. Both containers must be running before Paperless starts.
413 Request Entity Too Large: NPM’s default body size limit is too low. Add client_max_body_size 50M; to your NPM proxy host’s Advanced configuration.
CSRF verification failed: PAPERLESS_URL is set incorrectly. It must match the exact URL you access in the browser — including the scheme (https://), domain, and no trailing slash.
OCR’d text looks garbled: wrong language set. English documents being OCR’d with German (deu) will produce garbage. Check PAPERLESS_OCR_LANGUAGE matches your document language. Multiple languages are supported with a + separator: eng+deu for bilingual archives.
Consumer not picking up files: check that the consume volume mount in the compose file points to the right host path. Use docker compose exec paperless ls /usr/src/paperless/consume to confirm the container sees the files you dropped.
For the broader self-hosted stack this lives in, the 12 best self-hosted apps guide covers how Paperless-ngx fits alongside Immich, Nextcloud, and Vaultwarden. For file sync to mobile so you can scan from your phone, the Nextcloud AIO setup guide covers WebDAV and folder sync configuration. For off-site backups of your document archive, the restic backup guide handles the export-to-B2 pipeline. The Docker Compose starter stack covers the monitoring and reverse proxy layer that Paperless runs alongside.