10 KiB
10 KiB
LLM Base Media Server — Requirements
Overview
A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.
1. Functional Requirements
1.1 Media Ingestion & Library Management
- Watch configured directories for new media files
- Support formats: MKV, MP4, AVI, MOV, WEBM, TS
- Fingerprint media files to detect duplicates
- Organize library by: Movies / TV Shows / Home Videos
- Store library state in a local database (SQLite or embedded)
1.2 LLM-Powered Auto-Classification
- Identify content type (movie, TV episode, home video) from filename + video analysis
- Match movies/TV shows against known titles (local heuristics + LLM reasoning)
- Extract season/episode numbers for TV shows
- Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
- Classify content genre, mood, rating (family-safe, etc.)
- Confidence scoring for all LLM-generated tags; flag low-confidence for manual review
1.3 Metadata Generation
Metadata priority order: TMDB/IMDB first → LLM as fallback/supplement.
Primary: External Data Sources
- Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
- Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
- Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
- Cache API responses locally to avoid redundant external calls
- Store source attribution (e.g.,
metadata_source: tmdb) per item
Fallback/Supplement: LLM-Generated
LLM metadata is used only when the external source returns no match or partial data:
- Generate description/summary for unmatched titles (home videos, obscure content)
- Fill missing fields that TMDB/IMDB did not return
- Tag and describe home videos (no external source exists for these)
- Mark all LLM-generated fields with a
llm_generated: trueflag for transparency
General
- Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
- Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg
1.4 LLM Provider Abstraction
- Support local LLMs via Ollama (llama.cpp-compatible)
- Support Claude API (Anthropic) for cloud inference
- Support OpenAI-compatible APIs
- Provider selection per task type (e.g., local for tagging, cloud for summaries)
- Graceful fallback: cloud → local if cloud unavailable
- Token/cost tracking for cloud providers
1.5 Streaming & Playback
- HTTP streaming with range request support
- HLS adaptive bitrate transcoding (via ffmpeg)
- Direct play for supported client formats
- Basic web UI for browsing and playback
1.6 Search & Discovery
- Full-text search across titles, descriptions, tags
- Natural language search ("action movies from the 90s", "videos of kids at the beach")
- Filter by: genre, year, rating, tags, classification confidence
1.7 API
- REST API for all library operations
- Webhook/event system for processing status updates
- API key authentication
2. Non-Functional Requirements
2.1 Performance
- Processing pipeline must not block streaming; run as background workers
- Streaming latency < 2s for direct play; < 5s for transcoded streams
- Support concurrent streams: minimum 2 simultaneous (hardware-dependent)
2.2 Storage
- Metadata and index stored locally (no mandatory cloud dependency)
- Sidecar files (.nfo, .json) stored alongside media
- SQLite for metadata DB (upgradeable to PostgreSQL via config)
2.3 Privacy
- All LLM inference can run fully local (no data leaves the machine)
- Cloud LLM calls are opt-in and clearly logged
- No telemetry by default
2.4 Reliability
- Crash recovery: resume interrupted processing jobs on restart
- Idempotent processing: re-indexing a file does not duplicate metadata
- Graceful degradation: server remains operational if LLM provider is unavailable
2.5 Portability
- Docker image with bundled ffmpeg
- Single binary for bare-metal deployment (Go preferred for this)
- Config via TOML file + environment variable overrides
3. Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Media Server Core │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Watcher │→ │ Ingest │→ │ Processing │ │
│ │ (inotify)│ │ Pipeline │ │ Queue (workers) │ │
│ └──────────┘ └──────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌────────────────────────────┤ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Metadata │ │Classification│ │
│ │ Worker │ │Worker │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ └─────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ LLM Router │ │
│ └──┬──────┬───┘ │
│ │ │ │
│ ┌───────┘ └──────────┐ │
│ ┌────▼─────┐ ┌───────▼──────┐ │
│ │ Ollama │ │ Claude/OAI │ │
│ │ (local) │ │ (cloud) │ │
│ └──────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ SQLite DB │ │ HTTP API │ │ Web UI │ │
│ │ (metadata) │ │ (REST) │ │ (player) │ │
│ └──────────────┘ └─────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────┘
4. Tech Stack Decisions
| Concern | Choice | Rationale |
|---|---|---|
| Backend language | Go | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional |
| Database | SQLite (sqlx) | Zero-config, embeddable, enough for single-user |
| Media processing | ffmpeg (subprocess) | Industry standard, broad format support |
| LLM (local) | Ollama REST API | Simple HTTP interface, model management built-in |
| LLM (cloud) | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer |
| Containerization | Docker + Compose | Multi-service: server + ollama + optional GPU |
| Config format | TOML | Human-friendly, Go ecosystem support (viper) |
| Web UI | HTMX + Tailwind | No JS framework needed, Go template rendering |
Rust alternative: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.
5. MVP Scope (Phase 1)
Goal: Working library scanner + LLM tagging + basic web UI + streaming
- Directory watcher + file ingestion
- Movie/TV classification (filename heuristics + LLM disambiguation)
- Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
- Thumbnail extraction
- SQLite metadata store
- REST API: list library, get item, trigger re-scan
- Basic web UI: grid view + video player
- Direct-play HTTP streaming
- Ollama integration (local LLM)
- Docker Compose setup
6. Phase 2 (Post-MVP)
- HLS transcoding
- Claude API / OpenAI API integration + provider router
- Natural language search
- External metadata sources (TVDB, Trakt) enrichment
- Multi-user support with watch history
7. Phase 3 (Future)
- Mobile-friendly UI / PWA
- GPU-accelerated transcoding
- Home video scene detection + auto-chapter marking
- Face recognition for home video tagging
- Collections and playlists
- Client apps (Jellyfin protocol compatibility)
8. Open Questions
- Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
- GPU passthrough in Docker for local LLM acceleration — required or optional?
- Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
- TMDB/TVDB integration: decided — TMDB is the primary metadata source; LLM fills gaps only
- Multi-user: single-user MVP acceptable, or needed from day one?