# LLM Base Media Server — Requirements ## Overview A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal. --- ## 1. Functional Requirements ### 1.1 Media Ingestion & Library Management - [ ] Watch configured directories for new media files - [ ] Support formats: MKV, MP4, AVI, MOV, WEBM, TS - [ ] Fingerprint media files to detect duplicates - [ ] Organize library by: Movies / TV Shows / Home Videos - [ ] Store library state in a local database (SQLite or embedded) ### 1.2 LLM-Powered Auto-Classification - [ ] Identify content type (movie, TV episode, home video) from filename + video analysis - [ ] Match movies/TV shows against known titles (local heuristics + LLM reasoning) - [ ] Extract season/episode numbers for TV shows - [ ] Tag home videos with inferred subjects, locations, events (via frame analysis + LLM) - [ ] Classify content genre, mood, rating (family-safe, etc.) - [ ] Confidence scoring for all LLM-generated tags; flag low-confidence for manual review ### 1.3 Metadata Generation Metadata priority order: **TMDB/IMDB first → LLM as fallback/supplement**. #### Primary: External Data Sources - [ ] Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop - [ ] Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field) - [ ] Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast - [ ] Cache API responses locally to avoid redundant external calls - [ ] Store source attribution (e.g., `metadata_source: tmdb`) per item #### Fallback/Supplement: LLM-Generated LLM metadata is used **only when** the external source returns no match or partial data: - [ ] Generate description/summary for unmatched titles (home videos, obscure content) - [ ] Fill missing fields that TMDB/IMDB did not return - [ ] Tag and describe home videos (no external source exists for these) - [ ] Mark all LLM-generated fields with a `llm_generated: true` flag for transparency #### General - [ ] Embed/store final metadata in sidecar files (NFO/JSON) alongside originals - [ ] Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg ### 1.4 LLM Provider Abstraction - [ ] Support local LLMs via Ollama (llama.cpp-compatible) - [ ] Support Claude API (Anthropic) for cloud inference - [ ] Support OpenAI-compatible APIs - [ ] Provider selection per task type (e.g., local for tagging, cloud for summaries) - [ ] Graceful fallback: cloud → local if cloud unavailable - [ ] Token/cost tracking for cloud providers ### 1.5 Streaming & Playback - [ ] HTTP streaming with range request support - [ ] HLS adaptive bitrate transcoding (via ffmpeg) - [ ] Direct play for supported client formats - [ ] Basic web UI for browsing and playback ### 1.6 Search & Discovery - [ ] Full-text search across titles, descriptions, tags - [ ] Natural language search ("action movies from the 90s", "videos of kids at the beach") - [ ] Filter by: genre, year, rating, tags, classification confidence ### 1.7 API - [ ] REST API for all library operations - [ ] Webhook/event system for processing status updates - [ ] API key authentication --- ## 2. Non-Functional Requirements ### 2.1 Performance - Processing pipeline must not block streaming; run as background workers - Streaming latency < 2s for direct play; < 5s for transcoded streams - Support concurrent streams: minimum 2 simultaneous (hardware-dependent) ### 2.2 Storage - Metadata and index stored locally (no mandatory cloud dependency) - Sidecar files (.nfo, .json) stored alongside media - SQLite for metadata DB (upgradeable to PostgreSQL via config) ### 2.3 Privacy - All LLM inference can run fully local (no data leaves the machine) - Cloud LLM calls are opt-in and clearly logged - No telemetry by default ### 2.4 Reliability - Crash recovery: resume interrupted processing jobs on restart - Idempotent processing: re-indexing a file does not duplicate metadata - Graceful degradation: server remains operational if LLM provider is unavailable ### 2.5 Portability - Docker image with bundled ffmpeg - Single binary for bare-metal deployment (Go preferred for this) - Config via TOML file + environment variable overrides --- ## 3. Architecture Overview ``` ┌─────────────────────────────────────────────────────────┐ │ Media Server Core │ │ │ │ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Watcher │→ │ Ingest │→ │ Processing │ │ │ │ (inotify)│ │ Pipeline │ │ Queue (workers) │ │ │ └──────────┘ └──────────────┘ └────────┬─────────┘ │ │ │ │ │ ┌────────────────────────────┤ │ │ ▼ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Metadata │ │Classification│ │ │ │ Worker │ │Worker │ │ │ └──────┬───────┘ └──────┬───────┘ │ │ └─────────────────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ LLM Router │ │ │ └──┬──────┬───┘ │ │ │ │ │ │ ┌───────┘ └──────────┐ │ │ ┌────▼─────┐ ┌───────▼──────┐ │ │ │ Ollama │ │ Claude/OAI │ │ │ │ (local) │ │ (cloud) │ │ │ └──────────┘ └──────────────┘ │ │ │ │ ┌──────────────┐ ┌─────────────┐ ┌───────────────┐ │ │ │ SQLite DB │ │ HTTP API │ │ Web UI │ │ │ │ (metadata) │ │ (REST) │ │ (player) │ │ │ └──────────────┘ └─────────────┘ └───────────────┘ │ └─────────────────────────────────────────────────────────┘ ``` --- ## 4. Tech Stack Decisions | Concern | Choice | Rationale | |----------------------|---------------------|--------------------------------------------------| | Backend language | Go | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional | | Database | SQLite (sqlx) | Zero-config, embeddable, enough for single-user | | Media processing | ffmpeg (subprocess) | Industry standard, broad format support | | LLM (local) | Ollama REST API | Simple HTTP interface, model management built-in | | LLM (cloud) | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer | | Containerization | Docker + Compose | Multi-service: server + ollama + optional GPU | | Config format | TOML | Human-friendly, Go ecosystem support (viper) | | Web UI | HTMX + Tailwind | No JS framework needed, Go template rendering | > **Rust alternative**: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment. --- ## 5. MVP Scope (Phase 1) Goal: Working library scanner + LLM tagging + basic web UI + streaming - [ ] Directory watcher + file ingestion - [ ] Movie/TV classification (filename heuristics + LLM disambiguation) - [ ] Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only - [ ] Thumbnail extraction - [ ] SQLite metadata store - [ ] REST API: list library, get item, trigger re-scan - [ ] Basic web UI: grid view + video player - [ ] Direct-play HTTP streaming - [ ] Ollama integration (local LLM) - [ ] Docker Compose setup --- ## 6. Phase 2 (Post-MVP) - [ ] HLS transcoding - [ ] Claude API / OpenAI API integration + provider router - [ ] Natural language search - [ ] External metadata sources (TVDB, Trakt) enrichment - [ ] Multi-user support with watch history --- ## 7. Phase 3 (Future) - [ ] Mobile-friendly UI / PWA - [ ] GPU-accelerated transcoding - [ ] Home video scene detection + auto-chapter marking - [ ] Face recognition for home video tagging - [ ] Collections and playlists - [ ] Client apps (Jellyfin protocol compatibility) --- ## 8. Open Questions - [ ] Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)? - [ ] GPU passthrough in Docker for local LLM acceleration — required or optional? - [ ] Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis? - [x] TMDB/TVDB integration: **decided** — TMDB is the primary metadata source; LLM fills gaps only - [ ] Multi-user: single-user MVP acceptable, or needed from day one?