Initial requirements documentation: English and Chinese versions covering media ingestion, LLM classification, TMDB-first metadata, streaming, search, and API specs

2026-05-11 12:59:26 +08:00
commit d185fccd46
2 changed files with 429 additions and 0 deletions
@@ -0,0 +1,202 @@
+# LLM Base Media Server — Requirements
+
+## Overview
+
+A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.
+
+---
+
+## 1. Functional Requirements
+
+### 1.1 Media Ingestion & Library Management
+- [ ] Watch configured directories for new media files
+- [ ] Support formats: MKV, MP4, AVI, MOV, WEBM, TS
+- [ ] Fingerprint media files to detect duplicates
+- [ ] Organize library by: Movies / TV Shows / Home Videos
+- [ ] Store library state in a local database (SQLite or embedded)
+
+### 1.2 LLM-Powered Auto-Classification
+- [ ] Identify content type (movie, TV episode, home video) from filename + video analysis
+- [ ] Match movies/TV shows against known titles (local heuristics + LLM reasoning)
+- [ ] Extract season/episode numbers for TV shows
+- [ ] Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
+- [ ] Classify content genre, mood, rating (family-safe, etc.)
+- [ ] Confidence scoring for all LLM-generated tags; flag low-confidence for manual review
+
+### 1.3 Metadata Generation
+
+Metadata priority order: **TMDB/IMDB first → LLM as fallback/supplement**.
+
+#### Primary: External Data Sources
+- [ ] Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
+- [ ] Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
+- [ ] Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
+- [ ] Cache API responses locally to avoid redundant external calls
+- [ ] Store source attribution (e.g., `metadata_source: tmdb`) per item
+
+#### Fallback/Supplement: LLM-Generated
+LLM metadata is used **only when** the external source returns no match or partial data:
+- [ ] Generate description/summary for unmatched titles (home videos, obscure content)
+- [ ] Fill missing fields that TMDB/IMDB did not return
+- [ ] Tag and describe home videos (no external source exists for these)
+- [ ] Mark all LLM-generated fields with a `llm_generated: true` flag for transparency
+
+#### General
+- [ ] Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
+- [ ] Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg
+
+### 1.4 LLM Provider Abstraction
+- [ ] Support local LLMs via Ollama (llama.cpp-compatible)
+- [ ] Support Claude API (Anthropic) for cloud inference
+- [ ] Support OpenAI-compatible APIs
+- [ ] Provider selection per task type (e.g., local for tagging, cloud for summaries)
+- [ ] Graceful fallback: cloud → local if cloud unavailable
+- [ ] Token/cost tracking for cloud providers
+
+### 1.5 Streaming & Playback
+- [ ] HTTP streaming with range request support
+- [ ] HLS adaptive bitrate transcoding (via ffmpeg)
+- [ ] Direct play for supported client formats
+- [ ] Basic web UI for browsing and playback
+
+### 1.6 Search & Discovery
+- [ ] Full-text search across titles, descriptions, tags
+- [ ] Natural language search ("action movies from the 90s", "videos of kids at the beach")
+- [ ] Filter by: genre, year, rating, tags, classification confidence
+
+### 1.7 API
+- [ ] REST API for all library operations
+- [ ] Webhook/event system for processing status updates
+- [ ] API key authentication
+
+---
+
+## 2. Non-Functional Requirements
+
+### 2.1 Performance
+- Processing pipeline must not block streaming; run as background workers
+- Streaming latency < 2s for direct play; < 5s for transcoded streams
+- Support concurrent streams: minimum 2 simultaneous (hardware-dependent)
+
+### 2.2 Storage
+- Metadata and index stored locally (no mandatory cloud dependency)
+- Sidecar files (.nfo, .json) stored alongside media
+- SQLite for metadata DB (upgradeable to PostgreSQL via config)
+
+### 2.3 Privacy
+- All LLM inference can run fully local (no data leaves the machine)
+- Cloud LLM calls are opt-in and clearly logged
+- No telemetry by default
+
+### 2.4 Reliability
+- Crash recovery: resume interrupted processing jobs on restart
+- Idempotent processing: re-indexing a file does not duplicate metadata
+- Graceful degradation: server remains operational if LLM provider is unavailable
+
+### 2.5 Portability
+- Docker image with bundled ffmpeg
+- Single binary for bare-metal deployment (Go preferred for this)
+- Config via TOML file + environment variable overrides
+
+---
+
+## 3. Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                     Media Server Core                    │
+│                                                         │
+│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
+│  │  Watcher │→ │  Ingest      │→ │  Processing      │  │
+│  │ (inotify)│  │  Pipeline    │  │  Queue (workers) │  │
+│  └──────────┘  └──────────────┘  └────────┬─────────┘  │
+│                                           │             │
+│              ┌────────────────────────────┤             │
+│              ▼            ▼               ▼             │
+│            ┌──────────────┐  ┌──────────────┐          │
+│            │ Metadata     │  │Classification│          │
+│            │ Worker       │  │Worker        │          │
+│            └──────┬───────┘  └──────┬───────┘          │
+│                   └─────────────────┘                   │
+│                          │                              │
+│                   ┌──────▼──────┐                       │
+│                   │ LLM Router  │                        │
+│                   └──┬──────┬───┘                       │
+│                      │      │                           │
+│              ┌───────┘      └──────────┐                │
+│         ┌────▼─────┐           ┌───────▼──────┐         │
+│         │  Ollama  │           │  Claude/OAI  │         │
+│         │ (local)  │           │  (cloud)     │         │
+│         └──────────┘           └──────────────┘         │
+│                                                         │
+│  ┌──────────────┐  ┌─────────────┐  ┌───────────────┐  │
+│  │  SQLite DB   │  │  HTTP API   │  │  Web UI       │  │
+│  │  (metadata)  │  │  (REST)     │  │  (player)     │  │
+│  └──────────────┘  └─────────────┘  └───────────────┘  │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 4. Tech Stack Decisions
+
+| Concern              | Choice              | Rationale                                        |
+|----------------------|---------------------|--------------------------------------------------|
+| Backend language     | Go                  | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional |
+| Database             | SQLite (sqlx)       | Zero-config, embeddable, enough for single-user  |
+| Media processing     | ffmpeg (subprocess) | Industry standard, broad format support          |
+| LLM (local)          | Ollama REST API     | Simple HTTP interface, model management built-in |
+| LLM (cloud)          | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer        |
+| Containerization     | Docker + Compose    | Multi-service: server + ollama + optional GPU    |
+| Config format        | TOML                | Human-friendly, Go ecosystem support (viper)     |
+| Web UI               | HTMX + Tailwind     | No JS framework needed, Go template rendering    |
+
+> **Rust alternative**: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.
+
+---
+
+## 5. MVP Scope (Phase 1)
+
+Goal: Working library scanner + LLM tagging + basic web UI + streaming
+
+- [ ] Directory watcher + file ingestion
+- [ ] Movie/TV classification (filename heuristics + LLM disambiguation)
+- [ ] Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
+- [ ] Thumbnail extraction
+- [ ] SQLite metadata store
+- [ ] REST API: list library, get item, trigger re-scan
+- [ ] Basic web UI: grid view + video player
+- [ ] Direct-play HTTP streaming
+- [ ] Ollama integration (local LLM)
+- [ ] Docker Compose setup
+
+---
+
+## 6. Phase 2 (Post-MVP)
+
+- [ ] HLS transcoding
+- [ ] Claude API / OpenAI API integration + provider router
+- [ ] Natural language search
+- [ ] External metadata sources (TVDB, Trakt) enrichment
+- [ ] Multi-user support with watch history
+
+---
+
+## 7. Phase 3 (Future)
+
+- [ ] Mobile-friendly UI / PWA
+- [ ] GPU-accelerated transcoding
+- [ ] Home video scene detection + auto-chapter marking
+- [ ] Face recognition for home video tagging
+- [ ] Collections and playlists
+- [ ] Client apps (Jellyfin protocol compatibility)
+
+---
+
+## 8. Open Questions
+
+- [ ] Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
+- [ ] GPU passthrough in Docker for local LLM acceleration — required or optional?
+- [ ] Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
+- [x] TMDB/TVDB integration: **decided** — TMDB is the primary metadata source; LLM fills gaps only
+- [ ] Multi-user: single-user MVP acceptable, or needed from day one?