Initial requirements documentation: English and Chinese versions covering media ingestion, LLM classification, TMDB-first metadata, streaming, search, and API specs

This commit is contained in:
2026-05-11 12:59:26 +08:00
commit d185fccd46
2 changed files with 429 additions and 0 deletions
+202
View File
@@ -0,0 +1,202 @@
# LLM Base Media Server — Requirements
## Overview
A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.
---
## 1. Functional Requirements
### 1.1 Media Ingestion & Library Management
- [ ] Watch configured directories for new media files
- [ ] Support formats: MKV, MP4, AVI, MOV, WEBM, TS
- [ ] Fingerprint media files to detect duplicates
- [ ] Organize library by: Movies / TV Shows / Home Videos
- [ ] Store library state in a local database (SQLite or embedded)
### 1.2 LLM-Powered Auto-Classification
- [ ] Identify content type (movie, TV episode, home video) from filename + video analysis
- [ ] Match movies/TV shows against known titles (local heuristics + LLM reasoning)
- [ ] Extract season/episode numbers for TV shows
- [ ] Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
- [ ] Classify content genre, mood, rating (family-safe, etc.)
- [ ] Confidence scoring for all LLM-generated tags; flag low-confidence for manual review
### 1.3 Metadata Generation
Metadata priority order: **TMDB/IMDB first → LLM as fallback/supplement**.
#### Primary: External Data Sources
- [ ] Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
- [ ] Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
- [ ] Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
- [ ] Cache API responses locally to avoid redundant external calls
- [ ] Store source attribution (e.g., `metadata_source: tmdb`) per item
#### Fallback/Supplement: LLM-Generated
LLM metadata is used **only when** the external source returns no match or partial data:
- [ ] Generate description/summary for unmatched titles (home videos, obscure content)
- [ ] Fill missing fields that TMDB/IMDB did not return
- [ ] Tag and describe home videos (no external source exists for these)
- [ ] Mark all LLM-generated fields with a `llm_generated: true` flag for transparency
#### General
- [ ] Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
- [ ] Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg
### 1.4 LLM Provider Abstraction
- [ ] Support local LLMs via Ollama (llama.cpp-compatible)
- [ ] Support Claude API (Anthropic) for cloud inference
- [ ] Support OpenAI-compatible APIs
- [ ] Provider selection per task type (e.g., local for tagging, cloud for summaries)
- [ ] Graceful fallback: cloud → local if cloud unavailable
- [ ] Token/cost tracking for cloud providers
### 1.5 Streaming & Playback
- [ ] HTTP streaming with range request support
- [ ] HLS adaptive bitrate transcoding (via ffmpeg)
- [ ] Direct play for supported client formats
- [ ] Basic web UI for browsing and playback
### 1.6 Search & Discovery
- [ ] Full-text search across titles, descriptions, tags
- [ ] Natural language search ("action movies from the 90s", "videos of kids at the beach")
- [ ] Filter by: genre, year, rating, tags, classification confidence
### 1.7 API
- [ ] REST API for all library operations
- [ ] Webhook/event system for processing status updates
- [ ] API key authentication
---
## 2. Non-Functional Requirements
### 2.1 Performance
- Processing pipeline must not block streaming; run as background workers
- Streaming latency < 2s for direct play; < 5s for transcoded streams
- Support concurrent streams: minimum 2 simultaneous (hardware-dependent)
### 2.2 Storage
- Metadata and index stored locally (no mandatory cloud dependency)
- Sidecar files (.nfo, .json) stored alongside media
- SQLite for metadata DB (upgradeable to PostgreSQL via config)
### 2.3 Privacy
- All LLM inference can run fully local (no data leaves the machine)
- Cloud LLM calls are opt-in and clearly logged
- No telemetry by default
### 2.4 Reliability
- Crash recovery: resume interrupted processing jobs on restart
- Idempotent processing: re-indexing a file does not duplicate metadata
- Graceful degradation: server remains operational if LLM provider is unavailable
### 2.5 Portability
- Docker image with bundled ffmpeg
- Single binary for bare-metal deployment (Go preferred for this)
- Config via TOML file + environment variable overrides
---
## 3. Architecture Overview
```
┌─────────────────────────────────────────────────────────┐
│ Media Server Core │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Watcher │→ │ Ingest │→ │ Processing │ │
│ │ (inotify)│ │ Pipeline │ │ Queue (workers) │ │
│ └──────────┘ └──────────────┘ └────────┬─────────┘ │
│ │ │
│ ┌────────────────────────────┤ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Metadata │ │Classification│ │
│ │ Worker │ │Worker │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ └─────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ LLM Router │ │
│ └──┬──────┬───┘ │
│ │ │ │
│ ┌───────┘ └──────────┐ │
│ ┌────▼─────┐ ┌───────▼──────┐ │
│ │ Ollama │ │ Claude/OAI │ │
│ │ (local) │ │ (cloud) │ │
│ └──────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ SQLite DB │ │ HTTP API │ │ Web UI │ │
│ │ (metadata) │ │ (REST) │ │ (player) │ │
│ └──────────────┘ └─────────────┘ └───────────────┘ │
└─────────────────────────────────────────────────────────┘
```
---
## 4. Tech Stack Decisions
| Concern | Choice | Rationale |
|----------------------|---------------------|--------------------------------------------------|
| Backend language | Go | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional |
| Database | SQLite (sqlx) | Zero-config, embeddable, enough for single-user |
| Media processing | ffmpeg (subprocess) | Industry standard, broad format support |
| LLM (local) | Ollama REST API | Simple HTTP interface, model management built-in |
| LLM (cloud) | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer |
| Containerization | Docker + Compose | Multi-service: server + ollama + optional GPU |
| Config format | TOML | Human-friendly, Go ecosystem support (viper) |
| Web UI | HTMX + Tailwind | No JS framework needed, Go template rendering |
> **Rust alternative**: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.
---
## 5. MVP Scope (Phase 1)
Goal: Working library scanner + LLM tagging + basic web UI + streaming
- [ ] Directory watcher + file ingestion
- [ ] Movie/TV classification (filename heuristics + LLM disambiguation)
- [ ] Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
- [ ] Thumbnail extraction
- [ ] SQLite metadata store
- [ ] REST API: list library, get item, trigger re-scan
- [ ] Basic web UI: grid view + video player
- [ ] Direct-play HTTP streaming
- [ ] Ollama integration (local LLM)
- [ ] Docker Compose setup
---
## 6. Phase 2 (Post-MVP)
- [ ] HLS transcoding
- [ ] Claude API / OpenAI API integration + provider router
- [ ] Natural language search
- [ ] External metadata sources (TVDB, Trakt) enrichment
- [ ] Multi-user support with watch history
---
## 7. Phase 3 (Future)
- [ ] Mobile-friendly UI / PWA
- [ ] GPU-accelerated transcoding
- [ ] Home video scene detection + auto-chapter marking
- [ ] Face recognition for home video tagging
- [ ] Collections and playlists
- [ ] Client apps (Jellyfin protocol compatibility)
---
## 8. Open Questions
- [ ] Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
- [ ] GPU passthrough in Docker for local LLM acceleration — required or optional?
- [ ] Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
- [x] TMDB/TVDB integration: **decided** — TMDB is the primary metadata source; LLM fills gaps only
- [ ] Multi-user: single-user MVP acceptable, or needed from day one?