Files

T

csf123321 d185fccd46 Initial requirements documentation: English and Chinese versions covering media ingestion, LLM classification, TMDB-first metadata, streaming, search, and API specs

2026-05-11 12:59:26 +08:00

10 KiB

Raw Blame History

LLM Base Media Server — Requirements

Overview

A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.

1. Functional Requirements

1.1 Media Ingestion & Library Management

Watch configured directories for new media files
Support formats: MKV, MP4, AVI, MOV, WEBM, TS
Fingerprint media files to detect duplicates
Organize library by: Movies / TV Shows / Home Videos
Store library state in a local database (SQLite or embedded)

1.2 LLM-Powered Auto-Classification

Identify content type (movie, TV episode, home video) from filename + video analysis
Match movies/TV shows against known titles (local heuristics + LLM reasoning)
Extract season/episode numbers for TV shows
Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
Classify content genre, mood, rating (family-safe, etc.)
Confidence scoring for all LLM-generated tags; flag low-confidence for manual review

1.3 Metadata Generation

Metadata priority order: TMDB/IMDB first → LLM as fallback/supplement.

Primary: External Data Sources

Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
Cache API responses locally to avoid redundant external calls
Store source attribution (e.g., metadata_source: tmdb) per item

Fallback/Supplement: LLM-Generated

LLM metadata is used only when the external source returns no match or partial data:

Generate description/summary for unmatched titles (home videos, obscure content)
Fill missing fields that TMDB/IMDB did not return
Tag and describe home videos (no external source exists for these)
Mark all LLM-generated fields with a llm_generated: true flag for transparency

General

Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg

1.4 LLM Provider Abstraction

Support local LLMs via Ollama (llama.cpp-compatible)
Support Claude API (Anthropic) for cloud inference
Support OpenAI-compatible APIs
Provider selection per task type (e.g., local for tagging, cloud for summaries)
Graceful fallback: cloud → local if cloud unavailable
Token/cost tracking for cloud providers

1.5 Streaming & Playback

HTTP streaming with range request support
HLS adaptive bitrate transcoding (via ffmpeg)
Direct play for supported client formats
Basic web UI for browsing and playback

1.6 Search & Discovery

Full-text search across titles, descriptions, tags
Natural language search ("action movies from the 90s", "videos of kids at the beach")
Filter by: genre, year, rating, tags, classification confidence

1.7 API

REST API for all library operations
Webhook/event system for processing status updates
API key authentication

2. Non-Functional Requirements

2.1 Performance

Processing pipeline must not block streaming; run as background workers
Streaming latency < 2s for direct play; < 5s for transcoded streams
Support concurrent streams: minimum 2 simultaneous (hardware-dependent)

2.2 Storage

Metadata and index stored locally (no mandatory cloud dependency)
Sidecar files (.nfo, .json) stored alongside media
SQLite for metadata DB (upgradeable to PostgreSQL via config)

2.3 Privacy

All LLM inference can run fully local (no data leaves the machine)
Cloud LLM calls are opt-in and clearly logged
No telemetry by default

2.4 Reliability

Crash recovery: resume interrupted processing jobs on restart
Idempotent processing: re-indexing a file does not duplicate metadata
Graceful degradation: server remains operational if LLM provider is unavailable

2.5 Portability

Docker image with bundled ffmpeg
Single binary for bare-metal deployment (Go preferred for this)
Config via TOML file + environment variable overrides

3. Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                     Media Server Core                    │
│                                                         │
│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │  Watcher │→ │  Ingest      │→ │  Processing      │  │
│  │ (inotify)│  │  Pipeline    │  │  Queue (workers) │  │
│  └──────────┘  └──────────────┘  └────────┬─────────┘  │
│                                           │             │
│              ┌────────────────────────────┤             │
│              ▼            ▼               ▼             │
│            ┌──────────────┐  ┌──────────────┐          │
│            │ Metadata     │  │Classification│          │
│            │ Worker       │  │Worker        │          │
│            └──────┬───────┘  └──────┬───────┘          │
│                   └─────────────────┘                   │
│                          │                              │
│                   ┌──────▼──────┐                       │
│                   │ LLM Router  │                        │
│                   └──┬──────┬───┘                       │
│                      │      │                           │
│              ┌───────┘      └──────────┐                │
│         ┌────▼─────┐           ┌───────▼──────┐         │
│         │  Ollama  │           │  Claude/OAI  │         │
│         │ (local)  │           │  (cloud)     │         │
│         └──────────┘           └──────────────┘         │
│                                                         │
│  ┌──────────────┐  ┌─────────────┐  ┌───────────────┐  │
│  │  SQLite DB   │  │  HTTP API   │  │  Web UI       │  │
│  │  (metadata)  │  │  (REST)     │  │  (player)     │  │
│  └──────────────┘  └─────────────┘  └───────────────┘  │
└─────────────────────────────────────────────────────────┘

4. Tech Stack Decisions

Concern	Choice	Rationale
Backend language	Go	Single binary, excellent HTTP/concurrency, ffmpeg CGO optional
Database	SQLite (sqlx)	Zero-config, embeddable, enough for single-user
Media processing	ffmpeg (subprocess)	Industry standard, broad format support
LLM (local)	Ollama REST API	Simple HTTP interface, model management built-in
LLM (cloud)	Anthropic SDK + OpenAI SDK	Dual-provider via abstraction layer
Containerization	Docker + Compose	Multi-service: server + ollama + optional GPU
Config format	TOML	Human-friendly, Go ecosystem support (viper)
Web UI	HTMX + Tailwind	No JS framework needed, Go template rendering

Rust alternative: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.

5. MVP Scope (Phase 1)

Goal: Working library scanner + LLM tagging + basic web UI + streaming

Directory watcher + file ingestion
Movie/TV classification (filename heuristics + LLM disambiguation)
Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
Thumbnail extraction
SQLite metadata store
REST API: list library, get item, trigger re-scan
Basic web UI: grid view + video player
Direct-play HTTP streaming
Ollama integration (local LLM)
Docker Compose setup

6. Phase 2 (Post-MVP)

HLS transcoding
Claude API / OpenAI API integration + provider router
Natural language search
External metadata sources (TVDB, Trakt) enrichment
Multi-user support with watch history

7. Phase 3 (Future)

Mobile-friendly UI / PWA
GPU-accelerated transcoding
Home video scene detection + auto-chapter marking
Face recognition for home video tagging
Collections and playlists
Client apps (Jellyfin protocol compatibility)

8. Open Questions

Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
GPU passthrough in Docker for local LLM acceleration — required or optional?
Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
TMDB/TVDB integration: decided — TMDB is the primary metadata source; LLM fills gaps only
Multi-user: single-user MVP acceptable, or needed from day one?

10 KiB Raw Blame History