# LLM Base Media Server — Requirements

## Overview

A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.

---

## 1. Functional Requirements

### 1.1 Media Ingestion & Library Management
- [ ] Watch configured directories for new media files
- [ ] Support formats: MKV, MP4, AVI, MOV, WEBM, TS
- [ ] Fingerprint media files to detect duplicates
- [ ] Organize library by: Movies / TV Shows / Home Videos
- [ ] Store library state in a local database (SQLite or embedded)

### 1.2 LLM-Powered Auto-Classification
- [ ] Identify content type (movie, TV episode, home video) from filename + video analysis
- [ ] Match movies/TV shows against known titles (local heuristics + LLM reasoning)
- [ ] Extract season/episode numbers for TV shows
- [ ] Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
- [ ] Classify content genre, mood, rating (family-safe, etc.)
- [ ] Confidence scoring for all LLM-generated tags; flag low-confidence for manual review

### 1.3 Metadata Generation

Metadata priority order: **TMDB/IMDB first → LLM as fallback/supplement**.

#### Primary: External Data Sources
- [ ] Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
- [ ] Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
- [ ] Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
- [ ] Cache API responses locally to avoid redundant external calls
- [ ] Store source attribution (e.g., `metadata_source: tmdb`) per item

#### Fallback/Supplement: LLM-Generated
LLM metadata is used **only when** the external source returns no match or partial data:
- [ ] Generate description/summary for unmatched titles (home videos, obscure content)
- [ ] Fill missing fields that TMDB/IMDB did not return
- [ ] Tag and describe home videos (no external source exists for these)
- [ ] Mark all LLM-generated fields with a `llm_generated: true` flag for transparency

#### General
- [ ] Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
- [ ] Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg

### 1.4 LLM Provider Abstraction
- [ ] Support local LLMs via Ollama (llama.cpp-compatible)
- [ ] Support Claude API (Anthropic) for cloud inference
- [ ] Support OpenAI-compatible APIs
- [ ] Provider selection per task type (e.g., local for tagging, cloud for summaries)
- [ ] Graceful fallback: cloud → local if cloud unavailable
- [ ] Token/cost tracking for cloud providers

### 1.5 Streaming & Playback
- [ ] HTTP streaming with range request support
- [ ] HLS adaptive bitrate transcoding (via ffmpeg)
- [ ] Direct play for supported client formats
- [ ] Basic web UI for browsing and playback

### 1.6 Search & Discovery
- [ ] Full-text search across titles, descriptions, tags
- [ ] Natural language search ("action movies from the 90s", "videos of kids at the beach")
- [ ] Filter by: genre, year, rating, tags, classification confidence

### 1.7 API
- [ ] REST API for all library operations
- [ ] Webhook/event system for processing status updates
- [ ] API key authentication

---

## 2. Non-Functional Requirements

### 2.1 Performance
- Processing pipeline must not block streaming; run as background workers
- Streaming latency < 2s for direct play; < 5s for transcoded streams
- Support concurrent streams: minimum 2 simultaneous (hardware-dependent)

### 2.2 Storage
- Metadata and index stored locally (no mandatory cloud dependency)
- Sidecar files (.nfo, .json) stored alongside media
- SQLite for metadata DB (upgradeable to PostgreSQL via config)

### 2.3 Privacy
- All LLM inference can run fully local (no data leaves the machine)
- Cloud LLM calls are opt-in and clearly logged
- No telemetry by default

### 2.4 Reliability
- Crash recovery: resume interrupted processing jobs on restart
- Idempotent processing: re-indexing a file does not duplicate metadata
- Graceful degradation: server remains operational if LLM provider is unavailable

### 2.5 Portability
- Docker image with bundled ffmpeg
- Single binary for bare-metal deployment (Go preferred for this)
- Config via TOML file + environment variable overrides

---

## 3. Architecture Overview

```
┌─────────────────────────────────────────────────────────┐
│                     Media Server Core                    │
│                                                         │
│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
│  │  Watcher │→ │  Ingest      │→ │  Processing      │  │
│  │ (inotify)│  │  Pipeline    │  │  Queue (workers) │  │
│  └──────────┘  └──────────────┘  └────────┬─────────┘  │
│                                           │             │
│              ┌────────────────────────────┤             │
│              ▼            ▼               ▼             │
│            ┌──────────────┐  ┌──────────────┐          │
│            │ Metadata     │  │Classification│          │
│            │ Worker       │  │Worker        │          │
│            └──────┬───────┘  └──────┬───────┘          │
│                   └─────────────────┘                   │
│                          │                              │
│                   ┌──────▼──────┐                       │
│                   │ LLM Router  │                        │
│                   └──┬──────┬───┘                       │
│                      │      │                           │
│              ┌───────┘      └──────────┐                │
│         ┌────▼─────┐           ┌───────▼──────┐         │
│         │  Ollama  │           │  Claude/OAI  │         │
│         │ (local)  │           │  (cloud)     │         │
│         └──────────┘           └──────────────┘         │
│                                                         │
│  ┌──────────────┐  ┌─────────────┐  ┌───────────────┐  │
│  │  SQLite DB   │  │  HTTP API   │  │  Web UI       │  │
│  │  (metadata)  │  │  (REST)     │  │  (player)     │  │
│  └──────────────┘  └─────────────┘  └───────────────┘  │
└─────────────────────────────────────────────────────────┘
```

---

## 4. Tech Stack Decisions

| Concern              | Choice              | Rationale                                        |
|----------------------|---------------------|--------------------------------------------------|
| Backend language     | Go                  | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional |
| Database             | SQLite (sqlx)       | Zero-config, embeddable, enough for single-user  |
| Media processing     | ffmpeg (subprocess) | Industry standard, broad format support          |
| LLM (local)          | Ollama REST API     | Simple HTTP interface, model management built-in |
| LLM (cloud)          | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer        |
| Containerization     | Docker + Compose    | Multi-service: server + ollama + optional GPU    |
| Config format        | TOML                | Human-friendly, Go ecosystem support (viper)     |
| Web UI               | HTMX + Tailwind     | No JS framework needed, Go template rendering    |

> **Rust alternative**: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.

---

## 5. MVP Scope (Phase 1)

Goal: Working library scanner + LLM tagging + basic web UI + streaming

- [ ] Directory watcher + file ingestion
- [ ] Movie/TV classification (filename heuristics + LLM disambiguation)
- [ ] Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
- [ ] Thumbnail extraction
- [ ] SQLite metadata store
- [ ] REST API: list library, get item, trigger re-scan
- [ ] Basic web UI: grid view + video player
- [ ] Direct-play HTTP streaming
- [ ] Ollama integration (local LLM)
- [ ] Docker Compose setup

---

## 6. Phase 2 (Post-MVP)

- [ ] HLS transcoding
- [ ] Claude API / OpenAI API integration + provider router
- [ ] Natural language search
- [ ] External metadata sources (TVDB, Trakt) enrichment
- [ ] Multi-user support with watch history

---

## 7. Phase 3 (Future)

- [ ] Mobile-friendly UI / PWA
- [ ] GPU-accelerated transcoding
- [ ] Home video scene detection + auto-chapter marking
- [ ] Face recognition for home video tagging
- [ ] Collections and playlists
- [ ] Client apps (Jellyfin protocol compatibility)

---

## 8. Open Questions

- [ ] Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
- [ ] GPU passthrough in Docker for local LLM acceleration — required or optional?
- [ ] Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
- [x] TMDB/TVDB integration: **decided** — TMDB is the primary metadata source; LLM fills gaps only
- [ ] Multi-user: single-user MVP acceptable, or needed from day one?