Initial requirements documentation: English and Chinese versions covering media ingestion, LLM classification, TMDB-first metadata, streaming, search, and API specs

2026-05-11 12:59:26 +08:00
commit d185fccd46
2 changed files with 429 additions and 0 deletions
@@ -0,0 +1,202 @@
+# LLM Base Media Server — Requirements
+
+## Overview
+
+A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.
+
+---
+
+## 1. Functional Requirements
+
+### 1.1 Media Ingestion & Library Management
+- [ ] Watch configured directories for new media files
+- [ ] Support formats: MKV, MP4, AVI, MOV, WEBM, TS
+- [ ] Fingerprint media files to detect duplicates
+- [ ] Organize library by: Movies / TV Shows / Home Videos
+- [ ] Store library state in a local database (SQLite or embedded)
+
+### 1.2 LLM-Powered Auto-Classification
+- [ ] Identify content type (movie, TV episode, home video) from filename + video analysis
+- [ ] Match movies/TV shows against known titles (local heuristics + LLM reasoning)
+- [ ] Extract season/episode numbers for TV shows
+- [ ] Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
+- [ ] Classify content genre, mood, rating (family-safe, etc.)
+- [ ] Confidence scoring for all LLM-generated tags; flag low-confidence for manual review
+
+### 1.3 Metadata Generation
+
+Metadata priority order: **TMDB/IMDB first → LLM as fallback/supplement**.
+
+#### Primary: External Data Sources
+- [ ] Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
+- [ ] Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
+- [ ] Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
+- [ ] Cache API responses locally to avoid redundant external calls
+- [ ] Store source attribution (e.g., `metadata_source: tmdb`) per item
+
+#### Fallback/Supplement: LLM-Generated
+LLM metadata is used **only when** the external source returns no match or partial data:
+- [ ] Generate description/summary for unmatched titles (home videos, obscure content)
+- [ ] Fill missing fields that TMDB/IMDB did not return
+- [ ] Tag and describe home videos (no external source exists for these)
+- [ ] Mark all LLM-generated fields with a `llm_generated: true` flag for transparency
+
+#### General
+- [ ] Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
+- [ ] Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg
+
+### 1.4 LLM Provider Abstraction
+- [ ] Support local LLMs via Ollama (llama.cpp-compatible)
+- [ ] Support Claude API (Anthropic) for cloud inference
+- [ ] Support OpenAI-compatible APIs
+- [ ] Provider selection per task type (e.g., local for tagging, cloud for summaries)
+- [ ] Graceful fallback: cloud → local if cloud unavailable
+- [ ] Token/cost tracking for cloud providers
+
+### 1.5 Streaming & Playback
+- [ ] HTTP streaming with range request support
+- [ ] HLS adaptive bitrate transcoding (via ffmpeg)
+- [ ] Direct play for supported client formats
+- [ ] Basic web UI for browsing and playback
+
+### 1.6 Search & Discovery
+- [ ] Full-text search across titles, descriptions, tags
+- [ ] Natural language search ("action movies from the 90s", "videos of kids at the beach")
+- [ ] Filter by: genre, year, rating, tags, classification confidence
+
+### 1.7 API
+- [ ] REST API for all library operations
+- [ ] Webhook/event system for processing status updates
+- [ ] API key authentication
+
+---
+
+## 2. Non-Functional Requirements
+
+### 2.1 Performance
+- Processing pipeline must not block streaming; run as background workers
+- Streaming latency < 2s for direct play; < 5s for transcoded streams
+- Support concurrent streams: minimum 2 simultaneous (hardware-dependent)
+
+### 2.2 Storage
+- Metadata and index stored locally (no mandatory cloud dependency)
+- Sidecar files (.nfo, .json) stored alongside media
+- SQLite for metadata DB (upgradeable to PostgreSQL via config)
+
+### 2.3 Privacy
+- All LLM inference can run fully local (no data leaves the machine)
+- Cloud LLM calls are opt-in and clearly logged
+- No telemetry by default
+
+### 2.4 Reliability
+- Crash recovery: resume interrupted processing jobs on restart
+- Idempotent processing: re-indexing a file does not duplicate metadata
+- Graceful degradation: server remains operational if LLM provider is unavailable
+
+### 2.5 Portability
+- Docker image with bundled ffmpeg
+- Single binary for bare-metal deployment (Go preferred for this)
+- Config via TOML file + environment variable overrides
+
+---
+
+## 3. Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                     Media Server Core                    │
+│                                                         │
+│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
+│  │  Watcher │→ │  Ingest      │→ │  Processing      │  │
+│  │ (inotify)│  │  Pipeline    │  │  Queue (workers) │  │
+│  └──────────┘  └──────────────┘  └────────┬─────────┘  │
+│                                           │             │
+│              ┌────────────────────────────┤             │
+│              ▼            ▼               ▼             │
+│            ┌──────────────┐  ┌──────────────┐          │
+│            │ Metadata     │  │Classification│          │
+│            │ Worker       │  │Worker        │          │
+│            └──────┬───────┘  └──────┬───────┘          │
+│                   └─────────────────┘                   │
+│                          │                              │
+│                   ┌──────▼──────┐                       │
+│                   │ LLM Router  │                        │
+│                   └──┬──────┬───┘                       │
+│                      │      │                           │
+│              ┌───────┘      └──────────┐                │
+│         ┌────▼─────┐           ┌───────▼──────┐         │
+│         │  Ollama  │           │  Claude/OAI  │         │
+│         │ (local)  │           │  (cloud)     │         │
+│         └──────────┘           └──────────────┘         │
+│                                                         │
+│  ┌──────────────┐  ┌─────────────┐  ┌───────────────┐  │
+│  │  SQLite DB   │  │  HTTP API   │  │  Web UI       │  │
+│  │  (metadata)  │  │  (REST)     │  │  (player)     │  │
+│  └──────────────┘  └─────────────┘  └───────────────┘  │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 4. Tech Stack Decisions
+
+| Concern              | Choice              | Rationale                                        |
+|----------------------|---------------------|--------------------------------------------------|
+| Backend language     | Go                  | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional |
+| Database             | SQLite (sqlx)       | Zero-config, embeddable, enough for single-user  |
+| Media processing     | ffmpeg (subprocess) | Industry standard, broad format support          |
+| LLM (local)          | Ollama REST API     | Simple HTTP interface, model management built-in |
+| LLM (cloud)          | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer        |
+| Containerization     | Docker + Compose    | Multi-service: server + ollama + optional GPU    |
+| Config format        | TOML                | Human-friendly, Go ecosystem support (viper)     |
+| Web UI               | HTMX + Tailwind     | No JS framework needed, Go template rendering    |
+
+> **Rust alternative**: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.
+
+---
+
+## 5. MVP Scope (Phase 1)
+
+Goal: Working library scanner + LLM tagging + basic web UI + streaming
+
+- [ ] Directory watcher + file ingestion
+- [ ] Movie/TV classification (filename heuristics + LLM disambiguation)
+- [ ] Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
+- [ ] Thumbnail extraction
+- [ ] SQLite metadata store
+- [ ] REST API: list library, get item, trigger re-scan
+- [ ] Basic web UI: grid view + video player
+- [ ] Direct-play HTTP streaming
+- [ ] Ollama integration (local LLM)
+- [ ] Docker Compose setup
+
+---
+
+## 6. Phase 2 (Post-MVP)
+
+- [ ] HLS transcoding
+- [ ] Claude API / OpenAI API integration + provider router
+- [ ] Natural language search
+- [ ] External metadata sources (TVDB, Trakt) enrichment
+- [ ] Multi-user support with watch history
+
+---
+
+## 7. Phase 3 (Future)
+
+- [ ] Mobile-friendly UI / PWA
+- [ ] GPU-accelerated transcoding
+- [ ] Home video scene detection + auto-chapter marking
+- [ ] Face recognition for home video tagging
+- [ ] Collections and playlists
+- [ ] Client apps (Jellyfin protocol compatibility)
+
+---
+
+## 8. Open Questions
+
+- [ ] Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
+- [ ] GPU passthrough in Docker for local LLM acceleration — required or optional?
+- [ ] Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
+- [x] TMDB/TVDB integration: **decided** — TMDB is the primary metadata source; LLM fills gaps only
+- [ ] Multi-user: single-user MVP acceptable, or needed from day one?
@@ -0,0 +1,227 @@
+# LLM 媒体服务器 — 需求文档（中文对照版）
+
+## 项目概述
+
+一个自托管的视频媒体服务器，使用 LLM 对个人视频库（电影、电视剧、家庭录像）进行自动索引、标签分类、元数据生成与字幕生成。参考 Plex/Jellyfin 设计，但深度集成 LLM 能力。支持 Docker 或裸机部署。
+
+---
+
+## 1. 功能需求
+
+### 1.1 媒体摄入与库管理
+- [ ] 监听配置目录，自动发现新媒体文件
+- [ ] 支持格式：MKV、MP4、AVI、MOV、WEBM、TS
+- [ ] 文件指纹识别，防止重复入库
+- [ ] 按类型组织库：电影 / 电视剧 / 家庭录像
+- [ ] 本地数据库存储库状态（SQLite 或嵌入式数据库）
+
+### 1.2 LLM 自动分类
+- [ ] 根据文件名与视频内容识别类型（电影、电视剧集、家庭录像）
+- [ ] 匹配已知电影/电视剧标题（本地启发式规则 + LLM 辅助判断）
+- [ ] 提取电视剧季号与集号
+- [ ] 对家庭录像推断主题、地点、事件（帧分析 + LLM）
+- [ ] 分类内容类型：类型、情绪、分级（家庭友好等）
+- [ ] 对所有 LLM 生成标签进行置信度评分，低置信度标记待人工审核
+
+### 1.3 元数据生成
+
+**元数据优先级：外部数据源优先 → LLM 作为补充/兜底**
+
+#### 主要来源：外部数据源
+
+| 内容类型 | 主要来源 | 次要来源 | 兜底 |
+|---|---|---|---|
+| 电影 | TMDB | IMDB（通过 TMDB ID） | LLM |
+| 电视剧 | TMDB + TVDB | Trakt | LLM |
+| 动漫 | AniDB / AniList | TMDB | LLM |
+| 家庭录像 | — | — | 仅 LLM |
+
+- [ ] 从 TMDB 获取结构化元数据：标题、年份、导演、演员、类型、时长、语言、海报、背景图
+- [ ] 从 IMDB 获取评分与 ID（通过 TMDB 的 IMDB ID 字段）
+- [ ] 从 TVDB 获取电视剧集详情：剧情简介、播出日期、客串演员
+- [ ] 从 Trakt 获取观看数据与社区评分（可选）
+- [ ] 动漫内容从 AniDB 或 AniList 获取专项元数据
+- [ ] 本地缓存 API 响应，避免重复外部请求
+- [ ] 每条记录存储来源标注（如 `metadata_source: tmdb`）
+
+#### 其他可接入数据源（待决策）
+- [ ] OMDb API — IMDB 的简化封装，有免费额度
+- [ ] MyAnimeList (MAL) — 动漫社区评分与元数据
+- [ ] Rotten Tomatoes — 影评人 + 观众评分（无官方 API，需爬取）
+- [ ] Metacritic — 专业影评评分（同上）
+
+#### 兜底/补充：LLM 生成
+LLM 仅在以下情况使用：
+- [ ] 外部数据源未匹配到标题时，生成描述/摘要
+- [ ] 填充 TMDB/IMDB 未返回的缺失字段
+- [ ] 对家庭录像进行标签与描述生成（外部无此类数据）
+- [ ] 所有 LLM 生成字段标记 `llm_generated: true`，保持透明度
+
+#### 通用
+- [ ] 最终元数据写入旁文件（NFO/JSON）与媒体文件同目录存放
+- [ ] 封面/海报：优先使用 TMDB 海报，无则通过 ffmpeg 提取关键帧缩略图
+
+### 1.4 LLM 提供商抽象层
+- [ ] 支持本地 LLM（通过 Ollama REST API）
+- [ ] 支持 Claude API（Anthropic）云端推理
+- [ ] 支持 OpenAI 兼容 API
+- [ ] 按任务类型选择提供商（如：标签用本地，摘要用云端）
+- [ ] 优雅降级：云端不可用时自动回退本地
+- [ ] 记录云端提供商的 token 用量与费用
+
+### 1.5 流媒体与播放
+- [ ] HTTP 流媒体，支持 Range 请求
+- [ ] HLS 自适应码率转码（通过 ffmpeg）
+- [ ] 支持客户端直接播放（无需转码）
+- [ ] 基础 Web UI，支持浏览与播放
+
+### 1.6 搜索与发现
+- [ ] 全文搜索（标题、描述、标签）
+- [ ] 自然语言搜索（如"90年代动作片"、"海边的孩子视频"）
+- [ ] 按类型、年份、评分、标签、置信度筛选
+
+### 1.7 API
+- [ ] REST API，覆盖所有库操作
+- [ ] Webhook/事件系统，推送处理状态更新
+- [ ] API 密钥认证
+
+---
+
+## 2. 非功能需求
+
+### 2.1 性能
+- 处理流水线不阻塞流媒体，后台 Worker 异步执行
+- 直接播放延迟 < 2 秒；转码流延迟 < 5 秒
+- 支持至少 2 路并发流（受硬件限制）
+
+### 2.2 存储
+- 元数据与索引本地存储（无强制云依赖）
+- 旁文件（.nfo、.json）与媒体文件同目录存放
+- SQLite 作为元数据库（可通过配置升级为 PostgreSQL）
+
+### 2.3 隐私
+- LLM 推理可完全本地运行（数据不出本机）
+- 云端 LLM 调用为可选项，明确记录日志
+- 默认无遥测
+
+### 2.4 可靠性
+- 崩溃恢复：重启后自动续跑中断的处理任务
+- 幂等处理：重新索引文件不会产生重复元数据
+- 优雅降级：LLM 提供商不可用时服务器继续正常运行
+
+### 2.5 可移植性
+- Docker 镜像内置 ffmpeg
+- 单二进制文件支持裸机部署（Go 首选）
+- 配置通过 TOML 文件 + 环境变量覆盖
+
+---
+
+## 3. 架构概览
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                     媒体服务器核心                        │
+│                                                         │
+│  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐  │
+│  │ 目录监听 │→ │  摄入流水线  │→ │  处理队列(Worker)│  │
+│  │(inotify) │  │              │  │                  │  │
+│  └──────────┘  └──────────────┘  └────────┬─────────┘  │
+│                                           │             │
+│              ┌────────────────────────────┤             │
+│              ▼            ▼               ▼             │
+│            ┌──────────────┐  ┌──────────────┐          │
+│            │ 元数据       │  │ 分类         │          │
+│            │ Worker       │  │ Worker       │          │
+│            └──────┬───────┘  └──────┬───────┘          │
+│                   └─────────────────┘                   │
+│                          │                              │
+│             ┌────────────▼──────────────┐               │
+│             │     外部数据源路由器       │               │
+│             │ TMDB / TVDB / AniDB /     │               │
+│             │ OpenSubtitles / Trakt     │               │
+│             └────────────┬──────────────┘               │
+│                          │ 未匹配/缺失字段               │
+│                   ┌──────▼──────┐                       │
+│                   │  LLM 路由器 │                        │
+│                   └──┬──────┬───┘                       │
+│                      │      │                           │
+│              ┌────────┘      └──────────┐               │
+│         ┌────▼─────┐           ┌────────▼─────┐         │
+│         │  Ollama  │           │ Claude/OpenAI│         │
+│         │ (本地)   │           │ (云端)       │         │
+│         └──────────┘           └──────────────┘         │
+│                                                         │
+│  ┌──────────────┐  ┌─────────────┐  ┌───────────────┐  │
+│  │  SQLite DB   │  │  HTTP API   │  │  Web UI       │  │
+│  │ （元数据）   │  │  （REST）   │  │ （播放器）    │  │
+│  └──────────────┘  └─────────────┘  └───────────────┘  │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 4. 技术栈决策
+
+| 关注点 | 选型 | 理由 |
+|---|---|---|
+| 后端语言 | Go | 单二进制、HTTP/并发优秀、部署简单 |
+| 数据库 | SQLite (sqlx) | 零配置、可嵌入、单用户足够 |
+| 媒体处理 | ffmpeg（子进程） | 行业标准，格式支持广泛 |
+| LLM 本地 | Ollama REST API | HTTP 接口简单，内置模型管理 |
+| LLM 云端 | Anthropic SDK + OpenAI SDK | 抽象层双提供商支持 |
+| 容器化 | Docker + Compose | 多服务：server + ollama + 可选 GPU |
+| 配置格式 | TOML | 人类友好，Go 生态支持（viper） |
+| Web UI | HTMX + Tailwind | 无 JS 框架，Go 模板渲染 |
+
+> **Rust 备选**：如果性能成为瓶颈（转码流水线），Rust 是可行选项，但 Go 开发速度更快，部署更简单，推荐作为首选。
+
+---
+
+## 5. MVP 范围（第一阶段）
+
+目标：可用的库扫描 + LLM 辅助标签 + 基础 Web UI + 流媒体播放
+
+- [ ] 目录监听 + 文件摄入
+- [ ] 电影/电视剧分类（文件名启发式 + LLM 辅助）
+- [ ] TMDB 元数据获取（主要来源）；LLM 填补缺失字段
+- [ ] 缩略图提取（TMDB 海报优先，ffmpeg 关键帧兜底）
+- [ ] SQLite 元数据存储
+- [ ] REST API：库列表、获取条目、触发重扫
+- [ ] 基础 Web UI：网格视图 + 视频播放器
+- [ ] HTTP 直接播放流媒体
+- [ ] Ollama 集成（本地 LLM）
+- [ ] Docker Compose 配置
+
+---
+
+## 6. 第二阶段（MVP 后）
+
+- [ ] HLS 转码
+- [ ] Claude API / OpenAI API 集成 + 提供商路由
+- [ ] 自然语言搜索
+- [ ] TVDB 集成（电视剧强化）
+- [ ] Trakt 集成（观看历史与社区数据）
+- [ ] 多用户支持与观看记录
+
+---
+
+## 7. 第三阶段（未来规划）
+
+- [ ] 移动端友好 UI / PWA
+- [ ] GPU 加速转码
+- [ ] 家庭录像场景检测 + 自动章节标记
+- [ ] 家庭录像人脸识别标签
+- [ ] 合集与播放列表
+- [ ] Jellyfin 协议兼容（接入 Infuse、Swiftfin 等现有客户端）
+- [ ] AniDB / AniList / MAL 动漫数据源集成
+- [ ] Rotten Tomatoes / Metacritic 评分爬取（无官方 API）
+
+---
+
+## 8. 待决策事项
+
+- [ ] 是否暴露 Jellyfin 兼容 API 以复用现有客户端（Infuse、Swiftfin）？
+- [ ] Docker 中 GPU 直通：必选还是可选？
+- [ ] 家庭录像帧分析是否引入视觉模型（LLaVA / Claude Vision）？
+- [ ] 是否支持多用户（MVP 单用户是否可接受）？
+- [x] ~~TMDB/TVDB 是否接入~~：**已决策** — TMDB 为主要数据源，LLM 仅作补充