Initial requirements documentation: English and Chinese versions covering media ingestion, LLM classification, TMDB-first metadata, streaming, search, and API specs
This commit is contained in:
+202
@@ -0,0 +1,202 @@
|
||||
# LLM Base Media Server — Requirements
|
||||
|
||||
## Overview
|
||||
|
||||
A self-hosted video media server that uses LLMs to automatically index, tag, classify, and generate metadata/descriptions/subtitles for a personal video library (movies, TV shows, home videos). Inspired by Plex/Jellyfin but with deep LLM integration. Deployable via Docker or bare-metal.
|
||||
|
||||
---
|
||||
|
||||
## 1. Functional Requirements
|
||||
|
||||
### 1.1 Media Ingestion & Library Management
|
||||
- [ ] Watch configured directories for new media files
|
||||
- [ ] Support formats: MKV, MP4, AVI, MOV, WEBM, TS
|
||||
- [ ] Fingerprint media files to detect duplicates
|
||||
- [ ] Organize library by: Movies / TV Shows / Home Videos
|
||||
- [ ] Store library state in a local database (SQLite or embedded)
|
||||
|
||||
### 1.2 LLM-Powered Auto-Classification
|
||||
- [ ] Identify content type (movie, TV episode, home video) from filename + video analysis
|
||||
- [ ] Match movies/TV shows against known titles (local heuristics + LLM reasoning)
|
||||
- [ ] Extract season/episode numbers for TV shows
|
||||
- [ ] Tag home videos with inferred subjects, locations, events (via frame analysis + LLM)
|
||||
- [ ] Classify content genre, mood, rating (family-safe, etc.)
|
||||
- [ ] Confidence scoring for all LLM-generated tags; flag low-confidence for manual review
|
||||
|
||||
### 1.3 Metadata Generation
|
||||
|
||||
Metadata priority order: **TMDB/IMDB first → LLM as fallback/supplement**.
|
||||
|
||||
#### Primary: External Data Sources
|
||||
- [ ] Fetch structured metadata from TMDB (movies, TV shows): title, year, director, cast, genre, runtime, language, poster, backdrop
|
||||
- [ ] Fetch ratings and IDs from IMDB (via TMDB's IMDB ID field)
|
||||
- [ ] Fetch TV episode details from TMDB: season/episode synopsis, air date, guest cast
|
||||
- [ ] Cache API responses locally to avoid redundant external calls
|
||||
- [ ] Store source attribution (e.g., `metadata_source: tmdb`) per item
|
||||
|
||||
#### Fallback/Supplement: LLM-Generated
|
||||
LLM metadata is used **only when** the external source returns no match or partial data:
|
||||
- [ ] Generate description/summary for unmatched titles (home videos, obscure content)
|
||||
- [ ] Fill missing fields that TMDB/IMDB did not return
|
||||
- [ ] Tag and describe home videos (no external source exists for these)
|
||||
- [ ] Mark all LLM-generated fields with a `llm_generated: true` flag for transparency
|
||||
|
||||
#### General
|
||||
- [ ] Embed/store final metadata in sidecar files (NFO/JSON) alongside originals
|
||||
- [ ] Thumbnail/poster: use TMDB poster if available, else extract keyframe via ffmpeg
|
||||
|
||||
### 1.4 LLM Provider Abstraction
|
||||
- [ ] Support local LLMs via Ollama (llama.cpp-compatible)
|
||||
- [ ] Support Claude API (Anthropic) for cloud inference
|
||||
- [ ] Support OpenAI-compatible APIs
|
||||
- [ ] Provider selection per task type (e.g., local for tagging, cloud for summaries)
|
||||
- [ ] Graceful fallback: cloud → local if cloud unavailable
|
||||
- [ ] Token/cost tracking for cloud providers
|
||||
|
||||
### 1.5 Streaming & Playback
|
||||
- [ ] HTTP streaming with range request support
|
||||
- [ ] HLS adaptive bitrate transcoding (via ffmpeg)
|
||||
- [ ] Direct play for supported client formats
|
||||
- [ ] Basic web UI for browsing and playback
|
||||
|
||||
### 1.6 Search & Discovery
|
||||
- [ ] Full-text search across titles, descriptions, tags
|
||||
- [ ] Natural language search ("action movies from the 90s", "videos of kids at the beach")
|
||||
- [ ] Filter by: genre, year, rating, tags, classification confidence
|
||||
|
||||
### 1.7 API
|
||||
- [ ] REST API for all library operations
|
||||
- [ ] Webhook/event system for processing status updates
|
||||
- [ ] API key authentication
|
||||
|
||||
---
|
||||
|
||||
## 2. Non-Functional Requirements
|
||||
|
||||
### 2.1 Performance
|
||||
- Processing pipeline must not block streaming; run as background workers
|
||||
- Streaming latency < 2s for direct play; < 5s for transcoded streams
|
||||
- Support concurrent streams: minimum 2 simultaneous (hardware-dependent)
|
||||
|
||||
### 2.2 Storage
|
||||
- Metadata and index stored locally (no mandatory cloud dependency)
|
||||
- Sidecar files (.nfo, .json) stored alongside media
|
||||
- SQLite for metadata DB (upgradeable to PostgreSQL via config)
|
||||
|
||||
### 2.3 Privacy
|
||||
- All LLM inference can run fully local (no data leaves the machine)
|
||||
- Cloud LLM calls are opt-in and clearly logged
|
||||
- No telemetry by default
|
||||
|
||||
### 2.4 Reliability
|
||||
- Crash recovery: resume interrupted processing jobs on restart
|
||||
- Idempotent processing: re-indexing a file does not duplicate metadata
|
||||
- Graceful degradation: server remains operational if LLM provider is unavailable
|
||||
|
||||
### 2.5 Portability
|
||||
- Docker image with bundled ffmpeg
|
||||
- Single binary for bare-metal deployment (Go preferred for this)
|
||||
- Config via TOML file + environment variable overrides
|
||||
|
||||
---
|
||||
|
||||
## 3. Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Media Server Core │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
||||
│ │ Watcher │→ │ Ingest │→ │ Processing │ │
|
||||
│ │ (inotify)│ │ Pipeline │ │ Queue (workers) │ │
|
||||
│ └──────────┘ └──────────────┘ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────────────┤ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Metadata │ │Classification│ │
|
||||
│ │ Worker │ │Worker │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ └─────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────▼──────┐ │
|
||||
│ │ LLM Router │ │
|
||||
│ └──┬──────┬───┘ │
|
||||
│ │ │ │
|
||||
│ ┌───────┘ └──────────┐ │
|
||||
│ ┌────▼─────┐ ┌───────▼──────┐ │
|
||||
│ │ Ollama │ │ Claude/OAI │ │
|
||||
│ │ (local) │ │ (cloud) │ │
|
||||
│ └──────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────┐ │
|
||||
│ │ SQLite DB │ │ HTTP API │ │ Web UI │ │
|
||||
│ │ (metadata) │ │ (REST) │ │ (player) │ │
|
||||
│ └──────────────┘ └─────────────┘ └───────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Tech Stack Decisions
|
||||
|
||||
| Concern | Choice | Rationale |
|
||||
|----------------------|---------------------|--------------------------------------------------|
|
||||
| Backend language | Go | Single binary, excellent HTTP/concurrency, ffmpeg CGO optional |
|
||||
| Database | SQLite (sqlx) | Zero-config, embeddable, enough for single-user |
|
||||
| Media processing | ffmpeg (subprocess) | Industry standard, broad format support |
|
||||
| LLM (local) | Ollama REST API | Simple HTTP interface, model management built-in |
|
||||
| LLM (cloud) | Anthropic SDK + OpenAI SDK | Dual-provider via abstraction layer |
|
||||
| Containerization | Docker + Compose | Multi-service: server + ollama + optional GPU |
|
||||
| Config format | TOML | Human-friendly, Go ecosystem support (viper) |
|
||||
| Web UI | HTMX + Tailwind | No JS framework needed, Go template rendering |
|
||||
|
||||
> **Rust alternative**: Rust is viable if performance is critical (transcoding pipeline), but Go is recommended for faster initial development and simpler deployment.
|
||||
|
||||
---
|
||||
|
||||
## 5. MVP Scope (Phase 1)
|
||||
|
||||
Goal: Working library scanner + LLM tagging + basic web UI + streaming
|
||||
|
||||
- [ ] Directory watcher + file ingestion
|
||||
- [ ] Movie/TV classification (filename heuristics + LLM disambiguation)
|
||||
- [ ] Metadata fetch from TMDB API (primary); LLM fills unmatched/missing fields only
|
||||
- [ ] Thumbnail extraction
|
||||
- [ ] SQLite metadata store
|
||||
- [ ] REST API: list library, get item, trigger re-scan
|
||||
- [ ] Basic web UI: grid view + video player
|
||||
- [ ] Direct-play HTTP streaming
|
||||
- [ ] Ollama integration (local LLM)
|
||||
- [ ] Docker Compose setup
|
||||
|
||||
---
|
||||
|
||||
## 6. Phase 2 (Post-MVP)
|
||||
|
||||
- [ ] HLS transcoding
|
||||
- [ ] Claude API / OpenAI API integration + provider router
|
||||
- [ ] Natural language search
|
||||
- [ ] External metadata sources (TVDB, Trakt) enrichment
|
||||
- [ ] Multi-user support with watch history
|
||||
|
||||
---
|
||||
|
||||
## 7. Phase 3 (Future)
|
||||
|
||||
- [ ] Mobile-friendly UI / PWA
|
||||
- [ ] GPU-accelerated transcoding
|
||||
- [ ] Home video scene detection + auto-chapter marking
|
||||
- [ ] Face recognition for home video tagging
|
||||
- [ ] Collections and playlists
|
||||
- [ ] Client apps (Jellyfin protocol compatibility)
|
||||
|
||||
---
|
||||
|
||||
## 8. Open Questions
|
||||
|
||||
- [ ] Should the server expose a Jellyfin-compatible API to leverage existing clients (Infuse, Swiftfin)?
|
||||
- [ ] GPU passthrough in Docker for local LLM acceleration — required or optional?
|
||||
- [ ] Should home video tagging use vision models (LLaVA/Claude vision) for frame analysis?
|
||||
- [x] TMDB/TVDB integration: **decided** — TMDB is the primary metadata source; LLM fills gaps only
|
||||
- [ ] Multi-user: single-user MVP acceptable, or needed from day one?
|
||||
@@ -0,0 +1,227 @@
|
||||
# LLM 媒体服务器 — 需求文档(中文对照版)
|
||||
|
||||
## 项目概述
|
||||
|
||||
一个自托管的视频媒体服务器,使用 LLM 对个人视频库(电影、电视剧、家庭录像)进行自动索引、标签分类、元数据生成与字幕生成。参考 Plex/Jellyfin 设计,但深度集成 LLM 能力。支持 Docker 或裸机部署。
|
||||
|
||||
---
|
||||
|
||||
## 1. 功能需求
|
||||
|
||||
### 1.1 媒体摄入与库管理
|
||||
- [ ] 监听配置目录,自动发现新媒体文件
|
||||
- [ ] 支持格式:MKV、MP4、AVI、MOV、WEBM、TS
|
||||
- [ ] 文件指纹识别,防止重复入库
|
||||
- [ ] 按类型组织库:电影 / 电视剧 / 家庭录像
|
||||
- [ ] 本地数据库存储库状态(SQLite 或嵌入式数据库)
|
||||
|
||||
### 1.2 LLM 自动分类
|
||||
- [ ] 根据文件名与视频内容识别类型(电影、电视剧集、家庭录像)
|
||||
- [ ] 匹配已知电影/电视剧标题(本地启发式规则 + LLM 辅助判断)
|
||||
- [ ] 提取电视剧季号与集号
|
||||
- [ ] 对家庭录像推断主题、地点、事件(帧分析 + LLM)
|
||||
- [ ] 分类内容类型:类型、情绪、分级(家庭友好等)
|
||||
- [ ] 对所有 LLM 生成标签进行置信度评分,低置信度标记待人工审核
|
||||
|
||||
### 1.3 元数据生成
|
||||
|
||||
**元数据优先级:外部数据源优先 → LLM 作为补充/兜底**
|
||||
|
||||
#### 主要来源:外部数据源
|
||||
|
||||
| 内容类型 | 主要来源 | 次要来源 | 兜底 |
|
||||
|---|---|---|---|
|
||||
| 电影 | TMDB | IMDB(通过 TMDB ID) | LLM |
|
||||
| 电视剧 | TMDB + TVDB | Trakt | LLM |
|
||||
| 动漫 | AniDB / AniList | TMDB | LLM |
|
||||
| 家庭录像 | — | — | 仅 LLM |
|
||||
|
||||
- [ ] 从 TMDB 获取结构化元数据:标题、年份、导演、演员、类型、时长、语言、海报、背景图
|
||||
- [ ] 从 IMDB 获取评分与 ID(通过 TMDB 的 IMDB ID 字段)
|
||||
- [ ] 从 TVDB 获取电视剧集详情:剧情简介、播出日期、客串演员
|
||||
- [ ] 从 Trakt 获取观看数据与社区评分(可选)
|
||||
- [ ] 动漫内容从 AniDB 或 AniList 获取专项元数据
|
||||
- [ ] 本地缓存 API 响应,避免重复外部请求
|
||||
- [ ] 每条记录存储来源标注(如 `metadata_source: tmdb`)
|
||||
|
||||
#### 其他可接入数据源(待决策)
|
||||
- [ ] OMDb API — IMDB 的简化封装,有免费额度
|
||||
- [ ] MyAnimeList (MAL) — 动漫社区评分与元数据
|
||||
- [ ] Rotten Tomatoes — 影评人 + 观众评分(无官方 API,需爬取)
|
||||
- [ ] Metacritic — 专业影评评分(同上)
|
||||
|
||||
#### 兜底/补充:LLM 生成
|
||||
LLM 仅在以下情况使用:
|
||||
- [ ] 外部数据源未匹配到标题时,生成描述/摘要
|
||||
- [ ] 填充 TMDB/IMDB 未返回的缺失字段
|
||||
- [ ] 对家庭录像进行标签与描述生成(外部无此类数据)
|
||||
- [ ] 所有 LLM 生成字段标记 `llm_generated: true`,保持透明度
|
||||
|
||||
#### 通用
|
||||
- [ ] 最终元数据写入旁文件(NFO/JSON)与媒体文件同目录存放
|
||||
- [ ] 封面/海报:优先使用 TMDB 海报,无则通过 ffmpeg 提取关键帧缩略图
|
||||
|
||||
### 1.4 LLM 提供商抽象层
|
||||
- [ ] 支持本地 LLM(通过 Ollama REST API)
|
||||
- [ ] 支持 Claude API(Anthropic)云端推理
|
||||
- [ ] 支持 OpenAI 兼容 API
|
||||
- [ ] 按任务类型选择提供商(如:标签用本地,摘要用云端)
|
||||
- [ ] 优雅降级:云端不可用时自动回退本地
|
||||
- [ ] 记录云端提供商的 token 用量与费用
|
||||
|
||||
### 1.5 流媒体与播放
|
||||
- [ ] HTTP 流媒体,支持 Range 请求
|
||||
- [ ] HLS 自适应码率转码(通过 ffmpeg)
|
||||
- [ ] 支持客户端直接播放(无需转码)
|
||||
- [ ] 基础 Web UI,支持浏览与播放
|
||||
|
||||
### 1.6 搜索与发现
|
||||
- [ ] 全文搜索(标题、描述、标签)
|
||||
- [ ] 自然语言搜索(如"90年代动作片"、"海边的孩子视频")
|
||||
- [ ] 按类型、年份、评分、标签、置信度筛选
|
||||
|
||||
### 1.7 API
|
||||
- [ ] REST API,覆盖所有库操作
|
||||
- [ ] Webhook/事件系统,推送处理状态更新
|
||||
- [ ] API 密钥认证
|
||||
|
||||
---
|
||||
|
||||
## 2. 非功能需求
|
||||
|
||||
### 2.1 性能
|
||||
- 处理流水线不阻塞流媒体,后台 Worker 异步执行
|
||||
- 直接播放延迟 < 2 秒;转码流延迟 < 5 秒
|
||||
- 支持至少 2 路并发流(受硬件限制)
|
||||
|
||||
### 2.2 存储
|
||||
- 元数据与索引本地存储(无强制云依赖)
|
||||
- 旁文件(.nfo、.json)与媒体文件同目录存放
|
||||
- SQLite 作为元数据库(可通过配置升级为 PostgreSQL)
|
||||
|
||||
### 2.3 隐私
|
||||
- LLM 推理可完全本地运行(数据不出本机)
|
||||
- 云端 LLM 调用为可选项,明确记录日志
|
||||
- 默认无遥测
|
||||
|
||||
### 2.4 可靠性
|
||||
- 崩溃恢复:重启后自动续跑中断的处理任务
|
||||
- 幂等处理:重新索引文件不会产生重复元数据
|
||||
- 优雅降级:LLM 提供商不可用时服务器继续正常运行
|
||||
|
||||
### 2.5 可移植性
|
||||
- Docker 镜像内置 ffmpeg
|
||||
- 单二进制文件支持裸机部署(Go 首选)
|
||||
- 配置通过 TOML 文件 + 环境变量覆盖
|
||||
|
||||
---
|
||||
|
||||
## 3. 架构概览
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 媒体服务器核心 │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
||||
│ │ 目录监听 │→ │ 摄入流水线 │→ │ 处理队列(Worker)│ │
|
||||
│ │(inotify) │ │ │ │ │ │
|
||||
│ └──────────┘ └──────────────┘ └────────┬─────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────────────────────┤ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ 元数据 │ │ 分类 │ │
|
||||
│ │ Worker │ │ Worker │ │
|
||||
│ └──────┬───────┘ └──────┬───────┘ │
|
||||
│ └─────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌────────────▼──────────────┐ │
|
||||
│ │ 外部数据源路由器 │ │
|
||||
│ │ TMDB / TVDB / AniDB / │ │
|
||||
│ │ OpenSubtitles / Trakt │ │
|
||||
│ └────────────┬──────────────┘ │
|
||||
│ │ 未匹配/缺失字段 │
|
||||
│ ┌──────▼──────┐ │
|
||||
│ │ LLM 路由器 │ │
|
||||
│ └──┬──────┬───┘ │
|
||||
│ │ │ │
|
||||
│ ┌────────┘ └──────────┐ │
|
||||
│ ┌────▼─────┐ ┌────────▼─────┐ │
|
||||
│ │ Ollama │ │ Claude/OpenAI│ │
|
||||
│ │ (本地) │ │ (云端) │ │
|
||||
│ └──────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌─────────────┐ ┌───────────────┐ │
|
||||
│ │ SQLite DB │ │ HTTP API │ │ Web UI │ │
|
||||
│ │ (元数据) │ │ (REST) │ │ (播放器) │ │
|
||||
│ └──────────────┘ └─────────────┘ └───────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. 技术栈决策
|
||||
|
||||
| 关注点 | 选型 | 理由 |
|
||||
|---|---|---|
|
||||
| 后端语言 | Go | 单二进制、HTTP/并发优秀、部署简单 |
|
||||
| 数据库 | SQLite (sqlx) | 零配置、可嵌入、单用户足够 |
|
||||
| 媒体处理 | ffmpeg(子进程) | 行业标准,格式支持广泛 |
|
||||
| LLM 本地 | Ollama REST API | HTTP 接口简单,内置模型管理 |
|
||||
| LLM 云端 | Anthropic SDK + OpenAI SDK | 抽象层双提供商支持 |
|
||||
| 容器化 | Docker + Compose | 多服务:server + ollama + 可选 GPU |
|
||||
| 配置格式 | TOML | 人类友好,Go 生态支持(viper) |
|
||||
| Web UI | HTMX + Tailwind | 无 JS 框架,Go 模板渲染 |
|
||||
|
||||
> **Rust 备选**:如果性能成为瓶颈(转码流水线),Rust 是可行选项,但 Go 开发速度更快,部署更简单,推荐作为首选。
|
||||
|
||||
---
|
||||
|
||||
## 5. MVP 范围(第一阶段)
|
||||
|
||||
目标:可用的库扫描 + LLM 辅助标签 + 基础 Web UI + 流媒体播放
|
||||
|
||||
- [ ] 目录监听 + 文件摄入
|
||||
- [ ] 电影/电视剧分类(文件名启发式 + LLM 辅助)
|
||||
- [ ] TMDB 元数据获取(主要来源);LLM 填补缺失字段
|
||||
- [ ] 缩略图提取(TMDB 海报优先,ffmpeg 关键帧兜底)
|
||||
- [ ] SQLite 元数据存储
|
||||
- [ ] REST API:库列表、获取条目、触发重扫
|
||||
- [ ] 基础 Web UI:网格视图 + 视频播放器
|
||||
- [ ] HTTP 直接播放流媒体
|
||||
- [ ] Ollama 集成(本地 LLM)
|
||||
- [ ] Docker Compose 配置
|
||||
|
||||
---
|
||||
|
||||
## 6. 第二阶段(MVP 后)
|
||||
|
||||
- [ ] HLS 转码
|
||||
- [ ] Claude API / OpenAI API 集成 + 提供商路由
|
||||
- [ ] 自然语言搜索
|
||||
- [ ] TVDB 集成(电视剧强化)
|
||||
- [ ] Trakt 集成(观看历史与社区数据)
|
||||
- [ ] 多用户支持与观看记录
|
||||
|
||||
---
|
||||
|
||||
## 7. 第三阶段(未来规划)
|
||||
|
||||
- [ ] 移动端友好 UI / PWA
|
||||
- [ ] GPU 加速转码
|
||||
- [ ] 家庭录像场景检测 + 自动章节标记
|
||||
- [ ] 家庭录像人脸识别标签
|
||||
- [ ] 合集与播放列表
|
||||
- [ ] Jellyfin 协议兼容(接入 Infuse、Swiftfin 等现有客户端)
|
||||
- [ ] AniDB / AniList / MAL 动漫数据源集成
|
||||
- [ ] Rotten Tomatoes / Metacritic 评分爬取(无官方 API)
|
||||
|
||||
---
|
||||
|
||||
## 8. 待决策事项
|
||||
|
||||
- [ ] 是否暴露 Jellyfin 兼容 API 以复用现有客户端(Infuse、Swiftfin)?
|
||||
- [ ] Docker 中 GPU 直通:必选还是可选?
|
||||
- [ ] 家庭录像帧分析是否引入视觉模型(LLaVA / Claude Vision)?
|
||||
- [ ] 是否支持多用户(MVP 单用户是否可接受)?
|
||||
- [x] ~~TMDB/TVDB 是否接入~~:**已决策** — TMDB 为主要数据源,LLM 仅作补充
|
||||
Reference in New Issue
Block a user