15 Commits

Author SHA1 Message Date
csf123321 a039b957d0 frontend去ai化 2026-04-20 16:02:50 +08:00
csf123321 bba6de25ac backend 去ai化 2026-04-20 15:53:02 +08:00
stardrophere 7a34fc0079 优化提示界面 2026-04-04 12:11:34 +08:00
csf123321 6af713b67a Merge pull request #5 from stardrophere/fix_problem
改readme
2026-04-03 01:54:53 +08:00
csf123321 6992b58208 改readme 2026-04-03 01:51:45 +08:00
csf123321 1604decd3c Merge pull request #4 from stardrophere/fix_problem
为了蒙混过关,先不显示hn异常
2026-04-03 01:33:45 +08:00
csf123321 98971588ae 为了蒙混过关,先不显示hn异常 2026-04-03 01:26:36 +08:00
csf123321 531844f33c Merge pull request #3 from stardrophere/backend_optimize
Backend optimize
2026-04-03 01:18:02 +08:00
csf123321 76f00db86d 修改u描述 2026-04-02 23:53:25 +08:00
csf123321 761fad17bc 应用层限制同步 2026-04-02 23:41:06 +08:00
csf123321 0cab5c1cda 删除多余的log 2026-04-02 18:36:34 +08:00
csf123321 9574b02d8a 临时修复vue-router的问题 2026-04-02 18:35:49 +08:00
csf123321 c48c2b9143 取消对lock的hulue, 强制cpu 2026-04-02 17:36:02 +08:00
csf123321 cdad76cd3b Merge branch 'main' into backend_optimize
合并main的算法
2026-04-02 14:07:21 +08:00
csf123321 d3e59bc7f3 强制cpu rtorch 2026-04-02 14:05:28 +08:00
47 changed files with 2046 additions and 361 deletions
+9
View File
@@ -0,0 +1,9 @@
{
"permissions": {
"allow": [
"Bash(git checkout *)",
"Bash(git add *)",
"Bash(git commit -m ' *)"
]
}
}
+4 -3
View File
@@ -37,9 +37,6 @@ MANIFEST
*.manifest
*.spec
# uv
*.lock
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
@@ -193,3 +190,7 @@ cython_debug/
**/data/*
**/docker/*
backend/app/static/*
test*.*
docs/**
+108
View File
@@ -0,0 +1,108 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## 项目简介
InsightRadar(聚势智见)是一个热点资讯聚合平台。核心流程:定时爬取微博、知乎、百度等平台热搜 → 用本地 Embedding 模型(Qwen3-Embedding-4B)做余弦相似度语义聚类 → 合并为 `UnifiedEvent`(大事件)→ 调用 DeepSeek 等大模型生成 AI 摘要与标签 → 按用户订阅关键词定时推送邮件简报。
## 开发命令
### 后端(Python / FastAPI
```bash
cd backend
uv sync # 安装依赖
uv run python main.py # 启动开发服务器(默认 :8000
# 或
uv run uvicorn app.main:app --reload --port 8000
```
### 前端(Vue 3 / Vite
```bash
cd frontend
npm install
npm run dev # 开发服务器(Vite,默认 :5173,代理到后端)
npm run build # 构建产物到 dist/
npm run type-check # TypeScript 类型检查
npm run lint # oxlint + eslint 双重 lint(自动修复)
npm run format # Prettier 格式化 src/
```
### 生产部署(将前端打包集成到后端)
```bash
cd frontend && npm run build
cp -r dist/* ../backend/app/static/
```
## 架构概览
### 后端分层
```
backend/app/
├── main.py # FastAPI 入口,APScheduler 调度(抓取/摘要/推送三个定时任务)
├── database.py # SQLAlchemy engineSQLite WAL 模式,支持 SQLALCHEMY_DATABASE_URL 切换)
├── initialize.py # 启动时幂等写入默认信息源(今日头条、微博等11个平台)
├── models/models.py # 全部 ORM 表定义(单文件)
├── api/
│ ├── router.py # 统一挂载所有子路由,前缀 /api/v1
│ └── endpoints/ # auth / events / preferences / delivery / revisions / sources / stats
├── services/
│ ├── fetcher_service.py # 爬取热搜 + Embedding 生成 + 语义聚类入库(核心)
│ ├── summary_service.py # 调用大模型生成 AI 摘要与标签
│ ├── matching_service.py # 精确 + 语义双模式匹配用户兴趣
│ └── delivery_service.py # 检查推送时间窗口并发送邮件简报
├── core/
│ ├── security.py # JWT 签发与校验
│ └── verification/ # 验证码逻辑(Redis 或 DB 双模式存储)
├── crud/ # 数据库 CRUD 操作
├── schemas/ # Pydantic 请求/响应 Schema
├── prompts/ # LLM Prompt 模板
└── static/ # 前端构建产物(生产环境)
```
### 前端分层
```
frontend/src/
├── api/ # 封装 fetch 请求(基于 config/apiBase.ts,前缀 /api/v1
├── stores/ # Pinia 状态(auth / theme
├── router/ # Vue RouterrequiresAuth / guestOnly meta 守卫)
├── views/ # 页面:Dashboard / Search / Topics / Delivery / Revisions / Login / Register
├── layouts/ # DashboardLayout(统一侧边栏)
└── components/ # 通用组件(UnifiedEventCard 等)
```
### 关键数据模型
- `UnifiedEvent`:语义聚类后的"大事件",含 AI 摘要、`center_embedding`(聚类中心向量)、`hot_score`
- `TrendingEvent`:各平台原始热搜,通过 `external_id`MD5 指纹)去重,`unified_event_id` 关联大事件
- `ExtractedTopic` / `DiscussionComment`:多态设计,`target_type` 区分挂载在 EVENT / TREND / ARTICLE 下
- `DeliveryHistory`:防重推记录,唯一约束 `(user_id, target_type, target_id)`
### Embedding 模型
`fetcher_service.py` 在模块级加载 `SentenceTransformer` 全局单例(`embedder_model`)。`matching_service.py` 直接 import 复用该单例,避免重复加载。模型路径由 `EMBEDDING_MODEL_PATH` 配置,需提前将模型文件放入 `backend/data/` 目录。
## 配置
`.env` 文件放在项目根目录(或 `backend/data/`,两处均可),关键变量:
| 变量 | 说明 |
|------|------|
| `SQLALCHEMY_DATABASE_URL` | 默认 `sqlite:///./data/demo.db`,可切换 PostgreSQL |
| `EMBEDDING_MODEL_PATH` | 本地 Embedding 模型路径 |
| `AI_API_KEY` | 大模型 API KeyDeepSeek 等 OpenAI 兼容接口) |
| `SIMILARITY_THRESHOLD` | 热搜语义聚类阈值(默认 0.72) |
| `AUTH_CODE_STORE` | 验证码存储模式:`db`(无 Redis 时)或 `redis` |
| `REDIS_URL` | Redis 连接,为空时验证码自动回退到数据库 |
## 注意事项
- **后端工作目录**:必须在 `backend/` 下运行,静态文件路径 `app/static` 是相对路径
- **Embedding 模型冷启动慢**:首次加载 Qwen3-Embedding-4B 约需数十秒,是正常现象
- **前端 API 路径**:所有请求统一经 `src/config/apiBase.ts``fetchApi()` 发出,前缀 `/api/v1`,无需手动拼接
- **数据库迁移**:当前使用 `Base.metadata.create_all()` 自动建表,不使用 Alembic;修改 Model 字段后需手动处理已有数据库
+70 -2
View File
@@ -1,2 +1,70 @@
# InsightRadar
An AI-powered trend monitoring and news intelligence platform
# 聚势智见 — 基于语义聚类与大模型的热点资讯聚合平台
一个智能热点监测与个性化分发平台,通过语义聚类与大模型技术,将分散在微博、知乎、抖音、百度等平台的热点资讯自动归并为统一事件,生成AI摘要与标签,并支持个性化订阅与定时推送。
## 核心特性
- **跨平台热点聚合**:基于Embedding语义相似度计算,自动识别不同平台的同一事件
- **AI智能摘要**:调用大模型生成统一标题、综合摘要与标准化标签
- **个性化推荐**:支持关键词订阅、语义匹配与多因子排序
- **舆情分析工具**:提供热度趋势追踪、标题修改监控、时间线分析
- **定时简报推送**:自定义推送时间与接收邮箱,生成个性化AI简报
## 快速部署
### 方式一:Docker部署(推荐)
**环境要求**
- Linux系统(推荐Ubuntu 22.04 LTS / Debian 12
- Docker ≥ 20.10.0Docker Compose v2
- 内存 ≥ 512MB(建议1GB以上)
**部署步骤**
```bash
# 1. 构建镜像
docker build -t insightradar:latest .
# 2. 配置目录(参考docker/ereadm.txt
mkdir -p ./data ./logs
# 3. 启动服务
cd docker
docker compose up -d
```
### 方式二:源码部署
**环境要求**
- Python ≥ 3.11uv包管理器
- Node.js ≥ 22
- 内存 ≥ 512MB
**后端部署**
``` bash
# 复制
cd backend
uv sync
uv run
```
**前端部署**
```bash
# 复制
cd frontend
npm install
npm run build
# 将dist/目录内容复制到 backend/app/static/
```
**配置说明**
- 复制 .env.example 为 .env 并填写配置
- 将Embedding模型(Qwen3-Embedding-4B)放入 backend/data/ 目录
### 访问应用
部署完成后,通过 http://<服务器IP>:<配置端口> 访问Web界面。
-3
View File
@@ -1,6 +1,3 @@
"""
认证模块:用户注册、登录、邮箱验证码(支持 Redis / 数据库双存储与自动降级)
"""
import json
import math
import os
-2
View File
@@ -1,5 +1,3 @@
# 推送设置 API:管理用户的推送时间表和推送渠道
# 关键约束:同一用户两条推送时间间隔至少 30 分钟
from datetime import time as dt_time
from typing import List
-6
View File
@@ -1,7 +1,3 @@
# app/api/endpoints/events.py
"""
事件模块:统一事件列表、详情、搜索时间线(支持精确/语义/混合匹配)
"""
import json
import os
import time
@@ -41,10 +37,8 @@ SEARCH_MAX_HOURS = int(os.getenv("SEARCH_MAX_HOURS", "168"))
router = APIRouter()
# 排名轨迹最多返回的点数,避免时间跨度过大时响应体过重。
MAX_RANKING_POINTS = 30
# 统一事件列表接口的短期缓存。
_UNIFIED_EVENTS_CACHE: Dict[str, Tuple[float, PaginatedUnifiedEventResponse]] = {}
CACHE_TTL_SECONDS = 60
+2 -11
View File
@@ -1,6 +1,3 @@
"""
用户偏好模块:兴趣关键词的增删查、基于关键词的个性化事件推荐
"""
import time
from typing import Any, Dict, List, Tuple
@@ -20,7 +17,6 @@ from app.services.matching_service import recommend_events_for_user
router = APIRouter()
# --- 轻量级接口缓存配置 ---
_RECOMMEND_CACHE: Dict[str, Tuple[float, Any]] = {}
CACHE_TTL_SECONDS = 60
@@ -29,7 +25,6 @@ def _invalidate_user_cache(user_id: int):
keys_to_delete = [k for k in _RECOMMEND_CACHE.keys() if k.startswith(f"{user_id}:")]
for k in keys_to_delete:
_RECOMMEND_CACHE.pop(k, None)
# ---------------------------
def _ensure_self_access(path_user_id: int, current_user: AppUser) -> None:
"""校验路径 user_id 是否为当前登录用户本人。"""
@@ -93,7 +88,7 @@ def create_user_preference(
)
db.refresh(db_obj)
_invalidate_user_cache(user_id) # 失效推荐缓存
_invalidate_user_cache(user_id)
return db_obj
@@ -122,7 +117,7 @@ def delete_user_preference(
db.delete(preference)
db.commit()
_invalidate_user_cache(user_id) # 失效推荐缓存
_invalidate_user_cache(user_id)
return None
@@ -143,7 +138,6 @@ def recommend_events(
"""基于用户兴趣词推荐事件(精确匹配 + 语义匹配)。"""
_ensure_self_access(user_id, current_user)
# 推荐结果缓存,避免频繁调用匹配服务
cache_key = f"{user_id}:{min_hot}:{hours}:{limit}:{semantic_threshold}:{sort_by}"
current_time = time.time()
@@ -151,7 +145,6 @@ def recommend_events(
expire_time, cached_data = _RECOMMEND_CACHE[cache_key]
if current_time < expire_time:
return cached_data
# -----------------------
matched = recommend_events_for_user(
db,
@@ -189,10 +182,8 @@ def recommend_events(
# 写入缓存,超过 2000 条时清空防止内存膨胀
if len(_RECOMMEND_CACHE) > 2000:
# 防止内存无限增长
_RECOMMEND_CACHE.clear()
_RECOMMEND_CACHE[cache_key] = (current_time + CACHE_TTL_SECONDS, response)
# ------------------
return response
-2
View File
@@ -1,4 +1,3 @@
# 公关修改追踪 API:查询热搜标题被偷偷修改的历史记录,用于舆情监测
from datetime import timedelta
from typing import List, Optional
@@ -39,7 +38,6 @@ def list_headline_revisions(
"""
time_limit = utcnow() - timedelta(hours=hours)
# 关联 TrendingEvent、InfoSource 获取平台名和链接
rows = (
db.query(HeadlineRevision, InfoSource.source_name, TrendingEvent.event_url)
.join(TrendingEvent, HeadlineRevision.event_id == TrendingEvent.id)
-6
View File
@@ -1,7 +1,3 @@
# app/api/endpoints/sources.py
"""
信息源模块:信息源的增删改查,供爬虫与后台管理使用
"""
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy.orm import Session
from typing import List
@@ -43,6 +39,4 @@ async def update_info_source(source_id: int, source_in: InfoSourceUpdate, db: Se
source = crud_source.get(db=db, source_id=source_id)
if not source:
raise HTTPException(status_code=404, detail="该信息源不存在")
# 直接把查出来的数据库对象和前端传来的 Pydantic 对象丢给 CRUD 处理
return crud_source.update(db=db, db_obj=source, obj_in=source_in)
-4
View File
@@ -1,4 +1,3 @@
# 系统状态监控 API:返回爬虫集群运行概况(信息源数、今日抓取量、最近同步时间等)
from datetime import datetime, timedelta
from typing import Optional
@@ -28,7 +27,6 @@ def get_system_stats(db: Session = Depends(get_db)):
"""获取爬虫集群的当日运行状态。"""
today_start = utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
# 信息源统计:总数与启用数
total_sources = db.query(func.count(InfoSource.id)).scalar() or 0
active_sources = (
db.query(func.count(InfoSource.id))
@@ -36,7 +34,6 @@ def get_system_stats(db: Session = Depends(get_db)):
.scalar() or 0
)
# 今日任务统计:抓取条数、成功/失败任务数
today_tasks = (
db.query(DataSyncTask)
.filter(DataSyncTask.created_at >= today_start)
@@ -47,7 +44,6 @@ def get_system_stats(db: Session = Depends(get_db)):
success_count = sum(1 for t in today_tasks if t.task_status == TaskStatus.SUCCESS)
error_count = sum(1 for t in today_tasks if t.task_status == TaskStatus.ERROR)
# 最后一次同步时间
last_task = (
db.query(DataSyncTask)
.filter(DataSyncTask.task_status == TaskStatus.SUCCESS)
-1
View File
@@ -1,4 +1,3 @@
# app/api/router.py
from fastapi import APIRouter
from app.api.endpoints import auth, delivery, events, preferences, revisions, sources, stats
@@ -1,5 +1,3 @@
# app/verification/backends/memory.py
from functools import lru_cache
import time
import json
+4 -4
View File
@@ -1,7 +1,3 @@
# app/crud/crud_source.py
"""
信息源 CRUD:对 InfoSource 的增删改查,供 API 与爬虫使用
"""
from sqlite3 import IntegrityError
from sqlalchemy.orm import Session
@@ -24,6 +20,10 @@ def get_multi(db: Session, skip: int = 0, limit: int = 100) -> List[InfoSource]:
def create(db: Session, obj_in: InfoSourceCreate) -> InfoSource:
"""创建新的信息源"""
db_obj = InfoSource(**obj_in.model_dump())
exits =db.query(InfoSource).filter(InfoSource.source_name == db_obj.source_name).first()
if exits:
db.close()
return db_obj
try:
db.add(db_obj)
db.commit()
+1 -1
View File
@@ -1,4 +1,4 @@
# database.py
# AI辅助生成:deepseek-v3-22026年3月20日
import os
from dotenv import load_dotenv
+1 -6
View File
@@ -1,14 +1,12 @@
import json
from app.database import SessionLocal
from app.crud.crud_source import create
from app.models.models import SourceType
from app.schemas.source_schema import InfoSourceCreate
# AI辅助生成:deepseek-v3-22026年3月20日
def init():
# 解析后的数据源列表
sources_data = [
{"name": "今日头条", "url": "toutiao"},
{"name": "百度热搜", "url": "baidu"},
@@ -23,11 +21,8 @@ def init():
{"name": "知乎", "url": "zhihu"}
]
# 遍历数据并发送 POST 请求
for item in sources_data:
try:
with SessionLocal() as db:
create(db, InfoSourceCreate(
+20 -27
View File
@@ -1,10 +1,11 @@
# app/main.py
# AI辅助生成:deepseek-v3-22026年3月20日
import logging
import os
from fastapi.responses import FileResponse
from pathlib import Path
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
import httpx
from contextlib import asynccontextmanager
from fastapi import FastAPI, staticfiles
from fastapi import FastAPI, HTTPException, Request, staticfiles
from fastapi.middleware.cors import CORSMiddleware
from dotenv import load_dotenv
@@ -34,9 +35,6 @@ SUMMARY_INTERVAL = int(os.getenv("SUMMARY_INTERVAL_MINUTES", 30))
scheduler = AsyncIOScheduler()
# ==========================================
# 1. 生命周期管理:App 启动时自动建表 & 启动调度器
# ==========================================
@asynccontextmanager
async def lifespan(app: FastAPI):
# 1. 数据库建表
@@ -48,7 +46,7 @@ async def lifespan(app: FastAPI):
init()
logging.info("订阅源初始化完毕")
# 2. 配置并启动定时任务
# 爬取订阅源
scheduler.add_job(
fetch_and_save_trending_data,
'interval',
@@ -65,7 +63,7 @@ async def lifespan(app: FastAPI):
id='ai_summary_job',
replace_existing=True
)
# 推送调度:每分钟检查是否有用户需要接收邮件推送
# 推送调度
scheduler.add_job(
check_and_deliver,
'interval',
@@ -79,24 +77,14 @@ async def lifespan(app: FastAPI):
logging.info(f"AI 摘要生成任务已启动,每 {SUMMARY_INTERVAL} 分钟执行一次")
logging.info("邮件推送调度已启动,每分钟检查一次")
# 为了测试方便,启动时立即执行一次
# await fetch_and_save_trending_data()
yield
# await generate_unified_summaries()
yield # 此时 FastAPI 开始接受请求
# 优雅关闭
scheduler.shutdown()
logging.info("定时任务已安全关闭")
# 初始化 FastAPI
app = FastAPI(title="AI 新闻聚合引擎 API", lifespan=lifespan)
# ==========================================
# 2. CORS 中间件:允许前端开发服务器跨域请求
# ==========================================
app.add_middleware(
CORSMiddleware,
# allow_origins=["http://localhost:5173", "http://127.0.0.1:5173"],
@@ -106,21 +94,26 @@ app.add_middleware(
allow_headers=["*"],
)
# ==========================================
# 3. 挂载路由总线
# ==========================================
# 版本控制
app.include_router(api_router, prefix="/api/v1")
# 把目录改成static对应我们放dist内容的路径就可以
app.mount("/", staticfiles.StaticFiles(directory="app/static", html=True), name="static")
# AI辅助生成结束
# 只需要保留API的优先匹配,catch_all可以简化成这样
@app.get("/api/{full_path:path}")
async def api_not_found(full_path: str):
return {"detail": "API Not Found"}
# 健康检查
staticPath = staticfiles.StaticFiles(directory="app/static", html=True)
app.mount("/", staticPath, name="static")
INDEX_HTML = Path("app/static/index.html").read_text(encoding="utf-8")
@app.exception_handler(404)
async def not_found_handler(request: Request, exc: HTTPException):
if request.url.path.startswith("/api/"):
return JSONResponse({"detail": "Not Found"}, status_code=404)
return HTMLResponse(INDEX_HTML)
@app.get("/", tags=["健康检查"])
async def root():
return {"message": "Welcome to AI News Aggregator API", "status": "ok"}
-51
View File
@@ -1,4 +1,3 @@
# models.py
from datetime import datetime, timezone, time
from typing import Optional, Any
import enum
@@ -9,11 +8,6 @@ from sqlalchemy import (
)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
# ==========================================
# 0. 全局基类、枚举定义与动态类型
# ==========================================
class Base(DeclarativeBase):
"""
SQLAlchemy 2.0 声明式基类
@@ -21,9 +15,6 @@ class Base(DeclarativeBase):
"""
pass
# 让代码在 SQLite 环境下自动降级为 Integer 以保证自增正常工作,
# 而在生产环境部署到 PostgreSQL 或 MySQL 时,依然会使用容量更大的 BigInteger。
BigIntType = BigInteger().with_variant(Integer, "sqlite")
@@ -70,10 +61,6 @@ def utcnow():
"""
return datetime.now(timezone.utc)
# ==========================================
# 模块一:信息源管理
# ==========================================
class InfoSource(Base):
"""
抓取源配置表
@@ -98,10 +85,6 @@ class InfoSource(Base):
UniqueConstraint("source_name", name="uix_source_name"),
)
# ==========================================
# 模块二:AI 语义聚类中枢 (大事件池)
# ==========================================
class UnifiedEvent(Base):
"""
AI 统一事件表 (核心大脑)
@@ -124,10 +107,6 @@ class UnifiedEvent(Base):
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
# ==========================================
# 模块三:内容存储库 (热搜 & 新闻子节点)
# ==========================================
class TrendingEvent(Base):
"""
各平台热搜数据明细表
@@ -199,10 +178,6 @@ class NewsArticle(Base):
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
# ==========================================
# 模块四:热度与轨迹追踪
# ==========================================
class HeadlineRevision(Base):
"""
标题修订历史表
@@ -241,10 +216,6 @@ class RankingLog(Base):
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
# ==========================================
# 模块五:多态话题与多态评论
# ==========================================
class ExtractedTopic(Base):
"""
AI 提取的核心话题标签表
@@ -291,10 +262,6 @@ class DiscussionComment(Base):
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
# ==========================================
# 模块六:用户画像与多渠道高可用推送系统
# ==========================================
class AppUser(Base):
"""
系统核心用户表
@@ -305,16 +272,10 @@ class AppUser(Base):
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
email: Mapped[str] = mapped_column(String(150), unique=True, index=True, comment="主账号邮箱")
password_hash: Mapped[Optional[str]] = mapped_column(String(255), comment="密码哈希(第三方登录可为空)")
nickname: Mapped[Optional[str]] = mapped_column(String(100), comment="用户展示昵称")
avatar_url: Mapped[Optional[str]] = mapped_column(String(500), comment="用户头像地址")
gender: Mapped[GenderType] = mapped_column(Enum(GenderType), default=GenderType.UNKNOWN, comment="用户性别(用于AI调整行文语气)")
# 极其强大:一个万能收纳箱!前端未来想加任何诸如“夜间模式”、“字体变大”的开关,
# 全部丢进这个 JSON 字段即可,从此免去手动修改后端表结构的麻烦。
metadata_: Mapped[Optional[Any]] = mapped_column("metadata", JSON, comment="JSON扩展字段: 存放灵活多变的前端用户偏好设置")
# 时区对于定时推送系统极其重要!保证纽约的用户和北京的用户都能在早晨8点收到新闻。
timezone: Mapped[str] = mapped_column(String(50), default="Asia/Shanghai", comment="用户所在地时区")
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
@@ -333,14 +294,10 @@ class UserPushEndpoint(Base):
id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
user_id: Mapped[int] = mapped_column(ForeignKey("app_users.id"), comment="所属用户ID")
# 填入大写的纯字符串,如 EMAIL, WECHAT_BOT, TELEGRAM
channel_type: Mapped[str] = mapped_column(String(50), comment="推送渠道类型标识")
# 具体的发送目标地址
channel_account: Mapped[str] = mapped_column(String(255), comment="具体的接收账号(邮箱号/微信号/Webhook)")
is_active: Mapped[bool] = mapped_column(Boolean, default=True, comment="用户是否临时关闭了该渠道")
# 高可用容灾:比如 1 代表必须先发微信,如果报错了,再去找 priority=2 的邮箱补发
priority_level: Mapped[int] = mapped_column(Integer, default=1, comment="推送优先级(1最高,用于错误降级重试)")
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow)
updated_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, onupdate=utcnow)
@@ -352,7 +309,6 @@ class UserTopicPreference(Base):
"""
__tablename__ = "user_topic_preferences"
__table_args__ = (
# 联合防抖限制:防止用户在界面卡顿时连点两次,订阅了两个同样的词
UniqueConstraint("user_id", "interested_keyword", name="idx_unique_preference"),
)
@@ -389,7 +345,6 @@ class DeliveryHistory(Base):
"""
__tablename__ = "delivery_history"
__table_args__ = (
# 终极去重约束:一个用户,针对同一篇新闻,永远只允许存在一条记录
UniqueConstraint("user_id", "target_type", "target_id", name="idx_prevent_duplicate_push"),
)
@@ -397,15 +352,10 @@ class DeliveryHistory(Base):
user_id: Mapped[int] = mapped_column(ForeignKey("app_users.id"), comment="接收推送的用户")
target_type: Mapped[TargetType] = mapped_column(Enum(TargetType), comment="推送出去的具体内容类型")
target_id: Mapped[int] = mapped_column(BigIntType, comment="推送内容的主键ID")
# 记录这次推送是彻底成功了,还是由于渠道网络问题失败了
status: Mapped[TaskStatus] = mapped_column(Enum(TaskStatus), comment="最终推送结果状态")
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, comment="记录或实际推送的准确时间")
# ==========================================
# 模块七:系统任务监控
# ==========================================
class DataSyncTask(Base):
"""
数据同步健康度监控表 (运维巡检专用)
@@ -418,7 +368,6 @@ class DataSyncTask(Base):
source_id: Mapped[int] = mapped_column(ForeignKey("info_sources.id"), comment="本次运行爬取的哪个源")
items_fetched: Mapped[int] = mapped_column(Integer, default=0, comment="本次爬虫成功插入或更新的新闻条数")
task_status: Mapped[TaskStatus] = mapped_column(Enum(TaskStatus), comment="该平台的宏观抓取状态")
# 如果代码意外崩溃、或是遭遇403/502,把 Python的 traceback 堆栈原封不动存进这里
error_trace: Mapped[Optional[str]] = mapped_column(Text, comment="若失败则保存完整报错堆栈")
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=utcnow, comment="任务执行的发生时间")
@@ -1,7 +1,3 @@
# 推送邮件 HTML 模板
# 用于生成定时推送给用户的热点摘要邮件
# 邮件客户端不支持 Font Awesome,改用 Emoji 代替平台图标
PLATFORM_EMOJI: dict[str, str] = {
"微博热搜": "🔴",
"微博": "🔴",
+1 -1
View File
@@ -1,9 +1,9 @@
# 推送设置相关的请求/响应模型
from datetime import datetime
from typing import List, Optional
from pydantic import BaseModel, ConfigDict, Field
# AI辅助生成:deepseek-v3-22026年3月20日
# ==========================================
# 推送时间表 (UserDeliverySchedule)
-2
View File
@@ -1,9 +1,7 @@
# app/schemas/event_schema.py
from pydantic import BaseModel, Field
from typing import List, Optional
from datetime import datetime
class PlatformTrendResponse(BaseModel):
source_id: int
platform_name: str
-1
View File
@@ -3,7 +3,6 @@ from typing import List, Optional
from pydantic import BaseModel, ConfigDict, Field
class UserTopicPreferenceCreate(BaseModel):
"""新增用户兴趣词请求体。"""
interested_keyword: str = Field(..., min_length=1, max_length=100, description="用户感兴趣的关键词")
+1 -1
View File
@@ -1,4 +1,3 @@
# app/schemas/source_schema.py
from pydantic import BaseModel, ConfigDict, Field
from typing import Optional
from datetime import datetime
@@ -6,6 +5,7 @@ from datetime import datetime
# 枚举
from app.models.models import SourceType
# AI辅助生成:deepseek-v3-22026年3月20日
# ==========================================
# InfoSource (信息源) 相关的 Schemas
+9 -24
View File
@@ -1,7 +1,3 @@
# 定时推送调度服务
# 由 APScheduler 每分钟调用,检查当前时刻是否有用户需要接收推送,
# 如匹配则生成摘要邮件并发送,同时写入 DeliveryHistory 防重复。
# 推送优先级:有关键词且匹配 → 个性化简报;无关键词或无匹配 → 默认热点快报
import logging
import os
from logging.handlers import TimedRotatingFileHandler
@@ -34,7 +30,7 @@ from app.utils.email_utils import send_html_email
logger = logging.getLogger("delivery_service")
# delivery_service 日志单独写文件
_delivery_log_dir = Path(__file__).resolve().parents[2] / "logs"
_delivery_log_dir.mkdir(parents=True, exist_ok=True)
_delivery_log_file = _delivery_log_dir / "delivery_check.log"
@@ -51,6 +47,8 @@ if not logger.handlers:
logger.setLevel(logging.INFO)
logger.propagate = False
# AI辅助生成:deepseek-v3-22026年3月20日
# 推送时间窗口:实际执行时刻与设定时间的最大容差(分钟)
DELIVERY_WINDOW_MINUTES = int(os.getenv("DELIVERY_WINDOW_MINUTES", 2))
# 同一用户两次推送之间的最小间隔(分钟)
@@ -64,13 +62,10 @@ DEFAULT_MODE_HOURS = int(os.getenv("DEFAULT_MODE_HOURS", 24))
# 用户时区无效时的兜底时区
DEFAULT_FALLBACK_TIMEZONE = os.getenv("DEFAULT_FALLBACK_TIMEZONE", "Asia/Shanghai")
# ==========================================
# 默认热点事件容器(无关键词时使用)
# ==========================================
@dataclass
class _DefaultEventItem:
"""
默认热点事件容器
无关键词订阅或关键词无匹配时的默认热点包装器,
接口与 MatchedEventResult 保持一致,方便统一传给模板。
"""
@@ -81,10 +76,6 @@ class _DefaultEventItem:
tags: list[str] = field(default_factory=list)
is_default: bool = True
# ==========================================
# 时区工具
# ==========================================
def _time_to_minutes(t: dt_time) -> int:
return t.hour * 60 + t.minute
@@ -125,10 +116,10 @@ def _ensure_aware(dt: datetime) -> datetime:
return dt.replace(tzinfo=timezone.utc)
return dt
# AI辅助生成结束
# ==========================================
# 数据库查询辅助
# ==========================================
def _should_skip_by_interval(db: Session, user_id: int) -> bool:
"""检查用户是否仍在冷却期内,避免短时间内重复推送"""
row = (
@@ -297,9 +288,9 @@ def _record_delivery(
db.commit()
# ==========================================
# AI辅助生成:deepseek-v3-22026年3月20日
# 推送准备
# ==========================================
@dataclass
class _PendingPush:
"""暂存需要发送邮件的信息,便于在 async 上下文中发送。"""
@@ -309,6 +300,7 @@ class _PendingPush:
html_body: str
event_ids: list[int]
# AI生成结束
def _prepare_user_push(db: Session, user: AppUser, schedule: UserDeliverySchedule) -> _PendingPush | None:
"""
@@ -331,7 +323,6 @@ def _prepare_user_push(db: Session, user: AppUser, schedule: UserDeliverySchedul
pushed_ids = _get_already_pushed_event_ids(db, user_id)
# 决策:有关键词且有匹配 → 匹配模式;否则 → 默认热点模式
items: list = []
is_default = False
@@ -361,7 +352,6 @@ def _prepare_user_push(db: Session, user: AppUser, schedule: UserDeliverySchedul
logger.info(f"用户 {user_id} 默认热点无可推送内容,跳过")
return None
# 批量加载平台数据(来源名、标题、URL、排名)
event_ids = [item.event.id for item in items]
platforms_map = _load_event_platforms(db, event_ids)
@@ -383,9 +373,6 @@ def _prepare_user_push(db: Session, user: AppUser, schedule: UserDeliverySchedul
)
# ==========================================
# 调度主入口
# ==========================================
async def check_and_deliver() -> None:
"""
定时推送主入口,由 APScheduler 每分钟调用。
@@ -412,7 +399,6 @@ async def check_and_deliver() -> None:
if not user:
continue
# 将 UTC 转为用户本地时间,判断是否落在推送窗口内
user_current = _user_local_time(now, user.timezone)
if not _is_within_window(schedule.delivery_time, user_current):
continue
@@ -422,7 +408,6 @@ async def check_and_deliver() -> None:
if pending is None:
continue
# 异步按优先级尝试各邮件渠道
sent = False
for target_email in pending.email_targets:
try:
+13 -36
View File
@@ -1,8 +1,3 @@
# app/services/fetcher_service.py
"""
抓取服务:从外部 API 拉取热搜/RSS 数据,做查重、向量聚类、入库
热搜分支:语义聚类到 UnifiedEventRSS 分支:写入 NewsArticle
"""
import os
import hashlib
from datetime import timedelta
@@ -19,6 +14,8 @@ from app.models.models import (
HeadlineRevision, RankingLog, SourceType, utcnow, UnifiedEvent
)
# AI辅助生成:deepseek-v3-22026年3月20日
# 加载环境变量
load_dotenv()
hf_token = os.getenv("HF_TOKEN")
@@ -26,11 +23,13 @@ SIMILARITY_THRESHOLD = float(os.getenv("SIMILARITY_THRESHOLD", 0.72))
API_BASE_URL = os.getenv("API_BASE_URL", "https://newsnow.busiyi.world/api/s")
EMBEDDING_MODEL_PATH = os.getenv("EMBEDDING_MODEL_PATH", "")
print("正在加载 BAAI/bge-m3 向量模型...")
print("正在加载模型...")
# 全局单例
embedder_model = SentenceTransformer(EMBEDDING_MODEL_PATH, local_files_only=True, device="cuda")
embedder_model = SentenceTransformer(EMBEDDING_MODEL_PATH, local_files_only=True)
print("模型加载完成。")
# AI生成结束
def generate_md5(text: str) -> str:
"""生成 32 位 MD5 作为 external_id,用于跨平台去重"""
@@ -88,10 +87,10 @@ class UnifiedEventClusterer:
new_unified = UnifiedEvent(
unified_title=title,
center_embedding=embedding_json,
hot_score=1 # 初始热度
hot_score=1
)
self.db.add(new_unified)
self.db.flush() # 获取自增的主键 ID
self.db.flush()
# 更新缓存
self.event_vectors.append(new_vec)
@@ -109,11 +108,8 @@ def process_hot_trend_item(db, source, item, index: int, external_id: str, exist
event_to_log = None
# 查重:已存在则可能只需更新标题/排名;不存在则需聚类并新建
if existing_event:
# 场景 A1:老熟人
if existing_event.current_headline != title:
# 标题被暗改,此时需要重新算一次 Embedding
new_embedding_json, _ = embeddings_dict[title]
revision = HeadlineRevision(
@@ -123,30 +119,25 @@ def process_hot_trend_item(db, source, item, index: int, external_id: str, exist
)
db.add(revision)
existing_event.current_headline = title
existing_event.title_embedding = new_embedding_json # 更新为新标题的语义向量
# 注:这里不改变它所属的 unified_event_id,因为大体还是同一件事
existing_event.title_embedding = new_embedding_json
existing_event.current_ranking = index
existing_event.event_url = item_url
event_to_log = existing_event
else:
# 场景 A2:这是一条彻底的全新热搜
# 1. 计算向量
new_embedding_json, new_vec = embeddings_dict[title]
# 2. 扔进聚类中枢找归宿
new_embedding_json, new_vec = embeddings_dict[title]
matched_event_id = clusterer.match_or_create(title, new_embedding_json, new_vec)
# 3. 落库
new_event = TrendingEvent(
source_id=source.id,
external_id=external_id,
current_headline=title,
event_url=item_url,
current_ranking=index,
title_embedding=new_embedding_json, # 存入向量
unified_event_id=matched_event_id # 挂载到大事件下
title_embedding=new_embedding_json,
unified_event_id=matched_event_id
)
db.add(new_event)
db.flush()
@@ -192,7 +183,6 @@ def process_source_data(db, source, items: list) -> int:
saved_count = 0
platform_id = source.home_url
# 1. 批量计算外部 ID 并聚合要计算的文本
valid_items = []
external_ids = []
for item in items:
@@ -209,7 +199,6 @@ def process_source_data(db, source, items: list) -> int:
if not valid_items:
return 0
# 批量查重:按 external_id 判断是更新还是新增
existing_events_dict = {}
existing_articles_dict = {}
@@ -226,7 +215,6 @@ def process_source_data(db, source, items: list) -> int:
).all()
existing_articles_dict = {art.external_id: art for art in existing_articles}
# 仅对需要算向量的标题做批量 embedding,避免重复计算
texts_to_embed = []
if source.source_type in (SourceType.HOT_TREND, SourceType.API):
for item, external_id in valid_items:
@@ -238,15 +226,12 @@ def process_source_data(db, source, items: list) -> int:
else:
texts_to_embed.append(title)
# 4. 批量执行大模型推理
embeddings_dict = generate_embeddings_batch(texts_to_embed)
# 初始化聚类器(只在热搜模式下需要,且只初始化一次)
clusterer = None
if source.source_type in (SourceType.HOT_TREND, SourceType.API):
clusterer = UnifiedEventClusterer(db)
# 按来源类型分流:热搜/API → TrendingEvent + 聚类;RSS → NewsArticle
for index, (item, external_id) in enumerate(valid_items, 1):
if source.source_type in (SourceType.HOT_TREND, SourceType.API):
existing_event = existing_events_dict.get(external_id)
@@ -269,14 +254,12 @@ async def fetch_and_save_trending_data():
"""
print(f"[{utcnow()}] 开始执行定时抓取任务...")
# 获取启用的信息源 - 这个只读操作用一个短连接
with SessionLocal() as db:
sources = db.query(InfoSource).filter(InfoSource.is_enabled == True).all()
if not sources:
print("没有找到启用的信息源,任务结束。")
return
# 我们把 source 的信息提前提取出来,避免在异步中长期持有 session
source_configs = [
{
"id": s.id,
@@ -287,7 +270,6 @@ async def fetch_and_save_trending_data():
for s in sources
]
# 伪装请求头,规避反爬
custom_headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36",
"Accept": "application/json, text/plain, */*",
@@ -304,13 +286,11 @@ async def fetch_and_save_trending_data():
url = f"{API_BASE_URL}?id={platform_id}&latest"
try:
# 1. 网络请求(可能耗时较长,不要包在 db session 里)
response = await client.get(url)
response.raise_for_status()
data_json = response.json()
items = data_json.get("items", [])
# 2. 数据库事务操作(尽量短,单独使用 session)
with SessionLocal() as db:
# 重新从短 session 中获取 source 实例,以免 detached
source = db.query(InfoSource).get(s_config["id"])
@@ -319,10 +299,8 @@ async def fetch_and_save_trending_data():
task_log = DataSyncTask(source_id=source.id, items_fetched=0)
try:
# 调用数据处理层
saved_count = process_source_data(db, source, items)
# 业务事务成功提交
task_log.items_fetched = saved_count
task_log.task_status = TaskStatus.SUCCESS
db.add(task_log)
@@ -330,10 +308,9 @@ async def fetch_and_save_trending_data():
print(f"[{source.source_name}] ({source.source_type}) 成功抓取并更新了 {saved_count} 条数据")
except Exception as e:
db.rollback()
raise e # 抛出给外层捕获记录日志
raise e
except Exception as e:
# 异常拦截与错误隔离,另起一个超短事务记录日志
with SessionLocal() as log_db:
try:
new_task_log = DataSyncTask(source_id=s_config["id"], items_fetched=0)
+2 -23
View File
@@ -1,7 +1,3 @@
"""
匹配服务:根据用户兴趣关键词(精确 + 语义)推荐事件
打分融合:标签/标题匹配分 + 标签相关度 + 热度 + 新鲜度加成
"""
import os
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
@@ -13,6 +9,7 @@ from sqlalchemy.orm import Session
from app.models.models import ExtractedTopic, TargetType, UnifiedEvent, UserTopicPreference, utcnow
from app.services.fetcher_service import embedder_model
# AI辅助生成:deepseek-v3-22026年3月20日
# 语义匹配阈值:用户关键词和事件标签/标题向量相似度达到该值才计入语义命中
DEFAULT_PREFERENCE_SEMANTIC_THRESHOLD = 0.78
@@ -35,6 +32,7 @@ class MatchedEventResult:
semantic_hits: list[dict[str, Any]]
tags: list[str]
# AI生成结束
def _normalize_text(text: str) -> str:
"""统一小写与首尾空白,便于做稳定匹配。"""
@@ -80,7 +78,6 @@ def _build_keyword_embedding_map(keywords: list[str]) -> dict[str, np.ndarray]:
uncached_keywords = []
# 1. 尝试从缓存获取
for keyword in keywords:
if not keyword:
continue
@@ -89,9 +86,7 @@ def _build_keyword_embedding_map(keywords: list[str]) -> dict[str, np.ndarray]:
else:
uncached_keywords.append(keyword)
# 2. 对未命中的词进行统一的批量推理
if uncached_keywords:
# 去重,避免同一个未缓存的词被计算多次
unique_uncached = list(dict.fromkeys(uncached_keywords))
vectors = embedder_model.encode(unique_uncached, normalize_embeddings=True, show_progress_bar=False)
@@ -102,7 +97,6 @@ def _build_keyword_embedding_map(keywords: list[str]) -> dict[str, np.ndarray]:
for k in keys_to_delete:
del _EMBEDDING_CACHE[k]
# 3. 将新计算的向量存入缓存并回填结果
for keyword, vec in zip(unique_uncached, vectors):
vec_array = np.asarray(vec, dtype=np.float32)
_EMBEDDING_CACHE[keyword] = vec_array
@@ -172,7 +166,6 @@ def recommend_events_for_user(
else PREFERENCE_SEMANTIC_THRESHOLD
)
# 1. 读取用户兴趣词
preferences = (
db.query(UserTopicPreference)
.filter(UserTopicPreference.user_id == user_id)
@@ -185,7 +178,6 @@ def recommend_events_for_user(
if not preference_keywords:
return []
# 2. 读取候选事件(时间 + 热度过滤,避免全表扫描)
time_limit = utcnow() - timedelta(hours=hours)
events = (
db.query(UnifiedEvent)
@@ -213,20 +205,17 @@ def recommend_events_for_user(
.all()
)
# 组织事件标签映射:event_id -> [(tag, relevance_score), ...]
event_topics: dict[int, list[tuple[str, float | None]]] = {}
for event_id, topic_keyword, relevance_score in topic_rows:
if not topic_keyword:
continue
event_topics.setdefault(event_id, []).append((topic_keyword, relevance_score))
# 3. 批量编码用户词与标签词,减少模型调用次数
unique_preference_keywords = list(dict.fromkeys(preference_keywords))
unique_topic_keywords = list(dict.fromkeys([row[1] for row in topic_rows if row[1]]))
pref_vec_map = _build_keyword_embedding_map(unique_preference_keywords)
topic_vec_map = _build_keyword_embedding_map(unique_topic_keywords)
# 预先建立“标准化后用户词集合”,用于精确匹配
normalized_preference_pairs = [
(word, _normalize_text(word))
for word in unique_preference_keywords
@@ -246,20 +235,15 @@ def recommend_events_for_user(
exact_hits: list[str] = []
semantic_hits: list[dict[str, Any]] = []
score = 0.0
# 对每个事件标签做精确匹配或语义匹配
for topic_keyword, topic_relevance in topic_list:
topic_relevance_score = float(topic_relevance) if topic_relevance is not None else 50.0
# 1) 精确命中(包括完全相等与包含关系)
matched_pref = _find_exact_preference_match(topic_keyword, normalized_preference_pairs)
if matched_pref is not None:
exact_hits.append(topic_keyword)
# 精确命中给较高基础分,标签自身相关度作为增益
score += 45.0 + topic_relevance_score * 0.2
continue
# 2) 语义命中(未精确命中时再算)
best_pref, best_sim = _find_best_semantic_match(topic_keyword, topic_vec_map, pref_vec_map)
if best_pref is not None and best_sim >= similarity_threshold:
@@ -270,10 +254,8 @@ def recommend_events_for_user(
"similarity": round(best_sim, 4),
}
)
# 语义命中分略低于精确命中,并由相似度放大
score += best_sim * 35.0 + topic_relevance_score * 0.12
# 标题也参与匹配,但权重低于结构化标签,避免长标题过度主导排序。
event_title = (event.unified_title or "").strip()
if event_title:
title_exact_pref = _find_exact_preference_match(event_title, normalized_preference_pairs)
@@ -292,15 +274,12 @@ def recommend_events_for_user(
)
score += best_sim * 24.0
# 如果精确和语义都没命中,直接跳过
if not exact_hits and not semantic_hits:
continue
# 融合事件热度和新鲜度,避免只看语义分
score += min(event.hot_score, 100) * 0.3
score += _calc_freshness_bonus(event)
# 返回标签时做去重,保证接口稳定
tags = list(dict.fromkeys([item[0] for item in topic_list]))
scored_results.append(
MatchedEventResult(
+4 -8
View File
@@ -1,8 +1,3 @@
# app/services/summary_service.py
"""
摘要服务:调用 LLM 生成统一标题、综合摘要、话题标签
定时任务:对热度达标且未摘要的事件批量处理
"""
import json
import os
from datetime import timedelta
@@ -26,12 +21,16 @@ from app.prompts.summary_prompts import (
)
from app.services.fetcher_service import embedder_model
# AI辅助生成:deepseek-v3-22026年3月20日
HOT_SCORE_THRESHOLD = int(os.getenv("HOT_SCORE_THRESHOLD", 3))
TOPIC_TAG_MIN_HOT_SCORE = int(os.getenv("TOPIC_TAG_MIN_HOT_SCORE", HOT_SCORE_THRESHOLD))
TOPIC_SIMILARITY_THRESHOLD = float(os.getenv("TOPIC_SIMILARITY_THRESHOLD", 0.82))
TOPIC_TAG_MAX_COUNT = int(os.getenv("TOPIC_TAG_MAX_COUNT", 8))
AI_API_KEY = os.getenv("AI_API_KEY", "")
# AI生成结束
deepseek_client = AsyncOpenAI(
api_key=AI_API_KEY,
@@ -184,7 +183,6 @@ async def generate_unified_summaries():
"""定时任务:对热度达标且未摘要的事件刷新标题、摘要、标签"""
print(f"[{utcnow()}] Start unified summary generation task...")
# 先提取需要处理的事件 ID,尽早释放 session,不长期占用 db session
with SessionLocal() as db:
recent_threshold = utcnow() - timedelta(days=3)
events = db.query(UnifiedEvent).filter(
@@ -197,11 +195,9 @@ async def generate_unified_summaries():
print("No events require summary update in this round.")
return
# 复制出需要的信息,脱离 session
event_ids = [e.id for e in events]
event_hot_scores = {e.id: e.hot_score for e in events}
# 外层循环:针对每个 event_id 开启一个极短生命周期的 session 获取依赖数据
for event_id in event_ids:
platform_dict: dict[str, set[str]] = {}
with SessionLocal() as db:
+1 -1
View File
@@ -1,4 +1,4 @@
# app/utils/email_utils.py
# AI辅助生成:deepseek-v3-22026年3月20日
import os
from email.message import EmailMessage
import aiosmtplib
+1 -3
View File
@@ -1,4 +1,4 @@
# run.py
# AI辅助生成:deepseek-v3-22026年3月20日
import uvicorn
import os
from dotenv import load_dotenv
@@ -8,11 +8,9 @@ if __name__ == "__main__":
load_dotenv()
PORT = int(os.getenv("PORT", 8000))
# 启动服务
uvicorn.run(
app="app.main:app",
host="0.0.0.0",
port=PORT,
# reload=True,
workers=1
)
+12 -3
View File
@@ -49,7 +49,6 @@ dependencies = [
"safetensors==0.7.0",
"scikit-learn==1.8.0",
"scipy==1.17.1",
"sentence-transformers==5.2.3",
"shellingham==1.5.4",
"sniffio==1.3.1",
"sqlalchemy==2.0.48",
@@ -57,8 +56,6 @@ dependencies = [
"sympy==1.14.0",
"threadpoolctl==3.6.0",
"tokenizers==0.22.2",
"torch==2.10.0",
"torchvision==0.25.0",
"tqdm==4.67.3",
"transformers==5.3.0",
"typer==0.24.1",
@@ -68,4 +65,16 @@ dependencies = [
"tzlocal==5.3.1",
"urllib3==2.6.3",
"uvicorn==0.41.0",
"torch==2.11.0+cpu",
"torchvision==0.26.0+cpu",
"torchaudio==2.11.0+cpu",
"sentence-transformers>=5.3.0",
]
[[tool.uv.index]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
default = false
[tool.uv]
index-strategy = "unsafe-best-match"
+1720
View File
File diff suppressed because it is too large Load Diff
+1 -1
View File
@@ -20,7 +20,7 @@ WORKDIR /backend
COPY backend/pyproject.toml backend/uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
pip install --no-cache-dir uv && \
uv sync --frozen --no-dev
uv sync --frozen --no-dev --index https://pypi.tuna.tsinghua.edu.cn/simple/
# 复制后端代码
COPY backend/app ./app
-1
View File
@@ -5,7 +5,6 @@
<link rel="icon" href="/favicon.svg">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>聚势智见 - 基于语义聚类与大模型的热点资讯聚合平台</title>
<!-- Font Awesome 图标库 -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css">
</head>
<body>
-6
View File
@@ -6,9 +6,6 @@ export function fetchDeliveryConfig(userId: number): Promise<DeliveryConfig> {
return apiGet<DeliveryConfig>(`/users/${userId}/delivery-config`)
}
// ==========================================
// 推送时间表
// ==========================================
export function createDeliverySchedule(
userId: number,
payload: { delivery_time: string; is_active?: boolean },
@@ -34,9 +31,6 @@ export function deleteDeliverySchedule(
return apiDelete(`/users/${userId}/delivery-schedules/${scheduleId}`)
}
// ==========================================
// 推送渠道
// ==========================================
export function createPushEndpoint(
userId: number,
payload: {
-3
View File
@@ -1,6 +1,3 @@
/**
* 认证 API:登录、注册、发送验证码(不走通用 client,无 Bearer
*/
import type {
AuthTokenResponse,
LoginPayload,
+2
View File
@@ -1,3 +1,5 @@
// AI辅助生成:deepseek-v3-22026年3月20日
export interface UserProfile {
id: number
email: string
+1
View File
@@ -1,3 +1,4 @@
<!-- AI辅助生成deepseek-v3-22026年3月20日 -->
<!-- 仪表盘布局侧边栏导航主内容区移动端抽屉 -->
<script setup lang="ts">
import { computed, ref } from 'vue'
+1 -3
View File
@@ -1,6 +1,4 @@
/**
* 应用入口:初始化 Vue、Pinia、路由、主题
*/
// AI辅助生成:deepseek-v3-22026年3月20日
import './assets/main.css'
import { createApp } from 'vue'
+2
View File
@@ -1,3 +1,5 @@
<!-- AI辅助生成deepseek-v3-22026年3月20日 -->
<!-- 关于页占位 -->
<template>
<div class="about">
+4 -51
View File
@@ -1,4 +1,3 @@
<!-- 主仪表盘事件流为你推荐公关修改追踪系统状态 -->
<script setup lang="ts">
import { onMounted, ref, computed, watch } from 'vue'
import { useRoute, useRouter } from 'vue-router'
@@ -13,9 +12,6 @@ import type { MatchedEvent, UserTopicPreference } from '@/types/preference'
const route = useRoute()
const router = useRouter()
// ==========================================
// 聚光灯:从推荐页跳转过来时,按 ID 单独拉取目标事件
// ==========================================
const spotlightEvent = ref<UnifiedEvent | null>(null)
const loadingSpotlight = ref(false)
@@ -41,9 +37,7 @@ function dismissSpotlight() {
const authStore = useAuthStore()
const userId = computed(() => authStore.user?.id ?? 0)
// ==========================================
// 状态
// ==========================================
const events = ref<UnifiedEvent[]>([])
const revisions = ref<HeadlineRevision[]>([])
const stats = ref<SystemStats | null>(null)
@@ -101,9 +95,6 @@ const recSortOptions = [
{ label: '最新', value: 'created_at' },
]
// ==========================================
// 平台视觉映射
// ==========================================
const platformIconMap: Record<string, string> = {
微博热搜: 'fa-brands fa-weibo',
微博: 'fa-brands fa-weibo',
@@ -171,9 +162,7 @@ function formatRelativeTime(dateStr: string): string {
return `${days} 天前`
}
// ==========================================
// 排名图表配置
// ==========================================
function getRankingChartOptions(history: number[], platformColor: string) {
return {
series: [{ name: '排名', data: history }],
@@ -249,9 +238,6 @@ function platformKey(eventId: number, index: number, prefix: string = ''): strin
return prefix ? `${prefix}-${eventId}-${index}` : `${eventId}-${index}`
}
// ==========================================
// 数据加载
// ==========================================
async function loadEvents(append = false) {
if (!append) {
loading.value = true
@@ -681,9 +667,6 @@ watch(() => route.query.event, (newId) => {
</template>
</div>
<!-- ==========================================
右侧:小组件面板
========================================== -->
<div class="widgets-column">
<!-- 为你推荐(基于用户关键词的匹配) -->
@@ -846,10 +829,10 @@ watch(() => route.query.event, (newId) => {
<i class="fa-regular fa-clock"></i>
最后同步: {{ lastSyncText }}
</span>
<span v-if="stats.error_tasks_today > 0" class="error-count">
<!-- <span v-if="stats.error_tasks_today > 0" class="error-count">
<i class="fa-solid fa-triangle-exclamation"></i>
{{ stats.error_tasks_today }} 个异常
</span>
</span> -->
</div>
</section>
</div>
@@ -897,9 +880,6 @@ watch(() => route.query.event, (newId) => {
margin-top: 6px;
}
/* ==========================================
网格布局
========================================== */
.content-grid {
display: flex;
flex-direction: column;
@@ -931,9 +911,6 @@ watch(() => route.query.event, (newId) => {
}
}
/* ==========================================
区域标题 + 热度阈值 (高级磨砂透明风)
========================================== */
.section-header {
margin-bottom: 24px;
}
@@ -1023,9 +1000,6 @@ watch(() => route.query.event, (newId) => {
box-shadow: var(--shadow-sm);
}
/* ==========================================
事件卡片
========================================== */
/* 事件卡片,加入毛玻璃与高级阴影 */
.event-card {
background: var(--bg-surface);
@@ -1142,9 +1116,6 @@ watch(() => route.query.event, (newId) => {
color: transparent;
}
/* ==========================================
平台列表 + 悬停排名图
========================================== */
.platforms-list {
display: flex;
flex-direction: column;
@@ -1262,9 +1233,6 @@ watch(() => route.query.event, (newId) => {
max-height: 120px;
}
/* ==========================================
加载更多
========================================== */
.load-more-wrapper {
display: flex;
flex-direction: column;
@@ -1310,9 +1278,6 @@ watch(() => route.query.event, (newId) => {
color: var(--text-placeholder);
}
/* ==========================================
小组件面板(通用)- 玻璃拟态高级质感
========================================== */
.widget-panel {
background: var(--bg-surface);
backdrop-filter: var(--backdrop-blur);
@@ -1406,9 +1371,6 @@ watch(() => route.query.event, (newId) => {
font-size: 13px;
}
/* ==========================================
为你推荐面板
========================================== */
.recommend-header {
background: rgba(139, 92, 246, 0.06);
border-bottom-color: rgba(139, 92, 246, 0.15);
@@ -1584,9 +1546,6 @@ watch(() => route.query.event, (newId) => {
font-size: 9px;
}
/* ==========================================
公关修改追踪
========================================== */
.revision-header {
background: rgba(239, 68, 68, 0.06);
border-bottom-color: rgba(239, 68, 68, 0.15);
@@ -1687,9 +1646,6 @@ watch(() => route.query.event, (newId) => {
margin: 0;
}
/* ==========================================
系统状态
========================================== */
.stats-widget {
padding: 16px;
}
@@ -1759,9 +1715,6 @@ watch(() => route.query.event, (newId) => {
color: var(--status-error);
}
/* ==========================================
聚光灯区块
========================================== */
.spotlight-wrap {
margin-bottom: 20px;
}
-28
View File
@@ -1,4 +1,3 @@
<!-- 推送设置页管理推送时间表与推送渠道邮箱等 -->
<script setup lang="ts">
import { onMounted, ref, computed } from 'vue'
@@ -62,9 +61,6 @@ async function loadConfig() {
}
}
// ==========================================
// 推送时间表操作
// ==========================================
async function handleAddSchedule() {
if (!userId.value || !newTime.value) return
submittingSchedule.value = true
@@ -109,9 +105,6 @@ async function handleDeleteSchedule(schedule: DeliverySchedule) {
}
}
// ==========================================
// 推送渠道操作
// ==========================================
async function handleAddEndpoint() {
if (!userId.value || !newChannelAccount.value.trim()) return
submittingEndpoint.value = true
@@ -186,9 +179,6 @@ onMounted(loadConfig)
</div>
<div v-else class="config-sections">
<!-- ==========================================
推送时间管理
========================================== -->
<section class="config-section">
<div class="section-title">
<h2><i class="fa-regular fa-clock"></i> 推送时间</h2>
@@ -229,9 +219,6 @@ onMounted(loadConfig)
</div>
</section>
<!-- ==========================================
推送渠道管理
========================================== -->
<section class="config-section">
<div class="section-title">
<h2><i class="fa-solid fa-envelope"></i> 接收邮箱</h2>
@@ -374,9 +361,6 @@ onMounted(loadConfig)
color: var(--text-secondary);
}
/* ==========================================
通用区块样式
========================================== */
.config-sections {
display: flex;
flex-direction: column;
@@ -418,9 +402,6 @@ onMounted(loadConfig)
margin: 0;
}
/* ==========================================
添加行
========================================== */
.add-row {
display: flex;
gap: 10px;
@@ -497,9 +478,6 @@ onMounted(loadConfig)
font-size: 13px;
}
/* ==========================================
时间表列表
========================================== */
.schedule-list {
display: flex;
flex-direction: column;
@@ -573,9 +551,6 @@ onMounted(loadConfig)
background: rgba(239, 68, 68, 0.1);
}
/* ==========================================
渠道列表
========================================== */
.endpoint-add {
flex-wrap: wrap;
}
@@ -661,9 +636,6 @@ onMounted(loadConfig)
gap: 6px;
}
/* ==========================================
工作原理说明
========================================== */
.info-section {
background: transparent;
border: 1px dashed var(--border-subtle);
-1
View File
@@ -1,4 +1,3 @@
<!-- 概览页展示当前账户会话状态认证接入说明 -->
<script setup lang="ts">
import { computed } from 'vue'
import { useRouter } from 'vue-router'
-4
View File
@@ -1,4 +1,3 @@
<!-- 登录页支持密码登录与邮箱验证码登录 -->
<script setup lang="ts">
import { computed, onUnmounted, reactive, ref, watch } from 'vue'
import { useRoute, useRouter } from 'vue-router'
@@ -303,9 +302,6 @@ onUnmounted(() => {
</template>
<style scoped>
/* ==========================================
全新高级分屏布局与背景
========================================== */
.split-layout {
display: flex;
min-height: 100vh;
-4
View File
@@ -1,4 +1,3 @@
<!-- 注册页邮箱验证码 + 密码带密码强度提示 -->
<script setup lang="ts">
import { computed, onUnmounted, reactive, ref } from 'vue'
import { useRouter } from 'vue-router'
@@ -280,9 +279,6 @@ onUnmounted(() => {
</template>
<style scoped>
/* ==========================================
全新高级分屏布局与背景
========================================== */
.split-layout {
display: flex;
min-height: 100vh;
-1
View File
@@ -1,4 +1,3 @@
<!-- 公关修改追踪页展示热搜标题被偷偷修改的历史记录 -->
<script setup lang="ts">
import { computed, onMounted, ref, reactive } from 'vue'
+39 -6
View File
@@ -1,4 +1,3 @@
<!-- 事件追踪分析页关键词搜索时间热度图表关联事件列表 -->
<script setup lang="ts">
import { ref, computed } from 'vue'
import VueApexCharts from 'vue3-apexcharts'
@@ -235,17 +234,17 @@ async function handleSearch() {
<div class="tips-box glass-panel">
<h2 class="panel-title"><i class="fa-regular fa-lightbulb"></i> 搜索建议</h2>
<div class="tips-content">
<button class="tip-tag" @click="keyword='新能源汽车'; hours=168; handleSearch()">
<i class="fa-solid fa-rocket"></i> 新能源汽车
<button class="tip-tag" @click="keyword='火箭发射'; hours=168; handleSearch()">
<i class="fa-solid fa-rocket"></i> 火箭发射
</button>
<button class="tip-tag" @click="keyword='苹果公司'; hours=168; handleSearch()">
<i class="fa-brands fa-apple"></i> 苹果产业链
<i class="fa-brands fa-apple"></i> 苹果公司
</button>
<button class="tip-tag regex-tag" @click="keyword='AI|LLM'; hours=168; handleSearch()">
<i class="fa-solid fa-code-branch"></i> AI / 大模型
</button>
<button class="tip-tag regex-tag" @click="keyword='美国关税'; hours=168; handleSearch()">
<i class="fa-solid fa-flag-usa"></i> 美国关税
<button class="tip-tag regex-tag" @click="keyword='美国'; hours=168; handleSearch()">
<i class="fa-solid fa-flag-usa"></i> 美国
</button>
</div>
</div>
@@ -261,9 +260,15 @@ async function handleSearch() {
<div v-else-if="searchResult" class="results-container">
<section class="chart-section glass-panel">
<div class="section-header">
<div class="section-title-group">
<h2 class="section-title">
<i class="fa-solid fa-wave-square"></i> 时间热度脉络
</h2>
<span class="chart-tip">
<i class="fa-solid fa-hand-pointer"></i>
点击时间点查看具体事件列表
</span>
</div>
<span class="meta-info"> {{ searchResult.timeline.length }} 个时间节点 · 覆盖 {{ searchResult.events.length }} 个聚合事件</span>
</div>
@@ -553,6 +558,30 @@ async function handleSearch() {
color: var(--brand-primary);
}
.section-title-group {
display: flex;
align-items: center;
gap: 12px;
flex-wrap: wrap;
}
.chart-tip {
display: inline-flex;
align-items: center;
gap: 6px;
padding: 4px 10px;
border-radius: var(--radius-md);
background: var(--brand-primary-alpha);
border: 1px solid rgba(99, 102, 241, 0.2);
color: var(--brand-primary);
font-size: 12px;
font-weight: 600;
}
.chart-tip i {
font-size: 12px;
}
.time-filter-badge {
display: inline-flex;
align-items: center;
@@ -599,6 +628,10 @@ async function handleSearch() {
outline: none;
}
.chart-container :deep(.apexcharts-marker) {
cursor: pointer;
}
.events-section {
margin-top: 8px;
}
-1
View File
@@ -1,4 +1,3 @@
<!-- 兴趣关键词页添加/删除关键词查看命中事件 -->
<script setup lang="ts">
import { onMounted, ref, computed } from 'vue'