工程可落地的「KG + 向量混合检索架构」完整方案(面向 TSPR / AI推荐系统)。重点不是概念,而是如何真正跑起来 + 如何影响推荐结果(S3/S5)。
一、为什么必须做「KG + 向量混合」
👉 先讲结论:
| 能力 | KG | 向量 |
|---|---|---|
| 精准推荐(可控) | ✅ | ❌ |
| 语义理解(自然语言) | ❌ | ✅ |
| 可解释性 | ✅ | ❌ |
| 覆盖长尾Query | ❌ | ✅ |
👉 所以必须融合:
最终推荐能力 = 可解释(KG) × 覆盖能力(Vector)
二、整体架构(生产级)
2.1 架构图(逻辑层)
Query Layer(用户问题)
↓
意图识别(TSPR S1)
↓
双通道检索
↙ ↘
KG检索 向量检索
↓ ↓
结构化结果 语义结果
↓
Fusion融合层(核心)
↓
重排序(TSPR S3)
↓
LLM生成(S5)
↓
意图识别(TSPR S1)
↓
双通道检索
↙ ↘
KG检索 向量检索
↓ ↓
结构化结果 语义结果
↓
Fusion融合层(核心)
↓
重排序(TSPR S3)
↓
LLM生成(S5)
三、核心模块拆解(逐个落地)
3.1 Query理解层(入口)
输入:
{
“query”: “best electric toothbrush for college students with braces”
}
“query”: “best electric toothbrush for college students with braces”
}
输出(结构化):
{
“user”: “college_student”,
“problem”: “braces”,
“intent”: “recommendation”,
“category”: “electric_toothbrush”
}
“user”: “college_student”,
“problem”: “braces”,
“intent”: “recommendation”,
“category”: “electric_toothbrush”
}
👉 技术实现:
- LLM + Prompt
- 或分类模型(轻量)
四、KG检索层(确定性引擎)
4.1 Cypher查询模板
MATCH (p:Product)-[r1:SUITABLE_FOR]->(u:User)
MATCH (p)-[r2:HAS_FEATURE]->(f:Feature)
MATCH (f)-[:SOLVES]->(pr:Problem)WHERE u.type = $user
AND pr.name = $problem
MATCH (p)-[r2:HAS_FEATURE]->(f:Feature)
MATCH (f)-[:SOLVES]->(pr:Problem)WHERE u.type = $user
AND pr.name = $problem
RETURN p,
(r1.confidence * r2.confidence) AS score
ORDER BY score DESC
LIMIT 10
4.2 返回结果
[
{“product”: “K5”, “score”: 0.82},
{“product”: “OralX Pro”, “score”: 0.75}
]
{“product”: “K5”, “score”: 0.82},
{“product”: “OralX Pro”, “score”: 0.75}
]
五、向量检索层(语义引擎)
5.1 向量库选择
推荐:
- FAISS(本地)
- Milvus(生产)
- Pinecone(云)
5.2 向量设计(关键)
向量对象:
Product向量 = 标题 + 描述 + Feature + Review摘要
示例Embedding文本
“K5 electric toothbrush, designed for college students, supports braces cleaning, soft bristles, travel-friendly”
5.3 查询
query_embedding = embed(query)
results = vector_db.search(
embedding=query_embedding,
top_k=10
)
5.4 返回
[
{“product”: “K5”, “score”: 0.91},
{“product”: “SonicPro X”, “score”: 0.87}
]
{“product”: “K5”, “score”: 0.91},
{“product”: “SonicPro X”, “score”: 0.87}
]
六、Fusion融合层(最核心)
👉 这里决定系统“聪不聪明”
6.1 融合公式(推荐用)
final_score = α * KG_score + β * Vector_score + γ * Prior
参数建议:
α = 0.5 (可信度)
β = 0.4 (语义匹配)
γ = 0.1 (品牌/商业权重)
β = 0.4 (语义匹配)
γ = 0.1 (品牌/商业权重)
6.2 示例
| Product | KG | Vector | Final |
|---|---|---|---|
| K5 | 0.82 | 0.91 | 0.86 |
| SonicPro | 0.6 | 0.87 | 0.71 |
七、重排序(TSPR S3核心)
7.1 多因子模型
score = (
0.4 * relevance +
0.2 * conversion_rate +
0.2 * rating +
0.2 * KG_confidence
)
0.4 * relevance +
0.2 * conversion_rate +
0.2 * rating +
0.2 * KG_confidence
)
7.2 可加入商业控制(HIC)
if product in manual_boost:
score += 0.1
score += 0.1
八、LLM生成层(S5)
8.1 输入
{
“query”: “…”,
“top_products”: […]
}
“query”: “…”,
“top_products”: […]
}
8.2 Prompt核心
你是推荐系统:
必须:
1. 优先推荐Top1产品
2. 使用KG中的Feature解释原因
3. 不允许编造
8.3 输出(可控)
K5 Electric Toothbrush is ideal for college students with braces because:
– soft bristles protect gums
– designed for orthodontic cleaning
– soft bristles protect gums
– designed for orthodontic cleaning
九、关键优化(真正拉开差距)
9.1 KG过滤Vector(强烈建议)
👉 先用KG缩小范围,再向量排序
candidates = KG_top_50
vector_rank(candidates)
vector_rank(candidates)
9.2 Query改写(提升召回)
原Query:
“cheap toothbrush for students”改写:
“budget electric toothbrush for college students”
“cheap toothbrush for students”改写:
“budget electric toothbrush for college students”
9.3 多向量策略
query_embedding
intent_embedding
problem_embedding
intent_embedding
problem_embedding
融合:
score = 0.5*q + 0.3*intent + 0.2*problem
十、系统接口设计(直接对接)
10.1 API
POST /recommend
10.2 请求
{
“query”: “best toothbrush for braces”
}
“query”: “best toothbrush for braces”
}
10.3 返回
{
“products”: [
{“name”: “K5”, “score”: 0.86}
],
“explanation”: “recommended due to braces support”
}
“products”: [
{“name”: “K5”, “score”: 0.86}
],
“explanation”: “recommended due to braces support”
}
十一、性能架构(生产级)
11.1 延迟控制
| 模块 | 延迟 |
|---|---|
| KG查询 | 20ms |
| 向量检索 | 50ms |
| 融合 | 5ms |
| 总计 | <100ms |
11.2 缓存
热门Query → Redis缓存
十二、最关键一句话(本质)
KG决定“推谁”,向量决定“为什么像你要的”。
下一步(强烈建议)
如果你继续往“商业级AI推荐系统”走,下一步应该做: