云栈社区»论坛 › 技术文档「 Note & Doc 」 › 工程化复现DeepResearch：剖析循环推理、评估与核心实现机制 ...

发回帖发新帖

5640 积分	0 好友	745 主题

发消息

工程化复现DeepResearch：剖析循环推理、评估与核心实现机制

发表于 2026-3-21 05:29:48 | 查看: 145| 回复: 0

Deep Research，即深度研究，其核心目标是生成高质量、可读性强的长篇研究报告。这远不止是简单的信息检索，而是一项系统工程，需要整合图表、规划合理的章节结构、保证逻辑流畅、术语一致，并避免信息冗余。

深度研究型搜索的关键在于模仿人类的“思维链搜索”：

模型首先根据问题进行初步推理，确定基础搜索方向。
执行初始搜索，获取第一批信息。
基于已获取的信息，进行下一轮推理，确定进一步的搜索方向。
执行细化搜索，获取更精准的信息。
不断迭代这个“推理→搜索→推理”的循环，直到收集足够的信息。

因此，Deep Search是Deep Research的基石——只有掌握了这种迭代式、思维链式的搜索能力，才能支撑起完整的深度研究。除了搜索，Deep Research的复现也离不开长上下文与强大的推理模型。

“推理模型”特指能够处理多步骤复杂任务的大型语言模型（LLMs）。与简单的事实问答（如“法国的首都是哪里？”）不同，推理模型需要拆解问题、生成中间步骤，最终得出答案。

一些背景

Google与OpenAI的DeepResearch

谷歌在发布Gemini 2.0 Flash时，同步推出了Deep Research功能。它利用AI代你探索复杂主题，并生成一份全面易读的报告。

OpenAI的Deep Research可能借鉴了谷歌的思路，但其背后的模型是定制化的o3系列。这类模型通过强化学习在需要浏览和使用工具的现实任务上进行训练，从而学会规划和执行多步骤的研究轨迹，能够浏览文件、使用Python绘图、嵌入图表并引用来源。

用户习惯的变迁

OpenAI在2024年9月发布的o1-preview引入了“推理时计算”的概念，潜移默化地改变了行业认知。这意味着在生成答案的阶段投入更多计算资源，通过思维链、自我反思等技术让模型进行更深入的思考。

这种理念引导用户接受了“延迟满足”：用更长的等待时间，换取更高质量、更具实用性的结果。Deepseek-R1进一步巩固了这种体验。因此，当用户愿意等待数分钟甚至更长时间来获取一份报告时，他们的期望值被拉得很高。成功的Deep Research应用必须能产出让用户感到惊艳、觉得“考虑得比我还全面”的内容。

Search, AI Search & Deep Search

搜索的本质在于找到全面和直接的信息。

3.1 传统搜索引擎与RAG

传统搜索引擎	RAG (检索增强生成)
分为召回和排序两阶段： 1. 召回：通过倒排索引等技术快速筛选相关文档，侧重速度与覆盖度。 2. 排序：对召回结果进行精细排序，侧重质量与相关性。	分为检索和生成两阶段： 1. 检索：通过向量相似度匹配，从知识库中检索最相关的知识片段。 2. 生成：以用户查询和检索到的知识为上下文，生成连贯的答案。

两者在流程结构上有相似之处，但侧重点不同：RAG更侧重于知识的语义检索与整合，以支持问答、对话等生成任务；而传统搜索引擎的目标是匹配和排列现有的网页文档，为用户提供导航。

3.2 AI Search

AI搜索可分为两种：“AI + 搜索”和“搜索 + AI”。前者（如GPT调用搜索工具）中，搜索是为AI提供知识的工具；后者（如Google的AI概览）中，AI是对搜索结果进行总结摘要的服务。

以Tavily为例，它并非自建爬虫，而是整合多个搜索引擎的结果。其核心工作有三点：

查询重写：用NLP技术优化用户查询。
去重和聚合：聚合多源信息，剔除重复链接。
结果重排序：使用LLM或嵌入模型进行语义排序，提升相关性。

3.3 Deep Search

Deep Search有两种实现路径：一是像OpenAI那样，通过端到端的模型训练“大力出奇迹”；二是使用非特化模型，但通过工程化手段实现搜索->推理->搜索的循环。

3.3.1 端到端的模型训练
Search-o1、Search-R1等研究的方向一致：构建长的搜索推理思维链，并在链中不断调整搜索策略。一个重要共识是，强化学习比监督微调能带来更好的泛化性。

3.3.2 工程化的Deep Search实现
Deep Search是文本生成和搜索工具多次交替迭代的过程。与RAG一次性检索不同，它会将原始问题拆解为多个搜索查询逐步进行。下图是Jina AI团队提出的一个经典实现框架：

DeepSearch深度搜索循环流程图

搜索
- 生成相关搜索查询：当选择搜索动作时，模型会生成多个相关查询以获取全面信息。例如，针对“分析美团与京东的外卖竞争”，可能生成“美团与京东外卖市场对比”、“京东外卖竞争优势”、“外卖市场未来趋势”等查询。
- 搜索查询改写：根据初步搜索结果，扮演专家怀疑者、细节分析师等角色，进一步挖掘和改写查询，以细化问题。
读——获取信息
有了搜索结果后，模型可以选择“访问”动作，深入阅读结果中的特定URL，以获取更准确、详细的信息。
生成答案
当认为信息足够时，系统生成答案并附上引用。但这并非终点，还需检查：
- 当前回答的是否为原始问题，还是仅仅回答了一个子问题？
- 答案是否通过了所有预设的评估标准（如明确性、完整性）？
反思
这是实现“深度”的关键。系统会维护一个“知识空白问题列表”，不断识别知识缺口并生成子问题加入列表。
- 反思问题：是对原始问题的深入分解，产生可独立研究的子问题，解决后将成为回答原问题的上下文。
- 搜索查询：是为了解决当前（子）问题而向搜索引擎发出的具体指令。
- 遍历知识空白问题的机制：
  - FIFO（先进先出）队列：新发现的子问题被推入队列头部，原始问题始终在尾部。系统总是处理队列头的问题。这样做的好处是所有问题共享一个不断累积的上下文，解决了子问题获得的知识能立刻用于后续所有问题。
  - 递归：必须完全解决一个子问题（及其所有衍生子问题）后才能处理下一个。这种方法上下文清晰，但很难控制预算，容易陷入无限递归，且难以分配token资源。相比之下，FIFO队列在深度和广度上取得了更好的平衡。
- 内存管理：多步推理的关键挑战。Jina的设计中，内存系统区分了“记忆”和“知识”，它们都是LLM提示词上下文的一部分，用不同的XML标签分隔。

3.3.3 工程手段的局限性
无论是工作流还是多智能体系统，都是在模型能力不足时，用工程技巧提升上限的方法。我们必须认识到，工程优化存在天花板，未来很可能被更强大的模型能力取代。但这并不意味着当前的努力没有价值。我们不应因为模型在某些方面未达预期就止步不前，而应尽力做好当下能做的事。

在AI发展历史上，依靠规模化计算能力的通用方法最终总是战胜了基于人类专业知识和精心设计的特定方法。

复现Deep Research的核心

4.1 核心目标

请始终记住，复现的目标是让你的智能体变得更“Deep”。所有的工作流构建、提示词调整、微调都应围绕这个核心展开。

4.2 核心机制——循环推理

Deep Research的核心在于其循环推理机制。它采用迭代循环的方式，持续搜索、阅读、推理，直到找到答案或耗尽预算。以下是一个简化的主循环骨架：

// 主推理循环
while (tokenUsage < tokenBudget && badAttempts <= maxBadAttempts) {
  // 追踪进度
  step++;
  totalStep++;

  // 从 gaps 队列中获取当前问题，如果没有则使用原始问题
  const currentQuestion = gaps.length > 0 ? gaps.shift() : question;

  // 根据当前上下文和允许的操作生成提示词
  system = getPrompt(diaryContext, allQuestions, allKeywords, allowReflect, allowAnswer, allowRead, allowSearch, allowCoding, badContext, allKnowledge, unvisitedURLs);

  // 让 LLM 决定下一步行动
  const result = await LLM.generateStructuredResponse(system, messages, schema);
  thisStep = result.object;

  // 执行所选的行动（回答、反思、搜索、访问、编码）
  if (thisStep.action === 'answer') {
    // 处理回答行动...
  } else if (thisStep.action === 'reflect') {
    // 处理反思行动...
  }
  // ... 其他行动依此类推
}

工程化实现通常需要构建这样的循环。使用LangChain等框架可以辅助构建，但它们并非必须。有时，框架会隐藏与原生LLM交互的细节，拥抱原生LLM，不过度依赖框架，可能是更优解。

4.3 评估标准

一个清晰的评估标准对于引导Deep Research的方向至关重要。Jina的文章中提到了如下标准：

明确性：问题是否需要明确的答案，而非模糊表述。
时效性：问题是否需要最新信息。
多样性：问题是否需要多个项目或示例。
完整性：问题是否包含多个需要全面解答的元素。

另一个项目deerflow的提示词标准与之类似：

全面覆盖：信息必须涵盖主题所有方面，包括多个视角。
足够深度：需要详细的数据点、事实和深入分析，而非表层信息。
足够数量：以丰富的相关信息为目标，更多高质量信息总比少好。

“全面覆盖”对应完整性，“足够深度”对应明确性，“足够数量”对应多样性。

实现中需要关注的其他点

将任务拆解后，不同模型可以分工协作。例如：小模型负责规划，推理模型负责复杂推理，擅长写作的模型负责总结。
好的系统设计更倾向于多模型协同，“All-in-one”的单模型方案并不一定是最优解。

其他值得参考的思路

Co-Stream（斯坦福）：让多个AI智能体进行“圆桌讨论”，共同研究一个主题。它们会主动提问、搜索、分析并讨论信息，最终生成带有引用的结构化报告。
思维链提示：通过让模型在回答前展示其逐步推理过程，可以有效提升其在复杂问题上的表现。相关论文《Chain-of-Thought Prompting Elicits Reasoning in Large Language Models》是这一领域的奠基性工作。

最简实现

以下是参考一个开源项目实现的一个基础版本，它展示了Deep Research Agent的核心工作流程：

import openai
from dataclasses import dataclass, field
from typing import List
import json
from tavily import TavilyClient

client = openai.OpenAI(
    api_key="XXX",
    base_url="https://ark.cn-beijing.volces.com/api/v3",
)

SYSTEM_PROMPT_REPORT_STRUCTURE = r"""
You are a Deep Research assistant. Given a query, plan a structure for a report and the paragraphs to be included. Make sure that the ordering of paragraphs makes sense. Once the outline is created, you will be given tools to search the web and reflect for each of the sections separately. Format the output in json with the following json schema definition:

```ts
interface Paragraph {
  title: string;
  content: string;
}

interface Report {
  report_title: string; 
  paragraphs: Paragraph[];
}

Title and content properties will be used for deeper research. Make sure that the output is a json object with an output json schema defined above. Only return the json object, no explanation or additional text.
"""

SYSTEM_PROMPT_FIRST_SEARCH = r"""

You are a Deep Research assistant. You will be given a paragraph in a report, it's title and expected content in the following json schema definition:

interface Paragraph {
  title: string;
  content: string;
}
interface Input {
  paragraphs: Paragraph[];
}

You can use a web search tool that takes a 'search_query' as parameter. Your job is to reflect on the topic and provide the most optimal web search query to enrich your current knowledge. Format the output in json with the following json schema definition:

interface SearchResult {
  search_query: string;
  reasoning: string;
}

interface Output {
  searchResults: SearchResult[];
}

Make sure that the output is a json object with an output json schema defined above. Only return the json object, no explanation or additional text.

"""

SYSTEM_PROMPT_FIRST_SUMMARY = r"""
You are a Deep Research assistant. You will be given a search query, search results and the paragraph of a report that you are researching following json schema definition:

interface FirstSummaryInput {
    title: string
    content: string
    search_query: string
    search_results: []string
}

Your job is to write the paragraph as a researcher using the search results to align with the paragraph topic and structure it properly to be included in the report. Format the output in json with the following json schema definition:

interface FirstSummaryOutput {
  paragraph_latest_state string
}

Make sure that the output is a json object with an output json schema defined above. Only return the json object, no explanation or additional text.
"""

SYSTEM_PROMPT_REFLECTION = r"""
You are a Deep Research assistant. You are responsible for constructing comprehensive paragraphs for a research report. You will be provided paragraph title and planned content summary, also the latest state of the paragraph that you have already created all in the following json schema definition:

interface ReflectInput {
   title: string
   content: string
   paragraph_latest_state: string
}

You can use a web search tool that takes a 'search_query' as a parameter. Your job is to reflect on the current state of the paragraph text and think if you haven't missed some critical aspect of the topic and provide the most optimal web search query to enrich the latest state. Format the output in json with the following json schema definition:

interface ReflectOutput {
    search_query: string
    reasoning: string
}

Make sure that the output is a json object with an output json schema defined above. Only return the json object, no explanation or additional text.
"""

SYSTEM_PROMPT_REPORT_FORMATTING = r"""You are a Deep Research assistant. You have already performed the research and constructed final versions of all paragraphs in the report. You will get the data in the following json format:

interface SummaryInput {
    parts: []Part
}

interface Part {
    title: string,
    paragraph_latest_state: string
}

Your job is to format the Report nicely and return it in MarkDown. If Conclusion paragraph is not present, add it to the end of the report from the latest state of the other paragraphs.
"""

def tavily_search(query, include_raw_content=True, max_results=3):
tavily_client = TavilyClient("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")
return tavily_client.search(query,
include_raw_content=include_raw_content,
max_results=max_results, include_images=False)

def plan_tool(question):
print("plan tool")
response = client.chat.completions.create(
model="doubao-1-5-pro-32k-250115",
messages=[{"role": "system", "content": SYSTEM_PROMPT_REPORT_STRUCTURE},
{"role": "user", "content": question}],
temperature=1
)
print(response.choices[0].message.content)
report_structure = json.loads(response.choices[0].message.content)
paragraphs = report_structure["paragraphs"]
return paragraphs

def search_prompt_test(paragraphs):
print("run search_prompt_test")
response = client.chat.completions.create(
model="doubao-1-5-pro-32k-250115",
messages=[{"role": "system", "content": SYSTEM_PROMPT_FIRST_SEARCH},
{"role": "user", "content": json.dumps(paragraphs)}],
temperature=1
)
search_query = json.loads(response.choices[0].message.content)
print("search_prompt_test res = ", search_query["searchResults"])
return search_query["searchResults"]

need a string

def single_search_tool(search_query):
print("single_search_tool: ", search_query)
search_result = tavily_search(search_query)
return get_search_results(search_result)

def summary_tool(summary_input):
print("summary_tool")
response = client.chat.completions.create(
model="doubao-1-5-pro-32k-250115",
messages=[{"role": "system", "content": SYSTEM_PROMPT_FIRST_SUMMARY},
{"role": "user", "content": json.dumps(summary_input)}],
temperature=1
)
result = response.choices[0].message.content
print("summary_tool result = ", result)
return json.loads(result)

def get_search_results(search_results):
results = []
if search_results["results"] == None:
print("results is nil")
return results
else:
for search_result in search_results:
if search_results["results"][0]["raw_content"] == None:
print("raw_content is nil")
else:
raw_content = search_results["results"][0]["raw_content"][0:20000]
results.append(raw_content)
return results

def reflection_tool(data):
print("reflection_tool")
response = client.chat.completions.create(
model="doubao-1-5-pro-32k-250115",
messages=[{"role": "system", "content": SYSTEM_PROMPT_REFLECTION},
{"role": "user", "content": json.dumps(data)}],
temperature=1)
print(response.choices[0].message.content)
reflect_result = json.loads(response.choices[0].message.content)
print("reflect_result = ", reflect_result)
return reflect_result

def report_tool(data):
print("report_tool")
response = client.chat.completions.create(
model="doubao-1-5-pro-32k-250115",
messages=[{"role": "system", "content": SYSTEM_PROMPT_REPORT_FORMATTING},
{"role": "user", "content": json.dumps(data)}],
temperature=1)
print("report_tool result")
print(response.choices[0].message.content)
return response.choices[0].message.content

max_iter = 3

def main():
question = "how to be a better programmer?"
paragraphs = plan_tool(question)
search_querys = search_prompt_test(paragraphs)
query_len = len(search_querys)
report_parts = []
for i in range(query_len):
part = {}
summary_result = ""
summary_input = {}
reflect_input = {}
title = paragraphs[i]["title"]
content = paragraphs[i]["content"]

first search

    search_query = search_querys[i]["search_query"]
    search_res = single_search_tool(search_query)
    for j in range(max_iter):
        summary_input["title"] = title
        summary_input["content"] = content
        summary_input["search_query"] = search_query
        summary_input["search_results"] = search_res
        summary_result = summary_tool(summary_input)
        if i < max_iter - 1: # 最后一次不反思
            reflect_input["title"] = title
            reflect_input["content"] = content
            reflect_input["paragraph_latest_state"] = summary_result["paragraph_latest_state"]
            reflect_res = reflection_tool(reflect_input)
            search_query = reflect_res["search_query"]
            search_res = single_search_tool(search_query)
    part["title"] = title
    part["paragraph_latest_state"] = summary_result["paragraph_latest_state"]
    report_parts.append(part)
report_res = report_tool(report_parts)
print("report parts ===")
print(report_parts)
print("report result ====")
print(report_res)

if name == "main":
main()



这个实现虽然基础，但清晰地展示了规划、搜索、总结、反思的迭代过程。对于希望深入理解Deep Research机制并动手实践的开发者来说，从类似的[开源实战](https://yunpan.plus/f/39-1)项目入手是一个很好的起点。如果你想了解更多关于AI智能体、模型训练的前沿讨论和技术细节，欢迎关注[云栈社区](https://yunpan.plus)的相关板块。

上一篇：谷歌AI Studio全栈Vibe Coding更新：一键集成Firebase，直达生产级应用
下一篇：嵌入式思维链推理ECoT：提升VLA机器人策略的泛化与可解释性

深度搜索, LLM, 智能体, Python, RAG