云栈社区»论坛 › 技术文档「 Note & Doc 」 › 从理论到实践：详解构建AI Agent的四大核心模式 ...

发回帖发新帖

5553 积分	0 好友	745 主题

发消息

从理论到实践：详解构建AI Agent的四大核心模式

发表于 2026-3-17 02:13:48 | 查看: 97| 回复: 0

你是否认为AI Agent仅仅是“升级版的聊天机器人”？这种观点在初期或许可以接受，就像我们习惯了一问一答的简单交互。但很快我们就会发现，这种单轮的提示-响应模式在处理复杂任务时显得力不从心。真正的质变发生在AI系统开始能够模拟人类的思考过程时：规划、行动、观察、反思、并循环往复。

这正是Agentic AI的核心魅力所在。要构建出具备这种能力的系统，理解其背后的几种核心设计模式至关重要。

Agent AI Patterns 思维导图

本文将深入剖析当前重塑AI系统构建方式的四种核心Agentic模式，详细解读每种模式的工作机制、适用场景，并探讨如何将它们组合起来，构建出真正可用、可投入生产的智能系统。

为什么Agentic模式现在如此关键？

传统的大语言模型（LLM）虽然“知识渊博”，但也相当脆弱。面对一个查询，它倾向于一次性给出最终答案，却无法进行自我纠正、反复迭代或制定周密的执行计划。这类模型缺乏与外部世界交互的“手脚”——它们无法主动浏览网页、生成图片、编写脚本，更不用说并行处理多项复杂任务了。

Agentic模式正是为了解决这些根本性局限而生的。 它将一个静态的语言模型转变为动态的推理引擎：把复杂的任务分解成多个简单的子任务，并通过单个或多个代理的协作来逐一完成。在这个过程中，Agent可以调用外部工具、校验自身输出、与其他Agent协同工作——这些都是单轮提示根本无法实现的场景。

模式一：反思（Reflection）—— 让Agent学会自我审视

想象一下在报社工作的场景：主编让你写一篇特稿。初稿完成后，在提交之前你一定会通读一遍，检查事实错误、找出论证薄弱点、核实信息来源、修改结论。这一轮自我审查的过程，就是“反思”。

大型语言模型（LLM）同样可以做到这一点，在返回最终结果之前，审视并迭代改进自己生成的内容。

在最简单的形态下，Reflection是一个两步循环。第一步，Agent生成初始输出（代码、文章、研究报告等）；第二步，同一个Agent或另一个专职的“评审”Agent对输出进行分析，识别缺陷和可改进之处。这个循环可以反复执行，直到输出质量达到预设的阈值。

基于反思的Agentic模式流程图

代码示例：

# Agent生成响应，批评它，然后改进它。
# 安装: pip install ollama
# 需要: Ollama在本地运行 → https://ollama.com
#            先拉取模型: ollama pull llama3.2

import ollama

MODEL = "llama3.2"

def generate(prompt: str) -> str:
    response = ollama.chat(
        model=MODEL,
        messages=[{"role": "user", "content": prompt}]
    )
    return response["message"]["content"]

def reflect_and_improve(task: str, iterations: int = 2) -> str:
    print(f"📝 Task: {task}\n")

    # 步骤1: 生成初始回答
    draft = generate(task)
    print(f"--- Draft ---\n{draft}\n")

    for i in range(iterations):
        # 步骤2: 批评草稿
        critique_prompt = f"""
You are a strict reviewer. Here is a task and a draft response.

Task: {task}
Draft: {draft}

Identify weaknesses or errors in the draft, then write an improved version.
"""
        draft = generate(critique_prompt)
        print(f"--- Iteration {i + 1} ---\n{draft}\n")

    return draft

if __name__ == "__main__":
    final = reflect_and_improve(
        task="Explain what a Large Language Model is in 3 sentences."
    )
    print(f"✅ Final Answer:\n{final}")

典型应用场景包括：

代码生成中的“生成-测试-定位Bug-重新生成”循环。
创意写作中对语调、清晰度和逻辑的反复打磨。
数据分析报告中对逻辑缺陷的自查与修正。
数学推理中逐步验证并修复中间步骤的错误。

模式二：工具使用（Tool Use）—— 给Agent装上“手脚”

工具使用赋予了Agent与训练数据之外的实时世界进行交互的能力。

Tool Use（也称为Function Calling）的基本流程如下：Agent接收到任务后，首先判断是否需要借助外部工具；如果需要，则生成一个结构化的工具调用请求；工具执行后返回结果；Agent再将结果纳入自己的推理链条中继续处理。

工具使用（Tool-Use）Agentic模式流程图

代码示例：

# 模式2: 工具使用
# Agent决定何时调用外部工具（搜索、计算器、天气）
# 并将其结果纳入最终回答。
# 安装: pip install ollama requests
# 需要: Ollama在本地运行 → https://ollama.com
#            先拉取模型: ollama pull llama3.2

import ollama
import requests
import json

MODEL = "llama3.2"

# ── 定义的工具 ──────────────────────────────────────────────────────────────

def get_weather(city: str) -> str:
    """Fetch current weather for a city using the free Open-Meteo API."""
    # 步骤1: 将城市名称地理编码为经纬度
    geo = requests.get(
        "https://geocoding-api.open-meteo.com/v1/search",
        params={"name": city, "count": 1}
    ).json()

    if not geo.get("results"):
        return f"Could not find location for '{city}'."

    lat = geo["results"][0]["latitude"]
    lon = geo["results"][0]["longitude"]

    # 步骤2: 获取天气
    weather = requests.get(
        "https://api.open-meteo.com/v1/forecast",
        params={
            "latitude": lat,
            "longitude": lon,
            "current_weather": True
        }
    ).json()

    current = weather["current_weather"]
    return (
        f"Weather in {city}: {current['temperature']}°C, "
        f"wind speed {current['windspeed']} km/h."
    )

def calculator(expression: str) -> str:
    """Safely evaluate a basic math expression."""
    try:
        result = eval(expression, {"__builtins__": {}})
        return f"Result of '{expression}' = {result}"
    except Exception as e:
        return f"Error evaluating expression: {e}"

# 工具注册表 - 将工具名称映射到函数
TOOLS = {
    "get_weather": get_weather,
    "calculator": calculator,
}

# 发送给模型的工具描述，让它知道有哪些可用工具
TOOL_DESCRIPTIONS = """
You have access to the following tools. Call a tool by responding with JSON in this exact format:
{"tool": "<tool_name>", "args": {"<arg>": "<value>"}}

Available tools:
1. get_weather(city: str) — Returns the current weather for a city.
2. calculator(expression: str) — Evaluates a math expression like "12 * 9".

If you don't need a tool, just answer directly in plain text.
Only call ONE tool per response.
"""

# ── Agent循环 ─────────────────────────────────────────────────────────────────

def run_tool_agent(user_query: str) -> str:
    print(f"🧠 Query: {user_query}\n")

    messages = [
        {"role": "system", "content": TOOL_DESCRIPTIONS},
        {"role": "user",   "content": user_query}
    ]

    for step in range(5):  # 最多5步
        response = ollama.chat(model=MODEL, messages=messages)
        reply = response["message"]["content"].strip()

        # 尝试从模型的响应中解析工具调用
        try:
            call = json.loads(reply)
            tool_name = call.get("tool")
            args      = call.get("args", {})

            if tool_name in TOOLS:
                print(f"🔧 Calling tool: {tool_name}({args})")
                tool_result = TOOLS[tool_name](**args)
                print(f"📦 Tool result: {tool_result}\n")

                # 将结果反馈给模型
                messages.append({"role": "assistant", "content": reply})
                messages.append({"role": "user",      "content": f"Tool result: {tool_result}\nNow answer the original question."})
                continue

        except (json.JSONDecodeError, TypeError):
            pass

        # 没有工具调用 - 这是最终答案
        print(f"✅ Final Answer:\n{reply}")
        return reply

    return "Agent did not reach a conclusion."

if __name__ == "__main__":
    run_tool_agent("What is the weather in Tokyo? Also what is 144 divided by 12?")

适用场景非常广泛：

研究类Agent从网络、API、PDF等多源浏览和综合信息。
个人助理Agent管理待办事项、发送邮件或在特定应用中下单。
金融Agent实时拉取股票价格与市场行情数据。
新闻聚合Agent从多个渠道抓取并整合最新资讯。

模式三：规划（Planning）—— 行动之前先“想清楚”

假设有人接到开发一个应用程序的任务，不假思索就开始写代码，没有任何前期规划。这种做法无疑会让人感到不安。

优秀的工程师会先拆解问题，列出可执行的步骤序列，然后逐步推进。Agent同样需要这种能力。Planning模式赋予Agent的正是这种“先想后做”的理性思维。

规划主要有两种主流变体。

第一种是ReAct（推理+行动）。 这是目前最流行的规划方式，Agent在“推理（Thought）—行动（Action）—观察（Observation）”三个环节之间交替循环：先思考下一步该做什么，然后执行动作（调用工具或完成某个步骤），接着观察执行结果，再基于结果进入下一轮推理，如此循环直至任务完成。

基于ReAct的规划模式流程图

第二种是Plan-and-Execute（计划并执行）。 与ReAct那种“边走边看”的即时规划风格不同，Plan-and-Execute要求Agent在动手之前，一次性生成一个完整的、线性的执行计划，然后严格按照这个计划顺序执行。

Phase 1 — 计划：
“为了完成这个任务，我将采取以下5个步骤：...”

Phase 2 — 执行：
步骤1 → [执行] → 结果
步骤2 → [执行] → 结果
...
步骤5 → 最终输出

代码示例（ReAct）：

# 模式3: 规划 (ReAct — 推理 + 行动)
# Agent推理下一步该做什么，行动（使用工具），
# 观察结果，然后再次推理——直到任务完成。
# 安装: pip install ollama requests
# 需要: Ollama在本地运行 → https://ollama.com
#            先拉取模型: ollama pull llama3.2

import ollama
import requests
import re

MODEL = "llama3.2"

# ── 模拟工具 ─────────────────────────────────────────────────────────────────

def search_web(query: str) -> str:
    """Simulated web search (replace with real search API in production)."""
    mock_db = {
        "transformer architecture":
            "Transformers use self-attention mechanisms to process sequences in parallel.",
        "who invented transformers":
            "The Transformer architecture was introduced by Vaswani et al. in the paper "
            "'Attention Is All You Need' (2017) at Google Brain.",
        "applications of transformers":
            "Transformers are used in NLP (BERT, GPT), computer vision (ViT), "
            "speech recognition, and drug discovery.",
    }
    for key, value in mock_db.items():
        if key in query.lower():
            return value
    return f"No results found for '{query}'."

def summarize(text: str) -> str:
    """Use the LLM itself to summarize a piece of text."""
    response = ollama.chat(
        model=MODEL,
        messages=[{"role": "user", "content": f"Summarize this in one sentence:\n{text}"}]
    )
    return response["message"]["content"].strip()

TOOLS = {
    "search_web": search_web,
    "summarize":  summarize,
}

# ── ReAct系统提示词 ────────────────────────────────────────────────────────

REACT_PROMPT = """You are a reasoning agent. Solve tasks step by step using this loop:

Thought: Think about what to do next.
Action: <tool_name>(<argument>)
Observation: [result of the action]
... repeat as needed ...
Final Answer: <your complete answer>

Available tools:
- search_web(query) — Search for information on a topic.
- summarize(text)   — Summarize a block of text into one sentence.

Always start with a Thought. End with Final Answer when done.
"""

# ── ReAct Agent循环 ───────────────────────────────────────────────────────────

def parse_action(text: str):
    """Extract tool name and argument from 'Action: tool_name(argument)'"""
    match = re.search(r"Action:\s*(\w+)\((.+?)\)", text, re.DOTALL)
    if match:
        return match.group(1).strip(), match.group(2).strip().strip('"').strip("'")
    return None, None

def run_react_agent(task: str) -> str:
    print(f"🎯 Task: {task}\n")

    messages = [
        {"role": "system", "content": REACT_PROMPT},
        {"role": "user",   "content": f"Task: {task}"}
    ]

    for step in range(8):  # 最多8个ReAct步骤
        response = ollama.chat(model=MODEL, messages=messages)
        reply    = response["message"]["content"].strip()

        print(f"[Step {step + 1}]\n{reply}\n")

        # 检查Agent是否已达到最终答案
        if "Final Answer:" in reply:
            final = reply.split("Final Answer:")[-1].strip()
            print(f"✅ Final Answer:\n{final}")
            return final

        # 解析并执行动作
        tool_name, argument = parse_action(reply)

        if tool_name and tool_name in TOOLS:
            observation = TOOLS[tool_name](argument)
            print(f"🔭 Observation: {observation}\n")

            # 将交互追加到消息历史
            messages.append({"role": "assistant", "content": reply})
            messages.append({"role": "user",      "content": f"Observation: {observation}"})
        else:
            # 未找到有效动作 - 要求Agent继续
            messages.append({"role": "assistant", "content": reply})
            messages.append({"role": "user",      "content": "Continue your reasoning."})

    return "Agent reached maximum steps without a final answer."

if __name__ == "__main__":
    run_react_agent(
        "Who invented the Transformer architecture and what are its main applications?"
    )

Planning模式的适用场景集中在以下几类任务：

涉及多个相互依赖步骤的研究性任务，例如市场调研或学术文献综述。
软件开发中，需要先进行架构设计，再进入具体编码阶段的工作流程。
业务流程自动化中，需要在关键决策节点反复推理和修正的场景。

模式四：多智能体协作—— 分工、专精、协同增效

现实中，没有人能精通所有领域。团队之所以强大，正是因为每个成员各有专长，通过协作可以产出任何个体都无法独立完成的成果。

AI Agent也遵循同样的逻辑。多个具备不同专长的Agent各司其职，协同工作，以完成单一Agent难以应对的复杂任务。这一模式主要有以下几种架构变体。

第一种：编排器 + 子代理（Orchestrator + SubAgents）。 一个中央编排器负责接收原始任务、进行任务分解，然后将各个子任务分配给多个专业的子Agent并行处理，最后汇总并整合所有子结果。

多阶段多Agent协作流程图

第二种：对等代理协作/辩论（Peer Agents）。 多个能力相当的Agent独立处理同一任务，随后对比、辩论各自的结论，通过“思想的碰撞”逼近最优解。这种模式类似于研究者之间的学术讨论，在对精度和可靠性要求极高的任务中（如法律分析、医学诊断）尤其有价值。

多Agent协作辩论流程图

第三种：顺序流水线（Sequential Pipeline）。 可以将其想象成一个专业分工的流水线团队，在同一任务上依次接力。Agent A产出的结果交给Agent B处理，Agent B加工后的结果再交给Agent C，以此类推，直到得到最终输出。

[数据采集员] → [分析师] → [事实核查员] → [撰稿人] → 最终报告

代码示例（编排器 + 子代理）：

# 模式4: 多Agent协作
# 一个编排器Agent分解任务并将子任务委派给
# 专业化Agent（研究员、分析师、写手）。
# 每个Agent都有一个专注的角色和系统提示词。
# 安装: pip install ollama
# 需要: Ollama在本地运行 → https://ollama.com
#            先拉取模型: ollama pull llama3.2

import ollama

MODEL = "llama3.2"

# ── 基础Agent ─────────────────────────────────────────────────────────────────

def run_agent(system_prompt: str, user_message: str) -> str:
    """Run a single agent with a given role and task."""
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user",   "content": user_message}
        ]
    )
    return response["message"]["content"].strip()

# ── 专业化Agent ─────────────────────────────────────────────────────────

def researcher_agent(topic: str) -> str:
    """Gathers key facts and background on a topic."""
    system = (
        "You are a research specialist. When given a topic, provide 4–5 key facts, "
        "background context, and important figures or milestones. Be concise and factual."
    )
    print("🔍 Researcher Agent working...")
    result = run_agent(system, f"Research this topic: {topic}")
    print(f"   Done.\n")
    return result

def analyst_agent(research: str) -> str:
    """Identifies trends, insights, and implications from research."""
    system = (
        "You are a data analyst and critical thinker. Given research notes, "
        "extract 3 key insights or trends and explain their implications. "
        "Be analytical and structured."
    )
    print("📊 Analyst Agent working...")
    result = run_agent(system, f"Analyze these research notes:\n{research}")
    print(f"   Done.\n")
    return result

def writer_agent(topic: str, research: str, analysis: str) -> str:
    """Writes a polished short article from research and analysis."""
    system = (
        "You are a professional tech writer. Given a topic, research notes, and analysis, "
        "write a clear, engaging 150-word summary suitable for a technical blog. "
        "Use simple language and a logical structure."
    )
    print("✍️  Writer Agent working...")
    result = run_agent(
        system,
        f"Topic: {topic}\n\nResearch:\n{research}\n\nAnalysis:\n{analysis}\n\nWrite the blog summary."
    )
    print(f"   Done.\n")
    return result

# ── 编排器 ───────────────────────────────────────────────────────────────

def orchestrator(topic: str) -> str:
    """
    Central orchestrator that:
    1. Delegates research to the Researcher Agent
    2. Sends research to the Analyst Agent
    3. Passes both outputs to the Writer Agent
    4. Returns the final polished output
    """
    print(f"🎬 Orchestrator: Starting pipeline for topic → '{topic}'\n")
    print("=" * 55)

    # 步骤1: 研究
    research = researcher_agent(topic)

    # 步骤2: 分析
    analysis = analyst_agent(research)

    # 步骤3: 写作
    final_article = writer_agent(topic, research, analysis)

    print("=" * 55)
    print(f"\n✅ Final Output:\n\n{final_article}")
    return final_article

if __name__ == "__main__":
    orchestrator("The rise of Agentic AI systems in 2024")

多Agent协作的落地场景十分多样：

软件开发： 由架构师、编码员、代码审查员、测试员组成的虚拟团队。
金融分析： 数据采集、量化分析、风险评估、报告撰写等Agent分工配合。
内容创作： 研究、写作、编辑、SEO优化各环节的Agent组成创作流水线。
客户支持： 由智能分诊、专业知识处理、复杂问题升级等Agent组成的协同响应系统。

模式之间如何组合使用？

这四种核心模式绝非互斥。在实际的生产级系统中，最成熟、强大的方案往往是它们的有机叠加与组合。

以一个AI驱动的竞争情报分析平台为例，它的架构很可能融合了所有模式：

多模式组合应用表示例

工具使用（Tool Use）：Agent从网络、API、PDF等外部数据源采集信息。
规划（Planning）：Agent制定计划，决定检查哪些数据源以及按什么顺序进行。
多智能体协作（Multi-Agent）：研究Agent、分析Agent、写作Agent并行工作。
反思（Reflection）：专门的评审Agent对最终生成的报告进行准确性核查。

每种模式都在弥补一类特定的短板——Reflection提升输出质量，Tool Use拓展交互边界，Planning应对步骤复杂性，Multi-Agent解决规模和专业分工问题。将这四层能力叠加在一起，一个简单的LLM就演进成了可以在生产环境中自主运行、持续进化的智能系统。

总结

单轮提示、简单问答的AI时代正在成为过去。取而代之的，是具备思考、规划、行动、自检和协作能力的智能体（Agent）——这与一支配合默契的人类工程团队所做的事情，在本质上已没有区别。

反思（Reflection）、工具使用（Tool Use）、规划（Planning）和多智能体协作（Multi-Agent Collaboration），这四种模式构成了这次AI能力范式转移的基础构件。理解它们的意义远超理论层面；它直接决定了你最终构建出的，是一个只能用于演示的脆弱原型，还是一个能够在真实、复杂的生产环境中持续创造价值的稳健系统。对AI Agent架构设计感兴趣的开发者，欢迎在云栈社区交流更多实践经验。

上一篇：SQL WITH查询实战：简化复杂数据处理与层级数据查询
下一篇：阿里成立Alibaba Token Hub事业群，统筹AI时代核心Token资源

智能体, LLM, Python, 多智能体协作, 自动化