building multi-agent systems with langchain

single agents hit walls fast. ask a general-purpose llm to answer a question that spans a pdf, a live api, and a database simultaneously — it either hallucinates or refuses. the fix isn’t a smarter model. it’s decomposition.

this is what led me to multi-agent architecture on the smartle project: a system where each agent owns a specific capability and a coordinator routes tasks to the right one.

the shift from single to multi-agent

a single-agent loop looks like:

user → llm + tools → response

a multi-agent system looks more like:

user → coordinator → [doc agent | api agent | db agent | web agent] → synthesiser → response

the coordinator doesn’t answer questions. it classifies intent, selects the right specialist, passes context, and merges results. each specialist is a focused agent with a smaller tool surface and a tighter prompt.

task routing

the routing logic is the most important piece. i experimented with two approaches:

llm-based routing — pass the query to a lightweight model and ask it to classify which agent should handle it. flexible, but adds latency and can misclassify ambiguous queries.

rule-based routing — keywords and patterns mapped to agents. fast and predictable, but brittle for edge cases.

in production i ended up with a hybrid: rule-based first pass, llm fallback for unmatched cases. 90% of queries hit the fast path.

memory architecture

agents need memory at two levels:

working memory — what’s happening in the current session. i keep this in a ConversationBufferWindowMemory with a k of 6 turns. beyond that you’re paying for tokens that don’t meaningfully affect answers.
long-term memory — facts the system should recall across sessions. i store these as embeddings in chromadb and retrieve them at the start of each new conversation via a similarity search.

the key insight: don’t give every agent access to all memory. the doc agent only needs document context. the db agent only needs schema and previous query results. scoping memory per agent cuts token usage and reduces hallucination.

tool design

each tool should do exactly one thing and return structured output. if a tool returns unstructured text, the llm has to parse it — and parsing errors cascade.

class SearchDocumentsTool(BaseTool):
    name = "search_documents"
    description = "search indexed documents for relevant chunks given a query"

    def _run(self, query: str) -> list[dict]:
        results = vectorstore.similarity_search(query, k=4)
        return [{"source": r.metadata["source"], "text": r.page_content} for r in results]

returning a list of dicts instead of a string means the coordinator can reason about sources explicitly.

what breaks in production

context bleed — agents sometimes pass too much context forward, causing the synthesiser to over-weight irrelevant information. solution: summarise agent outputs before passing them downstream.

tool call loops — an agent can get stuck calling the same tool repeatedly when it doesn’t get the answer it expects. add a max-iteration guard and fail gracefully.

cost — multi-agent systems can burn through tokens quickly. instrument every llm call with token counts from day one. i added prometheus metrics and was surprised how much a single complex query cost.

google adk — a different take on agents

while building the stitch-mcp project, i used google’s agent development kit (adk) as an alternative approach to multi-agent systems, specifically for gemini-native workflows. where langchain is flexible and model-agnostic, adk is opinionated and optimised for gemini’s multimodal and function-calling capabilities.

the core adk abstraction is clean:

from google.adk.agents import Agent
from google.adk.tools import FunctionTool

def get_project_list() -> list[dict]:
    """list all available stitch projects"""
    return stitch_client.list_projects()

agent = Agent(
    model="gemini-2.0-flash",
    tools=[FunctionTool(get_project_list)],
    system_prompt="you are a UI generation assistant with access to google stitch.",
)

response = await agent.run_async("create a new project called 'dashboard' and generate a login screen")

the major advantage: adk handles gemini’s function calling format natively. you define python functions, adk generates the tool schema, and gemini calls them — no manual schema mapping. the tradeoff is that you’re locked into the gemini ecosystem.

adk vs langchain for multi-agent work:

	langchain	google adk
model support	any model	gemini-first
tool schema	manual or pydantic	auto from python signatures
multi-agent	langgraph required	native agent teams (preview)
streaming	manual	built-in
mcp integration	via custom tools	via mcp adapter

for the stitch-mcp project, i used adk’s gemini integration to drive the mcp tool calls directly — the agent sees the mcp tool schema at runtime and chains tool calls (generate screen → apply design system → export) without explicit orchestration code.

where i’d take it next

declarative agent graphs (like langgraph) make the routing logic explicit and testable. i’m moving smartle to a graph-based architecture where each node is an agent and edges are conditional — much easier to debug than chained llm calls.

this is based on what i built for smartle. the patterns here apply broadly but your mileage will vary depending on your data sources and latency requirements.