Single-document summarization
Whole input fits in context, no fetching needed, model picks the salient parts in one shot.
Comparison — — by Mahmoud Zalt
Multi-step chains beat single LLM calls on reliability and structure for complex tasks. Single calls win on latency and cost. Here is how to choose.
A single LLM call is one prompt, one response. You hand the model the whole task and trust it to figure the rest out in a single forward pass. A multi-step chain breaks that same task into a sequence (or graph) of smaller calls, each with its own prompt, often with tool calls, retrieval, or validators between them. Frameworks like LangChain, CrewAI, LangGraph, Haystack, and n8n popularized this pattern, and tools like Lindy bake it into a visual canvas. The real difference is not the number of calls. It is whether intermediate state is inspectable and recoverable. In a single call, the model holds everything in its head and you get one shot at the answer. In a chain, every step writes structured output that the next step reads, so you can log it, branch on it, retry it, and replay it. That changes the failure mode from silent hallucination to a step that visibly broke at line 3.
Single calls win more often than the agent crowd admits. If a task fits inside one prompt, has no branching logic, needs no external data fetch, and the model is strong enough to one-shot it, then adding a chain is pure latency tax and complexity. Classification, short rewriting, tone-shifting, summarization of a single document, simple SQL generation from a known schema: all of these usually do better as one call with a tight system prompt. Modern frontier models (Claude, GPT, Gemini, Kimi) are good enough that a five-step chain over a task they could finish in one call adds two seconds of latency, three times the cost, and a fresh surface area for errors at every hop. The honest rule: if you cannot articulate which step would fail and why, you do not need a chain. Reach for one when the task has real branching, tool use, or quality gates, not because frameworks make it look professional.
Whole input fits in context, no fetching needed, model picks the salient parts in one shot.
Input + style instruction in one prompt. Chains add latency without changing the output quality.
Bounded label set, structured output, deterministic enough that retry logic is the wrong fix.
Known schema, short query, one model strong enough to one-shot beats a planner plus executor.
Chat where the next turn depends on the previous turn, not on external data or branching logic.
Chains earn their cost when the task has at least one of four traits: branching that depends on prior output, external data fetch that cannot fit in the original prompt, a quality gate that must reject the answer and retry, or a long horizon where context simply will not survive a single pass. Lead enrichment is the textbook case: fetch the company domain, scrape the about page, classify the industry, generate a personalized opener, validate the opener against a tone rubric, then write to CRM. No single model call does that cleanly because the steps depend on each other and each one needs different tools. The reliability gain is real but uneven. Across the kind of multi-stage tasks Apollo, Clay, and Lindy run in production, chains typically push end-to-end success from somewhere around fifty percent on a single-shot prompt to ninety percent or higher once you add explicit validators and retries between steps.
The middle ground most teams miss is that you do not have to pick chains or single calls globally. Inside one workflow, the right shape is often a single strong call for the bounded steps and a chain only for the genuinely branching ones. That is also the shape that lines up with how goal-driven AI Employees plan their work in practice. Instead of forcing every task through a hand-built graph, the employee decides per task whether one shot is enough or whether to expand into a small chain of tool calls. That removes the static-graph maintenance cost that breaks LangChain and n8n setups the moment a downstream step changes shape.
If hand-wiring chains in LangChain, CrewAI, or n8n is not where you want to spend your week, the practical alternative is a goal-driven AI Employee that picks its own steps. You give it a goal (enrich this lead, write this campaign, audit this funnel) and the employee decides whether the task is one call or twelve, then executes. The point is not that chains are bad. They are just an implementation detail that most non-engineering founders should not be wiring themselves.
Latency: a chain pays the round-trip cost of every step, so a five-step chain on a fast model is usually four to eight seconds when a single call would be one. Cost: tokens compound at every step because each prompt re-states context, so a multi-step task often costs three to ten times the single-call equivalent. Reliability: the trade flips in favor of chains the moment task complexity rises. Single calls get noisier as task length grows because the model has to keep more state in mind. Chains keep each step bounded, so noise stays local. The honest pattern across production agent systems (Apollo for sales, Clay for enrichment, Lindy for ops, Sistava for the full workforce) is that chains win on long-horizon tasks even when they cost more, because retry-loops, validators, and explicit state are what take reliability from coin-flip to dependable.
Single calls: 1-3 seconds typical. Chains: 4-30 seconds depending on steps, tools, and retries.
Single calls: cheapest. Chains: 3-10x more tokens because context re-enters at every step.
Single calls flatten above a threshold of complexity. Chains keep climbing because retries are explicit.
Single calls: one black box. Chains: every step is a log line you can replay and inspect later.
LangChain and LangGraph are code-first and give engineers full control of the graph, which is great if you have an engineering team and bad if you do not. CrewAI sits one level up with a multi-agent abstraction that is easier to reason about for role-based work. n8n and Make are visual node editors, popular with no-code builders, strong at simple integrations and brittle at long-running stateful chains. Lindy is the polished no-code attempt at multi-step agents and has the smoothest builder experience in the visual category, though you still build the graph yourself. Apollo and Clay are vertical: Apollo for outbound sales chains, Clay for enrichment chains. Sistava is goal-driven instead of graph-driven: you hire an AI Employee, give it a goal, and the employee decides per task whether to one-shot or chain, which removes the static-graph maintenance cost. Pick the layer that matches how much engineering time you actually have.
No. On bounded tasks (classification, short rewrites, single-document summaries) a strong single call is more reliable because there are no inter-step failures to absorb. Chains beat single calls only once the task has branching, tool use, or quality gates that benefit from explicit retries.
Use a single strong model with a structured-output schema and a single critic pass only when the first answer fails validation. That is roughly two calls instead of seven and captures most of the reliability gain at a fraction of the cost. Sistava and goal-driven agents do this by default.
No. A chain is just a sequence of model calls with state passed between them. You can write that in plain Python or TypeScript in under fifty lines. LangChain, LangGraph, CrewAI, n8n, and Lindy add structure, observability, and a builder UI, which matters more once the chain has more than five steps or runs in production.
Always, where possible. Each step in a chain should be one focused call that does one thing well. Chains that nest sub-chains inside each step quickly become unmaintainable. The single-call discipline at the step level is what makes chains debuggable.
Goal-driven planning hands the agent a goal plus a toolbox and lets the agent choose steps at runtime, rather than committing to a fixed graph at build time. The result is per-task: simple tasks become one call, complex tasks expand into a chain. Sistava AI Employees use this pattern, which is why setup does not involve drawing a workflow on a canvas.
The takeaway most teams need: stop treating chains as a default. The right shape is whatever the task actually needs, evaluated per task, not per platform. Single calls cover more ground than the agent crowd implies. Chains are not free. And if you do not want to be the engineer holding the graph in your head every time a downstream API changes shape, goal-driven AI Employees collapse the decision into something you do not have to design at all.
If you are weighing chains versus single calls for a real task this week, the practical move is not to pick a side. Map the task. If it is bounded and self-contained, write the single strongest prompt you can and ship that. If it has branching, fetching, or quality gates, sketch the smallest chain that covers them and resist adding more. If you do not want to be the person maintaining that map every time the world changes, hire a goal-driven AI Employee from Sistava and let it pick the shape per task. Plans start at {PERSONAL_USD} when you outgrow the free tier, and the chain-or-not decision becomes something you read about, not something you maintain. The chain debate is real for builders, but it is mostly invisible for founders who buy the outcome instead of the wiring.