Append-only log
Every meaningful action lands in a single shared stream that all agents in the team can read.
How-to — — by Mahmoud Zalt
Stop conflicting AI agent actions with a shared journal, a leader approval gate, and tenant-isolated state. Five patterns that hold under real load.
AI agents produce conflicting actions for the same reason a team of humans does: nobody is reading the same state at the same time, decisions get made on stale snapshots, and two workers grab the same task before either finishes. With LLM agents this gets worse fast because each agent is non-deterministic, runs in its own loop, and is usually fed only a slice of context. A research agent and a sales agent that both touch the same CRM lead in the same minute will happily send two different emails, update the same status twice, or assign the lead to two different reps. The root cause is almost never the model. It is missing shared state, missing serialization on risky writes, and missing a recovery path when something half-finishes. Get those three right and the conflict rate drops by a large factor before you touch a single prompt.
A shared work journal is an append-only log every agent writes to after each meaningful step and reads from before starting one. The structure is simple: timestamp, agent name, action taken, target object, outcome. Before an agent acts on a record, it scans the journal for the last few entries touching the same target. If another agent already handled it inside a short window, the current agent skips, defers, or escalates instead of duplicating. The journal is not a database lock and it is not a queue. It is a coordination surface that costs almost nothing to write and prevents the most common collision pattern in multi-agent systems. CrewAI, LangGraph, and n8n can all be coaxed into shaping something similar with custom storage, but it is on you to design the schema, the read pattern, and the retention. Sistava ships it as a first-class concept inside every AI Employee.
Every meaningful action lands in a single shared stream that all agents in the team can read.
Each agent scans recent entries for the same target before acting, so duplicate sends and double-updates stay rare.
The journal is not a database lock. It is a coordination surface that is cheap to write and cheap to query.
When something goes wrong, the journal is your audit trail and the seed for any rollback or undo.
New agents joining the team get hours of context in seconds because the journal carries the team memory.
A leader approval gate is a small but critical pattern: agents draft, a team leader approves, then the action lands. The leader can be another agent dedicated to the role, or a human-in-the-loop step for high-stakes actions. Either way, the gate serializes the writes that hurt most when duplicated: sending email, posting to public channels, charging cards, mutating shared records. Underneath, the gate is a queue plus a small policy that decides what needs approval and what flows through. The policy can be as simple as a list of action types, or as elaborate as a per-tenant rule set. Without a gate, every agent thinks its action is the right one and ships it. With a gate, conflicts collapse into a single serialized decision, and the recovery story is straightforward because every approved write is logged.
Most teams that hit the conflicting-action wall on LangChain or CrewAI try to fix it with bigger prompts first. That almost never works. The agents are not confused. They are uncoordinated. The fix is structural: a shared journal, a leader gate, and an honest retry policy that knows the difference between a transient failure and a duplicate request. Once those three are in place, the prompts can stay small and the model can stay cheap, which is the opposite of what most teams expect when they start. The next part is the embedded team you can hire instead of building this from scratch.
If you do go the build-it-yourself route, the next thing that bites is state isolation across customers. The journal and the gate solve coordination inside a single tenant. They do nothing about one tenant accidentally reading or writing another tenant's data, which is the kind of failure that turns into a public incident in a week. The pattern below is the one we landed on after watching the alternatives fail in production, and it is the third leg of the stool that keeps multi-agent systems honest.
Tenant-isolated state means every read, every write, every queue, every memory store is scoped to the tenant the agent is acting for. No shared caches across tenants. No global memory pool. No agent loop that pulls work for tenant A and ends up writing for tenant B because a context variable leaked. This sounds obvious but it is the single most common bug in homegrown multi-agent setups, because the natural way to wire LangGraph or CrewAI is to share storage and pass a tenant ID around as a string. One missed pass and the data crosses. The fix is to make tenant-scoped storage the only kind of storage the agent can touch, enforced at the framework level, not at the prompt level. Sistava enforces this in the platform, so an AI Employee cannot read across tenants even if a prompt tries to.
Every database read, cache lookup, and memory fetch carries the tenant ID as a non-optional argument.
Worker loops are partitioned per tenant so a stuck agent on one tenant cannot starve another.
A bad write or runaway agent affects one tenant only. Recovery is local and quick.
Tenant-scoped logs and journals make incident review a search instead of a forensic exercise.
Recovery is the part most teams skip until they cannot. A good path has three properties. First, every risky action is idempotent at the target: a re-send produces the same outcome as the original, not a second email. Second, every failure has a category: transient (retry), policy (escalate), duplicate (drop), unknown (queue for human). Third, the recovery agent reads the journal and the gate log to decide which category applies, instead of guessing from a single error message. CrewAI and n8n give you retry primitives but leave the categorization to you. LangGraph lets you wire a recovery state but you build the policy. Sistava ships the policy and the categorization by default, with the journal and the gate already feeding it. The result is that most failures self-heal inside one or two ticks and the rest land in a clear human queue.
Almost always the architecture. The model is non-deterministic but the conflicts come from missing shared state, no serialization on risky writes, and no recovery path. A smaller model with a shared journal and a leader gate beats a bigger model without them every time.
A queue alone helps with throughput, not coordination. You still need the journal so agents can see what already happened, and you still need the approval gate so risky writes are decided once. The queue sits underneath both as the serializer.
A database lock blocks writes at the row level for the duration of a transaction. The journal and gate work at the action level over a longer window: an agent reads the journal, sees that another agent already emailed the lead in the last ten minutes, and skips, even though no row is locked. It is coordination, not contention.
The shared work journal, the leader approval gate, the tenant-isolated state primitives, and the categorized recovery policy. All four are first-class concepts inside every AI Employee. On CrewAI, LangGraph, or n8n you can reach the same shape, but it is on you to design, build, and harden each one over several weeks of work.
Start with the journal. Pick one shared log, write every meaningful action to it, and make agents read before acting. That alone removes most duplicate sends. Then add the leader gate for the top three risky action types. Tenant isolation and recovery come third, but they are non-negotiable before any real customer load.
If you want to go one level deeper on how a coordinated team of AI agents actually operates across the tools they need to touch (CRM, inbox, calendar, billing), the next read is the practical companion to this article. It covers the orchestration layer, the data flow between roles, and the integration choices that keep the team honest when the workload spikes. Use it as the architecture map after you have stopped the conflicts.
The honest summary of this whole problem: conflicting AI agent actions are a coordination failure, not an intelligence failure. The three patterns that fix it (a shared journal, a leader approval gate, tenant-isolated state) plus a categorized recovery path are not exotic. They are old ideas borrowed from distributed systems and applied to LLM agents. You can build them yourself on CrewAI, LangGraph, or n8n if you have the engineering time and the patience to harden each piece against real load. Or you can pick a platform that ships them by default. Sistava is built around exactly that shape, with plans from {PERSONAL_USD} for solo founders and {AGENCY_USD} for teams running multi-agent workloads at scale. The right pick depends on whether you want to own the coordination layer or rent it. Either way, do not try to fix conflicts with bigger prompts. Fix them with structure.