Dedicated service account
One named identity per AI Employee, never a personal user token. Makes revocation atomic.
How-to — — by Mahmoud Zalt
Run scope, audit, and reversibility checks before any AI Employee touches email, CRM, or payments. Use least-privilege keys and per-action approvals.
Email, CRM, and payments are the three systems where an AI mistake can become a real-world incident in minutes. A wrong email goes to a customer and cannot be unsent. A CRM bulk-update can silently overwrite months of notes. A payment action moves real money and triggers refunds, chargebacks, and tax events. None of these are like a sandbox bug you can roll back with a fresh container. That is why the security posture for an AI Employee on those surfaces has to be different from how you treat a calendar or a notes app. The blast radius is wider, the recovery is slower, and the audit trail matters in a way that a casual integration setup does not capture. I run the same six checks every time before any new write scope reaches one of these three systems, and I do them in order because each one builds on the previous.
Least-privilege means the AI Employee gets the smallest credential that still lets it do the job, not the broadest token your provider hands out by default. In practice that means a dedicated service account, scoped OAuth (not a personal token), read-only when read-only is enough, and a separate write credential that only unlocks for approved workflows. The trap is that every vendor SDK example uses an admin token because it is easy to demo. You inherit that default, ship it, and the AI is suddenly an admin on systems where it only needs to send one type of email or update one CRM field. The fix is unglamorous: spend the hour creating the narrow scope, naming it after the AI Employee, and rotating it quarterly. The audit log gets cleaner, blast radius shrinks, and revoking a single employee no longer means rotating a shared key that ten other tools rely on.
One named identity per AI Employee, never a personal user token. Makes revocation atomic.
Pick the narrowest scope (gmail.send vs gmail.full) and refuse the convenient admin defaults.
Two credentials per system: a read key for context, a write key gated behind approval flows.
Restrict the AI to specific objects and fields, never the whole record graph.
Calendar rotation on every credential, with the AI Employee re-onboarded against the new key.
I run the same six checks in order every time, because each one assumes the previous one is true. Skipping any of them has bitten me on past integrations, and the order matters: scope before audit, audit before approval, approval before kill switch, kill switch before rollback. The point is that by the time the AI Employee makes its first real write, every layer that could catch a mistake is already in place. None of the checks are exotic. The real value is doing them deliberately, writing them into the onboarding sequence, and not letting the convenience of one-click connectors skip past them. Below is the exact sequence I follow, whether the system is Gmail, HubSpot, or Stripe, with one note per step on the specific failure it prevents.
The reason I write these six down rather than rely on memory is that each one disappears under deadline pressure. The most common failure mode I see (and have lived) is that someone grants admin scope for a quick test, the test works, and the credential never gets narrowed. Three months later the AI Employee is still running on an admin key, the audit log is full but no one reviews it, and the rollback was never tested. The checklist exists because that pattern is the default unless you fight it on day one.
Beyond the checklist, the harder question is how you actually run these checks in practice when the AI Employee is operating across multiple systems at once. A scoped credential for Gmail is easy in isolation. A scoped credential for Gmail plus HubSpot plus Stripe, with consistent audit logging across all three and one kill switch that revokes everything, takes real architectural work. That is the part most teams underestimate, and it is the part that determines whether the AI Employee is actually safer than a human contractor or just looks safer on paper. The next two sections cover the system-level questions I get asked most often after teams have done the per-credential work and are now thinking about governance at the workforce level.
The trick is to split actions into three risk tiers and treat each tier differently. Tier one is reversible read actions (search inbox, query CRM, list charges) and runs without approval. Tier two is reversible writes (draft an email, create a CRM note, log a refund request) and runs with a soft approval, which is a one-click confirm inside the chat thread. Tier three is irreversible writes (send the email, merge two CRM contacts, execute a charge or refund) and always requires a hard approval, which means a separate confirmation step with the action and its arguments shown explicitly before it fires. Most AI Employee workflows are 80 percent tier one, 15 percent tier two, and 5 percent tier three. If you tune the tiers correctly, you barely notice the approvals on normal work, but the dangerous five percent always pauses for a human.
Search, query, list. Runs without approval, logged for audit. About 80 percent of activity.
Drafts, notes, internal labels. Soft confirm inside the chat thread. About 15 percent of activity.
Send, merge, charge, delete. Hard confirm with full payload shown. About 5 percent of activity.
Founders can self-approve tier 2. Team accounts get tier 2 routed to a designated reviewer.
Assume mistakes happen, because they will. The plan I follow has three steps. Step one is contain the damage: hit the kill switch on the credential so no further writes can fire while you investigate. Step two is execute the rollback you documented in check five, in the exact order written, with no improvisation. Step three is run the postmortem and decide whether the failure was a scope problem (credential too broad), an approval problem (tier was wrong for that action), a model problem (the AI made a bad call inside its allowed scope), or a system problem (an integration silently changed semantics). Most mistakes I have seen are scope or approval issues, not model failures, which is why the checklist front-loads those. Recovery is rarely about the AI itself. It is about whether the surrounding system caught the action in time and gave you clean undo steps.
No. Prompt-level guardrails are not a security boundary. The only durable boundary is the credential scope. If the token can send admin emails, the AI can be persuaded or accidentally caused to send admin emails. Always scope at the credential layer first, then layer prompts on top as a second line.
Scoped credentials. Every other check helps, but if the credential is admin, no audit log or approval flow is enough to fully contain a bad action. Narrow the scope first, then build the rest of the safety net on top of a credential that already cannot do most dangerous things.
Not all six, but at least scope, audit, and a kill switch. Read access still leaks data if compromised, especially for CRM and email, where reads can include customer PII and contract terms. Skip approvals and rollback for read-only, but never skip the credential boundary.
Use the native audit log of the underlying system (Gmail audit, HubSpot activity feed, Stripe events). Filter by the dedicated service account you created in check one. On Sistava, the work journal mirrors the same view at the platform level, so you can review per-employee actions without bouncing through every vendor dashboard.
Anything where a wrong write is permanent and high-blast: production database migrations, tax filings, KYC submissions, banking transfers above a threshold, and contract execution. For those, AI drafts the action and a human always executes. The line is not about trust, it is about reversibility.
If you want to go deeper on the workforce-level governance question (how to think about which AI Employee owns which scope, how to keep responsibilities clean as the team grows, and where the human stays in the loop), the next read is the practical companion to this checklist. It walks through how I structure the team, what each role owns, and the boundaries I set so no single AI Employee accumulates more access than its job needs. Use it once you have shipped the first scoped credential and are thinking about the second and third.
The honest framing for all of this: security for AI Employees is not a special discipline, it is the same least-privilege, audit, and rollback hygiene you would apply to any human contractor with system access, just made explicit because the AI does not have professional intuition to lean on when something looks off. The six checks exist because they are the smallest set that survives contact with real failure modes I have seen on email, CRM, and payment integrations. None of them are theoretical. Each one came from a moment where I wished the previous me had done one more step before the AI Employee got its keys. Run the checks once on the next integration you add, write them into your onboarding playbook, and the AI Employee becomes one of the better-governed actors in your stack, not one of the riskier ones.