What is Agentic Engineering?

Agentic Engineering combines three capabilities: push-to-main delivery with strong automated checks, production agents that monitor and respond to issues, and multi-model challenge where a second model reviews plans before execution. It does not replace your team. It gives your engineers a tighter operating system so they can ship with less rework.

Is this a fit for our team right now?

Usually yes if you already ship software and want to reduce cycle time without lowering quality. We work best with teams that can commit engineering time, adopt operating conventions, and measure outcomes. If your current bottleneck is product strategy or hiring capacity, we will say that early.

How long does implementation take?

Most teams start seeing workflow changes in the first few weeks. Full rollout depends on your architecture, compliance needs, and how many systems we need to integrate. We define phases up front so you can see what lands first and what comes next.

Do we need to replace our current infrastructure?

No. We usually layer Agentic Engineering practices onto your existing stack. We keep what is working, tighten the weak points, and avoid unnecessary migration risk. If a core piece is holding you back, we will recommend a staged change rather than a rewrite.

How do you handle risk, safety, and compliance?

We treat these as design constraints, not cleanup work. That includes scoped access for agents, audit trails, environment controls, and review gates around sensitive actions. We align controls to your current requirements and roadmap so delivery speed does not come at the expense of governance.

How do we measure ROI?

We define success metrics before rollout: lead time, deployment frequency, change failure rate, incident recovery time, and engineering hours spent on repeat work. Then we track baseline versus post-implementation so you can see what improved and what still needs work.

Agentic AI Evolution Roadmap: From GenAI Chat to Autonomous Agents

Agentic AI Evolution Roadmap chalkboard diagram showing 5 stages: Stage 0 GenAI Chat, Stage 1 Human + Agent, Stage 2 Knowledge Architecture, Stage 3 Autonomous Agent, Stage 4 Multi-Agent Orchestration — with foundation layer of enterprise agreements, multiple models, and team agreement on AI use — The 5-stage evolution from GenAI chat to autonomous multi-agent orchestration. Groundwork, not technology.

The Thesis

AI agents are coming to every company. Most won't be ready — not because of technology, but because they skipped the groundwork.

Almost every SaaS tool is racing to add AI capabilities. But the market is waking up to something bigger: the era of SaaS tools charging a fortune is ending. Companies can now build what they need at a fraction of the cost.

Chamath put it, the big question is whether any SaaS company's growth will be overtaken by a much cheaper AI-developed solution. The answer increasingly is yes.

"Software is eating the world. Now AI is eating software." — Marc Andreessen

"AI agents will use enterprise software as tools" — Jensen Huang

That means the real capability isn't buying another tool — it's building the internal muscle to work with AI effectively. AI agents are only as good as the skills they run, the context they operate in, and the feedback loops that refine both. Those things are built by humans, stage by stage. There are no shortcuts.

Not every company needs Stage 4. Every company needs Stages 1 and 2.

GenAI vs. Agentic AI

The underlying models are the same. The difference is tool access and autonomy.

GenAI: you give it input, it gives you output. It writes your email, summarizes your doc, generates code. It's powerful. Millions of people get real value from it daily. But it lives in a box. You are the hands and feet. You copy the output, switch to the right tool, and apply it yourself.

An agent has hands and feet. It can act on your environment. It reads your files, runs commands, hits APIs, observes the results, adjusts, and keeps going. You can ask it the same question ("why is this test failing?") but the difference is what happens next. Without an agent, you copy the error, paste it into a chat, get a guess based on the error message alone, then go back and copy more context (the test file, the function, the recent diff) and go back and forth narrowing it down. You're the one assembling the picture. With an agent, it reads the error, checks the git log, looks at what changed recently, reads the diff, and comes back with: "this test is failing because yesterday's PR changed the response schema but didn't update the assertion." It assembled the context itself.

The Stage 0 → 1 jump isn't smarter AI. It's giving AI the ability to act.

The Stages

Stage 0: GenAI Chat

Everyone knows ChatGPT, Claude, Gemini, Grok. They use it for email drafting, summarizing docs, brainstorming, explaining concepts, generating code snippets. This is real, daily value. It's not a toy.

But the human is the integration layer. You copy from the chat, switch to your IDE or terminal or browser, paste, adjust, apply.

Not agentic. But not worthless. This is the foundation of AI literacy that makes everything else possible.

Stage 1: Human + Agent

Every question gathers context. That's where it gets interesting.

An engineer installs Claude Code, Cursor, Copilot, or similar. The agent runs on their machine, uses their credentials, and can act on their environment: read files, run commands, execute code, interact with APIs.

The interaction is a loop, not a one-shot. You ask a question. The agent gathers context to answer it: reads files, checks git history, looks at configs. That answer surfaces something you didn't know, so you ask a follow-up. The agent gathers more. Each question deepens the picture for both of you. By the time you get to "ok do it," the agent has assembled a rich understanding of the problem that neither of you had at the start.

The adoption arc is natural: use common skills first, refine them, then build new ones. Skills are reusable instruction files (a SKILL.md) that teach an agent how to perform a specific task. They follow an open standard that works across Claude Code, Cursor, Gemini CLI, Codex CLI, and other agents. Nobody starts from scratch. You pick up a skill that already exists. You run it. It mostly works. When it doesn't, you fix it for your context and push the improvement.

One example is the Compound Engineering Plugin. You can go watch the video about it here.

The real value emerges when the team dynamic kicks in.

The custom skill feedback loop

Engineer A creates a new skill that is very specific to your organization. It works for their task in a specific context.
Engineer B picks it up, runs it against a different context.
Different context exposes edge cases or opportunities to improve it.
Engineer B fixes the bug or improves it and pushes the change.
Engineer C refines the output format for their workflow.
Eventually, someone encounters a gap. No skill exists for it. They build a new one.
The cycle repeats: use → refine → build.

The craft feedback loop

It's not just the skills that get refined. The team gets better at operating agents. One engineer uses /ce:plan to scope work. A teammate discovers /ce:deepen-plan works better for complex features. Someone else figures out that reviewing a plan across multiple models before writing code catches design-level blind spots. CCL. This knowledge (when to use which command, which plugin, which workflow) compounds across the team and keeps evolving as the tools evolve.

The agent is the execution layer. Humans are the quality layer. Skills get hardened and operating knowledge deepens through distributed use, not through formal training programs. Every teammate who works with agents in a different context is expanding the team's collective understanding of how to use them effectively.

The access model is simple:

👤 Engineer on their machine
    ↕
🤖 Agent (Claude Code / Cursor / Copilot etc)
    ↕
Engineer's own credentials — agent sees what they see
Skills shared via git repo — no new infrastructure required

You don't need to mandate this. If the tooling is right and skills are shareable, cross-pollination happens organically. The manager's job isn't to build a program. It's to not block it.

Stage 2: Knowledge Architecture

The shift from organic sharing to intentional system.

The team formalizes how context gets created, structured, and maintained, so that both humans and agents can operate with full context from day one.

This is not "writing documentation." This is externalizing institutional knowledge: the why behind every decision, the dependencies between systems, the things that will break if you change something without understanding the history. This knowledge lives in people's heads. Stage 2 gets it into the repo, structured and discoverable, so it works identically for a new hire and a new agent session.

What a Stage 2 repo looks like

This looks like a lot of documentation. The good part about working with AI is you're not hand-writing all of this. You use AI to generate it. But you carefully review it. Your job is to catch the mistakes AI makes and help it not make the same mistake again. That's the same feedback loop from Stage 1 applied to knowledge.

docs/
├── brainstorms/          ← Raw thinking. "We might need X. Here's what requirements could look like."
│   ├── 2026-03-18-ai-hand-coach-requirements.md
│   └── 2026-03-23-pwa-install-guide-requirements.md
├── ideation/             ← Shaped thinking. Structured enough to evaluate, not yet committed.
│   └── 2026-03-27-security-posture-ideation.md
├── plans/                ← Committed decisions. Structured frontmatter, linked to PRs.
│   └── 2026-03-28-001-fix-idor-hand-access-control-plan.md
├── ARCHITECTURE.md       ← How the system fits together
└── DEPLOYMENT_ARCHITECTURE.md
CLAUDE.md                 ← Agent entry point. Rules of engagement.

CLAUDE.md is a markdown file you add to your project root that the agent reads at the start of every session. It sets coding standards, architecture decisions, preferred libraries, review checklists. Everything the agent needs to know to contribute effectively without asking basic questions. Think of it as an onboarding document written for an AI engineer joining your team.

Plan docs have structured context: scope, date, findings, change history:

# Security Audit Findings

**Date:** 2026-02-21
**Scope:** Full repository
**Total Findings:** 39
**Last Updated:** 2026-02-21 (C1 accepted risk, H1→M22 downgrade, Appendix A corrections, M21 added)

An agent dropping into this repo for the first time can:

Read CLAUDE.md → understands the rules of engagement
Read ARCHITECTURE.md → understands how the system fits together
Browse docs/plans/ → sees the history of decisions and their rationale
Check brainstorms/ → understands what's being considered but not committed

The repo is the onboarding. No two-hour walkthrough needed.

The self-reinforcing loop

Plan doc → PR → CLAUDE.md references the plans folder → next agent reads the plans → understands patterns → produces work following the same conventions → creates its own plan doc. Every plan doc makes the next agent interaction better.

What changes at Stage 2 vs Stage 1

Stage 1: good engineers write decent docs because they're good engineers
Stage 2: the team agrees on a structure (the frontmatter schema, the folder conventions, the naming patterns) and that agreement is what makes it compoundable

Context contracts: team agreements about how context is created and maintained so it's consumable by both humans and agents. The team says: "this is how we explain work in this repo, for everyone."

Multi-model review at the plan stage

The plan docs aren't just for humans and agents. They're the artifact you run through multiple frontier models for review before you write a single line of code. Every model has biases shaped by how it was built. Cross-checking a plan across Claude, Codex, and Gemini catches blind spots that any single model would miss. The knowledge architecture enables this: without structured plan docs, there's nothing to review.

Stage 3: Autonomous Agent

The agent operates off the machine. Human supervises.

The agent gets deployed on its own infrastructure: platforms like OpenClaw, A2A framework, or custom setups. It's no longer tethered to a developer's laptop session.

It has access to the same tools as Stage 1 (GitHub, AWS, CI/CD, monitoring, error logs, etc) but no human is actively driving. It picks up tasks, executes against the codebase, creates PRs, flags issues, connects errors to root causes.

This only works because of Stages 1 and 2:

Stage 1 gave it battle-tested skills refined through human feedback loops
Stage 2 gave it a self-describing codebase where the context is discoverable

What this looks like in practice

We run this today. As of March 2026, our production app ships about 30 deploys a month with a two-person team and two AI agents. The humans push to main. The agents open PRs. Every agent PR gets human review before merge. Average cycle time from PR open to production: 13.5 minutes.

Why starting with the tool fails

Most think the tool is the solution. They're buying an autonomous agent when the real work is building the skills, craft, and context that make an agent effective.

They point it at a repo with no CLAUDE.md, no architecture docs, no plan history, no structured context. The agent has tool access but zero understanding. It's like hiring a contractor, handing them repo access, and saying "figure it out." Compare that to handing them a repo where every decision is documented and the context is self-describing.

What changes at the access level

🤖 Agent on its own infrastructure
    ↕
🔧 GitHub · AWS · CI/CD · Monitoring · Slack
    ↕
Agent's own credentials — scoped to what it needs (least privilege)
Human reviews output — PRs, reports, escalations

For smaller orgs, this may be the endgame. A well-scoped agent with good permissions, running autonomously, human reviewing output. Not every company needs multi-agent orchestration.

Stage 4: Multi-Agent Orchestration (Enterprise)

Multiple agents coordinate across organizational boundaries.

This stage exists because enterprise workflows cross permission boundaries that shouldn't be collapsed.

Why does HR have different access than Infrastructure? Why can't Finance modify production? Because segregation of duties exists for compliance, security, and accountability reasons. The same logic applies to agents.

When a workflow requires crossing permission boundaries where no single identity should have full access, that's Stage 4.

New hire onboarding:

HR agent triggers provisioning (has access to employee PII)
IT agent creates accounts and sets up machines (has access to identity systems)
Finance agent allocates budget and approves spend (has access to financial systems)
Security agent sets access policies based on role (has access to IAM)

Four domains. Four permission scopes. Four compliance boundaries. A single agent with all those permissions is a security incident waiting to happen.

Multi-agent isn't "more agents = more advanced." It's that the organizational structure demands it.

What's required: service identities for each agent scoped to their domain, an agent registry, cross-agent communication protocols (A2A protocol), full audit trail on every handoff, governance policies for what decisions require human approval, and an accountability model for autonomous decisions.

Who needs Stage 4: Enterprises with segregation of duties. Companies large enough that cross-team workflows involve different security domains. Not a 20-person startup.

The Skip Problem

Most start with the tool, not the team. The vendor sells the agent. Nobody sells the groundwork that makes the agent useful.

Without Stages 1 and 2, agents operate blind.

No battle-tested skills, no operating craft, no externalized institutional knowledge. The team hasn't learned how to direct agents or developed the judgment for which tools and workflows work in which situations. No architecture docs, no decision history, no dependency mapping between components, systems, and infrastructure. The agent doesn't know that changing the auth service affects the billing pipeline, or that the queue config is tied to three downstream consumers. It doesn't know why something was built a certain way at a certain time, or what will break where.

The result: agents make changes that look correct but silently break things downstream. Changes work in isolation but violate patterns and break dependencies nobody told the agent existed. The codebase degrades in ways that compound over time.

Both feedback loops are non-skippable. At Stage 1, skills get hardened and the team builds operating craft. At Stage 2, the team externalizes institutional knowledge and structures the codebase so agents can discover context. At Stage 3, the deployed agent runs battle-tested skills against a self-describing codebase, configured by humans who know how to direct agents effectively. Without that history, Stage 3 is just an expensive experiment.

Foundation Layer

1. Enterprise Agreement with Model Providers

Data isn't used for training. The legal side is covered. BAAs, DPAs, data residency as required. As of March 2026, the frontier models are Claude Opus 4.6, Codex GPT-5.4, and Gemini 3.1 Pro. You should have access to multiple, not just one.

2. Use Multiple Models

Every model has biases shaped by how it was built. Cross-check plans and approaches across multiple frontier models before you write code. The cheapest bug to fix is the one you catch at the design stage. The plan docs from Stage 2 are the artifact you run through multiple models for review.

3. Team Agreement on AI Use

What goes into prompts, what doesn't. How AI-generated output gets reviewed. This doesn't need to be heavy. It needs to exist.

Where the Human Sits

Stage	Human Role	Agent Role	Relationship
Stage 0	Does the work	Answers questions	You ask, it talks
Stage 1	Drives and refines	Executes on the machine	Pair programming
Stage 2	Builds the system	Operates within the system	Way-of-working shift
Stage 3	Reviews output	Operates independently	Supervision
Stage 4	Sets governance	Agents coordinate	Policy-driven

Not every company needs Stage 4. Every company needs Stages 1 and 2.

Sanjeev Nithyanandam is the founder of Accelra Technologies, an agentic engineering consultancy in Vancouver. Follow the journey at Ship With Sanjeev.

Agentic AI Evolution Roadmap