Your AI Rollout Didn't Create Inconsistency. It Revealed It.

When AI scales faster than your engineering culture, the codebase tells you the truth.

Black-and-white illustration of a bearded man in a baseball cap with three thick arrows behind his head pointing up, left, and right, suggesting diverging paths and inconsistency.

The PRs were bigger. Review times were flat. And somewhere in week six of the rollout, I started noticing the same problem solved three different ways across three different services. Try/catch blocks blanketing everything in one service. A custom logging wrapper nobody else knew existed in another. Home-built retry logic in a third that duplicated what was sitting two directories over. All technically valid. All completely inconsistent. All generated, at least in part, by the same AI tools we had just given the team.

My first instinct was to fix the AI.

I started thinking about guardrails in Cursor. Tighter prompt guidelines. Review checkpoints before AI-generated code could make it into a PR. We had rolled the tools out without enough constraints, and the codebase was showing it. That felt like a clean diagnosis. The kind where the fix is scoped and technical and doesn't require any uncomfortable conversations.

Then I pulled up the git history.

The inconsistency predated our AI rollout by two years. The try/catch patterns were already diverging before we installed Copilot. The logging wrappers had been proliferating since the last re-org. The retry logic was home-built in multiple services because engineers had been solving the same problem independently, without knowing someone else had already solved it. The shared decision about error handling that we thought we had made in Q2 of the prior year existed in one person's memory and nowhere else.

The AI hadn't created any of this. It had just started moving faster than we had. And when a tool that generates code all day is working on a codebase that never agreed on anything, it scales inconsistency instead of coherence.

I called it a junk drawer with a CI/CD pipeline. The description was accurate. And so was the part I hadn't wanted to admit yet ... we had built the junk drawer ourselves, long before the AI arrived.

What the Industry Is Reporting

This is not a story unique to my team. Engineering teams adopting GitHub Copilot and Cursor without shared architectural standards are reporting the same pattern. More code per day. More inconsistency per sprint. The output is higher, the review burden is higher, and the codebase is drifting faster than it used to because the tool that's supposed to help is working from whatever looked plausible, not from what the team decided months ago in a meeting nobody documented.

The tool is not confused. It is not generating bad code. It is doing exactly what it was designed to do ... find the pattern that looks correct and produce the next token. If your codebase has five patterns for the same problem, the tool treats all five as valid. Because to it, they are. There is no judgment in the model about which approach was the canonical one your team agreed on, because you never made that agreement in a durable form.

AI generates technically valid solutions. It cannot generate coherence that was never there.

What AI adoption does is force the question your engineering culture has been deferring. Teams with strong shared standards see the tools accelerate consistent output. Teams running on informal alignment and tribal knowledge see the inconsistency they've been accumulating suddenly visible everywhere at once, at a pace no code review cadence can keep up with. The tool doesn't create the inconsistency. It reveals it at scale, all at once, in a way that's hard to attribute to any single PR or any single engineer.

The Misdiagnosis That Almost Cost Us Weeks

I spent nearly a week convinced we had an AI tooling problem. That felt like a reasonable read of the evidence at the time. More AI-generated code, more inconsistency in the output ... the logical conclusion was that the tool needed more constraints.

The problem with that diagnosis is that it would have let me off the hook too easily. If the issue was the AI config, the fix was technical. Adjust the settings, tighten the guardrails, maybe roll back some of the autonomy we had given the tool. Scoped, clean, no ownership required beyond the platform side of the org.

What the git history showed was a human alignment problem wearing an AI costume.

We had made decisions in meetings and lost them because we never wrote them down. We had solved problems multiple times across different services because context didn't travel the way we assumed it did. We had tribal knowledge that was actually siloed knowledge ... one team's understanding of the right approach that had never been formalized into something another team could find and follow. The AI had made all of it visible faster than any code review cadence would have caught it.

The fix had to happen at the human layer before we touched anything else.

What We Actually Did

We fixed it before opening a single AI config file.

The first move was documentation, but not documentation as a feel-good exercise. Documentation as decision-making. We went through the patterns that should be canonical and made the choices we had been deferring. Error handling. Logging. State management. Retry logic. For each one, we had the final argument, picked a winner, and wrote it down. Not in a Confluence page buried under three levels of navigation. In markdown files that lived in the repo alongside the code they governed, where engineers would actually encounter them.

That documentation became the source of truth. Not for the team's memory but for the tools.

Once the canonical patterns were written, we built on top of them. Lint rules that enforced the decisions automatically. Architectural tests that would catch drift before it made it past CI. Then we built AI workflows trained on our documented standards, so the tool was generating from what we had decided rather than from whatever looked plausible in the broader training data.

You cannot train AI on standards you haven't written. You cannot enforce standards you haven't decided.

The sequence matters. You cannot train AI on standards you haven't written. You cannot enforce standards you haven't decided. And you cannot decide standards during a crisis by skipping the argument that produces actual alignment. We did all three steps in order, and it made the last step work in a way that jumping straight to guardrails never would have.

The codebase stabilized within weeks.

What the Junk Drawer Was Actually Telling Us

What I keep coming back to is how long it had been there before the AI made it undeniable.

We thought we had consistent engineering practices. We had consistent intentions. The engineers were strong. The individual code quality was fine. But the practices were mostly in our heads, unwritten and unverifiable, passed through conversations that didn't always reach everyone who needed them. The system looked coherent because nobody was moving fast enough to expose the gaps. A slow, carefully reviewed codebase can sustain informal alignment for years before the seams show.

The AI moved fast enough.

The AI rollout didn't break anything that was working. It found what had been quietly not working. The debt we had accumulated in unwritten decisions and informal context was always there. The moment we put a tool on the team that accelerates output by two or three times, that debt came due all at once. Every gap in our shared understanding surfaced simultaneously in the form of valid code that didn't belong together.

This is why the instinct to blame the tool is so easy and so wrong. The tool is honest. It works from what exists. If what exists is a junk drawer, the tool will generate from a junk drawer with remarkable efficiency. You cannot fix that by adjusting the tool's behavior. You can only fix it by becoming the kind of team that a good tool can work from.

For leaders looking at inconsistent AI output right now, the question worth sitting with is not how to constrain the tool. The question is what the output is telling you about the standards you thought you had.

The tool is honest. It works from what exists. If what exists is a junk drawer, the tool will generate from a junk drawer with remarkable efficiency.

AI doesn't create chaos in engineering teams. It inherits theirs.