You Used to Know Good Code | Jono Herrington

There is a specific sensation that happens when you open a pull request with fourteen hundred lines of diff. You register the size in the first few seconds. Your brain does a quiet calculation about what it would actually take to review this ... tracing the data model down through the service layers, checking whether the error handling covers the edge cases that show up in production rather than just the ones in the tests, evaluating whether the pattern decision buried in line six hundred is going to hold under load six months from now. And before you've done any of that, you feel the pull toward Approve.

I have felt that pull. More than once, I didn't fight it.

That pull used to be rarer. Not because I had more discipline, but because the PRs were smaller. When a developer scoped a PR to a manageable slice of work, the review was tractable. You could hold the whole thing in working memory. You could read it rather than skim it. The difference between reading a pull request and skimming one is the difference between evaluating the thinking and approving the output. They look identical in the audit trail. They are not the same thing.

The instinct I'm talking about runs deeper than what any linter catches. Whether the error handling covers the real edge cases, whether the data model will hold under production load, whether the approach behind the code is sound or whether it's generated-looking logic stitched together in a way that will work right up until it doesn't. That instinct comes from proximity. From having been wrong about these things enough times to recognize the smell before the failure.

When that instinct drifts, you don't lose it all at once. You just start finding more approaches reasonable that shouldn't be.

What the Dashboard Showed

I built a dashboard I call game tape. Engineering is a team sport and you can't coach what you can't see. The dashboard tracks PR cycle times, review patterns, where work slows down and where it moves. Not a performance proxy. A window into how the system is actually functioning underneath the sprint velocity numbers.

When I looked at the data, one number stopped me. Pull requests with over a thousand lines were taking an average of five minutes longer to review than the 250-line PRs. Five minutes sounds like a rounding error. It isn't. Five minutes is the difference between a reviewer who actually read the PR and a reviewer who got the gist of it. It's the gap between the review that catches the thing buried in the middle and the review that approves the shape of the work without evaluating the thinking behind it.

And the same one or two engineers were handling almost all of the large reviews.

The reason PRs got that large was AI. Not as a failure of the tools but as a predictable consequence of how people use them. AI makes output faster. Faster output means more code per session. More code per session means larger PRs when the culture hasn't set a standard for what a reviewable unit of work actually is. The tooling scaled the volume. The review process didn't scale with it.

PRs got bigger because AI made them bigger. Reviews didn't scale. And the leaders who should have been the quality backstop hadn't touched the code in long enough that they couldn't tell a sound pattern from a generated one.

The problem compounds when you add the leadership layer. A PR that's too large for a careful review is still just a workflow problem when senior engineers are reviewing it. It becomes a quality problem when the leaders who should be the backstop have drifted far enough from the codebase that they've lost the reference point for what good looks like in their own system.

Losing the Reference Point

Here is what technical drift actually feels like from the inside. It doesn't arrive as a discrete moment where you recognize your judgment is gone. It arrives as a slow softening. A PR that would have raised a flag six months ago starts to look reasonable. An approach that should have a question attached to it starts to look like a sensible choice. When every approach starts to look reasonable, that's the signal that the reference point has gone soft, not that the team's judgment has gotten sharper.

I watch for this in myself, looking for the specific signals that tell me the thread has gone slack ... the 1:1s that quietly become status reports because I've started asking "what's going on" instead of "walk me through the approach," the technical explanation I'm nodding through without fully following, the stretches where I go too long without being genuinely surprised by something in the codebase. Any of those and I know I've started to drift in ways that are slow to show up and fast to compound.

The tricky part about drift is that it doesn't announce itself. And the leaders who have drifted the furthest are usually the last to know it, because they stopped having the conversations that would surface the gap. They're not in the PRs. They're not in the logs. They're in the strategy conversations where the questions are about direction rather than implementation, and everything still looks like it's moving.

Setting the Standard

When I recognized the pull-toward-approve pattern in myself, I didn't add a process. I changed the standard.

PRs on my team need to be scoped to a vertical slice ... small enough that a reviewer can hold the whole thing in working memory and actually evaluate the thinking, not just approve the output. When a PR comes in too large, I've been known to close it and work with the developer to break it down. Not as a punitive move. As a forcing function for what a reviewable unit of work actually is.

Having that conversation requires knowing the system. You can't give useful feedback on how to slice a PR if you don't know the codebase well enough to know what a good slice looks like. That context doesn't stay current on its own. You have to read the PRs, get into the logs, let yourself be surprised by the codebase sometimes. Not because your job is to approve every line. Because the gap between what you think you know about the system and what's actually happening is where drift gets comfortable and starts making decisions on your behalf.

I still read pull requests. Not to check syntax. To understand how the team is thinking, whether the patterns we aligned on are holding under real delivery pressure, where connections are being made across the codebase and where they're not. That proximity is what keeps the reference point calibrated.

The gap between what you think you know about the system and what's actually happening is where drift gets comfortable and starts making decisions on your behalf.

The Retreat to Strategy

There is a particular move I've watched happen, and one I've caught myself making. When technical leaders lose confidence in their technical judgment, they retreat upward. Strategy. Roadmap. Stakeholder alignment. The conversations where the currency is narrative and the questions are about direction rather than implementation. It looks like leadership maturity. Sometimes it is. Often it's a move away from the part of the job that started to feel unfamiliar.

The tells are specific. A leader who used to have opinions in architecture reviews starts agreeing with the room. A leader who used to come to a PR with a real read starts asking the engineer whether it looks good to them. A leader who used to know the system well enough to spot a smell before it became a failure starts routing everything through the people who still know the code.

Technical credibility doesn't disappear in a day. It disappears through accumulated small moves, each of which looked reasonable at the time. You skip the PR review because the sprint is moving. You defer to the engineer because they're closer to it. You let the strategy conversation run long and don't make it to the standup.

The engineers notice before the leader does. They always do. They can tell when the review is a skim. They can tell when the question in the 1:1 is covering for a gap in context.

Most leaders would rather retreat to strategy, where nobody can prove them wrong with a stack trace.

The codebase keeps score anyway.