If your best engineer disappears for 10 days and the system wobbles, your org was not resilient. It was carried.
I am not writing that as a thought experiment. I am writing it from a failure we lived through.
Our single load-bearing engineer left, and every project in flight stalled for over a week ... not because the team lacked talent, but because the system context left with one person.
That was one of the clearest leadership failures I have ever been part of, and it was not subtle once it happened. I let a system form where speed lived in one person instead of the team, so engineers across regions were blocked waiting on context that no longer existed in the system because it had been living in one brain.
The Pattern Leaders Keep Complimenting
Most teams do not call this risk. They call it excellence: "She is our best engineer," "He just moves faster than everyone else," and "If anything gets weird, we bring them in."
From the outside, that sounds like high performance. From the inside, it is a single load-bearing wall with a nice performance-review label on it.
We had new engineers asking the same person questions 4 times a day. It looked like mentorship, but it was undocumented dependency. I even told myself we were being efficient. We were not efficient ... we were borrowing stability from one human and calling it process.
If you have read The Phoenix Project, you already know this pattern. It is the Brent Effect ... one indispensable person holding too many critical paths while leadership confuses heroic recovery for system health. Different decade, same movie, slightly better dashboards.
The Leadership Bench Test
You do not need a six-month initiative to know whether this is true on your team. Run this test on one critical domain this week.
- Coverage check: Can 2 other engineers explain and safely modify this area without the usual owner?
- Recovery check: Can your on-call rotation recover core flows here without escalating to the same name every time?
- Transfer check: Can a new engineer ship a meaningful change in 30 days using your existing docs and standards?
If any answer is no, you do not have a people problem first. You have a system design problem in your leadership model.
What We Changed After It Broke
We stopped pretending heroics were a strategy. We required context in pull requests ... not just what changed, but why, what was considered, and what would break if assumptions were wrong. We made runbooks non-negotiable for critical systems and treated stale runbooks like production risk, rotated ownership on load-bearing services even when the original owner could do it faster, and wrote ADRs so major decisions stopped living in hallway conversations. Turns out "ask Brent" is not a scalable architecture pattern.
Did this slow us down in the short term? Yes. Did it make us dramatically more resilient 2 quarters later? Also yes. That is the trade most teams avoid until the bill arrives.
If your bench strategy is "pray nobody takes PTO in Q4," that is not strategy. That is crossed fingers in a leadership costume.
The Rule
Your best engineer should be an accelerator, not a life-support machine.
If one person can pause your roadmap, your first priority this quarter is not shipping faster ... it is distributing context.
The AI Leadership Audit maps this directly in the operating model chapter most teams skip.