What the Claude Code leak can teach us about automation

On 31 March, Anthropic accidentally shipped roughly half a million lines of Claude Code's source code to the world. A source-map file got bundled into an npm release, pointed at a zip of the original TypeScript, and within hours, the internet was mirroring it across GitHub faster than the DMCA takedowns could land. Boris Cherny (the guy who actually built Claude Code) went on X and said:

"In this case, there was a manual deploy step that should have been better automated."

There is a very obvious, very lazy reading of this: “haha, even the AI people can't automate their own deploys,” and then there's what I think is actually interesting, which is: of course, there was a manual step. There's a manual step in every pipeline that matters, including Claude Code's, and the reason is not laziness; it's that somebody, somewhere, probably looked at the trade-off and decided that on this particular edge of this particular system, a slow, error-prone human was still the less bad option than a fast, confident machine.

Even Anthropic doesn’t automate everything

Anthropic is probably the company with the strongest commercial incentive on Earth to automate software delivery. Claude Code is roughly fourteen months old as a product and already doing $2.5 billion of run-rate revenue. Cherny has said publicly that Claude Code writes nearly 100% of its own code. If anyone in the industry was going to have a pipeline with zero human touchpoints, surely it would be them?

So, have they got tech debt in their delivery pipeline? In a product that is barely out of its first birthday? At a company that is three years old? Running on modern infrastructure built by people who could, in theory (and according to the media), have AI write the automation for them over a long weekend?

The question is, “why?” and the answer is not "they hadn't got to it yet." The answer is that, at some point, an engineer, likely tired after an incident, made a call that looked like this: “the automation we'd need to write here is complicated enough, and the edge cases are weird enough, and the failure modes are subtle enough, that I am more afraid of a bad piece of automation silently doing the wrong thing than I am of a human occasionally forgetting a step.” And so they kept the human in the loop. Deterministic, error-prone, understandable - you can easily ask the human what they did and why.

That is not a failure of engineering. The job has always been to decide whether determinism-with-human-fallibility or automation-with-machine-confidence gives you the better failure surface. Most of the time, you automate. Sometimes you don't.

The Claude Code leak is the "sometimes you don't" case where the call turned out to be wrong in hindsight — or rather, turned out to be right until the day it wasn't, which is how these things always go.

For me, the fix is now better automation for that specific step, not "automate everything" as a pointless slogan.

When human error causes catastrophic failure: Don’t replace the humans, upgrade them

If you've never read the report on the Tenerife air disaster, it's worth your time. In 1977, two 747s collided on a runway in the Canary Islands and 583 people died. It's still the worst accident in aviation history. If you come to it expecting a simple cause, you will find one: the root cause is almost embarrassingly banal — the KLM captain started his takeoff roll without clearance.

Don't take off without clearance. Lesson learned.

Except that's not the lesson at all, because the interesting question isn't what the captain did, it's why a man with more than ten thousand flight hours, who was literally KLM's chief 747 instructor — the guy in the airline's print adverts — did it. And the answer is dozens of edge cases stacking up on each other in a way that no individual factor predicts:

A terrorist bomb at the intended destination had diverted both planes to a regional airport that couldn't really handle 747s.
Fog rolled in so thick that the tower couldn't see the runway and the two aircraft couldn't see each other.
Dutch duty-hour rules had just been tightened, so the KLM crew was under real legal pressure to get wheels-up before a cutoff.
Radio transmissions were stepping on each other — two people keyed their mics at the same moment and what the captain heard was garbled into something that sounded like clearance.
The first officer and flight engineer both had doubts but didn't push back hard, partly because the captain was a celebrity inside the airline — and in the first officer's case, partly because the captain was literally the man who had signed off his own 747 qualification about two months earlier. Try telling the guy who certified you last quarter that he's making a mistake.
The Pan Am crew had missed their assigned taxiway exit and were still on the runway when they weren't supposed to be.

None of those factors on its own causes a crash. All of them together, on the same afternoon, at the same airport, cause the worst aviation disaster in history. That's the shape of the real problem: it's combinatorial, and the individual ingredients look fine right up until they don't.

So, what's the reason I'm dragging Tenerife and aviation into a post about deploy pipelines? Human judgment failed on that runway, and human judgment is still probably more flexible than any piece of software we could have put in its place. The post-Tenerife fix wasn't "replace the captain with an autopilot," it was Crew Resource Management — training the humans to communicate better, flattening the cockpit hierarchy so the first officer would actually say "Captain, we don't have clearance," standardizing phraseology so "OK" could never again be mistaken for "cleared for takeoff." The industry automated the bits that benefited from automation (ground radar, TCAS, autoland) and upgraded the humans on the bits that didn't. Both, in sequence, on purpose.

This is the move I think a lot of enterprise AI strategy is currently missing.

Do-nothing scripting: The road from manual slog to automation

There's a beautiful small idea from incident management veteran Dan Slimmon that I think every enterprise automation team should have tattooed on their wrist. He calls it do-nothing scripting, and it's exactly the bridge between "we do this by hand" and "a robot does this" that most teams never build.

The idea is this: you take a manual procedure — the kind of 15-step slog that lives in a Confluence page nobody trusts — and you write a script that doesn't actually do any of the steps. It just prints them out, one at a time, and waits for a human to press Enter before showing the next one. That's it. That's the whole pattern.

At first glance, it looks pointless. You haven't saved anyone any work; you've just turned a document into a program that narrates the document back at you. But it does three things that matter enormously:

It stops you losing your place. In a 15-step procedure run at 2am by a tired on-call engineer, step-skipping is one of the most common failure modes. The Claude Code leak was, at root, a version of this — a manual step that needed to happen and didn't. A do-nothing script makes skipping physically harder.
Every step is now a function. Which means the day you decide that step three is safe to automate, you replace the print("go do this thing") with actual code — and nothing else about the procedure changes. The humans who run it don't even need to know. You've created a joint in the system where automation can land incrementally, instead of demanding a big-bang "let's automate the whole pipeline" project that never ships.
It gives you data. Because the script is now the canonical way to run the procedure, you can log how long each step takes, which steps get skipped, which steps fail, and which steps people complain about. That's the data you need to decide where to move the boundaries next. Without it, you're just guessing which joints to automate, and guessing is how you end up automating the wrong ones and leaving the Claude-Code-leak-shaped ones manual.

The reason I love this pattern so much is that it takes the whole "human vs automation vs AI" question out of the realm of architectural theology and makes it incremental and reversible. You don't have to know the final mix on day one; you just have to encode the current mix in a form that lets you move one task at a time, watch what happens, and move another. Boring, cheap, reliable, and the thing that actually gets you to remove toil.

The reason I probably love it quite this much is that it's more or less what we build at Cutover for our customers. Cutover is, at its heart, a collection of tasks arranged in a directed graph: some human, some automated, some (increasingly) AI. Because it's software rather than tasks in some other software masquerading as a directed graph, we get to capture the execution data that do-nothing scripting hints at but can't really deliver on its own, such as:

When a task was ready to start
When the task actually started
How long the task took
Whether the task failed
Whether it always fails at 2am on the last Sunday of the month when the batch window collides with it
Whether this branch of the graph behave differently when a particular team runs it

That's the instrumentation layer you need to decide, with evidence rather than vibes, which tasks are ready to move from human to automated, which ones should stay human, and which ones have quietly drifted into being the next Claude-Code-leak-shaped failure waiting to happen. The script-that-does-nothing is the idea. A graph-that-remembers-everything is the version you can actually run and improve a large enterprise on.

System boundaries constantly move

Because you cannot work out the right automation mix easily, you have to instrument the process, run it, watch where it fails, and move the boundaries. Maybe a step you thought was safely automated keeps producing weird outputs on Tuesdays, and it turns out there's a monthly batch job that collides with it, and the right fix is to hand that step back to a human for now. Maybe a step you thought needed a human is actually just three if-statements in a trench coat, and a script would do it more reliably than the tired person doing it at 2am. You only know by looking at the data. And the boundaries move, especially in a large enterprise, which are not static systems — they are constantly being reorganized, acquired, regulated, and re-platformed underneath you. The automation mix that was right last quarter is not necessarily right this quarter.

The failure mode I keep seeing is treating "automate everything" as a strategy rather than a direction. That is not a strategy. The actual strategy is: automate everything except the things we don't, and have the discipline and data to know — and keep re-checking — which is which.

Don’t automate everything, automate better

So, back to Cherny, because I like his response to the leak

He didn't say "we'll automate everything." He said that particular step should have been better automated. So let's assume they've made a few improvements, and a couple more are on the way. He named it as a process issue. The mix after the incident is not "all automation" or "all humans" or "all AI." It's a different mix, with the boundaries adjusted in one specific place where the data now says they should be adjusted.

That's sadly the whole game in an enterprise context. It's likely boring, it's definitely iterative, it requires you to actually look at what's happening rather than ship a slide that says "agentic transformation," and you know that it will be wrong again in six months and you will have to move the boundaries again. If you're building an automation strategy right now and your plan doesn't include a mechanism for continuously deciding where the humans go, you don't have a strategy; you have a vibe that's likely a liability disguised as speed.

Find out how Cutover Respond is helping major organizations orchestrate humans, automation, and AI fo faster, safer major incident management,

Kieran Gutteridge

What the Claude Code leak can teach us about automation

Even Anthropic doesn’t automate everything

When human error causes catastrophic failure: Don’t replace the humans, upgrade them

Do-nothing scripting: The road from manual slog to automation

System boundaries constantly move

Don’t automate everything, automate better

Agentic AI in Major Incident Management: The End of the 2am Scramble

Cutover Launches New Integrations Script Builder to Eliminate Friction Across Complex Technology Operations

What is runbook automation? A comprehensive guide

Get the latest Cutover updates and insights in a monthly newsletter