No items found.
Blog
February 24, 2026

AI will transform software engineering. But not all of it.

There’s a prevailing narrative in tech right now that goes something like this: AI coding agents are improving rapidly, models are getting bigger and smarter, and it’s only a matter of time before the big blob of compute steamrollers its way through every software engineering task we can think of. Junior engineers first, then seniors, then architects, then everyone.

I’m genuinely bullish on AI’s impact on software engineering. The progress is real, it’s accelerating, and organizations that aren’t adapting are going to fall behind. But I think the “inevitable steamroller” narrative misunderstands something fundamental about how these systems actually learn - and that misunderstanding is going to lead to bad investment decisions, misplaced expectations, and a lot of wasted money.

The distinction that matters isn’t between easy and hard tasks. It’s between verifiable and non-verifiable ones.

The engine under the hood of AI

The breakthrough behind today’s AI coding agents isn’t just bigger language models. It’s the combination of large language models with reinforcement learning (RL) - the ability to generate code, test it against some objective, and iteratively improve.

This loop is extraordinarily powerful when it works. An agent writes code, runs the tests, sees the failures, adjusts, and tries again. Each cycle tightens the feedback loop. The result is systems that can generate working code, fix bugs, write tests, and handle migrations with increasing reliability.

But here’s what often gets glossed over: this entire loop depends on having a clear, computable signal for what “good” looks like. In RL terms, you need a reward function. And in software engineering, that reward function is straightforward for some tasks and essentially non-existent for others.

Where AI agents already excel

Let’s give full credit. For verifiable tasks where correctness can be evaluated programmatically, AI agents are already transforming the game. Code generation against a clear spec. Bug fixes with reproducible test cases. Unit test creation. Boilerplate and scaffolding. Data transformations. Dependency upgrades and API migrations.

These tasks share a common trait: you can write a function that checks whether the output is correct. That makes them perfect targets for the LLM + RL loop. The agent generates, evaluates, and improves. More compute, more iterations, better results. The trajectory here is genuinely impressive and I expect continued rapid improvement.

If your engineering organization isn’t already putting these capabilities to work on this class of problem, you’re leaving significant productivity on the table.

The verifiability wall for AI agents

Now consider a different class of task. A senior engineer deciding whether to decompose a monolith into microservices. A tech lead choosing between three viable architectural patterns, each with different tradeoff profiles that will only become apparent over the next eighteen months. A staff engineer writing a design document that needs to communicate intent, constraints, and unknowns to a cross-functional audience. An incident commander making triage decisions under pressure with incomplete information.

These tasks don’t have a clean reward signal. You can’t run a test suite against an architectural decision. There’s no compiler for “was this the right abstraction?” The feedback is delayed, noisy, context-dependent, and often subjective. Two experienced engineers can reasonably disagree about the right call, and both can be right depending on how the future unfolds.

This is where the steamroller narrative breaks down. It’s not that these tasks are “harder” in some abstract sense,  it’s that the fundamental mechanism driving AI agent improvement (generate, evaluate, improve) doesn’t apply in the same way. You can’t RL your way to better judgment when you can’t define the reward function.

Throwing more compute at this problem is like turning up the volume on a radio that’s not tuned to a station. More power, same static.

The dangerous middle ground for AI productivity

The riskiest territory isn’t either extreme. It’s the tasks that look verifiable but aren’t.

An AI agent can generate a system architecture that compiles, passes tests, and even performs well under synthetic benchmarks, but is fundamentally wrong for the organization’s trajectory. It can suggest a refactoring that passes every test but makes the codebase harder to reason about.

These are the cases where over-confidence in AI output creates real risk. The system produces something that clears every automated check but fails the checks that only exist in experienced human judgment. And because the output looks correct it compiles, it passes, it deploys, yet the failure mode is silent.

As entrepreneur and software developer Wes McKinney stated in his recent blog post - the hardest part of programming isn’t programming, it’s containing mission creep and ensuring the architecture being introduced is actually fit for the system’s future direction. If not, you create development debt at agentic speed.

A more useful mental model of AI

Rather than asking “will AI replace software engineers?”, a more productive question is: “for each class of task in my engineering organization, how verifiable is the output?”

On one end: tasks with deterministic, fast feedback loops. Automate aggressively. Deploy agents. Measure the gains. This is real, it’s here, and it’s significant.

On the other end: tasks where the “right answer” depends on organizational context, evolves over time, and involves tradeoffs between competing goods. Here, AI is a powerful thinking partner and force multiplier, but the idea that it will autonomously handle these tasks end-to-end is a category error based on a misunderstanding of the underlying technology.

In the middle: tasks that require careful orchestration of human judgment and AI capability, with clear handoff points and appropriate human oversight.

What this approach to AI tooling means for leaders

The organizations that will get the most from AI in software engineering are the ones that resist the temptation to treat it as a monolithic force and instead think carefully about the verifiability spectrum of their work.

That means investing aggressively in AI tooling for the verifiable end of the spectrum. It means building the orchestration capability to coordinate human and AI contributions across the messy middle. And it means recognizing that senior engineering judgment — the ability to navigate ambiguity, make contextual tradeoffs, and communicate complex decisions — isn’t just surviving the AI transition. It’s becoming more valuable, not less.

The big blob of compute is immensely powerful. But power without a target is just heat. The organizations that understand where to point it — and where to complement it with human judgment — are the ones that will actually capture the value.

What we are doing with AI at Cutover

At Cutover, we are putting these principles into practice by mapping our internal development efforts directly to this verifiability spectrum. We are automating aggressively where the feedback loop is deterministic and leveraging AI for tasks like boilerplate generation, unit testing, and dependency management where success is programmatically verifiable. For the "messy middle," we have built orchestration workflows that treat AI as a high-speed thinking partner while maintaining clear human handoffs to verify outputs. By offloading the verifiable to the "big blob of compute," we are not replacing our engineers; we are elevating them. This strategic focus ensures that our human talent remains focused on the highest-value, non-verifiable decisions by navigating the complex trade-offs and organizational context that define the future of our platform.

If you would like to learn more about what we are doing at Cutover, feel free to contact us.

Ky Nichol
CEO
AI
Latest blog posts