Tuesday, March 24, 2026
Home » The diff is where the learning happens: Why AI development needs hands-on engineering

The diff is where the learning happens: Why AI development needs hands-on engineering

Everyone is talking about AI-assisted development. Benchmarks. Productivity gains. Ten times engineers. Code written in seconds. 

What gets less attention is what happens after the AI writes the code.

At Scality, we have learned that the real improvement does not happen in the prompt. It happens in the diff — the space between what the AI generates and what the engineer ultimately decides to ship.

AI doesn’t improve on its own. It improves when someone studies the output, questions it, and feeds the lessons back into the system.

Looking at the actual work

As VP of Engineering, I oversee eight engineering teams working on distributed storage systems across multiple languages and technologies, including C, Go, Python, JavaScript, and React. Our work spans everything from low-level hardware management to the management interface that customers use every day. My role includes responsibilities such as maintaining performance standards and supporting the development of our engineering leaders.

Over the past year, as AI tools have become part of our workflow, I leaned into a principle borrowed from manufacturing practice: gemba.

Several times per week, I sit with an engineer and their manager. We do not review slides. We do not debate roadmap priorities. We examine the actual work:

  • The real prompts
  • The generated code
  • The final version that ships

And most importantly, the diff between what the AI produced and what the engineer ultimately decides to ship. If you want to understand how AI is affecting engineering practice, that’s where the signal lives.

Gemba in AI development: The real factory floor is the codebase

In manufacturing, gemba means going to the factory floor — the place where the work actually happens. For engineering teams, the factory floor is the codebase. In an AI-assisted workflow, it also includes prompts, shared configuration files, and the outputs those systems produce.

Leadership in that environment isn’t about approving tools. It’s about staying close enough to the work to understand how those tools behave in practice. 

When you sit with engineers and review diffs together, you start to see judgment in action:

  • Where something was simplified
  • Where a piece of logic disappeared entirely
  • Where an abstraction was rewritten to better fit the system
  • Where the generated code was accepted as-is

Metrics don’t show that level of detail. You only see it when you look directly at the work.

What diffs reveal about AI-generated code

AI coding tools are undeniably useful. They accelerate iteration and remove a lot of mechanical work from the development process. But once you spend time reviewing the diffs, certain patterns become obvious.

AI often over-engineers. It introduces abstractions that aren’t necessary for the specific system, or repeats logic across layers where a human engineer would consolidate it. Sometimes the generated code technically works and even passes tests, but it still isn’t production-ready.

None of this is surprising. The model optimizes for general solutions while engineers optimize for context. The tension between those two perspectives is where a lot of engineering knowledge lives.

Why the diff matters more than the prompt

The first output is not the artifact. The meaningful artifact is the difference between the AI output and the final engineer-reviewed version. That difference contains architectural judgment, performance tradeoffs, conventions learned through production experience, and the discipline that keeps systems reliable.

If you don’t study that gap, you miss the signal.

And if the reasoning behind those changes never gets captured, the organization doesn’t really improve. The code improves, but the system that generates the code does not.

Engineers still own the feedback loop

AI does not own the feedback loop; engineers do. AI cannot evaluate or correct itself. It has no judgment and no accountability. When AI generates code that needs changes, the engineer is the one who cares about the consequences.

But the real feedback is not just fixing the code. It is asking why the AI generated it that way and then updating the system that shapes the output. That might mean refining shared prompt frameworks, updating configuration files, strengthening architectural guidance, or codifying standards that repeatedly appear in diffs.

In other words, humans don’t just fix the code. They improve the system that generates it.

Over time, when recurring patterns from diffs are fed back into prompts, configurations, and engineering standards, the tools become more useful, but they remain tools inside a human-owned system.

AI adoption is an engineering practice, not a tooling decision

Many organizations treat AI adoption as a tooling decision: Which model? Which vendor? Which license?

Those questions matter, but maturity does not come from procurement. It comes from leadership involvement in real workflows: observing friction directly, turning repeated corrections into standards, and improving the engineering system continuously.

AI improves when your engineering system improves, and that requires discipline.

The discipline behind the productivity

AI is powerful. It speeds up iteration and lets engineers try things that would have taken much longer before, but it doesn’t automatically improve how an organization builds software.

Improvement still happens how it always has: through review, through diffs, through the accumulation of small engineering decisions that shape production systems over time.

That mindset shapes how we adopt AI at Scality — the same way we build resilient, high-performance software: through observation, iteration, and disciplined engineering practice, not hype.