ai-code-review-gemba

Everyone is talking about AI-assisted development. Benchmarks. Productivity gains. Ten times engineers. Code written in seconds.

What gets less attention is what happens after the AI writes the code.

At Scality, we have learned that the real improvement does not happen in the prompt. It happens in the diff — the space between what the AI generates and what the engineer ultimately decides to ship.

AI doesn’t improve on its own. It improves when someone studies the output, questions it, and feeds the lessons back into the system.

Looking at the actual work

As VP of Engineering, I oversee eight engineering teams working on distributed storage systems across multiple languages and technologies, including C, Go, Python, JavaScript, and React. Our work spans everything from low-level hardware management to the management interface that customers use every day. My role includes responsibilities such as maintaining performance standards and supporting the development of our engineering leaders.

Over the past year, as AI tools have become part of our workflow, I leaned into a principle borrowed from manufacturing practice: gemba.

Several times per week, I sit with an engineer and their manager. We do not review slides. We do not debate roadmap priorities. We examine the actual work:

The real prompts
The generated code
The final version that ships

And most importantly, the diff between what the AI produced and what the engineer ultimately decides to ship. If you want to understand how AI is affecting engineering practice, that’s where the signal lives.

Gemba in AI development: The real factory floor is the codebase

In manufacturing, gemba means going to the factory floor — the place where the work actually happens. For engineering teams, the factory floor is the codebase. In an AI-assisted workflow, it also includes prompts, shared configuration files, and the outputs those systems produce.

Leadership in that environment isn’t about approving tools. It’s about staying close enough to the work to understand how those tools behave in practice.

When you sit with engineers and review diffs together, you start to see judgment in action:

Where something was simplified
Where a piece of logic disappeared entirely
Where an abstraction was rewritten to better fit the system
Where the generated code was accepted as-is

Metrics don’t show that level of detail. You only see it when you look directly at the work.

What diffs reveal about AI-generated code

AI coding tools are undeniably useful. They accelerate iteration and remove a lot of mechanical work from the development process. But once you spend time reviewing the diffs, certain patterns become obvious.

AI often over-engineers. It introduces abstractions that aren’t necessary for the specific system, or repeats logic across layers where a human engineer would consolidate it. Sometimes the generated code technically works and even passes tests, but it still isn’t production-ready.

None of this is surprising. The model optimizes for general solutions while engineers optimize for context. The tension between those two perspectives is where a lot of engineering knowledge lives.

Why the diff matters more than the prompt

The first output is not the artifact. The meaningful artifact is the difference between the AI output and the final engineer-reviewed version. That difference contains architectural judgment, performance tradeoffs, conventions learned through production experience, and the discipline that keeps systems reliable.

If you don’t study that gap, you miss the signal.

And if the reasoning behind those changes never gets captured, the organization doesn’t really improve. The code improves, but the system that generates the code does not.

Engineers still own the feedback loop

AI does not own the feedback loop; engineers do. AI cannot evaluate or correct itself. It has no judgment and no accountability. When AI generates code that needs changes, the engineer is the one who cares about the consequences.

But the real feedback is not just fixing the code. It is asking why the AI generated it that way and then updating the system that shapes the output. That might mean refining shared prompt frameworks, updating configuration files, strengthening architectural guidance, or codifying standards that repeatedly appear in diffs.

In other words, humans don’t just fix the code. They improve the system that generates it.

Over time, when recurring patterns from diffs are fed back into prompts, configurations, and engineering standards, the tools become more useful, but they remain tools inside a human-owned system.

AI adoption is an engineering practice, not a tooling decision

Many organizations treat AI adoption as a tooling decision: Which model? Which vendor? Which license?

Those questions matter, but maturity does not come from procurement. It comes from leadership involvement in real workflows: observing friction directly, turning repeated corrections into standards, and improving the engineering system continuously.

AI improves when your engineering system improves, and that requires discipline.

The discipline behind the productivity

AI is powerful. It speeds up iteration and lets engineers try things that would have taken much longer before, but it doesn’t automatically improve how an organization builds software.

Improvement still happens how it always has: through review, through diffs, through the accumulation of small engineering decisions that shape production systems over time.

That mindset shapes how we adopt AI at Scality — the same way we build resilient, high-performance software: through observation, iteration, and disciplined engineering practice, not hype.

The diff is where the learning happens: Why AI development needs hands-on engineering

Looking at the actual work

Gemba in AI development: The real factory floor is the codebase

What diffs reveal about AI-generated code

Why the diff matters more than the prompt

Engineers still own the feedback loop

AI adoption is an engineering practice, not a tooling decision

The discipline behind the productivity

Pierre Derome

Related Posts

What is object storage, anyway?

How to raise employee cybersecurity IQ with awareness training

The top 10 most influential people in AI

When cloud is the problem, not the answer

How Scality helped me build a simple “hypersonic restore target” for my...

High-density HDD vs QLC flash: Demystifying the power efficiency debate

About Us

Useful Links

Editors' Picks

COME MEET US

The diff is where the learning happens: Why AI development needs hands-on engineering

Looking at the actual work

Gemba in AI development: The real factory floor is the codebase

What diffs reveal about AI-generated code

Why the diff matters more than the prompt

Engineers still own the feedback loop

AI adoption is an engineering practice, not a tooling decision

The discipline behind the productivity

Ransomware-Proof Backup: Protecting Petabyte-Scale ML Data

Related Posts

About Us

Useful Links

Editors' Picks

COME MEET US