Tuesday, May 26, 2026
Home » Feeding AI at scale doesn’t require all-flash

Feeding AI at scale doesn’t require all-flash

There’s a stubborn assumption baked into how most enterprises plan AI infrastructure today: if you want serious throughput, you need all-flash. Anything else gets relegated to the cold tier.

That assumption deserves a hard look. We completed a structured performance validation on a production-grade Scality RING deployment at a major sovereign AI cloud provider, and the numbers tell a different story. 

With Scality RING on HDD-based storage servers, with flash media (SSDs) used only for metadata, in a real production path with TLS encryption, load balancers, and traffic crossing three availability zones, the system delivered sustained read throughput over the S3 API of approximately 420 GB per second and sustained write throughput of approximately 250 GB per second. Held steady across a continuous two-hour window. No storage-layer errors.

That’s bandwidth historically associated with parallel file systems, but instead it is running on disks!

The architect’s dilemma: AI throughput without all-flash economics

If you’re designing infrastructure for AI pipelines, retrieval-augmented generation (RAG), large-scale analytics, or consolidated backup, you already know aggregate throughput has become the binding constraint. Inference clusters need embeddings, checkpoints, and intermediate artifacts pulled at hundreds of GB/s in parallel. Analytics platforms generate distributed read and write patterns across compute clusters. Backup environments consolidate petabyte-scale datasets into windows that keep shrinking.

The default reflex is to push performance-sensitive data onto flash and use HDD for cold capacity. But at petabyte and exabyte scale, the all-flash premium starts to dominate the budget, and it forces architects to maintain multiple storage tiers purely to satisfy bandwidth requirements that, as it turns out, HDD-based object storage can already meet.

The question has never really been whether HDDs have enough raw aggregate bandwidth. It’s whether an object storage system can extract that bandwidth under realistic deployment conditions, with encryption, load balancing, and multi-AZ architecture in the data path. That’s what this validation set out to answer.

Testing HDD-based object storage under production-realistic load

The deployment ran Scality RING 9.5 on just over 100 x86 nodes with HDD storage media, deployed across three availability zones in a synchronous geo-stretched configuration. Load was generated from 120 injector VMs distributed across all three zones, driving up to 35,000 concurrent S3 operations across 696 buckets in parallel. All traffic flowed through a global DNS endpoint over HTTPS, traversing the same load-balancing, TLS, and cross-zone routing infrastructure that production workloads see. 

Note that with a geo-stretched RING, all write operations happen synchronously to ensure that the system can maintain full service and full consistency even if one of the sites is lost or becomes unavailable (due to power loss, network connectivity issues, or site disaster). 

The primary workload used 10 MB objects, which is representative of AI dataset distribution and analytical pipeline transfers. Crucially, the validation focused on sustained throughput across a two-hour window, not short-duration peak bursts. Data injection and retrieval used a common third-party open-source S3 testing tool. Industry-standard, not a proprietary harness.

The result: Hundreds of GB/s over S3, sustained

Measured at the load balancer layer, thereby reflecting the externally visible encrypted traffic that applications actually see, sustained reads ran maintained close to 420 GB/s, with peaks approaching 450 GB/s. Sustained writes maintained approximately 250 GB/s. 

Throughput stayed flat across the full two-hour window. Latency increased predictably as concurrency approached saturation, but aggregate bandwidth never collapsed, showing controlled saturation rather than instability. The system served 20,000 to 40,000 large-object operations per second under sustained load with zero storage-layer errors.

A secondary profile using 4 KB objects validated behavior under high request rates: 173,456 S3 PUT/sec and 151,091 S3 GET/sec, again with no errors. Different workload shape, same stable behavior.

What this means for AI, analytics, and backup architecture

For architects, the result challenges a default assumption: not every performance problem is a latency problem. Some workloads are constrained by sustained aggregate bandwidth, and that changes the role HDD-based object storage can play in AI, analytics and backup architecture.

For AI data pipelines and RAG retrieval workloads, maintaining sustained reads in the 420-450 GB/s range keep GPUs fed and retrieval response times predictable when many services hit object storage concurrently. Aggregate read bandwidth is the binding constraint when object storage is the primary data layer for AI, and these results clear the bar.

For backup and data protection, sustained write throughput in the range of 230-250 GB/s (nearly 900 TB per hour!) collapses backup windows for petabyte-scale environments, while 420 GB/sec reads (1500 TB/hour) will certainly accelerate large-scale restores, which is critical to restore business operations after a major cyber event. Backup teams that have been planning around all-flash ingest tiers for performance reasons should rerun the math.

For data lakes and analytics, the validation confirms that distributed read and write patterns from large compute clusters can run against HDD-based object storage without storage becoming the bottleneck — even with encryption and multi-AZ traffic in the path.

The architecture takeaway: Bandwidth and latency are not the same problem

None of this argues that flash is irrelevant. Latency-sensitive workloads still benefit from it, and some will continue to require it. But the industry reflex to equate performance with all-flash and relegate HDD to cold capacity is increasingly outdated. 

At scale, HDD-based object storage can actually deliver sustained throughput that changes the economics of AI, analytics and backup architecture. 

The implication for architects is straightforward: AI pipelines, large-scale analytics, and consolidated backup can run on a single high-throughput object storage foundation without introducing additional storage tiers solely to chase bandwidth. The cost difference at petabyte and exabyte scale is significant. The performance penalty, based on this validation, is not. 

Before defaulting to an all-flash architecture, architects should pressure-test whether the workload truly needs flash-level latency or whether the real requirement is sustained aggregate bandwidth at scale.

Read the full validation

The full performance paper covers the test environment, methodology, measurement integrity across the storage and load balancer layers, and detailed results, including system behavior under sustained load and the small-object profile. 

If you’re designing infrastructure for AI, analytics, or large-scale data protection, it’s worth the read.