Friday, December 6, 2024
Home » Why app developers will love Scality RING +RING XP for optimizing their AI pipelines

Why app developers will love Scality RING +RING XP for optimizing their AI pipelines

As AI technology continues to transform industries, developers face growing demands for speed, scalability, and reliability in their AI pipelines. At the heart of these pipelines lies the data infrastructure — a crucial factor that can either accelerate or bottleneck innovation. 

Enter Scality RING and RING XP, a powerful combination that brings unmatched flexibility and performance to support AI workloads at scale. With Scality RING’s multidimensional scaling  and RING XP’s extreme performance, developers gain the storage they need to keep pace with rapidly evolving AI applications. Let’s dive into how it works.

Scality RING delivers foundational capabilities to make massive-scale AI data lakes a reality. RING XP (an all-flash configuration of RING) takes performance to the unheard-of level of microsecond response-time latencies for small (4KB) object data, accelerating AI model training and fine-tuning. Use both, and you have the power to unleash AI’s full potential across the entire pipeline.

Scality RING: The gold standard for data lakes in high-stakes industries

Scality RING is perfect for customers looking to build resilient, feature-rich data pipelines — without sacrificing easy scalability or the flexibility to adapt to ever-evolving requirements. 

RING is now established as a foundational data lake for AI and analytics across a diverse set of customers representing critical industries, including:

Financial services 

  • Major U.S. bank: Daka lake for AI-driven fraud detection
  • Leading U.S. insurance provider: Data lake for automated claims processing

Transportation and research

  • Space research agency: Data lake for earth observation, astronomy, orbitography images 
  • Space exploration agency: Data lake for launch vehicle configurations and telemetry
  • Top automobile manufacturer: Daka lake supporting crash avoidance technology
  • European railway services: Data lake supporting real-time maintenance updates 
  • Global travel services provider: Data lake for analyzing travel search patterns 

Life sciences and genomics labs

  • Multiple genomics labs: Data lake for genome sequencing and research 
  • U.S pharmaceutical companies: Data lake for biopharma research automation 

Government and public sector

  • Several government intelligence agencies: Data lake for forensic crime data  
  • Military and defense agencies: Data lakess in top secret agencies across the U.S., Europe, Middle East and Asia

This range of customers showcases how Scality RING supports the complex AI workflows of industries with rigorous (and ever-evolving) demands for performance, scalability, and security.

A closer look at Scality RING in action: SeqOIA’s Genomics Lab

A prime example of RING’s impact is its deployment by SeqOIA, one of only two national labs in France specializing in whole-genome sequencing to advance research and enable more positive outcomes for patients with rare diseases and cancer.

With RING, SeqOIA’s high-throughput genomics lab significantly accelerates the analysis of petabyte-scale genetic data, driving improvements in the speed and precision of diagnostics and healthcare delivery. For application developers in genomics and other data-heavy fields, RING’s capabilities translate into more efficient data management for complex research projects.

Bottom line: RING is proven as an ideal AI data lake repository

Our customers share a common need that reflects the modern requirements for data lakes, especially with the rise of AI and analytics. These requirements map directly into the capabilities and strengths of RING:

  • Data lakes need to aggregate data from multiple external repositories 

→ RING provides a single high-throughput, scalable repository to store any type of unstructured data and can grow to huge capacities — 10s to 100s of petabytes and some into exabyte scale

  • Data filtering and cleansing before the data can be analyzed and processed by AI tools → RING provides flexible and fast performance for data access and processing
  • Ensuring data security and privacy for sensitive aggregated data
    → RING provides CORE5 cyber resiliency with secure multi-tenancy and immutability
  • Data augmentation and vector data stored in object metadata
    → RING provides application-extensible metadata tagging, applicable for direct search and to emerging vector databases and retrieval augmented generation (RAG) techniques

RING supports all of these capabilities through its underlying ultra-flexible architecture that scales in multiple dimensions, to handle the widest range of application and workload requirements.

Redefining scale

Why next-gen AI and cloud data demands multidimensional scaling

Given all the hype around all-flash storage solutions, it is important to mention that the customers I’ve listed use hybrid (mixed flash/HDD) storage servers and still achieve high throughput rates.

Specifically, in the case of the bank’s fraud detection data lake, they currently realize 80GB/sec read throughput per site, so 160GB/sec of aggregate throughput across the sites to serve a large Splunk analytics cluster. The travel services company ingests 1 petabyte per day into the RING for their AI and analytics applications.

In addition to Splunk, these customers also successfully use a wide range of partner-provided (ISV) AI tools and applications to analyze data from the RING data lake, including Weka, Dremio, Presto, Trino, SPARK, Cloudera, HPE Pachyderm, Elastic, Cribl and others. 

With a roster of satisfied RING data lake customers worldwide, the question remains: What is the final component needed to completely solve their AI data pipeline storage needs?

Enter Scality RING XP: eXtreme Performance for AI workloads

Combining RING XP, the world’s fastest object store with RING means app developers can achieve unprecedented microsecond response-time latencies for small (4KB or smaller) object data, making it the ideal target for AI tools, custom-developed applications, and performance-optimized file systems used for training AI models.

When deploying RING XP, IT leaders and AI engineers continue to enjoy the clear benefits that RING object storage provides for AI data lakes: scalability, security, and low cost of ownership. Application developers retain unbounded flat namespaces and API-based access to storage, which fits naturally into today’s stateless, services-based application architectures.

RING XP specifically steps in to address the need to achieve eXtreme Performance levels on smaller “hot” object data. For AI model training and fine-tuning, this typically entails processing millions to billions of small (few KB and below) objects. 

RING + RING XP – eXtreme Performance solution for AI model training and fine-tuning

For these extreme-performance AI applications, RING XP, a special configuration of Scality RING, delivers microsecond-level access latencies on KB-size objects. We believe RING XP to be the first and only storage solution that delivers latencies well below 1 millisecond over an object storage interface.

So, what is RING XP exactly?

From a software perspective, it is fundamentally based on our proven RING software stack with:

  • High-performance object storage connector, with streamlined object APIs 
    • An ideal target for AI applications, tools, and file systems via a simple object API
  • Performance-tuned RING storage stack
    • Retains all the advantages of RING data durability (EC, replication, integrity, self-healing)
    • Also enables ultra-efficient durability policies for the most extreme storage performance requirements such as for scratch storage
  • RING software deployed on AMD EPYC, all-NVMe flash server platforms
    • Supported on eight different reference platforms from vendors such as Lenovo, SuperMicro, Dell and HPE at launch
    • The same usable capacity-based subscription licensing as with RING, with no additional charges or fees for RING XP

To provide some context on performance, RING XP can deliver latencies below today’s fastest public object storage offerings:

  • Standard Amazon Web Services (AWS) S3 cloud object storage: Delivers latencies between 10-50 milliseconds (not deterministic or bounded by an SLA, and may be higher).
  • Amazon S3 Express One Zone: AWS introduced a streamlined (less functional, not S3 compatible) object interface in late 2023 termed S3 Express. In return for reduced functionality, it promises “single-digit millisecond” access latencies.

Scality RING XP achieves 500 microsecond read access and 700 microsecond write access latencies on 4KB objects. 

A brief history: How Amazon S3 evolved to meet lower-latency demands

In 2023, AWS introduced S3 Express One Zone as a faster, lower-latency option for AI developers. The standard S3 API had become so feature-rich that it added extra overhead into the S3 stack, impacting performance. 
To enhance latency, AWS made significant adjustments in the S3 Express API such as:

New directory bucket structure: Unlike the standard S3 buckets, S3 Express operates with directory buckets, an optimized design that reduces latency and supports high-frequency access.

Session-based authentication: Whereas standard S3 authentication occurs with each API call to ensure secure access, S3 Express consolidates authentication at the session level to minimize repetitive authentication tasks and enhance overall response times.

RING XP vs. all-flash file storage solutions: What sets us apart

When evaluating storage solutions, ask file system vendors one critical question: how do they measure performance? All-flash file storage solutions come from both legacy giants and nimble startups, and each measures performance differently.

Typically, file system vendors report performance metrics through traditional file interfaces like NFS, with some also offering limited object storage compatibility. However, vendors often do not disclose latency measurements for object storage interfaces — choosing instead to highlight low latency achieved through file system APIs.

File system interfaces can achieve low latencies, typically in the hundreds of microseconds — similar to the new RING XP (in the range of 100s of microseconds). However, our measurements are measured directly from an object storage interface. We’re confident that file systems using their object storage APIs cannot match the low-latency levels of RING XP, as additional response time is typically introduced.

The difference? RING XP offers the unique combination of scalability and ultra-low latency through a purpose-built object storage interface that delivers distinct advantages.

Did you know: Unlike RING, which is software-defined object storage, traditional file systems are usually proprietary hardware-based solutions. This locks customers in and provides limited flexibility in choosing a variety of storage server hardware platforms.

Why RING + RING XP object storage is optimal for AI developers

Object storage brings significant advantages to modern applications. Its flat, single namespace enables limitless scalability, while API-based access is perfectly suited for containerized architectures and stateless, microservice-driven applications. 

RING XP takes these benefits to the next level. With its massive 160-bit address space — enabling more unique addresses than there are atoms in the universe — RING XP offers developers boundless scalability without directory constraints or performance degradation. This makes it ideal for data-intensive AI applications where storage limitations and bottlenecks are simply not an option.

Object storage also simplifies data management. There’s no need to worry about directory structure limitations or performance bottlenecks when files are grouped in a single directory. RING XP empowers developers to create data-intensive applications without limitations on structure, supporting scalable, API-driven workflows that simplify access and enhance flexibility. 

Expanding on Scality RING’s foundational capabilities, RING XP easily meets the extreme performance demands of AI workloads. Organizations can now seamlessly leverage a high-performance, scalable storage solution within the trusted Scality ecosystem, ensuring that AI pipelines run faster and more efficiently than ever. A robust storage foundation with the flexibility and speed necessary to build cutting-edge, data-driven solutions fit for the future — what’s not to love?


Learn more

RING product overview
RING for AI data lakes
RING XP: eXtreme Performance for AI workloads

Blog:  How Scality RING XP performs 20x faster than Amazon S3 Express One Zone

About Us

Solved is a digital magazine exploring the latest innovations in Cloud Data Management and other topics related to Scality.

Editors' Picks

Newsletter

Challenges solved, insights delivered, straight to your inbox.

Receive hand-picked articles, case studies, and expert opinions. Keep up with industry innovations and get actionable insights to optimize your strategy.

All Right Reserved. Designed by Scality.com