31 From warehouses to lakehouses, each evolutionary step in data architecture has solved one problem while creating another. Data architecture has come a long way since the golden age of traditional data warehouses. Those legacy systems served their purpose — but they weren’t built for the complexity, scale, and speed demanded by today’s AI and analytics-driven world. Traditional data warehouses — Early 2000s era: ✅ Structured, reliable, optimized for reporting ❌ Rigid, costly, and unable to manage the surge of unstructured/semi-structured data Data lakes — Big data era: ✅ Vast, low-cost repositories for storing raw data of any type ❌ Often became “data swamps” with weak governance, poor performance, and limited real-time analytics support Data lakehouses — Modern hybrid era (warehouses + lakes): ✅ Combine the governance and schema of warehouses with the flexibility and scale of lakes ❌ Still maturing, with challenges in consistency, performance, and enterprise-wide adoption In part 1 of our AI blog series, we broke down the core concepts driving today’s AI boom — clarifying the terminology and cutting through the hype with A primer on the concepts of AI: ML, LLMs, DL, NLP, GenAI, and the rise of RAG. Part 2 takes the next step: showing how Scality RING + Starburst turn messy, distributed data into a solid foundation for AI. As AI and machine learning workloads accelerate, enterprises need data environments that go beyond the tradeoffs of past architectures — bringing together scale, governance, and real-time access in a way that makes data truly AI-ready. That’s where Scality RING, a massively scalable and immutable object storage platform, and Starburst, a modern SQL-based query engine, combine to make data lakes AI-ready with a practical path forward. Together, they form a validated architecture that turns enterprise-scale data sprawl into fast, queryable insight. Most blogs stop at theory. This one goes further. Here, we’ll walk you step by step through integrating Starburst with Scality RING, creating schemas, and running SQL queries on real data — building a fast, flexible, and governed platform for AI/ML workloads at scale. Ready to roll up your sleeves? What is Starburst? Starburst is a data lakehouse platform that provides fast, secure access to data where it lives — across clouds, on-prem storage, databases, or data lakes — without needing to move or duplicate it. The ability to access and analyze data from diverse sources empowers organizations to be more agile and responsive to changing business needs and accelerate their time to insight from data analysis. Starburst enables high-performance queries across: Structured data (e.g., tables, Apache Iceberg, Parquet) Semi-structured data (e.g., JSON, CSV, YAML) Unstructured data (tagged via metadata) This makes it a powerful tool for unifying data access across environments. And when paired with Scality RING, it becomes an enterprise-ready platform for analytics and AI that respects scale, performance, and security. Scality-validated design application Starburst has been certified as a validated design by Scality. This process involves deploying Starburst and Scality RING in a lab environment and tuning the products for predictable performance and testing workloads of typical customer scenarios. Customers get the value of having integration instructions, sizing their platform to have a seamless onboarding experience as well as predictable performance. Scality’s partner app certification program helps our customers ensure seamless integration. Why Scality RING + Starburst? Scality RING isn’t just an object storage platform — it’s a foundation for next-generation data analytics and AI/ML workloads. When you integrate it with Starburst, several powerful capabilities come together: Search unstructured and semi-structured data While SQL can’t query raw unstructured data directly, metadata tagging on RING makes it possible to organize and classify unstructured content — like documents, images, or log files — so they become discoverable and useful. Starburst can query metadata alongside structured and semi-structured data for a comprehensive view. Structured query power at scale Starburst brings high-performance SQL querying to structured data sitting in RING. Think Iceberg tables, ORC files, Parquet — queryable at speed and at scale, even across petabytes of data. Fuel for AI model training Every AI/ML workflow starts with a dataset. The better organized and governed data is, the more accurate your models will be. With Starburst + RING, you can: Create tables and schemas on your data Run SQL queries to extract the right training sets Enrich or label data before feeding it into AI pipelines And best of all, you don’t need to move the data. Built-in governance and access controls Security and data governance are non-negotiable, especially when working with sensitive or regulated data. Scality RING supports granular access controls and permissioning. Starburst respects and extends these controls, letting you enforce role-based access policies across your entire data lake. In practice, that governance means you can: Enable data access for specific teams or users Control visibility at the dataset or metadata level Keep your compliance team happy How it works: A quick look Setting up Starburst to query data on RING is straightforward. Step 1: Connect Starburst to RING as an object storage data source.Create a new catalog configuration file in the /etc/catalogs folder, and add the following configuration: # etc/catalog/s3.propertiesconnector.name=hivehive.metastore=filehive.s3.aws-access-key=RING_ACCESS_KEYhive.s3.aws-secret-key=RING_SECRET_KEYhive.s3.endpoint=RING_S3_ENDPOINThive.s3.path-style-access=truehive.s3.ssl.enabled=truehive.s3.max-connections=100hive.non-managed-table-writes-enabled=truehive.s3select-pushdown.enabled=false Step 2: Define schemas and tables on structured/semi-structured datasets. CREATE SCHEMA s3.mydataWITH (location = ‘s3://your-bucket/path/to/data’);CREATE TABLE s3.mydata.mytable ( id BIGINT, name VARCHAR, value DOUBLE)WITH ( external_location = ‘s3://your-bucket/path/to/data/mytable/’, format = ‘ORC’); Step 3: Query your data. Use SQL queries to explore, join, and extract insights.a. Simple query: SELECT * FROM s3.mydata.mytable LIMIT 10; b. Join with another table: SELECT t1.name, t2.valueFROM s3.mydata.table1 t1JOIN s3.mydata.table2 t2ON t1.id = t2.id; c. Aggregation: SELECT date_column, COUNT(*) as count, SUM(value) as totalFROM s3.mydata.mytableWHERE date_column >= DATE ‘2024-01-01’GROUP BY date_column; Use tagged metadata to organize unstructured data for future use. This architecture allows your AI initiatives to take off faster — without rebuilding data pipelines or duplicating data across systems. The steps we’ve walked through above aren’t just about setting up queries — they’re about transforming your raw, distributed data into a foundation that AI can actually use. Why this matters for AI When your leadership says, “We want to do something around AI,” this is what they’re talking about. A platform like Starburst on Scality RING checks all the boxes: ✅ Scalable, cost-efficient data lake foundation✅ Real-time analytics without data movement✅ Built-in governance and security✅ Structured + semi-structured data support✅ Metadata tagging for unstructured data✅ Compatible with AI/ML model training workflows For IT practitioners, this is more than a checkbox exercise — it’s a future-proof strategy. You’re not just standing up another analytics tool; you’re building an environment where data becomes fuel for innovation, ready for AI today and adaptable for what’s ahead. Bringing it all together The journey from warehouses to lakes to lakehouses has always been about balance: structure vs. flexibility, cost vs. performance, governance vs. freedom. With the validated integration of Starburst and Scality RING, enterprises don’t have to choose. You can unify sprawling data, make it queryable at scale, and do it with governance and security baked in — turning a complex data landscape into an AI-ready platform. Next steps If you’re ready to move from theory to practice: Explore part 1 of our AI series for a foundational primer on AI-ready infrastructure. Put part 2 into action by following this validated design to integrate Starburst with Scality RING. Connect with our team to see how this fits into your AI roadmap and to start accelerating your own initiatives. The future of AI is being built on the data foundations we lay today. With Starburst and Scality, you’re not just keeping up — you’re getting ahead. Other AI resources: A primer on the concepts of AI: ML, LLMs, DL, NLP, GenAI, and the rise of RAG Stop building dumb chatbots: The RAG + Scality RING solution Enterprise AI in action: 5 real-world use cases powered by object storage AI can’t wait for your data — How Hammerspace and Scality keep GPUs fed Multidimensional scale: 10 must-have data storage dimensions to power your AI workloads The AI storage problem you didn’t see coming — and how Scality RING already solved it