Ocient scales hyperscale data warehouse for machine learning

December 13, 2023


There’s big data, then there is really big data, where there can be trillions of rows of data. That’s the space where Chicago-based Ocient is in with its hyperscale data warehouse technology.

Today the company announced a series of new capabilities that expand the hyperscale data platform for geospatial data analytics as well as machine learning (ML) and artificial intelligence (AI). Embedded within Ocient’s Hyperscale Data Warehouse product, the new OcientGeo capability provides an extensive library of geospatial functions and a globally optimized spatial index. With OcientGeo, companies can now ingest and process massive volumes of historical and real-time geospatial data to generate actionable insights. Integrated ML tools allow businesses to further accelerate geospatial AI initiatives.

Ocient promises the use of highly optimized storage and processing to be able to handle hyperscale data requirements, without the need for using GPUs.

“Our focus is hyperscale workloads and I would say the average number of elements that are looked at in an average Ocient query, whether it’s SQL, machine learning, or geospatial is on the order of probably an average of a trillion things,” Ocient CEO Chris Gladwin told VentureBeat.


The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

Request an invite

Hyperscale data analytics are about flow, not GPUs

For many forms of accelerated computing use cases today, organizations will lean into using GPUs to help improve performance. That’s not however the path that Ocient is taking to enable its data warehouse.

“The whole kind of secret sauce to making this actually deliver, is a level of parallelization that is just extreme,” Gladwin said. “It’s not at all unusual that at every layer in the stack, there’s a million parallel tasks in flight or more.”

To enable the massive parallelization for the data warehouse, Gladwin said it’s all about – flow. He explained that with machine learning algorithms for clustering, regression and classification, the actual computational operations in a CPU are not the bottleneck. Rather the bottleneck is often compute density in that there is a need to have more compute power for each terabyte of data.

Gladwin said the challenge is getting enough throughput across the computing stack, including storage and memory. That challenge is at the foundation of Ocient’s technical differentiation, as the company has built technology to optimize memory and fast solid-state drive (SSD) based data storage systems.

“Our engineers would love to work on GPUs, they’re super cool, but we just haven’t found a need,” Gladwin said.


Image credit: Ocient / OcientML

Machine learning at hyperscale with OcientML

Ocient data warehouse got its start with SQL data queries. The same architecture that enables fast analytics queries on massive data sets is also at the foundation of OcientML and the OcientGeo capabilities.

Gladwin said that the same advantages of hyperscale performance, real-time analytics and data loading that Ocient provides for SQL workloads are now available for ML. He said that OcientML allows customers to do machine learning on datasets with billions, hundreds of billions or trillions of data points at a level of price performance that is better than alternatives. It also includes features like workload management to ensure fair access to resources across different queries and analyses running at hyperscale. OcientML integrates the ML stack directly into the Ocient Hyperscale Data Warehouse, eliminating the need to extract, transform and load data to a separate platform.

The benefits of OcientML include increased model accuracy by allowing full interaction with historical and current data, faster iteration by removing data movement steps, and simplified operations by managing SQL and ML in one system.

The OcientGeo capability follows a similar pattern as OcientML in that it is part of the core Ocient Hyperscale Data Warehouse and benefits from the platform’s massive parallelization. Gladwin noted that with OcientGEO, customers can perform geospatial queries, analysis, and functions on massive datasets directly within the Ocient platform, without having to first extract large amounts of data. This allows queries and analyses involving trillions of data points with geospatial components to be run in seconds at a massive scale.

“We still are kind of beginning that journey of enabling all these new uses that only can be enabled by making the price and performance of hyperscale analytics 10 times or more better,” Gladwin said.


Pritzker Group Home Page

Pritzker Group Venture Capital

Pritzker Group Private Capital

Pritzker Group Asset Management