Cloud Pak for Data Footnote

*Performance and price-performance claims based on data from IBM benchmarks running a workload derived from TPC-DS at 1 TB scale factor and Cognos workloads that model a typical Business Intelligence application based on a retail database with in-store, on-line, and catalog sales of merchandise categorized into analyst dashboards, sales reports, and deep-dive complex analytics.

The IBM AutoSQL configuration was a Flex Performance Basic with 48 Cores 864GB of RAM at $7.76/hour running on IBM Cloud. The Azure Synapse SQL pool was a DW1000c with 2 compute nodes 600GB of RAM at $12/hour. The Snowflake Data Warehouse was a size Medium on Enterprise Tier at $12/hour. Serial and multi-user throughput tests of 8-user and 16-user concurrency tests were performed. There was no tuning and all benchmarks were run as is or an "out of the box" configuration to load and run with the same "partition/distribute/cluster key". Azure Synapse SQL pool and Snowflake runs were performed with persisted query results (result caching) disabled for a fair comparison of actual processing times measured.

IBM AutoSQL loaded data at 125 GB/hour for an 8 hour load time while Azure Synapse SQL pool took about 17 hours at a rate of 58.8 GB/hour to complete the 1 TB load. The total times to complete both loading and running 1TB workloads were 10hrs 46min for IBM AutoSQL and 21hrs 6min for Azure Synapse SQL pool. IBM AutoSQL was 129% faster on Serial Query throughput. 18% faster on 8 concurrent stream query throughput, and 19% faster on 16 concurrent stream query throughput and cost 43% less for with more included capabilities including data encryption, full privacy and security features, and private virtual servers and storage. IBM AutoSQL was twice the performance of Azure Synapse SQL pool DW1000c loading data 2x faster and cost 1.8x less to load and run the benchmark.

A retail customer ELT benchmark was also run against Snowflake and IBM AutoSQL. The workload focused on extract, load, and transformation queries that consisted of lookups, maintenance, aggregates, and temporary table queries. Data 3.5 TBs of data was loaded from S3 in us-west co-located with the database instance. The Data consisted of 36 gzipped ASCII delimited text files. Each file was loaded serially for both Db2 Flex P and Snowflake.

IBM AutoSQL loaded data 8.3x faster than Snowflake at 175.80GB/hour in 19.91 hours compared to Snowflake that loaded at a rate of 21.02GB/hour in 166.53 hours. Db2 AutoSQL outperformed Snowflake by 1.5X in ELT workloads completed all queries in 2.94 hours vs 4.70 hours on Snowflake. AutoSQL outperformed Snowflake in all aggregate queries, all lookup queries by 2x or more faster, 18 out of 23 mixed queries, and all temp table queries again by over 2x. IBM AutoSQL was only 14% of the operating cost to load and run all workloads.

AutoSQL AI Cache improved the workload derived from TPC-DS by up to 8X. IBM benchmarks connected to 5 data sources across multiple clusters. 103 SQL Queries on 10 GB, 10TB, and up to 500TB scale factors. Zero SQL tuning with table-based, query-based, and AI-recommended caching.

The IBM benchmark derived from TPC-DS is not comparable to officially published TPC-DS results, as the IBM results do not comply with the TPC-DS benchmark standard.