Abstract
As data becomes more and more important, so does the technology that holds it. With a change as large scale as the transition from Oracle to Snowflake, companies need to be informed just how much database technology has improved in order to determine if a shift is worth it. To thoroughly analyse the differences between Oracle and Snowflake, this project investigates one primary aspect of the databases: performance. The speeds of the databases were measured through insert, update, and query operations. After comparing Oracle and Snowflake performance, it was determined that Snowflake drastically increased speeds in all aspects (insertion, updation, and querying).
Background
Since the early adoption of PCs to the current enormous industry of automation and electronics, the use of technology has grown exponentially. Its growing usage in various fields ranging from security, such as securing your home digitally, to entertainment, for example, Netflix, show the importance technology has in modern life.[1] The increasing popularity of digitalisation has made the backbone of technology, data, more important to understand.[2] In the past, traditional data warehouses were sufficient for simpler data storage with less demand. However, due to the increase in volume of data and data becoming more varied in nature, the use of traditional data warehouses has grown limited and companies have transitioned to the use of the Cloud, an elastic system that holds near-infinite resources, has high availability, and is cost efficient (where one only pays for that which they use).[3]
Oracle and Snowflake Architecture
Oracle has been a leading database for running online transaction processing and analytical database workloads in the industry for over forty years. The three primary structures of an Oracle data server are memory structures, process structures, and storage structures. The purpose of the storage structure is simply to hold the data in databases. The storage architecture is more complex and will be discussed later in this report. The process is broken down into client processes (user execution), Oracle database processes (main compute), and Oracle daemons and application processes (background). Memory is used for storing shared data and program code and within the backend engine exist the Program Global Areas (PGA) and the System Global Areas (SGA). In simple terms, the PGA is a client specific memory allocation (dedicated space for the user) and SGA is a shared memory pool (connected space shared by all users).[4]
Snowflake is a newly introduced database technology in the Cloud. Snowflake’s basic design involves three primary components: storage, compute and services. Snowflake storage is similar to Oracle storage in that it stores data, including both relational data and semi-structured data, within databases. Since all the data is stored in one layer, updates are only performed once. Compute is the design component that manages query computation. Snowflake’s dynamic compute infrastructure allows users to submit large numbers of queries which Snowflake automatically assigns into virtual warehouses that are able to access the storage without contention. Services coordinates the execution of all the processes within Snowflake which include verifying and securing users, handling metadata, and maintaining optimised performance. Snowflake’s ability to handle a vast number of usage scenarios with high performance provides the availability and flexibility needed for the growing variety of data.[3]
Method
These following tests will measure the fundamental operations of database performance: insertion, updation, and querying. The insertion test calculates the speed with which the data warehouse ingests data into the databases. The test for updation measures how quickly the data can be modified. The query test evaluates the performance of a database in retrieving and answering a query with the desired information. To proceed with the testing, query code (included in the ‘Algorithms’ section) ran on both data warehouses. The operation time and speed comparison are listed in the table below. Server size is comparable in memory and disk size for both Oracle and Snowflake.
Algorithms
# of Rows Calculation (selects data sets that will be used for testing)
— 22,822,307 rows are selected
select count(*) from D_XXXXXX;
— 3,125,501 rows are selected
select count(*) from STG_XXXXXXX;
–720,122,654 rows are selected
select count(*) from PERF_TEST.FACT_SALES;
Query Performance Test Case #1 (calculations are run against ten years of sales data)
Execution Logic Flow
- select sales data
- filter sales data by only taking past ten years
- filter sales data by product class and territory
- finding summation, average, and standard deviation of the selected sales data
SET TIMING ON;
SELECT
ACCT_NO
,ACCT_NAME
,SUM(SALES)
,AVG(SALES)
,STDDEV(SALES)
FROM
PERF_TEST.FACT_SALES
WHERE
SALES_DATE BETWEEN TO_DATE(\’YYYYMMDD\’, 20100101) AND TO_DATE(\’YYYYMMDD\’, 20191231)
AND TRADE_CLASS_CODE = \’S\’
AND TERRITORY_CODE = \’NY04\’
AND SOURCE_CODE = \’WEB\’
GROUP BY
ACCT_NO
,ACCT_NAME
SET TIMING OFF;
Test Result and Discussion
Category | Test | Test Size | Oracle Execution Time (minutes:seconds) | Snowflake Execution Time (minutes:seconds) |
Performance comparison |
Insertion | Narrow table insertion (10 columns) |
22.8 million rows | 264:00 | 8:07 | 33 times faster |
Insertion | Wide table insertion (500 columns) |
3.1 million rows | 59:53 | 22:56 | 2.6 times faster |
Updation | Blind update (500 columns) |
22.8 million rows | 130:01 | 6:17 | 21 times faster |
Query | Aggregation | 720 million rows | 0:31 | 0:11 | 2.8 times faster |
Query | Filtering | 720 million rows | 0:45 | 0:10 | 4.9 times faster |
Discussion and Analysis
In the cases listed above, Snowflake completed the tasks at a much faster rate than Oracle. When given large data sets comparable to the modern processes required in large companies, Snowflake was able to outperform Oracle in regards to time taken. i.e, the speed of the operation. These results are reasonable when the architectures of both database technologies are compared (see the section “Why Snowflake is Faster: An Architectural Standpoint). Considering the fact that the tests were run on comparable memory and disk sizes and ran the same code, the research was controlled and the results could not have been caused by any confounding variables. These speed measurements provide companies with a more accurate and specific measure of how Oracle and Snowflake compare in terms of performance, allowing companies to determine if a switch is worth it.
Why Snowflake is Faster: An Architectural Standpoint
Oracle’s basic storage units are called data blocks that store chunks of persistent data that are connected to each other. On the other hand, Snowflake’s storage is organized into micro partitions rooted in a columnar format.[5] Utilising this structure, systems running Snowflake do not need to read unnecessary data but can pinpoint the columns or rows that are needed through micro partitioning (dividing data into smaller sections for optimised query processing).[3] By default, Oracle requires the processing of exponential amounts of undesired data as it has to read through full rows. In many cases, users only need to access information in a certain column which Snowflake can easily extract due to its columnar format, while Oracle fails to do this efficiently as it must read an entire row before finding that one column.[2]
Conclusion: Impact of the Cloud and Snowflake
In investigating the operations of insertion, updation, and querying on Oracle and Snowflake, it was found that Snowflake was indeed faster in all aspects. In a world centred on technology, these findings are crucial as even a slight improvement in performance could save large amounts of time and money once scaled. As a quantifiable method to compare performance in databases, this research will lead the way in exactly calculating the benefits (cost, time) in switching to the Cloud and will help companies make their decision. For over 40 years, traditional databases like Oracle have monitored and managed most of the data used in the industry. However, it has been estimated that by 2025, almost fifty percent of all data will be transferred over to the Cloud.[6]. Especially with the diverse loads of data, performance is key for maintaining the efficient functions of common technologies, like phones. In addition to managing the ever-increasing number of data applications in the current generation, cloud computing fuels the development of newer technologies. Even as a fairly new field, artificial intelligence has already required large amounts of compute power and data storage which, as emphasised by the findings, is more available in the Cloud. As such, the use of the Snowflake in the Cloud has provided the necessary tools to fuel a tech-heavy society and has much more to offer in the future.
Acknowledgements
Currently, at California State University Fullerton, Professor Kyoung Seop Shin teaches two database classes: CPSC 531:Advanced Database and CPSC 332: File Structures and Database. In these classes, Professor Kyoung teaches the ins and outs of traditional data architecture (ex. Oracle, SQL Server, MySQL) and techniques for query analysis and data allocation. He hopes to prepare his students for the next generation of technology advancements by including the Cloud as part of his school curriculum. To include more in-depth information in his curriculum, Professor Kyoung has begun to investigate the differences between a primary database platform that exists within the Cloud (Snowflake) and a traditional database (Oracle). As a part of that project, this article focuses on the contrasting performance levels between Snowflake and Oracle.
References
- Roser, Max, and Hannah Ritchie. “Technological Progress.” Our World in Data, May 11, 2013. https://ourworldindata.org/technological-progress.
- Hwang, Yitaek, Yitaek Hwanghttps, and Yitaek Hwang. “Data – The Modern World\’s Most Precious Resource.” IoT For All, July 26, 2018. https://www.iotforall.com/data-worlds-most-precious-resource/.
- “Key Concepts & Architecture.” Key Concepts & Architecture – Snowflake Documentation. Accessed March 15, 2020. https://docs.snowflake.net/manuals/user-guide/intro-key-concepts.html.
- “An Overview of Oracle Database Architecture.” Oracle Tutorial. Accessed March 15, 2020. https://www.oracletutorial.com/oracle-administration/oracle-database-architecture/.
- Ryan, John, and John Ryan. “Oracle vs. Snowflake – DZone Database.” dzone.com, June 25, 2019. https://dzone.com/articles/oracle-vs-snowflake.
- “DataAge 2025 – The Digitization of the World: Seagate US.” Seagate.com. Accessed March 15, 2020. https://www.seagate.com/our-story/data-age-2025/.
About the Author
Andrew Chung is a 15 year old student at Northwood High School in Irvine, California. He has a variety of interests such as computer science, mathematics and clarinet. Recently, Andrew has pursued his latest project in data science with Professor Kyoung Seop Shin at the California State University Fullerton, delving into the differences between traditional and modern databases.
I really liked this article and would like to work with you on something related to this. If you’re interested do email me at : [email protected]