• Read
  • Discuss

Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. 

Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the platform. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.

Its unique design and architecture enables businesses to take advantage of the following:

  • Performance: The high volume of data can be loaded and retrieved faster with vertical and horizontal scaling of virtual warehouses.
  • Data Sharing: It enables organizations to share the data with data consumers through reader accounts that can be created directly from the user interface.
  • Support for Structured and Semi-structured data: Yet another huge benefit is that Snowflake supports both structured and semi-structured data. Data can be loaded directly in its raw format without going to the ETL or ELT process.
  • Concurrency: Often organizations run into concurrency issues when many users try to run the queries simultaneously in traditional data warehouse platforms. However, under Snowflake’s unique multicluster architecture approach, queries running from one virtual warehouse can never affect other virtual warehouses.
  • Billing: Snowflake’s multi-cluster shared data architecture separates the storage resources from compute resources, allowing organizations to pay for the compute resource per second. In contrast, storage is billed by terabyte per month. Because of Snowflake’s unique architecture approach, workloads run in parallel without any contention.

Architecture Overview

Snowflake’s unique architecture consists of three key layers:

  1. Database Storage
  2. Query Processing
  3. Cloud Services

Database Storage

Snowflake uses highly scalable and secure cloud storage to store structured and semi-structured data like JSON, AVRO, and Parquet. The storage layer consists of tables, schemas, and databases.

When data is loaded into Snowflake, Snowflake reorganizes that data into its internal optimized, compressed, columnar format. Snowflake stores this optimized data in cloud storage.

Snowflake manages all aspects of how this data is stored — the organization, file size, structure, compression, metadata, statistics, and other aspects of data storage are handled by Snowflake. The data objects stored by Snowflake are not directly visible nor accessible by customers; they are only accessible through SQL query operations run using Snowflake.

Query Processing

Query execution is performed in the processing layer. This layer handles query execution using resources provisioned from a cloud provider. Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.

Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. As a result, each virtual warehouse has no impact on the performance of other virtual warehouses.

The following table describes some of the key advantages of Virtual Warehouses:

Scalability Scale-up and down without any downtime and disruption
Auto-suspendAutomatically suspend a Virtual Warehouse when queries aren’t running on it 
Auto-resumeResume within milliseconds if a new SQL query needs to be executed
Pay as you go Since storage and compute are decoupled in Snowflake, this means that you only pay for the compute resources you use
Zero Contention Each virtual warehouse has dedicated hardware which there’s no dependency on another virtual warehouse 
Data ChangesBecause of shared data storage, any data changes immediately available to all virtual warehouses

Cloud Services

The cloud services layer is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.

Services managed in this layer include:

  1. Authentication
  2. Infrastructure management
  3. Metadata management
  4. Query parsing and optimization
  5. Access control
  6. A vital component of the services layer is the Metadata Store of tables and micro partitions, which powers several unique snowflake features, including zero-copy cloning, time travel, and data sharing

Leave a Reply

Leave a Reply

Scroll to Top