- Read
- Discuss
The following are the primary data loading options offered by Snowflake.
- Bulk Loading
- Continuous/Incremental Loading
The volume of data to load and the frequency of loading may affect the selection of ideal type.
Bulk loading
Bulk loading in Snowflake is the fastest way to insert large numbers of rows into a Snowflake table.The Bulk load loads vast amounts of data into a Snowflake virtual warehouse in a single session.
Phases of Bulk loading:
Two phases are involved in bulk loading. First, data files must be uploaded to a location where Snowflake can access them. This is referred to as Staging. Next, upload the data from the staged files into the appropriate table.
Staging
Snowflake lets you stage files on internal locations called stages. Each table and user has a stage. Moreover, Snowflake allows for the creation of named stages,e.g. demo stage. Staging can be done as follows:
- First, you upload your data files to a location where Snowflake can access your files. This is referred to as staging your files.
- Then you load your data from these staged files into your tables.
Loading the Data
Loading data to Snowflake requires a running virtual Data Warehouse. The warehouse extracts the data from each file and inserts it as rows in the table. Data Warehouse size can impact loading performance. When loading large numbers of files or large files, you may want to choose a larger Data Warehouse.
Bulk Loading Using the COPY Command
This option enables loading batches of data from files already available in cloud storage, or copying (i.e. staging) data files from a local machine to an internal (i.e. Snowflake) cloud storage location before loading the data into tables using the COPY command.
Compute Resources
Bulk loading relies on user-provided virtual warehouses, which are specified in the COPY statement. Users are required to size the warehouse appropriately to accommodate expected loads.
Simple Transformations During a Load
Snowflake supports transforming data while loading it into a table using the COPY command. Options include:
- Column reordering
- Column omission
- Casts
- Truncating text strings that exceed the target column length
There is no requirement for your data files to have the same number and ordering of columns as your target table.
Continuous/Incremental Loading
Another option for loading data into Snowflake is continuous loading in which data is incrementally loaded into snowflake in small batches in multiple sessions. This is called staging. After staging data is put into analysis to obtain results
Continuous Loading Using Snowpipe
This option is intended to incrementally load small amounts of data (i.e., micro-batches) and make them accessible for analysis. After files are put to a stage and submitted for ingestion, Snowpipe loads data in a matter of minutes. As soon as the raw data is accessible, this guarantees users obtain the most recent results.
Compute Resources
Snowpipe uses compute resources provided by Snowflake (i.e. a serverless compute model). These Snowflake-provided resources are scaled up or down automatically based on demand, and they are billed and itemised using per-second pricing. The cost of data ingestion is determined by the actual workloads.
Simple Transformations During a Load
The COPY statement in a pipe definition supports the same COPY transformation options as when bulk loading data.
In addition, data pipelines can leverage Snowpipe to continuously load micro-batches of data into staging tables for transformation and optimization using automated tasks and the change data capture (CDC) information in streams.
Data Pipelines for Complex Transformations
A data pipeline enables applying complex transformations to loaded data. This workflow generally leverages Snowpipe to load “raw” data into a staging table and then uses a series of table streams and tasks to transform and optimize the new data for analysis.
Leave a Reply
You must be logged in to post a comment.