Load CSV Data From Cloud Storage to BigQuery Via BQ CLI:
In this section, we will load CSV dataset from cloud storage to BigQuery console via bq CLI
- Solution Diagram:
The solution diagram above shows us how to load the CSV data obtained from github repository to BigQuery from the google cloud storage, via bq CLI tool.
Following are the steps required for this lab:
- Create a table inside dataset:
For this lab, we first need to create a table inside our dataset where we will load the data through bq command line utility from local machine.
- Go to BigQuery console from either the search bar or Navigation menu.
- We can see our dataset inside the project. Click on three vertical ellipses next to our dataset and click on Create Table
- Our source will be Empty table. We need to specify the name of the table, we will write covid19_vaccination_bq. Click on Create Table
- Now we can see the table inside dataset on the Big Query console
- Creating Google Cloud Storage Bucket and Uploading data
For this lab, we first need to create a google cloud storage bucket on GCP for this task:
- Go to Cloud Storage from search bar or Navigation menu.Click on Buckets from the left navigation menu.
- Click on +Create to create the bucket.
- This will open a window for creating a bucket. In the Name Your Bucket section, give a unique name to your bucket, I’ll choose demo-bucket-321.
- Next, we have the Choose where to store your data section. Here, we need to specify the Location Type and Location. There are three location types: Multi-region, Dual-region and Region. I’ll choose Region for this demo and the region will be us-east1.
- After that, we have the option to choose a storage class for our data. We can choose from Standard, Nearline, Coldline and Archive. We will go with Standard.
- Next, we need to choose the Access control, which can either be Uniform or Fine Grained. We are selecting Uniform. We also have the checkbox for Enforce public access prevention on this bucket. We will keep it checked. Leave the data protection options as-is, and click on Create.
- After creating of bucket, we will be redirected inside it.
- For this lab, we need to upload our Covid19 vaccination CSV dataset inside our storage bucket. For that, we will click on Upload Files inside our bucket, and choose the CSV file from local machine. After uploading, we can see our file.
2. Command to load the csv data from cloud storage:
Now we will write a bq command on CLI to load csv file from cloud storage to BigQuery. Here, we are first specifying the location. Our schema is set to auto-detect and our source format is csv. Then we are providing the name of dataset and table as well as the path to CSV file on google cloud storage bucket.