Setting up BigQuery Environment
- Read
- Discuss
In this project, we will look at how to load CSV data to bigquery using API.
Prerequisites
This project has the following prerequisites:
First we will set up our environment, which has following steps:
- Create a service account:
Before we start working with bigQuery, we need to grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this service. For performing any task in bigquery, it is advisable to get the IAM permissions of bigquery admin. For that, we will create a service account:
- Login to your google cloud account, and go to console
- From the search bar, search for IAM and Admin
- From the left navigation menu, select Service Accounts. This will take you to the service accounts page. Click on create service account
- Give the account details, including name and description and click on Create and Continue.
- Now we need to add a role to the service account. First, we will give it the Owner role
- Now we would give it the BigQuery Admin role:
- Now clicking on continue would create our service account.
- Download credentials
- To download the credentials for accessing the bigquery API from our local machine, we will first create the key. For that, click on the three ellipses next to the service account and click on Manage keys
- Now in the Keys tab, click on Add Key and then Create New Key
- This will give us the option of either downloading the key in JSON or P12. We will click on JSON and then click on Create
- This will download our key to the local machine in JSON format.
- Download CSV dataset
Now we will download the dataset we are using in this project. We are taking this data in CSV format from Github. This is the COVID19 vaccination dataset by Our World in Data. In this particular dataset, most recent official numbers from governments and health ministries worldwide are used.
- We will navigate to the github repository https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations :
- From here we will go to the locations.csv dataset:
- We will download this data in csv format. We can see that this data contains location (country), iso_code (country code), vaccines (vaccine name), last_observation_data, source_name, source_website.
Create dataset on BigQuery Console:
The next step is to create dataset inside our project in BigQuery. We will do that using Bigquery console
- Navigate to BigQuery from the google cloud console:
- Here, you can see the project name. Click on the three vertical ellipses next to project name, then click on Create dataset:
- This will prompt you to provide details of dataset such as Dataset ID (it will be covid_dataset in our case), Location type (we will choose Region for this demonstration, which refers to a single region), then we will specify the Region name which will be US-east1 in our case. Now click on create dataset.
- Now we can see our dataset inside the project.
Load data to BigQuery using Python API:
Now we are ready to load our CSV data to BigQuery dataset via Python API. We can do this in two ways:
- Load CSV data from local machine to bigquery via API
- Load CSV data from cloud storage to bigquery via API
Leave a Reply
You must be logged in to post a comment.