Warning: call_user_func_array() expects parameter 1 to be a valid callback, function 'fop_enqueue_conditional_scripts' not found or invalid function name in /home/skillcu1/datanerds.io/wp-includes/class-wp-hook.php on line 324
Load CSV Data From Local Machine To BigQuery Via API – Datanerds.io

Load CSV Data From Local Machine To BigQuery Via API

Loading CSV Data From Local Machine To BigQuery Via API

  • Read
  • Discuss

Here is the solution diagram for loading CSV data from local machine to bigquery:

Here, we are getting the Covid19 CSV dataset from github and downloading to local machine. The data is then loaded to the BigQuery dataset that we have already created, via the BigQuery python API

Now we will head to the Visual Studio Code and start writing our python code for loading CSV data from local machine to BigQuery via API. We will create a new python file class bqLoad.py. 

Code Steps:

Our code has the following steps:

  • Import google cloud libraries
  • Specify path to GCP credentials
  • Create bigquery client object
  • Specify bigQuery table ID
  • Give path to csv file on local machine
  • Specify load job configuration
  • Make API request

Code:

#import libraries 

from google.cloud import bigquery

#path to GCP credentials json

credentials = bigquery.Client.from_service_account_json(r'C:\Users\Sherry\Downloads\pelagic-earth-353304-5e9ab57c7481.json')

#function to load table

def load_table_file(file_path: str, table_id: str) -> "bigquery.Table":

    # Construct a BigQuery client object.

    client = bigquery.Client()

    # TODO(developer): Set table_id to the ID of the table to create.

    table_id = "pelagic-earth-353304-5e9ab57c7481.covid_dataset.covid19_vaccination"

#path to csv file

    file_path=r"C:\Users\Sherry\Downloads\locations.csv"

#load job configuration

    job_config = bigquery.LoadJobConfig(

        source_format=bigquery.SourceFormat.CSV,

        skip_leading_rows=1,

        autodetect=True,

    )

    with open(file_path, "rb") as source_file:

        job = client.load_table_from_file(source_file, table_id, job_config=job_config)

    job.result()  # Waits for the job to complete.

    table = client.get_table(table_id)  # Make an API request.

#print number of loaded rows and columns

    print(

        "Loaded {} rows and {} columns to {}".format(

            table.num_rows, len(table.schema), table_id

        )

    )

    # [END bigquery_load_from_file]

    return table

Load CSV data from Cloud Storage to BigQuery via API

Here is the solution diagram for loading CSV data from google cloud storage to bigquery:

Here, we are getting the Covid19 CSV dataset from github and uploading to to local google cloud storage. The data is then loaded to the BigQuery dataset that we have already created, via the BigQuery python API. Following is the step-by step implementation:

  • Creating Google Cloud Storage Bucket and Uploading data

Firstly, we need to create a google cloud storage bucket on GCP for this task:

  1. Go to Cloud Storage from search bar or Navigation menu.Click on Buckets from the left navigation menu.
  1. Click on +Create to create the bucket. 
  1. This will open a window for creating a bucket. In the Name Your Bucket section, give a unique name to your bucket, I’ll choose demo-bucket-321. 
  1. Next, we have the Choose where to store your data section. Here, we need to specify the Location Type and Location. There are three location types: Multi-region, Dual-region and Region. I’ll choose Region for this demo and the region will be us-east1. 
  1. After that, we have the option to choose a storage class for our data. We can choose from Standard, Nearline, Coldline and Archive. We will go with Standard. 
  1. Next, we need to choose the Access control, which can either be Uniform or Fine Grained.  We are selecting Uniform. We also have the checkbox for Enforce public access prevention on this bucket. We will keep it checked. Leave the data protection options as-is, and click on Create.
  1. After creating of bucket, we will be redirected inside it. 
  1. For this lab, we need to upload our Covid19 vaccination CSV dataset inside our storage bucket. For that, we will click on Upload Files inside our bucket, and choose the CSV file from local machine. After uploading, we can see our file.

Load Data from Cloud Storage to BigQuery via Python API

Now we will head to the Visual Studio Code and start writing our python code for loading CSV data from google cloud storage to BigQuery via API. We will create a new python file class bqStorageLoad.py. 

Code Steps:

Our code has the following steps:

  • Import google cloud libraries
  • Specify path to GCP credentials
  • Create bigquery client object
  • Specify bigQuery table ID
  • Specify Load Job Configuration
  • Provide storage bucket path
  • Make API request
#import libraries

from google.cloud import bigquery

#credentials

credentials = bigquery.Client.from_service_account_json(r'C:\Users\Sherry\Downloads\pelagic-earth-353304-5e9ab57c7481.json')

#bigquery client call

client = bigquery.Client()

# Set the dataset

dataset_id = 'covid_dataset'

#set table

table_id = 'covid19_vaccination'

#set table reference

table_ref = client.dataset(dataset_id).table(table_id)

table = client.get_table(table_ref)

# Configure the load job

job_config = bigquery.LoadJobConfig()

job_config.source_format = bigquery.SourceFormat.CSV  #specify source format as csv

job_config.skip_leading_rows = 1

job_config.autodetect = True  #auto detect schema

# Start the load job

uri = "gs://demo-bucket-3211/locations.csv"  #path to storage bucket

#load job

load_job = client.load_table_from_uri(

    uri, table_ref, job_config=job_config

# API request

load_job.result()  # Waits for table load to complete.

#print number of loaded rows

print("Loaded {} rows into {}:{}.".format(

    load_job.output_rows, dataset_id, table_id))

Leave a Reply


Posted

in

by

Tags:

Comments

Leave a Reply