Running Python in Databricks

  • Read
  • Discuss

Once the data is successfully loaded into your table, the next step is to run Python queries on it using a notebook.

Create Notebook

Click on create from the navigation menu and then click notebook

Give a suitable name to your notebook and in the default language select Python

Run Queries

You can run SQL queries inside python notebook by using %sql like so:

%sql
Select * From PetDB.Pets;

In order to run Python queries on table data, load the data into a dataframe object

df=spark.sql("Select * from PetDB.Pets")

In order to display the data inside the dataframe use the following query:

display(df)

Output:

In order to display the number of rows in the table we can write following query:

df.count()

Output:

In order to display additional information about data like, min, max, mean ,std dev you can use describe() function like so:

display(df.describe())

Output:

In order to display any particular column, you can use following query:

display(df.select("Name"))

Output:

In order to display the number of Pets in each age group we can write following query:

display(df.groupBy("Age").count().orderBy("Age"))

Output:

Visualize the Data

Click on visualization tab in the output and select bar graph to view the result in graph form

Leave a Reply

Leave a Reply

Scroll to Top