Running Python in Databricks
- Read
- Discuss
Once the data is successfully loaded into your table, the next step is to run Python queries on it using a notebook.
Create Notebook
Click on create from the navigation menu and then click notebook
Give a suitable name to your notebook and in the default language select Python
Run Queries
You can run SQL queries inside python notebook by using %sql like so:
%sql
Select * From PetDB.Pets;
In order to run Python queries on table data, load the data into a dataframe object
df=spark.sql("Select * from PetDB.Pets")
In order to display the data inside the dataframe use the following query:
display(df)
Output:
In order to display the number of rows in the table we can write following query:
df.count()
Output:
In order to display additional information about data like, min, max, mean ,std dev you can use describe() function like so:
display(df.describe())
Output:
In order to display any particular column, you can use following query:
display(df.select("Name"))
Output:
In order to display the number of Pets in each age group we can write following query:
display(df.groupBy("Age").count().orderBy("Age"))
Output:
Visualize the Data
Click on visualization tab in the output and select bar graph to view the result in graph form
Leave a Reply
You must be logged in to post a comment.