How to Drop Null Values in Databricks?

  • Read
  • Discuss

For Python:

Spark provides drop() function in DataFrameNaFunctions class that is used to drop rows with null values in one or multiple(any/all) columns in DataFrame/Dataset. While reading data from files, Spark API’s like DataFrame and Dataset assigns NULL values for empty value on columns.

Something based on a need you many needs to remove these rows that have null values as part of data cleansing. Usually, in traditional SQL, you need to check on every column if the value is null in order to drop however, the Spark drop() function in DataFrameNaFunctions class drop rows that have null values in any columns.

For SQL:

Comparison operators:
Databricks supports the standard comparison operators such as >, >=, =, < and <=. The result of these operators is unknown or NULL when one of the operands or both the operands are unknown or NULL. In order to compare the NULL values for equality, Databricks provides a null-safe equal operator (<=>), which returns False when one of the operand is NULL and returns True when both the operands are NULL.

Expressions
The comparison operators and logical operators are treated as expressions in Databricks. Databricks also supports other forms of expressions, which can be broadly classified as:

  1. Null intolerant expressions
  2. Expressions that can process NULL value operands. The result of these expressions depends on the expression itself.

Null intolerant expressions
Null intolerant expressions return NULL when one or more arguments of expression are NULL and most of the expressions fall in this category.

Expressions that can process null value operands
This class of expressions are designed to handle NULL values. The result of the expressions depends on the expression itself. As an example, function expression isnull returns a true on null input and false on non null input where as function coalesce returns the first non NULL value in its list of operands. However, coalesce returns NULL when all its operands are NULL.

Leave a Reply

Leave a Reply

Scroll to Top