Creating Nodes from Loaded Data in Neo4j

  • Read
  • Discuss

Before you start writing the Cypher statements, an important step is to determine what the graph structure should look like once you import your file data. After all, importing the data from the existing table and columns won’t give you the benefits of a graph that you’re looking for. You require a graph data model in order to completely utilize the graph database.

We will use the below version of the model for this article, though there are many ways to organize products and orders.

We have two nodes – one for a product and one for an order. Each of those nodes have properties from the CSV files. 

For the Product, we have ID, name, and unit cost. 

For the Order, we have ID, date/time, and country where it is going.

The order-details.csv file defines the relationship between those two nodes. That file has the product ID, the order ID it belongs to, and the quantity of the product on the order. So, in the data model, this becomes the CONTAINS relationship between Product and Order nodes. We also include a property of quantityOrdered on the relationship because the product quantity value only exists when a product is related to an order.

Now that you know the types of nodes and relationships you will have and the properties involved, you can construct the Cypher statements to create the data for this model.


In order to create nodes from the loaded data you have two options. You can either use CREATE or MERGE. 

CREATE statement is used when you are sure that you will not have duplicate rows in your CSV file and use MATCH with it to find existing data for updates. 

MERGE statement is used to determine whether the data already exists before adding, though, as it is challenging to thoroughly clean all data and import flawlessly clean data from any source. If the node or connection already exists, Cypher will match it and return (without doing any writes); otherwise, Cypher will create it from scratch. Although there may be some performance overhead when using MERGE, it is frequently the preferred method for preserving high data integrity.

Creating Nodes for Product

We will first load the products into the graph. Start with the LOAD CSV statement to load the csv file, then the Cypher is added to create the data from the CSV into your graph model.

Then the MERGE statement will be used to check if the Product already exists before you create it. The properties will be set to the converted values.

LOAD CSV FROM 'file:///products.csv' AS row
WITH toInteger(row[0]) AS productId, row[1] AS productName, toFloat(row[2]) AS unitCost
MERGE (p:Product {productId: productId})
  SET p.productName = productName, p.unitCost = unitCost
RETURN count(p);

When you run these statements, it will return the number of product nodes that were created in the database. You can cross-check that number with the number of rows in the CSV file from earlier. 

You can also run a validation query to return a sample of nodes and review that the properties look accurate.

The query will be:

MATCH (p:Product)

Once you execute this query, the following output will be generated in the Neo4j browser:

Creating Nodes for Orders

Next, we will create the nodes for orders. Again, since you want to verify you do not create duplicate Order nodes, you can use the MERGE statement. Just like products, we will start with the LOAD CSV command, then add Cypher statements and include your data conversions.

LOAD CSV WITH HEADERS FROM 'file:///orders.csv' AS row
WITH toInteger(row.orderID) AS orderId, datetime(replace(row.orderDate,' ','T')) AS orderDate, row.shipCountry AS country
MERGE (o:Order {orderId: orderId})
  SET o.orderDateTime = orderDate, o.shipCountry = country
RETURN count(o);

You can also run a validation query, as before, to verify the graph data looks correct.

//validate orders loaded correctly
MATCH (o:Order)

Here are the results in Neo4j Browser:

Creating Nodes for Order-details

Finally we will create the relationship between the products and the orders. Since the data have been loaded with the last two queries, then you start with MATCH to find the existing Product and Order nodes. Then, the MERGE statement will add the new relationship or match an existing one.

There are 2,155 rows in the CSV. Although this is not a huge number for file imports, still you will have Cypher to commit the data to the database to reduce the memory overhead of the transaction state. For this, you can add the :auto clause before the LOAD CSV command. 

LOAD CSV WITH HEADERS FROM 'file:///order-details.csv' AS row
WITH toInteger(row.productID) AS productId, toInteger(row.orderID) AS orderId, toInteger(row.quantity) AS quantityOrdered
MATCH (p:Product {productId: productId})
MATCH (o:Order {orderId: orderId})
MERGE (o)-[rel:CONTAINS {quantityOrdered: quantityOrdered}]->(p)
RETURN count(rel);

Next,  validate the data with the query below.

MATCH (o:Order)-[rel:CONTAINS]->(p:Product)
RETURN p, rel, o LIMIT 50;

Here are the results in Neo4j Browser:


You have successfully loaded 3 CSV files into a Neo4j graph database using Neo4j Desktop!

The LOAD CSV functionality, coupled with Cypher, is exceptionally useful for getting data from files into a graph structure. The best way to advance your skills in this area is to load a variety of files for various data sets and models. 

Leave a Reply

Leave a Reply

Scroll to Top