1

As the time changes, the needs and demands of the industry grow. Databases play a vital role in any software development and in the past two decades, handling of data has become a big challenge because of its drastic rise in size. For example, every day the amount of data handled in Google is equivalent to the sum of entire data that was gathered till 2003. And in any database, the performance of search is as important as storing them. Conventional RDBMS are coming of age now and we need to look for a solid alternative that helps us in retrieving data in a more efficient manner. In addition, data are more inter-connected nowadays than it was 2 decades back. Graph databases provide us answers for all these complex requirements and are of great help in traversing millions of records and relations in milliseconds. (Know how to Enhance your customer journey using big data).

 

How a graph search differs from normal RDBMS search?

 Graph search uses traversal technique unlike Index technique followed by RDBMS. So basically, whenever the data has added this technique build all possible paths in that graph network. And while we perform some search using CQL, possible paths that meet this search criterion are returned. So basically graph search allows you to store information as a series of paths and that is really valuable for giving a user options when it comes to search. A search cache is built which acts as a repository of all possible paths and stored into a hierarchical data store.

And when we search, search happens only on this path address and not on the actual path. And whichever address satisfies your search query that data returned. Isn’t Simple? So less search overload, high performance guaranteed. See my diagram below.

 2

 

Scenarios where graph databases should be adopted and avoided:

 Use it – when we have a large amount of interconnected data, where the depth of relationships goes deep more than one level then graph database is more relevant and suitable to use. Graph databases have a power of traversing deep in the tree with its unique addresses same like a binary search algorithm performance won’t get affected when the amount of data keeps increasing. So basically, the time taken to search a node or a relation among 10 records and among 1 million records will remain same.

 

Avoid it – when in contrast if the business has a need to store plain data without any relation or interconnection then RDBMs should be preferable when compared to graph search as its search time is a lot quicker than Graph DB search.

 Let’s see with a practical sample:-

In many E-Commerce websites, one of the most important functionality is recommendation engine. For example, when a customer searches for a product this website tends to show some products which are quite relative to the product that user searched, or some products which are searched by some other users who also searched the same product which current user has searched.

If we need to implement this functionality using an RDBMS then we will end up writing some really complex queries which will be so complex in nature also takes a lot of time due to multiple joins and traversals. On the other hand, CQL (Cypher query language) in graph databases achieves this same requirement very easily. You can see the difference in the below queries.

SQL Query (RDBMS)

Select product.product_name as Recommendation, count(1) as Frequency
From product, customer_product_mapping, (SELECT cpm3.product_id, cpm3.customer_id
From Customer_product_mapping cpm, Customer_product_mapping cpm2, Customer_product_mapping cpm3
WHERE cpm.customer_id = ‘customer-one’
AND cpm.product_id = cpm2.product_id
AND cpm2.customer_id != ‘customer-one’
AND cpm3.customer_id = cpm2.customer_id
AND cpm3.product_id not in

(select distinct product_id FROM Customer_product_mapping cpm
WHERE cpm.customer_id = ‘customer-one’)
) recommended_products

WHERE customer_product_mapping.product_id = product.product_id
AND customer_product_mapping.product_id in recommended_products.product_id
AND customer_product_mapping.customer_id = recommended_products.customer_id
GROUP BY product.product_name
ORDER BY Frequency desc

CQL Query (Graph Database)

MATCH (u:Customer {customer_id:’customer-one’})-[BOUGHT]->(p:Product)<-[:BOUGHT}-(peer:Customer)-[:BOUGHT]->(reco:Product)

WHERE not (u)-[:BOUGHT]->(reco)

RETURN reco as Recommendation, count(*) as Frequency

ORDER BY Frequency DESC LIMIT 5;

Reasons why Graph database stands tall …

 

1)    It is a whiteboard friendly (i.e.) any laymen can understand its DB structure, unlike an RDBMS data model. See below,

 5

2)    Highly scalable – Supports up to 32 billion nodes & 64 billion relationships.

 

3)    Supports ACID transactions with rollback support.

 

4)    Fast deep traversal instead of slow SQL queries that span many tables joins.

 

5)    Ability to store properties in both node and relationship.

6

6)    Accessible from most of the programming languages like C#, Java, Python, Pearl, Scala etc.

 

7)    Provides human-friendly Query Language called Cypher Query Language (CQL) same like SQL.

 

8)     Open source & wide industry support. Many big vendors like Neo4j, Allegro Graph, and Hypergraph DB provide it free of cost.

 

9)    Many leading Research organizations like Forrester and Gartner’s Hype cycles have predicted that by 2017 more than 20% of enterprises will be using graph.

 

10)    And many more ……

 

Reality is a graph, let’s embrace it ….

The relational data model is now more than 30 years old. It’s good for a number of scenarios and can handle certain types of data very well. But it isn’t perfect. For data that is semi-structured and/or network oriented, the relational database offers poor run time characteristics.

But in contrast, Graph database independent of the total size of any dataset, excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting points. Graph database collects and aggregate information from millions of nodes and relationships leaving the billions outside the search perimeter untouched.

In reality, a graph database is designed and built from scratch for any organisation which has data that is naturally ordered in a network or data that is semi-structured, and it offers an elegant and flexible alternative that is robust, fast and scalable.

 

 

image

Casestudy On Scalable Mongodb Solution

Our solution that reduced Query Latency for leading E-learning firm