The Graph Traversal Programming Pattern (Part 3) - Graph Traversal
Wednesday, July 21, 2010 at 12:01AM |
Derek Stainer This is the third and final post discussing Marko Rodriguez's presentation on the The Graph Traversal Programming Pattern.
Part 1: Graph Structures
Part 2: Graph Databases
In his presentation Marko sets up an experiment to test a graph database, in this case Neo4j, against a relation data store, MySQL. The purpose of the experiment is to traverse the graph five levels deep. The graph in the experiment contains 1 million vertices and 4 million edges.
- For the run of the experiment a traverser is placed on a single vertex
- For each step, the traverser moves to it's adjacent vertices
- Repeat each step five times
The results? Neo4j completed the experiment in 14 minutes vs. MySQL not completing the job. The full results of the experiment can be found here.
So why use a graph database? Marko provides us with three potential reasons:
- If solution to your problem can be represented as a local process within a larger global structure
- If solution to your problem can be represented as being with respect to a set of root elements
- If solution to your problem does not require a global analysis of your data
So this is great and dandy from a theoretical perspective but what about real life use cases? Marko provides the following examples:
- Local Searches - What is in the neighborhood around A?
- Local Recommendations - Given A, what should A include in their neighborhood?
- Local Ranks - Given A, how would you rank B relative to A?
- Collaborative Filtering - Find all the items that the person A likes. Then find all the people that like those same items. Then find which items those people like that are not already the items that are liked by person A
- Question expert identification - Find all the tags associated with question A. For all those tag, find all answers (for any questions) that are tagged by those tags. For those answers, find who created those answers
In conclusion, Marko offers these final points:
- Graph databases are efficient with respects to local data analysis
- Locality is defined by direct referent structures
- Frame all solutions to problems as a traversal over local regions of the graph
This is the Graph Traversal Pattern.

