Follow Us

Follow nosqldatabases on Twitter Follow nosqldatabases on Facebook Follow nosqldatabases on Google Buzz Follow nosqldatabases on LinkedIn Follow nosqldatabases on FeedBurner NoSQL presentations on slideshare


Become a sponsor of Contact us to find out how.

Featured Jobs


Follow On Facebook
Recent NoSQL News


Entries in Graph Data Store (14)


Graph Databases - What's So Different

Darren Wood, the Chief Architect at InfiniteGraph has a presentation that discusses what makes graph databases unique to their relational counterparts.

One very interesting concept discussed in this presentation is partitioning of graph data. Partitioning of graphs is a very complex and difficult problem namely because the graph is a living entity that is constantly changing i.e. new relationships between nodes.

Click to read more ...


Do Graph Databases Need a Standard API

In a recent post on ACM Michael Stonebraker made the argument that one of the reasons the enterprise is not interested in NoSQL is because it lacks standard API's that prevent vendor lock in. Now as an engineer I completely understand this argument and argue that vendor lock-in is not a good thing. However, what was missed in the post was that work is underway to solve that problem for a subset of NoSQL databases.

Click to read more ...


The Graph Database Landscape

Achim Friedland gave an excellent presentation in which he discussed the various graph databases that exist today. In addition, he has some excellent slides that show graph data model, languages and query methods.

Click to read more ...


Neo Technology announces Neo4j Spatial

NeoTechnologies, support organization and contributor to the Neo4j graph database have announced Neo4j Spatial

the first and only geospatial component for a graph database

Neo4j Spatial is a collection of utilities that "simplify and support advanced capabilities like"

  • Storage of geographic features like points, lines and polygons as graphs
  • Indexing and querying based on location with R-trees, Quad-trees and other structures
    Spatial operations for GIS and Location Based Services
  • Import/Export from existing industry standard formats like shapefiles
  • Exposure of any Neo4j traversal as dynamic layers with points, multilines or polygons
  • Construction of geographic operations from arbitrary combinations of traversals and relevant properties

Read more: Neo Technology announces Neo4j Spatial


Scaling the Social Graph in the Cloud with InfiniteGraph

In his presentation at Glue Conference 2010, Darren Wood lead architect of InfiniteGraph discusses "technical challenges to running queries traversing relationships to 4, 5 or more degrees of separation, across extremely large graph datasets in the cloud".

Click to read more ...


Gremlin: A Graph-Based Programming Language

In this presentation, Marko Rodriguez, discusses Gremlin, a graph programming language.

What is Gremlin?

Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation.

There are several interesting facts about Gremlin. First, Gremlin is JSR-223 compliant which allows the language itself to be embedded in Java applications, much like Jython, JRuby or Groovy code fragments can. In addition, Gremlin is considered a DSL (domain specific language) and NOT an API. Gremlin is designed to work with several different types of graph databases via Blueprints.

Two important things that Gremlin provides:

  1. Gremlin provides a means to elegantly map single-relational graph analysis algorithms over to the multi-relational graph domain
  2. Gremlin provides an elegant way to do automated reasoning in multi-relational graphs using path expressions

Gremlin works with a type of graph called a property graph. A property graph has the following properties:

  • Vertices and edges are labeled with unique identifiers
  • Edges are directed, labeled, and can form loops
  • Multiple edges of the same label can exist for the same vertex pair
  • Vertices and edges can have any number of key/value pair properties/attributes

So as an example of a typical Gremlin expression you have the following:

./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name

Here is how this expression would be evaluated:

  1. Get the current object(s)
  2. outE[@label=‘knows’]: Get the outgoing edges of the current object(s), where their labels equal ‘knows’
  3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incoming vertices of those ‘knows’ edges, where the names of those vertices are 5 characters long, start with ‘va’, and whose age is greater than 21
  4. @name: get the name of those particular incoming vertices

The language is simple, easy to understand and very concise. Marko dives into several other examples of expressions, as well as, language specifics like types, expressions and variables.

If you are planning on working with graph databases, take a look at Gremlin.


InfiniteGraph Meets StackOverflow

InfiniteGraph, which just recently release version 1.0, has a post on their blog in which they model the popular Q+A site StackOverflow in a graph database. How do you model Q+A in a graph database?

The vertices in the graph are represented as the Users, Questions and Answers above while the edges are represented as the interactions between them (i.e. a User “Posts” a Question,  an Answer is “For” a Question, a User “Comments On” a Question or Answer).  Simple enough, and like most other social graphs, users seem to be the focal points with the majority of connected edges.

While a deviation from the traditional graph database example of Facebook or Twitter graphs, it's a useful deviation that demonstrates the power and flexibility of the graph database.

Read more: InfiniteGraph Meets StackOverflow


The Graph Traversal Programming Pattern (Part 3) - Graph Traversal

This is the third and final post discussing Marko Rodriguez's presentation on the The Graph Traversal Programming Pattern.

Part 1: Graph Structures
Part 2: Graph Databases

In his presentation Marko sets up an experiment to test a graph database, in this case Neo4j, against a relation data store, MySQL. The purpose of the experiment is to traverse the graph five levels deep. The graph in the experiment contains 1 million vertices and 4 million edges.

  • For the run of the experiment a traverser is placed on a single vertex
    • For each step, the traverser moves to it's adjacent vertices
    • Repeat each step five times

The results? Neo4j completed the experiment in 14 minutes vs. MySQL not completing the job. The full results of the experiment can be found here.

So why use a graph database? Marko provides us with three potential reasons:

  1. If solution to your problem can be represented as a local process within a larger global structure
  2. If solution to your problem can be represented as being with respect to a set of root elements
  3. If solution to your problem does not require a global analysis of your data

So this is great and dandy from a theoretical perspective but what about real life use cases? Marko provides the following examples:

  • Local Searches - What is in the neighborhood around A?
  • Local Recommendations - Given A, what should A include in their neighborhood?
  • Local Ranks - Given A, how would you rank B relative to A?
  • Collaborative Filtering - Find all the items that the person A likes. Then find all the people that like those same items. Then find which items those people like that are not already the items that are liked by person A
  • Question expert identification - Find all the tags associated with question A. For all those tag, find all answers (for any questions) that are tagged by those tags. For those answers, find who created those answers

In conclusion, Marko offers these final points:

  • Graph databases are efficient with respects to local data analysis
  • Locality is defined by direct referent structures
  • Frame all solutions to problems as a traversal over local regions of the graph

This is the Graph Traversal Pattern.


The Graph Traversal Programming Pattern (Part 2) - Graph Databases

In last Friday's post we explored a presentation by Marko Rodriguez about the Graph Traversal Programming Pattern. Specifically we explored the various graph structures that exist. In this post we are going to explore graph databases.

Most databases can model a graph. But how do you define a graph database?

  • A graph database is any storage system that provides index-free adjacency
  • Every element (i.e. vertex or edge) has a direct pointer to its adjacent element
  • No O(log2(n)) index lookup required to determine which vertex is adjacent to which other vertex
  • If the graph is connected, the graph as a whole is treated as an atomic data structure

Marko proceeds to demonstrate the difference between a graph database and a non-graph database with respect to index adjacency.

Graph Database

  • Direct references to its adjacent vertices
  • Constant time cost to navigate between vertices

Non-Graph Database

  • Must look at an index to locate adjacent vertices
  • log2(n) time cost to move between vertices

More about index adjacency:

  • While any database can implicitly represent a graph, only a graph database makes the graph structure explicit
  • In a graph database, each vertex serves as a “mini index” of its adjacent elements
  • As the graph grows in size, the cost of a local step remains the same

The final point with regard to graph databases and indices Marko has the following points to make:

  • Graph databases allows you to explicitly model indices endogenous to your domain model. Your indices and domain model are one atomic entity—a graph
  • This has benefits in designing special-purpose index structures for your data.
    • Think about all the numerous types of indices in the geo-spatial community
    • Think about all the indices that you have yet to think about

In the final post of this series we will explore graph traversals with both artificial and real life examples.


The Graph Traversal Programming Pattern (Part 1) - Graph Structures

In his presentation at WindyCityDB, Marko Rodriguez, discusses graph traversal patterns. This is the first part of a multi-part series that will discuss this presentation. Specifically in this posting we are going to discuss the various types of graph structures. We will be discussing graph databases and graph traversals in a following posts.

So what types of graph structures are there? It's an interesting question and one I did not know the answer to until this presentation. Before we can begin discussing the various graph structures we need a small primer.

Graph Primer

  • Dots are vertices
  • Lines are edges
  • Dots and Lines make a Graph

Undirected Graph

  • Vertices
    • All denote the same type of object
  • Edges
    • All edges denote the same type of relationship
    • All edges denote a symmetric relationship
  • Examples
    • Collaborator graph
    • Road graph

Directed Graph

  • Vertices
    • All denote the same type of object
  • Edges
    • All edges denote the same type of relationship
    • All edges denote a asymmetric relationship
  • Primer
    • Directed edge is a line with an arrow
  • Examples
    • Twitter follow graph
    • Web href-citation graph

Single-Relational Graphs

Both undirected and directed graphs are considered single relational graphs.

  • All edges have the same meaning/type
  • Perhaps the most common type of graph type

Limitations of a single relational graph:

  • Only can express a single type of vertex
  • Only can express a single type of edge
  • In general are very limiting graph types

Multi-Relational Graphs

Obviously the opposite of a single-relational graph.

What are the gains with multi-relational graphs?

  • Allows for explicit typing of edges
  • Explicit typing allows for
    • edges to have different meanings
    • vertices to have different types

Property Graph

  • Specialized graph which extends a multi-relational graph by adding key/value map to both edges and vertices
  • Properties useful for expressing non-relational data
  • Allows further refinement of the meaning of an edge
  • Property graphs are the basis for other types of graphs

The entire presentation is shown below. We'll discuss graph databases on Monday.


Links of the Day - 2010/07/14

Links of the Day for July 07, 2010



Links of the Day - 2010/07/08

Links of the day for July 08, 2010


Links of the Day - 2010/07/06

Links of the day for July 07, 2010


Need a graph database? Look no farther than Neo4j 

Robert Scoble and Emil Eifrem discuss Neo4j, graph databases and BBQ restaurants.

Couple of notes taken from the interview:

  • There are a lot of interesting integrations of social and geographic graphs that have yet to be explored.
  • Neo4j more general purpose and supports infinite depth. This is contrast to a specialized solution like Twitter's FlockDB which is geared towards Twitter's unique needs.
  • Graph databases excel whenever the value is in the connections.
  • How scalable is Neo4j?
    • It's really challenging to provide horizontally when dealing with a graph database.
    • Neo4j scales to 12 billion nodes and relationships.
    • Currently relies on replication of the entire graph across machines.
  • In Neo4j 2.0 they are implementing transparent partitioning. Transparent partitioning essentially attempts to place isolated clusters of a graph, on their own machine. It's a difficult problem to solve simply because two isolated clusters can become connected with just one link.
  • Neo4j uses AGPL license. Essentially if your service is open source, Neo4j is open source. If your service is going to be closed source you need to purchase a commercial license.