Thursday, July 17, 2008

Neo4j

I attended an interesting technical talk with the Proto Team yesterday down in Santa Fe Complex. Emil Eifrem of Neo Technologies shared with us their open source Neo4j project, a high-performance graph database that is implemented in Java.

A graph database is very different from a relational database; rather than storing data in tables of rows and columns, data is stored in a graph data structure (nodes, relationships, and properties) which is obviously a more intuitive model for networks. Such a database is ideal for storing RDF, social networks, co-authorship networks, etc. Although relational databases can be used to represent graphs, answering queries like "Who are all the friends of everyone who likes ice cream" requires many joins to be performed which takes a lot of processing time.

Emil noted that although everyone he talks to says they know what a graph database is, they practically don't exist. Wikipedia doesn't even have an entry entitled graph database, and the database article doesn't mention them at all (graph databases are distinct from the network model). Here's some slides that give a good overview of graph databases and a survey paper by the same authors.