Wednesday, October 26, 2011

Counting triangles - new performance study, Kognitio triumphant




Following the challenge thrown down by Vertica and then by an Oracle member of staff, Kognitio decided to rise to the occasion and prove how its WX2 in-memory analytic database would perform against a 86 million row data set. The data was provided by Vertica and is publicly available. The benchmark that was run using Vertica and Oracle also included other technologies such as Hadoop and PIG.

The tests themselves centered around counting triangles in an undirected graph with reciprocal edges, the idea being to queries to find out the quantity of vertexes and edges, which edges are joined and which vertexes join which edges, which is regarded as a computationally intensive exercise. Such a concept can be used to understand social networks and how people are connected and interact with others.

All the tests were run on the same hardware configuration, essentially a cluster of 4 nodes. The results were astounding and are as follows:

  • Kognitio WX2 - 11.48 seconds
  • Oracle, post a re-write of SQL - 14 seconds
  • Oracle, pre the re-write - 90 seconds
  • Vertica - 97 seconds
  • PIG - 2,151 seconds
  • Hadoop - 3,900 seconds
We're extremely excited that Kognitio WX2 won out and would also like to extend our thanks to Vertica for setting the data warehousing community abuzz with this challenge. Similary, for providing the data set as well as the SQL language statements. And for the sake of transparency, here are the links to the research as well as more details on what hardware and software was tested on the 86 million row dataset.

Kognitio ran the tests using WX2 on 4 HP servers, each with 128GB of RAM and 24 cores per server.

Vertica ran the tests using Vertica, PIG and Hadoop on 4 HP servers, each with 96GB of RAM and 12 cores per server. Results can be found here: http://www.vertica.com/2011/09/21/counting-triangles/

Tests were run using Oracle Exadata 2-2 hardware with 2 socket, 12 core Westmere-EP nodes and Oracle Database 11.2.0.2. Results can be found here: http://structureddata.org/2011/10/17/counting-triangles-faster/?utm_source=rss&utm_medium=rss&utm_campaign=counting-triangles-faster

To download the data set used in all tests, go to:
http://www.vertica.com/benchmark/TriangleCounting/edges.txt.gz

To download the query statements, go to:
https://github.com/vertica/Graph-Analytics----Triangle-Counting

2 comments:

Susanne said...

This is pretty misleading! "All the tests were run on the same hardware configuration, essentially a cluster of 4 nodes". Small detail: Everybody else's nodes had 12 cores apiece, Kognitio's had 24 cores. Whoops!

Anonymous said...

The test environment was replicated based on what was available and were constrained in that regard. Bottom line- WX2's speed comes from its ability to use ALL of the available cores and use them very efficiently. Most technologies can only utilize a relatively small number of cores and certainly don't scale their core use linearly. If we wanted, we could double, triple X10 or X100 the number of cores and continue to show a performance improvement. What applications exist that easily drive 24 cores in a single server with maximum efficiency?