The tests themselves centered around counting triangles in an undirected graph with reciprocal edges, the idea being to queries to find out the quantity of vertexes and edges, which edges are joined and which vertexes join which edges, which is regarded as a computationally intensive exercise. Such a concept can be used to understand social networks and how people are connected and interact with others.
All the tests were run on the same hardware configuration, essentially a cluster of 4 nodes. The results were astounding and are as follows:
- Kognitio WX2 - 11.48 seconds
- Oracle, post a re-write of SQL - 14 seconds
- Oracle, pre the re-write - 90 seconds
- Vertica - 97 seconds
- PIG - 2,151 seconds
- Hadoop - 3,900 seconds
Kognitio ran the tests using WX2 on 4 HP servers, each with 128GB of RAM and 24 cores per server.
Vertica ran the tests using Vertica, PIG and Hadoop on 4 HP servers, each with 96GB of RAM and 12 cores per server. Results can be found here: http://www.vertica.com/2011/09/21/counting-triangles/
Tests were run using Oracle Exadata 2-2 hardware with 2 socket, 12 core Westmere-EP nodes and Oracle Database 11.2.0.2. Results can be found here: http://structureddata.org/2011/10/17/counting-triangles-faster/?utm_source=rss&utm_medium=rss&utm_campaign=counting-triangles-faster
To download the data set used in all tests, go to:
http://www.vertica.com/benchmark/TriangleCounting/edges.txt.gz
To download the query statements, go to:
https://github.com/vertica/Graph-Analytics----Triangle-Counting
2 comments:
This is pretty misleading! "All the tests were run on the same hardware configuration, essentially a cluster of 4 nodes". Small detail: Everybody else's nodes had 12 cores apiece, Kognitio's had 24 cores. Whoops!
The test environment was replicated based on what was available and were constrained in that regard. Bottom line- WX2's speed comes from its ability to use ALL of the available cores and use them very efficiently. Most technologies can only utilize a relatively small number of cores and certainly don't scale their core use linearly. If we wanted, we could double, triple X10 or X100 the number of cores and continue to show a performance improvement. What applications exist that easily drive 24 cores in a single server with maximum efficiency?
Post a Comment