Wednesday, October 26, 2011

Counting triangles - new performance study, Kognitio triumphant




Following the challenge thrown down by Vertica and then by an Oracle member of staff, Kognitio decided to rise to the occasion and prove how its WX2 in-memory analytic database would perform against a 86 million row data set. The data was provided by Vertica and is publicly available. The benchmark that was run using Vertica and Oracle also included other technologies such as Hadoop and PIG.

The tests themselves centered around counting triangles in an undirected graph with reciprocal edges, the idea being to queries to find out the quantity of vertexes and edges, which edges are joined and which vertexes join which edges, which is regarded as a computationally intensive exercise. Such a concept can be used to understand social networks and how people are connected and interact with others.

All the tests were run on the same hardware configuration, essentially a cluster of 4 nodes. The results were astounding and are as follows:

  • Kognitio WX2 - 11.48 seconds
  • Oracle, post a re-write of SQL - 14 seconds
  • Oracle, pre the re-write - 90 seconds
  • Vertica - 97 seconds
  • PIG - 2,151 seconds
  • Hadoop - 3,900 seconds
We're extremely excited that Kognitio WX2 won out and would also like to extend our thanks to Vertica for setting the data warehousing community abuzz with this challenge. Similary, for providing the data set as well as the SQL language statements. And for the sake of transparency, here are the links to the research as well as more details on what hardware and software was tested on the 86 million row dataset.

Kognitio ran the tests using WX2 on 4 HP servers, each with 128GB of RAM and 24 cores per server.

Vertica ran the tests using Vertica, PIG and Hadoop on 4 HP servers, each with 96GB of RAM and 12 cores per server. Results can be found here: http://www.vertica.com/2011/09/21/counting-triangles/

Tests were run using Oracle Exadata 2-2 hardware with 2 socket, 12 core Westmere-EP nodes and Oracle Database 11.2.0.2. Results can be found here: http://structureddata.org/2011/10/17/counting-triangles-faster/?utm_source=rss&utm_medium=rss&utm_campaign=counting-triangles-faster

To download the data set used in all tests, go to:
http://www.vertica.com/benchmark/TriangleCounting/edges.txt.gz

To download the query statements, go to:
https://github.com/vertica/Graph-Analytics----Triangle-Counting

Friday, October 07, 2011

New report on Big Data Analytics

We are proud co-sponsors of a new report from TDWI on the subject of Big Data Analytics. Published late September 2011, the report is written by expert Philip Russom and he conducted a survey of 325 data warehousing professionals on the topic of Big Data Analytics. There are some interesting statistics in the report and they support the recent emergence of topics such as big data, analytics and the Cloud.

In particular:

- only 7% of those asked had never heard of or hadn't got a "big data" problem

- other terms used to describe the phenomenon include: big data analytics; advanced analytics; large data set analytics, a pain in the a... (seriously....)

- the barriers to adopting a solution for big data analytics include cost, skill set, staffing and scalability

- while 70% regard big data as an opportunity, 30% see it as a pain

- in 3 years time, 31% of those surveyed believe 10-100 TB will be the size of their data warehousing environments

- a third of those asked think they will replace their analytic platform in the next 3 years

- 30% would prefer a cloud-based analytic platform as opposed to on-premise

- drivers to replacing their platform include "cannot scale up with existing platform" (42%) and "need/want a cloud/SaaS platform (15%)

- over the next 3 years, respondents will adopt: a private cloud (22%); in-memory database (26%); data marts for analytics (38%)

- "clouds", "SaaS" and "analytic DBMS" are deemed areas of "good potential growth"

Feel free to download the report from the Resource area of our website.