Recently I've noticed following article about Hadoop -
HadoopVsGridGain. And to my understanding it's just an adv :) Or a kind of because only those one who is scared will put things like this on official site. I think competitors must be polite and compete instead of comparing.
Could you compare the Jet and Submarine? I can't and I think that GridGain and Hadoop are different. Every product has own niche. The goal of this article is not to object but to say what is wrong in those one mentioned above about GridGain.
The major issue is about 10000 CPUs and PB of data. Looks like "mine one is longer ...". Do you reader have them? I don't and if our customers buy 10000 CPUs then GridGain will work on 10000 CPUs without any issues - we don't publish stupid numbers and don't make artificial test to prove our scalability. We do our job to make Grid computing simple and useful for the people who develop software.
Most of people use Hadoop to process large dataset and usually files but Gridgain is most likely a computational Grid that can even use Hadoop as a distributed FS. GridGain has flexible pluggable interfaces that allow using any kind of data grids (and we have some shipped out-of-the-box). Saying that GridGain does not have it's own data-grid is the same as asking why did not we reinvent the wheel? The answer is because someone did it better than we :) And if Hadoop is good at it then we will provide integration with Hadoop :)
Yes we have different understanding about tasks and jobs but it's not a comparison point. To my POV task is something more complicated than job. Thus we chose task as a "primary entity" and task should be split into jobs. But again this is not the point to compare products.
Returning single value from GridGain's task makes more sense than list of items. We gave a chance to user to define that kind of data should be returned back. Use either single object or List even Map - it's up to user not GridGain or Hadoop. And there is nothing to compare.
Combiners and counters. Hm. If you process files yes you can say that your task has already processed half of them. But what about computational math tasks that have no strong algorithm and rather based on approximation or things like this. You cannot say that you have calculated half of PI :)) Sometimes they are useful sometimes not.
Using java.io.serialization is being changed and in version 2.1 we will provide new SerializationSpi to make it more flexible. And thanks that they pointed it out. We have already implemented it.
As for C++ and other languages several posts below I wrote the example of using GridGain with shell scripts to process files like Hadoop does, but those post is in Russian. I will translate it into English when I have time and again - nothing to compare. Example in those post clearly shows that GridGain can be used with any kind of languages because every language supports system output and error codes.
And the last and the most funny thing is to say that GridGain costs something :) GridGain is the open source product licensed under GPL and Apache2 licenses which provides source code, bug tracking access and forum with average response time less than 1 hour. The only thing that will cost is a management console but in 90% of cases you don't need it because every GridGain node shares JMX beans and publish entire node/tasks/jobs information.