Recently I've noticed following article about Hadoop - HadoopVsGridGain. And to my understanding it's just an adv :) Or a kind of because only those one who is scared will put things like this on official site. I think competitors must be polite and compete instead of comparing.
Could you compare the Jet and Submarine? I can't and I think that GridGain and Hadoop are different. Every product has own niche. The goal of this article is not to object but to say what is wrong in those one mentioned above about GridGain.
The major issue is about 10000 CPUs and PB of data. Looks like "mine one is longer ...". Do you reader have them? I don't and if our customers buy 10000 CPUs then GridGain will work on 10000 CPUs without any issues - we don't publish stupid numbers and don't make artificial test to prove our scalability. We do our job to make Grid computing simple and useful for the people who develop software.
Most of people use Hadoop to process large dataset and usually files but Gridgain is most likely a computational Grid that can even use Hadoop as a distributed FS. GridGain has flexible pluggable interfaces that allow using any kind of data grids (and we have some shipped out-of-the-box). Saying that GridGain does not have it's own data-grid is the same as asking why did not we reinvent the wheel? The answer is because someone did it better than we :) And if Hadoop is good at it then we will provide integration with Hadoop :)
Yes we have different understanding about tasks and jobs but it's not a comparison point. To my POV task is something more complicated than job. Thus we chose task as a "primary entity" and task should be split into jobs. But again this is not the point to compare products.
Returning single value from GridGain's task makes more sense than list of items. We gave a chance to user to define that kind of data should be returned back. Use either single object or List even Map - it's up to user not GridGain or Hadoop. And there is nothing to compare.
Combiners and counters. Hm. If you process files yes you can say that your task has already processed half of them. But what about computational math tasks that have no strong algorithm and rather based on approximation or things like this. You cannot say that you have calculated half of PI :)) Sometimes they are useful sometimes not.
Using java.io.serialization is being changed and in version 2.1 we will provide new SerializationSpi to make it more flexible. And thanks that they pointed it out. We have already implemented it.
As for C++ and other languages several posts below I wrote the example of using GridGain with shell scripts to process files like Hadoop does, but those post is in Russian. I will translate it into English when I have time and again - nothing to compare. Example in those post clearly shows that GridGain can be used with any kind of languages because every language supports system output and error codes.
And the last and the most funny thing is to say that GridGain costs something :) GridGain is the open source product licensed under GPL and Apache2 licenses which provides source code, bug tracking access and forum with average response time less than 1 hour. The only thing that will cost is a management console but in 90% of cases you don't need it because every GridGain node shares JMX beans and publish entire node/tasks/jobs information.
Could you compare the Jet and Submarine? I can't and I think that GridGain and Hadoop are different. Every product has own niche. The goal of this article is not to object but to say what is wrong in those one mentioned above about GridGain.
The major issue is about 10000 CPUs and PB of data. Looks like "mine one is longer ...". Do you reader have them? I don't and if our customers buy 10000 CPUs then GridGain will work on 10000 CPUs without any issues - we don't publish stupid numbers and don't make artificial test to prove our scalability. We do our job to make Grid computing simple and useful for the people who develop software.
Most of people use Hadoop to process large dataset and usually files but Gridgain is most likely a computational Grid that can even use Hadoop as a distributed FS. GridGain has flexible pluggable interfaces that allow using any kind of data grids (and we have some shipped out-of-the-box). Saying that GridGain does not have it's own data-grid is the same as asking why did not we reinvent the wheel? The answer is because someone did it better than we :) And if Hadoop is good at it then we will provide integration with Hadoop :)
Yes we have different understanding about tasks and jobs but it's not a comparison point. To my POV task is something more complicated than job. Thus we chose task as a "primary entity" and task should be split into jobs. But again this is not the point to compare products.
Returning single value from GridGain's task makes more sense than list of items. We gave a chance to user to define that kind of data should be returned back. Use either single object or List even Map - it's up to user not GridGain or Hadoop. And there is nothing to compare.
Combiners and counters. Hm. If you process files yes you can say that your task has already processed half of them. But what about computational math tasks that have no strong algorithm and rather based on approximation or things like this. You cannot say that you have calculated half of PI :)) Sometimes they are useful sometimes not.
Using java.io.serialization is being changed and in version 2.1 we will provide new SerializationSpi to make it more flexible. And thanks that they pointed it out. We have already implemented it.
As for C++ and other languages several posts below I wrote the example of using GridGain with shell scripts to process files like Hadoop does, but those post is in Russian. I will translate it into English when I have time and again - nothing to compare. Example in those post clearly shows that GridGain can be used with any kind of languages because every language supports system output and error codes.
And the last and the most funny thing is to say that GridGain costs something :) GridGain is the open source product licensed under GPL and Apache2 licenses which provides source code, bug tracking access and forum with average response time less than 1 hour. The only thing that will cost is a management console but in 90% of cases you don't need it because every GridGain node shares JMX beans and publish entire node/tasks/jobs information.


7 comments:
It's not on the official site. It is on a wiki, where anyone can edit. If you disagree with it, you can add comments to that effect right there.
Right but I would like the author to be more precise and if he compares these products be more accurate and ask GridGain about implemented features when comparing with Hadoop ones.
Да, тут чтото они перегнули палку, совсем некрасиво, а вроде приличные люди должны были быть. Имидж они явно подпортили себе.
А вообще что их, что вас давно просили написать сравнение, никто почему-то не хотел. Теперь все получили вот такое сравнение... тут кто первый тот и выиграл.
Нам не интересно сравнение. Просто со следующей версии мы выпустим интеграцию с любыми языками и shell scripts после чего задавим Hadoop простотой использования. А так же напишем интеграцию Hadoop так что GridGain будет работать поверх него - вот и посмотрим
Да, именно, вот в этом и проблема что никому сравнение не нужно, кроме конечного пользователя. И в результате перед выбором технологии он сможет судить лишь по одному доступному сравнению.
Тут ведь не какая то библиотека для ресайза картинок, за день не разберешься что лучше подойдет под конкретные нужды. И за неделю не разберешься.
Тут, конечно, лучше бы независимое сравнение, но на это совсем не стоит надеяться.
I agree, GridGain and Hadoop are very different. That is why I spent time comparing them.
10,000 cores and petabytes aren't just random numbers. They are the scale that Hadoop is currently used in production.
http://developer.yahoo.com/blogs/hadoop/2008/02/yahoo-worlds-largest-production-hadoop.html
And trust me, no tool scales to that level without a lot of work testing at that scale.
All of the comparisons that had been done previously, used very difficult to understand terminology that is specific to GridGain. So, I wanted to specifically look at the map/reduce implementation in each.
GridGain may have other features that are very useful, but it isn't a useful implementation of map/reduce for anything but toy examples. The fact that there is a single reduce and that all of the map outputs have to fit in memory make it a non-starter for almost all of the applications we use map/reduce for.
A couple more points:
Combiners are roughly equivalent to GridGain's GridTask.result() method, except that they are used to reduce the volume of data that the map generated to cut down network traffic, which is the primary bottleneck for map/reduce computations.
Counters are for user level understanding and debugging. They help the user understand what the application is doing/did.
I agree that for computing pi, you don't need or want either. If however, you are processing billions of records on a thousand nodes, they help a lot.
Post a Comment