Wednesday, May 7, 2008

SImply in time.

Yesterday I read post that Xorg 7.4 release will be out without some major features because they are still fighting with some critical bugs and thus Fedora Core 9 release that I was looking forward has a delay and even if they publish it it won't have those pretty interesting features of Xorg.

I'm really disappointed. Next FC release should be in 6 months and I have to wait so long to get stable and featured operating system just because of the Xorg.

So my question is what do you think is better to be in time but with less features or delay it but get fully featured release.

To my understanding there is something between as usually. I mean instead of having major releases only they could publish 9.5 with new Xorg when it is ready just to make customers and users happy. I know that every release is a nightmare for developers and testers and takes pretty much time to make it stable and reliable but users are looking forward for those new features that you promised.

Tuesday, May 6, 2008

Example of two job scheduling approaches.

There are two different approaches of job scheduling on the grid.

First approach is choosing the most suitable node for the job execution or another words load balancing. Prior to sending job on remote node one have to make a decision which node has resources for the job execution. The strategy could be different from simple round-robin implementation to affinity load balancing that selects the node where processing data are located on. But one should take into account that everything can be changed when job comes on the node.

GridGain ships some implementations out-of-the-box and at the same time give simple way to implement your own load balancer by creating "probes".
Probe implements GridAdaptiveLoadProbe interface and is in change of returning back load for the given node. Here is an example of CPU probe that returns node load based on current CPU one.

GridAdaptiveCpuLoadProbe.java

public class GridAdaptiveCpuLoadProbe implements GridAdaptiveLoadProbe {
/**
* {@inheritDoc}
*/
public double getLoad(GridNode node, int jobsSentSinceLastUpdate) {
GridNodeMetrics metrics = node.getMetrics();

double k = metrics.getAvailableProcessors();

return (metrics.getCurrentCpuLoad()) / k;
}
}

"Simply clever" as Skoda says. And configuration file excerpt:

config.xml

...
<property name="loadBalancingSpi">
<bean class="org.gridgain.grid.spi.loadbalancing.adaptive.GridAdaptiveLoadBalancingSpi">
<property name="loadProbe">
<bean class="GridAdaptiveCpuLoadProbe">
<constructor-arg value="true"/>
</bean>
</property>
</bean>
</property>
...

Another approach is a runtime job scheduling or as we call it in GridGain collision resolution. Every new job collide with the others when comes to the target node. Saying "collide" we don't mean that jobs beat each other somehow :). Collision in this case just means that node should probably take some actions about it.
GridGain has different collision resolutions. One that I have already wrote some posts about is a "priority collision resolution" where all outstanding jobs are ordered according to their priority.

Another one is so called "job stealing". "Job stealing" is a brand-new feature significantly influenced by Java Fork/Join Framework authored by Doug Lea and planned for Java 7. GridGain implementation took similar concepts and applied them to the grid (as opposed to within VM support planned in Java 7). Job stealing allows underloaded node to take some jobs from overloaded node and thus balance grid nodes load automatically during runtime. Developer should not even know about job stealing or do anything special about it.

You need to turn it on to get working and can find description and parameters here:

config.xml

...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="100"/>
<property name="waitJobsThreshold" value="0"/>
<property name="maximumStealingAttempts" value="10"/>
<property name="stealingEnabled" value="true"/>
<property name="messageExpireTime" value="1000"/>
</bean>
</property>
...
<property name="failoverSpi">
<bean class="org.gridgain.grid.spi.failover.jobstealing.GridJobStealingFailoverSpi">
<property name="maximumFailoverAttempts" value="5"/>
</bean>
</property>
...

GridGain as the enterprise level Grid supports both load-balancing and collision resolution that makes it very flexible and at the same time easy-to-use.

Friday, May 2, 2008

"To compete" is not the same as " to compare"

Recently I've noticed following article about Hadoop - HadoopVsGridGain. And to my understanding it's just an adv :) Or a kind of because only those one who is scared will put things like this on official site. I think competitors must be polite and compete instead of comparing.

Could you compare the Jet and Submarine? I can't and I think that GridGain and Hadoop are different. Every product has own niche. The goal of this article is not to object but to say what is wrong in those one mentioned above about GridGain.

The major issue is about 10000 CPUs and PB of data. Looks like "mine one is longer ...". Do you reader have them? I don't and if our customers buy 10000 CPUs then GridGain will work on 10000 CPUs without any issues - we don't publish stupid numbers and don't make artificial test to prove our scalability. We do our job to make Grid computing simple and useful for the people who develop software.

Most of people use Hadoop to process large dataset and usually files but Gridgain is most likely a computational Grid that can even use Hadoop as a distributed FS. GridGain has flexible pluggable interfaces that allow using any kind of data grids (and we have some shipped out-of-the-box). Saying that GridGain does not have it's own data-grid is the same as asking why did not we reinvent the wheel? The answer is because someone did it better than we :) And if Hadoop is good at it then we will provide integration with Hadoop :)

Yes we have different understanding about tasks and jobs but it's not a comparison point. To my POV task is something more complicated than job. Thus we chose task as a "primary entity" and task should be split into jobs. But again this is not the point to compare products.

Returning single value from GridGain's task makes more sense than list of items. We gave a chance to user to define that kind of data should be returned back. Use either single object or List even Map - it's up to user not GridGain or Hadoop. And there is nothing to compare.

Combiners and counters. Hm. If you process files yes you can say that your task has already processed half of them. But what about computational math tasks that have no strong algorithm and rather based on approximation or things like this. You cannot say that you have calculated half of PI :)) Sometimes they are useful sometimes not.

Using java.io.serialization is being changed and in version 2.1 we will provide new SerializationSpi to make it more flexible. And thanks that they pointed it out. We have already implemented it.

As for C++ and other languages several posts below I wrote the example of using GridGain with shell scripts to process files like Hadoop does, but those post is in Russian. I will translate it into English when I have time and again - nothing to compare. Example in those post clearly shows that GridGain can be used with any kind of languages because every language supports system output and error codes.

And the last and the most funny thing is to say that GridGain costs something :) GridGain is the open source product licensed under GPL and Apache2 licenses which provides source code, bug tracking access and forum with average response time less than 1 hour. The only thing that will cost is a management console but in 90% of cases you don't need it because every GridGain node shares JMX beans and publish entire node/tasks/jobs information.