Sunday, June 22, 2008

Grids vs. Application server clusters (or may be both)?

I won't write about tricks with GridGain today. I just want to talk about grids, both data and computational ones.

Recently I was asked about their applicability when talked about grids. The question was "Why should I use it. I think grids are not for our application. We don't compute anything." And I understood how far we are from the people who write their applications.

And I think it's time to explain grids. Regardless the vendor (GridGain, Gigaspaces, Terracotta or Hadoop ...) grids are intended to make application robust, flexible, scalable and reliable.

When you start coding any kind of application you hardly ever think about the fact that in most cases your requirements will be changed when you finish development. The number of users/requests will be increased 10 or even 100 times and you would have to change your code and hardware. Your would have to buy another license for the application server that you use. Of course there is a LAMP and "cool" developers use it but don't you think that there are plenty of companies which don't use Perl or MySQL?

So what do you do to make your application reliable? Scalable? Use EJB or any other container specific features, may be web-services. You have to pay for the license, you have to set up and administrate application server cluster and define load balancing, database access, deployment schema. But let me ask you what would you do if you need to execute your code on the same box where data are located? You would deploy your code there right? The answer is no. This is not quite correct. Data could be moved from one Database box to another and you should spend a lot of time to reconfigure your cluster.

This is nightmare, believe me I saw it and worked with such systems.

Of course application servers provide standard access to the features like JMS / JMX / JNDI / Databases and so on and grids don't implement them but in most cases you don't need those stuff and more of that usually you can start grid inside your favourite Application Server or Web Container (like GridGain does).

Grids are the cure in most cases. They give us another development model especially if you use Spring or Spring-like frameworks to configure entire applications. Whatever grid you use it gives you the way to join computational tasks (your code) and data (data grids) together and execute code on the same node where data are.

You should not even think about deployment in most cases (For example GridGain allows you not to deploy your code - it will be deployed at runtime).

So you need to define set of different jobs and probably let grid know what kind of data they need. That's all. Grid will deploy, balance the load, provide failover and thus reliability, scale your application and in most cases it is free.

So what would you choose application servers or grids? Or both of them?

Friday, June 13, 2008

Push vs. Poll approaches with GridGain

When we discuss resources distribution (not necessary on grid) we can say that there are 2 different approaches exist - "push" and "poll".

"Push" approach means that if we have some consumers and producers last distribute resources across all consumers. Say producer is leading. On the contrary "poll" concept means that producers share resources somehow and consumers request necessary resources so in this case consumer is leading.

If we apply these approaches to computational grids we could see that pure "master-worker" concept is a "push" one where master lets workers know that some new jobs exist and assign them to workers. But for example GigaSpaces implemented "master-worker" using their cache that makes it different. Master node just puts some data into their cache and worker nodes pick up data and process them. This is nice and pretty flexible because all nodes will execute new jobs as soon as they completed existing ones - a kind of automatic load balancing.

GridGain by default uses "push" way and distributes all jobs across grid nodes. But it supports another way as well.

Let me show you how to "invert" the base GridGain concept without even changing a code. First lets make some statements that will explain all things that I'm going to do.

GridGain's "job stealing" implementation allows nodes to steal jobs from each other (do you still remember about consumers and load balancing? :)

Ok. What we need to do is just stop sending jobs to remote nodes and "cache" them on "master" node. Then we should configure all "worker" nodes to steal jobs from "master" node. That's it.

Lets configure this approach.


Grid "master" node configuration (this node does not steal jobs)

...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="0">
<property name="waitJobsThreshold" value="1000">
<property name="maximumStealingAttempts" value="10">
<property name="stealingEnabled" value="false">
<property name="messageExpireTime" value="10000">
</bean>
</property>


Grid "worker" node configuration (nodes steal jobs)

...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="10">
<property name="waitJobsThreshold" value="10">
<property name="maximumStealingAttempts" value="10">
<property name="stealingEnabled" value="true">
<property name="messageExpireTime" value="10000">
</bean>
</property>


And in your task you should always map your jobs to the local "master" node like below.

public class MyFooBarTask extends GridTaskAdapter<String, String> {
// Inject grid instance.
@GridInstanceResource
private Grid grid;

// Map jobs to the local node.
public Map<? extends GridJob, GridNode> map(List<GridNode> subgrid,
String arg) throws GridException {
Map<MyFooBarJob, GridNode> jobs =
new HashMap<MyFooBarJob, GridNode>(1);

jobs.put(new MyFooBarJob(arg), grid.getLocalNode())

return jobs;
}
....
}

Easy isn't it?

Wednesday, June 4, 2008

Result? No cache.

OutOfMemoryError is very critical when you processing large data on the grid.

What if you split up all you jobs among other boxes in the network and every box produced result of 100K? It's OK if we are talking about hundred job and if you have enough memory installed on those one that will collect all execution results and process them.

But what would happen if there are 10000 nodes in the grid and every node sends back 1M of data. I know that in most cases this is rather a hypothetical issue but as usually there are some people who always ask you "what if...".

I know that normally when you build the grid you have to take it into account and avoid sending such data back. I'm sure that in 90 cases of 100 you will never send back more than 100K and even if you know that result is 1M at maximum you should set up as much memory on "master" node as you can get back. So in our case described above it is 10K (number of nodes) * 1M (maximum result size) = 10G. Not so much (taking into account 10K nodes :)).

But anyway let's be lazy and instead of thinking about our grid we would better waste out money and time and rely on Grid product that will probably solve it somehow ;)

Different products gives you different ways to handle the case. We are at GridGain solved it as following: instead of parallel results processing we do it sequentially without caching received data and delete them as soon as they were processed. Of course this will work only if your final result does not depend on all interim results received from remote nodes (that's why I said waste time as money above - 8G of additional memory costs a few).

GridGain product uses annotations throughout the code and this issue had been solved with @GridTaskNoResultCache annotation like below:


GridResultNoCacheTask.java

@GridTaskNoResultCache
public class GridResultNoCacheTask extends GridTaskSplitAdapter<String, Object> {
@Override
public GridJobResultPolicy result(GridJobResult result,
List<GridJobResult> received) throws GridException {
assert result.getData() != null;
assert received.contains(result) == true;

// Do something with received result. The rest
// (in "received" list) are null;
}
}