When we discuss resources distribution (not necessary on grid) we can say that there are 2 different approaches exist - "push" and "poll".
"Push" approach means that if we have some consumers and producers last distribute resources across all consumers. Say producer is leading. On the contrary "poll" concept means that producers share resources somehow and consumers request necessary resources so in this case consumer is leading.
If we apply these approaches to computational grids we could see that pure "master-worker" concept is a "push" one where master lets workers know that some new jobs exist and assign them to workers. But for example GigaSpaces implemented "master-worker" using their cache that makes it different. Master node just puts some data into their cache and worker nodes pick up data and process them. This is nice and pretty flexible because all nodes will execute new jobs as soon as they completed existing ones - a kind of automatic load balancing.
GridGain by default uses "push" way and distributes all jobs across grid nodes. But it supports another way as well.
Let me show you how to "invert" the base GridGain concept without even changing a code. First lets make some statements that will explain all things that I'm going to do.
GridGain's "job stealing" implementation allows nodes to steal jobs from each other (do you still remember about consumers and load balancing? :)
Ok. What we need to do is just stop sending jobs to remote nodes and "cache" them on "master" node. Then we should configure all "worker" nodes to steal jobs from "master" node. That's it.
Lets configure this approach.
"Push" approach means that if we have some consumers and producers last distribute resources across all consumers. Say producer is leading. On the contrary "poll" concept means that producers share resources somehow and consumers request necessary resources so in this case consumer is leading.
If we apply these approaches to computational grids we could see that pure "master-worker" concept is a "push" one where master lets workers know that some new jobs exist and assign them to workers. But for example GigaSpaces implemented "master-worker" using their cache that makes it different. Master node just puts some data into their cache and worker nodes pick up data and process them. This is nice and pretty flexible because all nodes will execute new jobs as soon as they completed existing ones - a kind of automatic load balancing.
GridGain by default uses "push" way and distributes all jobs across grid nodes. But it supports another way as well.
Let me show you how to "invert" the base GridGain concept without even changing a code. First lets make some statements that will explain all things that I'm going to do.
GridGain's "job stealing" implementation allows nodes to steal jobs from each other (do you still remember about consumers and load balancing? :)
Ok. What we need to do is just stop sending jobs to remote nodes and "cache" them on "master" node. Then we should configure all "worker" nodes to steal jobs from "master" node. That's it.
Lets configure this approach.
Grid "master" node configuration (this node does not steal jobs)
...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="0">
<property name="waitJobsThreshold" value="1000">
<property name="maximumStealingAttempts" value="10">
<property name="stealingEnabled" value="false">
<property name="messageExpireTime" value="10000">
</bean>
</property>
Grid "worker" node configuration (nodes steal jobs)
...
<property name="collisionSpi">
<bean class="org.gridgain.grid.spi.collision.jobstealing.GridJobStealingCollisionSpi">
<property name="activeJobsThreshold" value="10">
<property name="waitJobsThreshold" value="10">
<property name="maximumStealingAttempts" value="10">
<property name="stealingEnabled" value="true">
<property name="messageExpireTime" value="10000">
</bean>
</property>
And in your task you should always map your jobs to the local "master" node like below.
public class MyFooBarTask extends GridTaskAdapter<String, String> {
// Inject grid instance.
@GridInstanceResource
private Grid grid;
// Map jobs to the local node.
public Map<? extends GridJob, GridNode> map(List<GridNode> subgrid,
String arg) throws GridException {
Map<MyFooBarJob, GridNode> jobs =
new HashMap<MyFooBarJob, GridNode>(1);
jobs.put(new MyFooBarJob(arg), grid.getLocalNode())
return jobs;
}
....
}
Easy isn't it?


1 comments:
For recovery case, is GridGain able to recover from last failure state? (e.g. if I have total 100 sub-jobs, with 50 sub-jobs completed and 50 pending. The master node failed at this point and later on restart. Will the master node able to start from last state?)
Post a Comment