Execution on Clusters

On grid environments, clusters which connects many PCs and workstations in a network is a typical computation resources. In OmniRPC, we can treat a cluster as a remote host. We can run a remote executable module on each node in the cluster and execute in parallel so that we can achieve good performance.

Setting of cluster environment
Selection of job scheduler
Use of a built-in round-robin scheduler
Cluster in private network.
Cluster outside firewall
Cluster inside firewall

Setting of cluster environment

When using clusters, we can access at least one computer in a cluster form the client host. We label this computer as the cluster server host and label the computers in the cluster without a cluster server host as the cluster node host.

We assume the environment below.
  1. Client host is jones.tsukuba.ac.jp
  2. Cluster server host is hpc-serv.hpcc.jp. So hpc1, hpc2 and hpc3 are connected to the cluster server host as cluster node hosts.
  3. Both the cluster server host and cluster node hosts share the same file system.
  4. Client host can connect directly to the cluster server host and all of the cluster nodes. All port are not limited.

The last item in the list assumes that the client host and cluster are in same network.

Selection of job scheduler

In OmniRPC, the omrpc-agent is invoked first on the cluster server host, and this agent activates remote executable module on each cluster node host with the appropriate scheduler. We can use one of the scheduler described below.

Use of built-in round-robin scheduler

OmniRPC's built-in round-robin scheduler is a simple scheduler which is implemented in the agent. This scheduler activates remote executable modules on cluster node hosts.

To use this scheduler, we create a nodes file which specifies cluster node hosts on the registry( "$HOME/.omrpc-register" ) of the cluster server host. Below is the setting for this example.

hpc1
hpc2
hpc3

On the client host side, we create this hostfile.

<?xml version="1.0" ?>
<OmniRpcConfig>
   <Host name="hpc-serv.hpcc.jp" arch="i386" os="linux">
   <JobScheduler type="rr" maxjob="4" /> 
   </Host>
</OmniRpcConfig>
Set the type attribute in the job scheduler element to the round-robin scheduler "rr." The default value for this attribute is "fork," which just creates the process on the same host. Our example applies to an SMP system. The number of cluster node hosts is 4, so you should set maxjob equal to 4.

The relationship between the agent and rex with this option is as follows.

Cluster in private network

In the above example, the client host and cluster hosts are in same network. Also, the remote executable programs which execute on the cluster node hosts are activated directly for the client host. In the case in which the cluster and client host are in different networks, programs can communicate to the client host from a cluster node host.

But, as the number of node hosts increases, so do the clusters connected to the local-address network. In this situation, only the server host has a global IP address; the node hosts have local IP addresses. For OmniRPC, the cluster node host must communicate with the client host, but in this situation the cluster node host cannot communicate directly with the client host outside the cluster's network.

In this situation, there are 2 ways to use the cluster.

  1. Set NAT to communicate with outside networks from the cluster node hosts. Programs can connect to anonymous ports on the client node from each cluster node. For the setting of NAT, please refer to NAT documents.
  2. By using the agent function of multiplex communication, the agent relays communications between remote executable programs, which are executed on the cluster node host and client host.

We show an example hostile.xml which is based on the second way to use the cluster.

<?xml version="1.0" ?>
<OmniRpcConfig>
   <Host name="hpc-serv.hpcc.jp" arch="i386" os="linux">
   <Agent invoker="rsh" mxio="on" />
   <JobScheduler type="rr" maxjob="4" />
   </Host>
</OmniRpcConfig>
You should set the mxio attribute on the agent element with "on." In this case, because we assume that the cluster server host and client host are in same network, we use "rsh." If you want to invoke the agent with SSH, set "ssh". If you want to do this with the Globus gate keeper, use "globus."

Using this option, the relationship is shown in the figure below.

The agent relays communications between every rex which is executed on the remote node host and the client.

Cluster outside firewall

You don't have to prepare for this situation.

Cluster inside firewall

We now explain the case of using clusters from outside of firewalls. When there are firewall(s), it is necessary at least to access with ssh to the cluster server host. If you can not use anonymous ports without a port (#22) of ssh, you can use the function of multiplex communication with the agent. We show the hostfile for this example.

<?xml version="1.0" ?>
<OmniRpcConfig>
   <Host name="hpc-serv.hpcc.jp" arch="i386" os="linux">
   <Agent invoker="ssh" mxio="on" />
   <JobScheduler type="rr" maxjob="4" />
   </Host>
</OmniRpcConfig>

Set the mxio attribute "on" in the agent element and use the function of multiplex communication.

In environments which use Globus, usually there are no firewalls, so you don't have to prepare. But, you have to set the mxio attribute in the same manner if the clusters consist of private IP addresses.

The relationship between the agent and rex is shown by the figure below.

Communications between the client and rex, which are executed on remote node hosts are relayed by the agent. And communications between the agent and clients are relayed by SSH's port forwarding through the firewall.