Here we provide details of our implementation of SPIRIT. For the sake of simplicity we assume a standard cluster configuration composed of a mother node and a heterogeneous set of computing nodes interconnected by network cards. All computers run Linux and we assume that the cluster software implemented is the Sun Grid Engine (SGE). Adaptation to other cluster software should be trivial and we thank those who adopt SPIRIT to other cluster platforms to please send us a copy so that we may make it available here, including the appropriate acknowledgements of course. We assume all computations are performed via jobs submitted to a queue system. SPIRIT runs only in the mother node. By monitoring the queue system (in the Sun Grid Engine, SGE, via the commands `qhost` and `qstat`), SPIRIT will determine if there are jobs waiting in queue or if there are idle computing nodes. Idle nodes will be shutdown. In this case SPIRIT issues the command:
`ssh nodeXX “poweroff”`
If there are jobs waiting in queue and computing nodes shutdown then SPIRIT will send Wake On Lan (WoL) signals to (currently, see below) predefined nodes such that enough CPU’s will be available for the pending jobs to be executed. SPIRIT issues the command invoking Donald Becker’s open source ether-wake utility  sent via the right network card (ethY), using the mac-address of the computing node:
`ether-wake –i ethY MAC_ADDRESS`
SPIRIT is a small highly modularized program written in C++. We believe its simplicity is its strength, as it can be trivially modified and customized to implement more-specific/smarter configurations. As such, SPIRIT is straightforward to install and configure in a given PCI.
1. Get the latest SPIRIT
2. Unpack SPIRIT in a system directory like /usr/local/:
3. Compile SPIRIT (this requires gnu c++ compiler ):
cd SPIRIT ; ./MSPIRIT
The main configuration files are located in config.
4. Edit main.dat to reflect your cluster settings and preferences.
MachinePrefix should be the network prefix of the computing nodes. In the example above the cluster is made out of a mother node called tuga, connected to 10 (nmachines=10) computing nodes called tuga01, tuga02, ..., tuga10. The network interface id ifname identifies the network card (of the mother node) used to communicate with the computing nodes. UpdateTime is the value in seconds of the time it takes SPIRIT to update its state. WakeTime is a delay in seconds between each machine Wake on Lan instruction sent (this may be useful to prevent a synchronized boot up of all machines). Finally, setting Debug to: 1 will cause SPIRIT to print diagnostic messages, 0 suppresses all output.
5. Edit macs.dat.
This file should have nmachines entries. In the first column set the node number. In the second column set the corresponding the mac-address of the network card (for details, see below) .
6. Edit priority.list
This file should have nmachines entries which specify the order by which the computing nodes will wake up. This may be useful in heterogeneous PCI with different computing nodes.
7. Edit exclude.list.
for some reason some nodes should not be powered off, they should be
listed in this file. Example:
8. Issue the command:
If all is properly configured SPIRIT is now up and running.
Tips & Tricks
1. What's inside the package?
After downloading, copy the file to a convenient system directory:
The above will unpack the following files:
2. The cluster @ ATP group.
Our cluster is composed of a mother node plus 22 computing nodes interconnected via a 24 channel gigabit ethernet switch. The computing nodes are quad cores running Linux and the cluster software is the Sun Grid Engine.
These are the configuration files for "tuga":
3. How to get the mac-address of a machine
You can obtain these numbers easily issuing a command like:
ssh tuga10 " /sbin/ifconfig"
This example for computing node node tuga10 gives something like:
Highlighted above is the mac-address of tuga10 network card.