"Parallel Computing Infrastructures (PCI) - computer clusters with many CPUs interconnected via a dedicated network - constitute the preferred option for many Institutions requiring massive computational power. PCI typically run 24h per day, even when not all CPUs are being requested. This leads to unnecessary energy expenditure and associated CO2 emissions. Here we offer a solution saving up to 90% of PCI’s energy consumption during idle time in existing (and future) PCI. It is provided in the form of a free, open source software tool (we call it SPIRIT) which activates-on-demand requested CPUs, leaving all others in a shutdown state. Given the widespread adoption of PCI across the globe, we anticipate significant reductions in CO2 emissions upon implementation of this protocol."

 

Here we provide details of our implementation of SPIRIT. For the sake of simplicity we assume a standard cluster configuration composed of a mother node and a heterogeneous set of computing nodes interconnected by network cards. All computers run Linux and we assume that the cluster software implemented is the Sun Grid Engine (SGE). Adaptation to other cluster software should be trivial and we thank those who adopt SPIRIT to other cluster platforms to please send us a copy so that we may make it available here, including the appropriate acknowledgements of course. We assume all computations are performed via jobs submitted to a queue system. SPIRIT runs only in the mother node. By monitoring the queue system (in the Sun Grid Engine, SGE, via the commands `qhost` and `qstat`), SPIRIT will determine if there are jobs waiting in queue or if there are idle computing nodes. Idle nodes will be shutdown. In this case SPIRIT issues the command:

`ssh nodeXX “poweroff”`

If there are jobs waiting in queue and computing nodes shutdown then SPIRIT will send Wake On Lan (WoL) signals to (currently, see below) predefined nodes such that enough CPU’s will be available for the pending jobs to be executed. SPIRIT issues the command invoking Donald Becker’s open source ether-wake utility [1] sent via the right network card (ethY), using the mac-address of the computing node:

`ether-wake –i ethY MAC_ADDRESS`

SPIRIT is a small highly modularized program written in C++. We believe its simplicity is its strength, as it can be trivially modified and customized to implement more-specific/smarter configurations. As such, SPIRIT is straightforward to install and configure in a given PCI.

Installation

1. Get the latest SPIRIT

2. Unpack SPIRIT in a system directory like /usr/local/:

3. Compile SPIRIT (this requires gnu c++ compiler [2]):

cd SPIRIT ; ./MSPIRIT

The main configuration files are located in config.

Configuration

4. Edit main.dat to reflect your cluster settings and preferences.

MachinePrefix tuga
ifname eth1
nmachines 10
UpdateTime 60
WakeTime 5
Debug 0

MachinePrefix should be the network prefix of the computing nodes. In the example above the cluster is made out of a mother node called tuga, connected to 10 (nmachines=10) computing nodes called tuga01, tuga02, ..., tuga10. The network interface id ifname identifies the network card (of the mother node) used to communicate with the computing nodes. UpdateTime is the value in seconds of the time it takes SPIRIT to update its state. WakeTime is a delay in seconds between each machine Wake on Lan instruction sent (this may be useful to prevent a synchronized boot up of all machines). Finally, setting Debug to: 1 will cause SPIRIT to print diagnostic messages, 0 suppresses all output.

5. Edit macs.dat.

01 XX:YY:ZZ:WW:QQ:KK
02 XX:YY:ZZ:WW:QQ:KK
03 XX:YY:ZZ:WW:QQ:KK
04 XX:YY:ZZ:WW:QQ:KK
05 XX:YY:ZZ:WW:QQ:KK
06 XX:YY:ZZ:WW:QQ:KK
07 XX:YY:ZZ:WW:QQ:KK
08 XX:YY:ZZ:WW:QQ:KK
09 XX:YY:ZZ:WW:QQ:KK
10 XX:YY:ZZ:WW:QQ:KK

This file should have nmachines entries. In the first column set the node number. In the second column set the corresponding the mac-address of the network card (for details, see below) .

6. Edit priority.list

This file should have nmachines entries which specify the order by which the computing nodes will wake up. This may be useful in heterogeneous PCI with different computing nodes.

7. Edit exclude.list.

If for some reason some nodes should not be powered off, they should be listed in this file. Example:
1
2

Run

8. Issue the command:

./RunSPIRIT &

If all is properly configured SPIRIT is now up and running.


Tips & Tricks

1. What's inside the package?

After downloading, copy the file to a convenient system directory:

cp SPIRIT.tar.gz /usr/local
cd /usr/local
tar zxvf SPIRIT.tar.gz
rm SPIRIT.tar.gz

The above will unpack the following files:

SPIRIT/
SPIRIT/util.h
SPIRIT/SPIRIT
SPIRIT/RunSPIRIT
SPIRIT/SPIRIT04.cpp
SPIRIT/SPIRIT.cpp
SPIRIT/config/
SPIRIT/config/macs.dat
SPIRIT/config/priority.list
SPIRIT/config/main.dat
SPIRIT/config/exclude.list
SPIRIT/MSPIRIT

2. The cluster @ ATP group.

Our cluster is composed of a mother node plus 22 computing nodes interconnected via a 24 channel gigabit ethernet switch. The computing nodes are quad cores running Linux and the cluster software is the Sun Grid Engine.

These are the configuration files for "tuga":

"main.dat"
"exclude.list"
MachinePrefix tuga
ifname eth1
nmachines 22
UpdateTime 60
WakeTime 5
Debug 0

2
3
21
11
9

"macs.dat"
"priority.list"
01 XX:YY:ZZ:WW:QQ:KK
02 XX:YY:ZZ:WW:QQ:KK
03 XX:YY:ZZ:WW:QQ:KK
04 XX:YY:ZZ:WW:QQ:KK
05 XX:YY:ZZ:WW:QQ:KK
06 XX:YY:ZZ:WW:QQ:KK
07 XX:YY:ZZ:WW:QQ:KK
08 XX:YY:ZZ:WW:QQ:KK
09 XX:YY:ZZ:WW:QQ:KK
10 XX:YY:ZZ:WW:QQ:KK
11 XX:YY:ZZ:WW:QQ:KK
12 XX:YY:ZZ:WW:QQ:KK
13 XX:YY:ZZ:WW:QQ:KK
14 XX:YY:ZZ:WW:QQ:KK
15 XX:YY:ZZ:WW:QQ:KK
16 XX:YY:ZZ:WW:QQ:KK
17 XX:YY:ZZ:WW:QQ:KK
18 XX:YY:ZZ:WW:QQ:KK
19 XX:YY:ZZ:WW:QQ:KK
20 XX:YY:ZZ:WW:QQ:KK
21 XX:YY:ZZ:WW:QQ:KK
22 XX:YY:ZZ:WW:QQ:KK
9
10
1
2
3
4
5
6
7
8
11
12
13
14
15
16
17
18
19
20
21
22

3. How to get the mac-address of a machine

You can obtain these numbers easily issuing a command like:

ssh tuga10 " /sbin/ifconfig"

This example for computing node node tuga10 gives something like:

eth0      Link encap:Ethernet  HWaddr 00:1C:C0:10:E6:F5

          inet addr:192.168.50.22  Bcast:192.168.50.255  Mask:255.255.255.0

          inet6 addr: fe80::21c:c0ff:fe10:e6f5/64 Scope:Link

Highlighted above is the mac-address of tuga10 network card.

[1] ftp://ftp.scyld.com/pub/diag/

[2] http://gcc.gnu.org/