摘要: Contents [hide] 1 Introduction 2 Motivation 2.1 Why build a Network Bootable Beowulf Cluster 2.2 Why use Gentoo as the Base Di...
Contents [hide]
- 1 Introduction
- 2 Motivation
- 2.1 Why build a Network Bootable Beowulf Cluster
- 2.2 Why use Gentoo as the Base Distribution
- 3 Material
- 3.1 Master Node
- 3.2 Slave Nodes
- 3.3 Network Cards
- 4 Network Topology
- 5 Creating the SSI
- 6 Configuring the Master Node
- 7 Passwordless SSH Logon to the Nodes
and Node List Creation
- 7.1 Creating a passwordless key for all nodes
- 7.2 Auto Creating the HOSTFILE
- 8 What Works
- 9 What Doesn't Work
- 10 To Do
- 11 References
- 11.1 Diskless Solutions
- 11.2 Other
- 12 Footnotes
Introduction
There are already many existing Linux based clustering solutions out there that claim to provide an easy way to obtain/build a Linux Beowulf Cluster[1] . The fact is that most links out there are either dead or completely outdated. We'll concentrate on a more specific class that use a diskless node approach where the nodes boot off a Single System Image through the network interface (this process is explained in the Creating the SSI section below). The Clustermonkey web site[2] has an article[3] which alleviates the use of such a configuration in specific conditions depending on the intended use of the cluster. One of the key conditions where diskless nodes are useful is when there is a need to share file based data during the runs between the nodes. However, if all processes compute and manipulate the data independently, local storage becomes more interesting. In our case, the nodes do have a local disk that we could configure as a local "scratch pad" for such purposes.
Motivation
With the existence of commercial and non-commercial solutions (refer to the References at the end of the article), one must ponder as to why we would want to build our own cluster solution from bottom up. We'll provide a few of the key answers to this here.
Why build a Network Bootable Beowulf Cluster
- The key element here is maintenance and management. A typical Beowulf Cluster is made of many identical nodes which are to run essentially copies of the same software. Duplicating the installation onto those nodes and managing each noes becomes inefficient on a node to node basis. For this reason, a Single System Image (SSI) booted off the network makes much more sense. Further more, some of the changes to the SSI can be propagated to all nodes instantly.
- Adding a node becomes a simple task of adding it's MAC address to a config file and booting the node on the Beowulf network[4] .
Why use Gentoo as the Base Distribution
Gentoo is becoming more and more popular due to it's flexibility and managebility. Recent IEEE journal articles have been written on this subject so we won't debate this here[5] . Since hardware and software, in the Linux and research world, evolve at a frantic rate, the need for a fast evolving OS is more than necessary. Gentoo offers this technological bleeding edge as well as providing the means to easily integrating new packages to the system by the means of portage's ebuilds.
Material
In our proof of concept, we will be using the following material to build our mini-cluster:
Master Node
- Dual AMD Opteron(tm) Processor 244
- 2Gigs of RAM ROOT (/) partition in RAID1
- "ScratchPad" area made up of 4*SATA 120Gig in RAID0
Slave Nodes
- AMD Athlon(TM) XP 2500+
- 1Gig of RAM ROOT (/) is NFS mounted
- "ScratchPad" (/ScratchPad) is NFS mounted
- 1 local disk but not used in the present test configuration
Network Cards
- 8 port 10/100 D-Link switch
- 3c905C-TX Ethernet NICs on both Master and Slave nodes
Network Topology
We will use the most basic/common network topology for building this cluster. All Slave Nodes connect to one switch which in turn connect to the Master Node through a 100BaseT Ethernet network. The Master Node has two Ethernet devices to ensure that the Slave Nodes are on an isolated network. In theory, the nodes should not be accessed directly by the users and jobs are to be launched through the Master Node.
Creating the SSI
The steps to creating a Gentoo base Single System Image is documented in the Gentoo Diskless Client section of this wiki. Although a little bit general, it contains the key elements to creating an SSI image that will be usable by the Gentoo Headnode Configuration document. However, there are some applications we do implicitly add to nodes. Here is a short listing of some of the Gentoo packages:
sys-cluster/openmpi sys-cluster/torque sys-cluster/ganglia
Ganglia is used for monitoring the entire cluster. It's installation is detailed in the Gentoo Headnode Configuration document document. Note that the openmpi and torque ebuilds come from the www.gentooscience.org overlay.
Configuring the Master Node
This section is detailed by the Gentoo Headnode Configuration Document. Please refer to it for details on how to configure the Master Node.
Passwordless SSH Logon to the Nodes and Node List Creation
We use SSH to launch commands on each nodes as an alternative to RSH. There are many arguments to using RSH instead of the overhead of SSH. The fact is that SSH is more portable when it comes to carrying the environment over to the other nodes than RSH. Since SSH is only used for launching commands an not for the actual communications, there is no overhead added to the actual computation.
Creating a passwordless key for all nodes
Since your home directory is mounted across all nodes, you only need to create one key in your home directory and it will automatically be present on all nodes due to the NFS mounted nature of your $HOME. Here is the sequence to perform:
cd ~/.ssh/ ssh-keygen -t dsa -b 1024 -f id_dsa
The ssh-keygen command will prompt for a passphrase, don't enter anything since we don't want one to log onto the nodes. We then add the newly generated key to the authorized_keys:
cat id_dsa.pub >> authorized_keys
Now we must log onto all the nodes so that their unique signature is added to our ssh configuration. To make the process simpler, we can loop the process as such:
for Num in $(seq 1 24); do ssh thinkbig${Num} hostname; done
This will log you onto each nodes and get the hostname value (we use hostname so that ssh is only used to launch a simple command and doesn't actually open a session on the node). Here is an example output, note that some of the nodes aren't available (ssh: thinkbig20: Name or service not known) and some of them were already registered (they simply return their hostname):
eric@headless ~ $ for Num in $(seq 1 24); do ssh thinkbig${Num} hostname; done thinkbig1 The authenticity of host 'thinkbig2 (10.0.1.12)' can't be established. RSA key fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'thinkbig2,10.0.1.12' (RSA) to the list of known hosts. thinkbig2 The authenticity of host 'thinkbig3 (10.0.1.13)' can't be established. RSA key fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'thinkbig3,10.0.1.13' (RSA) to the list of known hosts. thinkbig3 ssh: thinkbig4: Name or service not known The authenticity of host 'thinkbig5 (10.0.1.15)' can't be established. RSA key fingerprint is 22:c1:2a:28:44:f2:1d:a6:7e:57:72:16:ee:d5:28:4c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added'thinkbig5,10.0.1.15' (RSA) to the list of known hosts. thinkbig5 ssh:thinkbig6: Name or service not known thinkbig7 ssh: thinkbig8: Name or service not known thinkbig9 thinkbig10 ssh: thinkbig11: Name or service not known thinkbig12 thinkbig13 ssh: thinkbig14: Name or service not known thinkbig15 thinkbig16 thinkbig17 thinkbig18 thinkbig19 ssh: thinkbig20: Name or service not known thinkbig21 ssh: thinkbig22: Name or service not known thinkbig23 thinkbig24
Auto Creating the HOSTFILE
The loop described above can also be used to generate a list of available nodes as such:
for Num in $(seq 1 24); do ssh thinkbig${Num} hostname 2> /dev/null | grep -e '^think' >> hostfile ; done
Only run this after having added the hosts to your known_hosts as performed by the loop above. The file named hostfile now contains:
eric@headless ~ $ cat hostfile thinkbig1 thinkbig2 thinkbig3 thinkbig5 thinkbig7 thinkbig9 thinkbig10 thinkbig12 thinkbig13 thinkbig15 thinkbig16 thinkbig17 thinkbig18 thinkbig19 thinkbig21 thinkbig23 thinkbig24
What Works
- Network booted nodes
- Ganglia monitoring of the nodes
- Passwordless login onto the nodes with SSH
- Local execution of OpenMPI on the Master Node:
kyron@headless ~ $ export LD_LIBRARY_PATH=/usr/lib/openmpi/1.0.2-gcc-4.1/lib64; /usr/lib/openmpi/1.0.2-gcc-4.1/bin/mpirun -np 2 hello Hello, world. I am 1 of 2 Hello, world. I am 0 of 2
What Doesn't Work
Execution of OpenMPI on the 32 bit nodes including the 64 bit head node... This is a heterogeneous issue.
To Do
- Configure OpenLDAP authentication for the nodes
- Install and configure a job management system such as Torque and/or Maui
References
Diskless Solutions
- Bootable Cluster CD (BCCD)[6] Based on a mix of BSD ports and Gentoo ebuilds, this bootable "cluster in a pocket" is geared towards educational use.
- Skyld Commercial, based on Redhat Enterprise and was created by the founders of Beowulf.
- ClusterKnoppix Free, based no the Knoppix bootable CD, is geared towards using OpenMosix for distributing the processing (process migration).
Other
Here, in no particular order:
- LinuxHPC
- Beowulf homepage Don't judge the homepage, check out the mailing list, that is where all the knowledge and experience lies.
Footnotes
- ↑ We refer to Beowulf Cluster in it's classical sens of the definition where the nodes can only be accessed by the head/master node.
- ↑ www.clustermonkey.net
- ↑ So Why Use Disks on Clusters?
- ↑ Depending on the management tools used, some other files might require modification and some daemon refreshing might be in order, we won't get into these dynamics at this point
- ↑ Gentoo Linux: the next generation of Linux
- ↑ Something Wonderful this Way Comes