How to Install an OSCAR Cluster Software Version 1.2 Documentation Version 1.2

How to Install an OSCAR Cluster
Software Version 1.2
Documentation Version 1.2
http://oscar.sourceforge.net/
[email protected]
The Open Cluster Group
http://www.openclutergroup.org/
February 5, 2002
1
Contents
1
Introduction
4
2
Quick Start OSCAR Installation on a Private Subnet
2.1 Why you shouldn’t do a Quick Start install . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Quick installation procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
5
3
Overview of SIS
8
4
Outline of Cluster Installation Procedure
9
5
Detailed Cluster Installation Procedure
5.1 Server Installation and Configuration
5.2 Initial OSCAR Server Configuration
5.3 Cluster Definition . . . . . . . . . .
5.4 Client Installations . . . . . . . . .
5.5 Cluster Configuration . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
13
14
18
20
A Network Booting Client Nodes
22
B What Happens During Client Installation
22
C Troubleshooting
C.1 Using LAM/MPI Instead of MPICH .
C.2 Managing machines and images . . .
C.3 Known Problems and Solutions . . . .
C.4 What to do about unknown problems?
C.5 Starting Over or Uninstalling OSCAR
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
24
26
26
D Security
D.1 Security layers . . . . . .
D.2 Router packet filtering . .
D.3 Network stack protections
D.4 Host based packet filtering
D.5 Tcpwrappers . . . . . . .
D.6 Service paring . . . . . . .
D.7 Service configuration . . .
D.8 Secure communications . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
26
26
27
27
28
28
28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
E Screen-by-Screen Walkthrough
29
2
List of Tables
1
OSCAR file directory layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
OSCAR Wizard. . . . . . . . . . . . . . . . . . . . . . . . . . .
Build the image. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Define the Clients. . . . . . . . . . . . . . . . . . . . . . . . . .
Collect client MAC addresses. . . . . . . . . . . . . . . . . . . .
Setup cluster tests . . . . . . . . . . . . . . . . . . . . . . . . . .
Getting OSCAR. . . . . . . . . . . . . . . . . . . . . . . . . . .
Unpacking OSCAR. . . . . . . . . . . . . . . . . . . . . . . . .
Running the install cluster script. . . . . . . . . . . . . .
Running the install cluster script. . . . . . . . . . . . . .
The OSCAR installation wizard. . . . . . . . . . . . . . . . . . .
Beginning step 1. . . . . . . . . . . . . . . . . . . . . . . . . . .
Step 1: Building the image. . . . . . . . . . . . . . . . . . . . . .
Step 1: Building the image, completed. . . . . . . . . . . . . . . .
Beginning step 2. . . . . . . . . . . . . . . . . . . . . . . . . . .
Step 2: Defining the clients. . . . . . . . . . . . . . . . . . . . . .
Step 2: Defining the clients, completed. . . . . . . . . . . . . . .
Beginning step 3. . . . . . . . . . . . . . . . . . . . . . . . . . .
Step 3: Setting up networking, scanning for MAC addresses. . . .
Booting the client. . . . . . . . . . . . . . . . . . . . . . . . . . .
Client is broadcasting. . . . . . . . . . . . . . . . . . . . . . . .
Step 3: Setting up networking, got MAC addresses, stop scanning.
Step 3: Setting up networking, assigning MAC address to client. .
Step 3: Setting up networking, configuring DHCP server. . . . . .
Booting the client. . . . . . . . . . . . . . . . . . . . . . . . . . .
Client broadcasting and is answered. . . . . . . . . . . . . . . . .
Client partitioning disk. . . . . . . . . . . . . . . . . . . . . . . .
Client installing image. . . . . . . . . . . . . . . . . . . . . . . .
Nodes have finished the install. . . . . . . . . . . . . . . . . . . .
Beginning step 4. . . . . . . . . . . . . . . . . . . . . . . . . . .
Step 4: Post installation . . . . . . . . . . . . . . . . . . . . . . .
Beginning step 5. . . . . . . . . . . . . . . . . . . . . . . . . . .
Step 5: Setting up tests . . . . . . . . . . . . . . . . . . . . . . .
Step 5: Executing tests as a non-root user . . . . . . . . . . . . .
Step 5: Executing tests as a non-root user . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
16
18
19
21
29
30
31
32
33
34
35
35
36
37
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
53
54
1
Introduction
The OSCAR cluster installation HowTo is provided as an installation guide to users, as well as a detailed
explanation of what is happening as you install. This document does not describe what OSCAR is however.
For an overview of OSCAR and the intentions behind it see the oscar introduction document located
in the docs subdirectory. A list of software and hardware requirements for OSCAR can be found in the
oscar requirements document as well. There are both “Quick Start” and detailed installation sections
in this guide. Due to the complicated nature of putting together a high-performance cluster, it is strongly
suggested that you read this document through, without skipping any sections, and then use the detailed
installation procedure to install your OSCAR cluster. Novice users will be comforted to know that anyone
who has installed and used Linux can successfully navigate through the OSCAR cluster install. If you are
too impatient to read manuals, and your cluster meets certain requirements, feel free to try your luck with
the “Quick Start” installation.
Let’s start with a few basic terms and concepts, so that everyone is starting out on the same level. The
most important is the term cluster, which when mentioned herein refers to a group of individual computers
bundled together using hardware and software in order to make them work as a single machine. Each
individual machine of a cluster is referred to as a node. Within the OSCAR cluster to be installed, there
are two types of nodes, server and client. A server node is responsible for servicing the requests of client
nodes. A client node is dedicated to computation. The OSCAR cluster to be installed will consist of one
server node and a number of client nodes, where all the client nodes have homogeneous hardware. The
software contained within OSCAR does support doing multiple cluster installs from the same server, but no
documentation is provided on how to do so. In addition, OSCAR does not support installation of additional
client nodes after the initial cluster installation is performed, although this functionality is planned for later
releases.
The rest of this document is organized as follows. First, a “Quick Start”section is provided for people
who hate manuals. The “Quick Start” section is not for the Linux or clustering novice. Then, an overview
is given for the installation software used in OSCAR, known as SIS. Third, an outline is given of the entire
cluster install procedure, so that users have a general understanding of what they will be doing. Next,
the cluster installation procedure is presented in much detail. The level of detail lies somewhere between
“the install will now update some files” and “the install will now replace the string ‘xyz’ with ‘abc’ in file
some file.” The reasoning behind providing this level of detail is to allow users to fully understand what
it takes to put together a cluster, as well as to allow them to help troubleshoot any problems, should some
arise. Last, but certainly not least, are the appendices. Appendix A covers the topic of network booting
client nodes, which is so important that it deserved its own section. Appendix B provides curious users an
overview of what really happens during a client install. Appendix C tackles troubleshooting, providing fixes
to known problems and where to find help for unknown problems, as well as telling how to start over with a
new OSCAR install. And finally, Appendix D covers some security aspects of a linux cluster.
4
2
Quick Start OSCAR Installation on a Private Subnet
2.1
Why you shouldn’t do a Quick Start install
If you meet the following criteria and are very brave, you can try to install your cluster using the brief,
mysterious, cryptic, terse, and obscure documentation in this section. Otherwise, please do the right thing
and read and use the detailed installation procedure section of this installation guide. If you don’t understand
any of the following criteria, then you probably don’t meet them:
1. You hate to read manuals, documentation, or other printed matter.
2. You are a unix guru. (You can uset netcfg or an editor to configure network interfaces.)
3. Your cluster client machines are on a private subnet.
4. Your cluster server machine has two network interfaces, one public, and one connected to the cluster
client machines.
5. The default RPM list supplied by OSCAR (oscar-1.2/oscarsamples/sample.rpmlist)
is acceptable for your clients.
6. You are using the Red Hat 7.1 distribution.
7. You know how to install the Red Hat 7.1 distribution on a machine.
8. You’ve never read completely through the installation instructions of anything, ever.
If you decide that the quick install isn’t for you, see the detailed installation instructions in section
5.Throughout the quick install instructions, there is a reference to the appropriate section in the detailed
installation instructions in parenthesis at the end of each step.
2.2
Quick installation procedures
Note: All actions specified herein should be performed by the root user on the service machine unless
noted otherwise.
1. Install (or already have installed) Linux on your server machine. The only requirements for your
Linux installation are:
(a) There should be approximately 2 gigabytes free space in both the / and /var filesystems. It is
simplest if you just create 1 large partition on the entire drive.(Detail:Section 5.1.2)
(b) Some X environment such as GNOME or KDE must be installed.
(c) Networking must be set up and working on the public interface. (Do yourself a favor and install
some type of network security if your system is exposed to the general internet.)
(d) The second network interface for the private cluster network must be installed.
(Detail:Section 5.1.1)
2. After the installation is complete, log on as root to the server machine.
5
3. Download and unpack OSCAR with these commands:
# cd ∼
# ncftp ftp.sourceforge.net
ncftp / > cd pub/sourceforge/oscar
ncftp /pub/sourceforge/oscar > get oscar-1.2.tgz
ncftp /pub/sourceforge/oscar > quit
# tar -zxf oscar-1.2.tgz
Ignore the configure script, it is not used in the installation process of OSCAR.
(Detail:Section 5.1.3)
4. Copy the rpms from all of the Red Hat 7.1 CD’s to /tftpboot/rpm using these commands:
(insert cd)
# cp /mnt/cdrom/Redhat/RPMS/*.rpm /tftpboot/rpm
(Detail:Section 5.1.5)
5. Configure the second (private) cluster network adapter using the linux /usr/sbin/netcfg command or your favorite editor. Set the interface IP address to 10.0.0.250, set the interface configuration
protocol to “none”, and set the interface to activate at boot time. Then reboot your machine. and make
sure that the private cluster interface is properly setup and activated. (Detail:Section 5.1.4)
6. After the reboot is complete, log on as root to the server machine.
7. To start the OSCAR cluster installation, in the X environment do the following command:
cd ∼ /oscar-1.2
./install cluster eth1
In the above command, substitute the device name (e.g., eth1) for your server’s private network ethernet adapter. After install cluster successfully completes some configuration, it will display
the OSCAR wizard. (Detail:Section 5.2.1)
8. Press the Build OSCAR Client Image button. This pops up the ”Create a SystemImager Image” window. Verify that the last part of the filename in the Disk Partition File field matches
the type of drives that are in your clients. OSCAR ships sample.disk.ide & sample.disk.scsi
in the oscarsamples directory. Press the BuildImage button to build a SystemImager image
for installation to the compute nodes. You will see a progress bar across the bottom of the window
as the image is built. A dialog will pop up when the build is complete. If it is successful, press the
Close button on the pop up window and then the Close button on the ”Build a SystemImager Image” window. You have just built an installation image on the server, with the name oscarimage.
(Detail:Section 5.3.1)
9. Press the Define OSCAR Clients button on the OSCAR Wizard. On the window that is opened,
fill in the number of client nodes that your cluster contains. Verify that the IP information is correct
and correct if it is not. Pay special attention to the domain name. Make sure one is filled in as it is
6
required. Press the Addclient button to define the clients. Once that is complete a dialog will pop
up. If it is successful, press the Close button on the pop up window and then the Close button on
the ”Define OSCAR Clients” window. (Detail:Section 5.3.2)
10. Press the Setup Networking button on the OSCAR Wizard. The window that is opened will
help you collect MAC addresses and configure the remote boot services. If your client nodes do not
support PXE booting, you will need to create a boot diskette. Put a diskette in the floppy drive and
press the Build Autoinstall Floppy button. You may create several diskettes if you like.
(Detail:Section 5.3.3)
11. Press the Collect MAC Addresses button. The wizard will start scanning the network as indicated in the message at the top of the window. (Detail:Section 5.3.3)
12. Now, you need to network boot your nodes. This is done in one of 2 ways:
• If your nodes support PXE booting, set the boot order in the BIOS to have the network adapter
first.
• If your nodes do not support PXE, insert the boot floppy created in the previous step and boot
off the floppy
(Detail:Appendix A)
13. As the nodes broadcast over the network, the MAC addresses detected will show up on the left side of
the window. Select a MAC address and select its corresponding client. Press the Assign Mac to
Node button to give the node that MAC address. (Detail:Section 5.3.3)
14. When you have assigned a MAC address to each node, press the Stop Collecting button to stop
scanning the network. Then press the Configure DHCP Server to setup the server to answer
the client requests. (Detail:Section 5.3.3)
15. If your clients support PXE booting, press the Setup Network Boot button to configure the
server to answer PXE requests. (Detail:Section 5.3.3)
16. Press Close to dismiss the network setup window.
17. Network boot the clients, as you did when collecting the MAC addresses, again. This time the clients
will be answered by the server and will perform the installation. The install should only take about
3-5 minutes depending on your hardware and your network. You can install multiple clients simultaneously. (Detail:Section 5.4)
18. As each client finishes, it will start beeping and printing a message to the console. You should now
reboot them off of thier newly installed hard disks.
• If you are using PXE boot, reboot the clients and set the BIOS boot order to boot to hard disk
before the network.
• If you are using auto install diskettes, just remove the diskette and reboot
(Detail:Section 5.4.2)
19. To verify that the clients are installed properly use ping clients script. In a terminal as the root
user issue ∼ /oscar-1.2/scripts/ping clients. (Detail:Section 5.4.4)
7
20. Once all the clients have successfully booted, Press the Complete Cluster Setup button. Your
OSCAR cluster is now installed and configured. (Detail:Section 5.5.1)
21. To verify proper installation, you should test the cluster. OSCAR provides a test suite to verify the
basic cluster functions. To perform the test, do the following:
(a) Press the Test Cluster Setup button. A window will pop up that will ask for a non-root
user.
(b) Enter the user name of a non-root user. If this user doesn’t exist it will create it. The OSCAR
test suite will be installed in the user’s home directory.
i. Open a new terminal and login as the user given above.
ii. change directory to ∼ /OSCAR test
iii. run ./test cluster
iv. enter the number of cluster nodes and the number of processors per node when prompted.
The test will submit simple jobs to the cluster and print the output. If there are any errors,
there is a problem in your installation.
(Detail:Section 5.5.2)
3
Overview of SIS
The first question you may have is what is SIS. The System Installation Suite (SIS) is a cluster installation
tool developed by the collaboration of the IBM Linux Technology Center and the SystemImager team. The
main reason that SIS was chosen as the installation mechanism was that it does not require that client nodes
already have Linux installed. SIS also has many other distinguishing features that make it the mechanism of
choice. The most highly used quality of SIS in the OSCAR install is the cluster information database that it
maintains. The database contains all the information on each node needed to both install and configure the
cluster. A second desirable quality is that SIS makes use of the Red Hat Package Manager (RPM) standard
for software installation, which simplifies software installation tremendously. Another quality, which up to
this point has not been taken advantage of in OSCAR, is the heterogeneous nature of SIS, allowing clients
to contain not only heterogeneous hardware, but heterogeneous software as well. An obvious application of
the future use of this quality is in the installation of heterogeneous clusters, allowing certain clients to be
used for specialized purposes.
In order to understand some of the steps in the upcoming install, you will need knowledge of the main
concepts used within SIS. The first concept is that of an image. In SIS, an image is defined for use by the
cluster nodes. This image is a copy of the operating system files stored on the server. The client nodes
install by replicating this image to their local disk partitions. Another important concept from SIS it the
client definition. A SIS client is defined for each of your cluster nodes. These client definitions keep track
of the pertinent information about each client. The server machine is responsible for creating the cluster
information database and for servicing client installation requests. The information that is stored for each
client includes;
• IP information like hostname, IPaddress, route.
• Image name.
Each of these pieces of information will be discussed further as part of the detailed install procedure.
For additional information on the concepts in SIS and how to use it, you should refer to the SIS manpage.
In addition, you can visit the SIS web site at http://sisuite.org for recent updates.
8
4
Outline of Cluster Installation Procedure
Notice: You do not perform many of the steps in this outline, as they are automated by the install. It will be
clear in the detailed procedure what exactly you are expected to do.
1. Server Installation and Configuration
(a) install Linux on server machine
(b) get OSCAR distribution
(c) configure ethernet adapter for cluster
(d) copy Red Hat rpms from cds
2. Initial OSCAR Server Configuration
(a) creates OSCAR directories
(b) installs necessary software
•
•
•
•
•
•
•
Parallel Virtual Machine (PVM)
C3
LAM
OpenPBS
Maui
SIS
services
i. Network File System (NFS)
ii. Dynamic Host Configuration Protocol (DHCP)
iii. Rsync
iv. OpenSSH
(c) updates some system files
(d) updates system startup scripts
(e) starts/restarts affected services
3. Cluster Definition
(a) build and customize image
i.
ii.
iii.
iv.
v.
install Red Hat rpms
install OSCAR rpms
customize and copy system and user files
setup nfs mount for /home
generate ssh keys
(b) define disk partitioning and filesystems
(c) define clients
i. update hosts files on server and in images.
(d) collect client MAC addresses
(e) setup remote booting, network or diskette
9
4. Client Installations
(a) boot clients to start install
(b) reboot each client when finished installing
5. Complete cluster setup
(a) Get number of processors from clients
6. Test cluster
(a) test PBS
(b) test MPICH
(c) test LAM
(d) test PVM
7. Cluster setup complete!
5 Detailed Cluster Installation Procedure
Note: All actions specified herein should be performed by the root user.
5.1
Server Installation and Configuration
During this phase, you will prepare the machine to be used as the server for using OSCAR.
5.1.1
Install Linux on the server machine
If you have a machine you want to use that already has Linux installed, you may use it and continue with
the next section. When installing Linux, it is required that you use a distribution that is based upon the RPM
standard. Furthermore, it should be noted that all testing up to this point has been done using the Red Hat 7.1
distribution. As such, use of distributions other than Red Hat 7.1 will require a porting of OSCAR, as many
of the scripts and software within OSCAR are dependent on Red Hat. Do not worry about doing a custom
install, as OSCAR contains all the software on which it depends. The only other installation requirement
is that some X environment such as GNOME or KDE must be installed. Therefore, a typical workstation
install is sufficient.
If you install Red Hat 7.1 on the server machine, during the installation you should enable the ipchainsbase firewall that is included with the Red Hat distribution in medium mode. Other firewalls that are stronger
and more versatile can be installed later, but this will offer some protection until that time. Note that OSCAR
currently assumes that only the server machine is exposed to the general network, with the server and the
rest of the cluster’s machines being on a private network. To keep the Red Hat firewall from interfering with
network traffic between the server machine and the other machines in the cluster, OSCAR automatically
disables portions of the Red Hat firewall. This may not have the intended results in the situation where
all the cluster machines are exposed on the general network. See Appendix D for more information about
firewalls and other security software that can be installed.
10
5.1.2
Disk space and directory considerations
OSCAR has certain requirements for server disk space. Space will be needed to store the Linux rpms and
to store the images. The rpms will be stored in /tftpboot/rpm. Approximately 1 gigabyte is required to store
the rpms. The images are stored in /var/lib/systemimager and will need approximately 1 gigabyte per image.
Only 1 image is required, although you may want to create more in the future.
If you are installing a new server, it is suggested that you allow for 2 gigabytes in both the /, which
contains /tftpboot, and /var filesystems when partitioning the disk on your server.
If you are using an existing server, you will need to verify that you have enough space on the disk
partitions. Again 2 gigabytes of free space is recommended in both the / and /var partitions.
You can check the amount of free space on your drive’s partitions by issuing the command df -h in a
terminal. The result for each file system is located below the Avail heading. If your root (/) partition has
enough free space, enter the following command in a terminal:
mkdir -p /tftpboot/rpm
If your root partition does not have enough free space, create the directories on a different partition
that does have enough free space and create links to them from the root (/) directory. For example, if the
partition containing /usr contains enough space, you could do so by using the following commands:
mkdir -p /usr/tftpboot/rpm
ln -s /usr/tftpboot /tftpboot
The same procedure should be repeated for the /var/lib/systemimager subdirectory.
5.1.3
Get a copy of OSCAR and unpack on the server
If you are reading this, you probably already have a copy. If not, go to http://oscar.sourceforge.
net/ and download the latest OSCAR tarball, which will be named something like oscar-1.2.tgz.
The version used in these instructions is 1.2, which you should replace with the version you download in
any of the sample commands. Copy the OSCAR tarball to a directory such as /root on your server. There
is no required installation directory, except that you may not use /usr/local/oscar, which is reserved
for special use. Do not unpack the tarball on a Windows based machine and copy the directories over to
the server, as this will convert all the scripts to the dreaded “DOS” format and will render them useless
under Linux. Assuming you placed the OSCAR tarball in /root, open a command terminal and issue the
following commands to unpack OSCAR:
cd /root
tar -zxvf oscar-1.2.tgz
The result is the creation of an OSCAR directory structure that is laid out as show in Table 1 (again
assuming /root).
Notice that the base directory has a configure script – ignore it. The configure script is not used in the
installation process of OSCAR unless you need to build the documentation from its LaTeX source.
5.1.4 Configure the ethernet adapter for the cluster
Assuming you want your server to be connected to both an external network and the internal cluster subnet,
you will need to have two ethernet adapters installed in the server. It is preferred that you do this because
11
Directory
/root/oscar-1.2/
/root/oscar-1.2/COPYING
/root/oscar-1.2/README.first
/root/oscar-1.2/packages
/root/oscar-1.2/docs
/root/oscar-1.2/install cluster
/root/oscar-1.2/oscarsamples
/root/oscar-1.2/scripts
/root/oscar-1.2/testing
Contents
the base OSCAR directory
GNU General Public License v2
README first document
contains rpms and installation files for the OSCAR packages
OSCAR documentation directory
main installation script
contains sample OSCAR files
contains scripts that do most of the work
contains OSCAR Cluster Test software
Table 1: OSCAR file directory layout.
exposing your cluster may be a security risk, and certain software used in OSCAR such as DHCP may
conflict with your external network. Once both adapters have been physically installed and you have booted
Linux into an X environment, open a terminal and enter the command:
/usr/sbin/netcfg &
The network configuration utility will be started, which you will use to configure your network adapters.
At this point, the Names panel will be active. On this panel you will find the settings for the server’s
hostname, domain, additional search domains, and name servers. All of this information should have been
filled in by the standard Linux installation. To configure your ethernet adapters, you will need to first press
the Interfaces button to bring up the panel that allows you to update the configuration of all of your
server machines interfaces. You should now select the interface that is connected to the cluster network
by clicking on the appropriate device. If your external adapter is configured on device “eth0”, then you
should most likely select “eth1” as the device, assuming you have no other adapters installed. After
selecting the appropriate interface, press the Edit button to update the information for the cluster network
adapter. Enter a private IP address 1 and the associated netmask2 in their respective fields. Additionally, you
should be sure to press the Activate interface at boot time button and set the Interface
configuration protocol to “none”. After completing the updates, press the Done button to return
to the main utility window, pressing the Save button in the Save current configuration menu that pops
up. Then press the Save button at the bottom of the main network configuration window to confirm your
changes, and then press the Quit to leave the network configuration utility.
Now reboot your machine to ensure that all the changes are propagated to the appropriate configuration
files. To confirm that all ethernet adapters are in the UP state, once the machine has rebooted, open another
terminal window and enter the following command:
/sbin/ifconfig -a
1
There are three private IP address ranges: 10.0.0.0 to 10.255.255.255; 172.16.0.0 to 172.32.255.255; and 192.168.0.0 to
192.168.255.255. Additional information on private intranets is available in RFC 1918. You should not use the IP addresses
10.0.0.0 or 172.16.0.0 or 192.168.0.0 for the server. If you use one of these addresses the network installs of the client nodes will
fail (rpc has problems).
2
The netmask 255.255.255.0 should be sufficient for most OSCAR clusters.
12
You should see UP as the first word on the third line of output for each adapter. If not, there is a problem
that you need to resolve before continuing. Typically, the problem is that the wrong module is specified for
the given device. Try using the network configuration utility again to resolve the problem.
5.1.5
Copy distribution rpms to /tftpboot/rpm
In this step, you need to copy the rpms included with your Linux distribution into the /tftpboot/rpm
directory. Insert each of the distribution CDs in turn. When each one is inserted, linux will automatically
make the contents of the CD be available in the /mnt/cdrom directory. Then for each CD locate the
directory that contains the rpms. In Red Hat 7.1, the rpms are located in the RedHat/RPMS directory,
which will appear on the system as the /mnt/cdrom/RedHat/RPMS directory. After locating the rpms
on the each CD, copy them into /tftpboot/rpm with a command such as:
cp /mnt/cdrom/RedHat/RPMS/*.rpm /tftpboot/rpm
Be sure to repeat the above process for both CDs when using Red Hat 7.1. After using each CD you will
have to unmount it from the local file system by issuing these commands:
cd
umount /mnt/cdrom
If you wish to save space on your server’s hard drive and will be using the default RPM list supplied with
OSCAR (see Section 5.3), you should only copy over the rpms listed in the sample. For the Red Hat 7.1
distribution, this can be done for each CD with a command sequence like this:
cd /mnt/cdrom/RedHat/RPMS
cat /root/oscar-1.2/oscarsamples/sample.rpmlist | \
xargs -i sh -c "cd ‘pwd‘;ls -1 {}*.rpm" 2>/dev/null | \
xargs -i -t cp -f ’{}’ /tftpboot/rpm
5.2
Initial OSCAR Server Configuration
During this phase, the software needed to run OSCAR will be installed on the server. In addition, some
initial server configuration will be performed. The steps from here forward should be run within the X
environment, due to the graphical nature of the OSCAR.
5.2.1
Change to the OSCAR directory and run install cluster
If the OSCAR directory was placed in /root for example, you would issue the following commands:
cd /root/oscar-1.2
./install cluster eth0
In the above command, substitute the device name (e.g., eth0) for your server’s internal ethernet
adapter. Also note that the install cluster script must be run from within the OSCAR base directory
as shown above. The script will first run the part one server configuration script, which does the following:
1. copies OSCAR rpms to /tftpboot/rpm
13
2. installs all OSCAR server rpms
3. updates /etc/hosts with OSCAR aliases
4. updates /etc/exports
5. adds OSCAR paths to /etc/profile
6. updates system startup (/etc/rc.d/init.d) scripts
7. restarts affected services
8. if all the above is successful, launch oscar wizard
The wizard, as shown in Figure 1, is provided to guide you through the rest of the cluster installation.
To use the wizard, you will complete a series of steps, with each step being initiated by the pressing of a
button on the wizard. Do not go on to the next step until the instructions say to do so, as there are times
when you must complete an action outside of the wizard before continuing on with the next step. For each
step, there is also a Help button located directly to the right of the step button. When pressed, the Help
button displays a message box describing the purpose of the step.
As each of the steps are performed, there is output generated that is displayed to the user.
5.3
Cluster Definition
During this phase, you will complete steps one through five of the OSCAR wizard in defining your cluster.
If you encounter problems or wish to redo any of the SIS actions performed in the wizard steps 1, or 2,
please refer to the SIS man pages.
5.3.1 Build the Image
Press the Step 1 button of the wizard entitled Build OSCAR Client Image. A dialog will be displayed. In most cases, the defaults will be sufficient. You should verify that the disk partition file is the
proper type for your client nodes. The sample files have the disk type as the last part of the filename. You
may also want to change the post installation action and the IP assignment methods. It is important to
note that if you wish to use automatic reboot, you should make sure the BIOS on each client is set to
boot from the local hard drive before attempting a network boot by default. If you have to change the
boot order to do a network boot before a disk boot to install your client machines, you should not use
automatic reboot. Once you are satisfied with the input, click the Build Image button.
Building the image will take a few minutes, the progress bar on the bottom will give you the status and
a small dialog will appear when the image is complete.
A sample dialog is shown in Figure 2.
Customizing your image The defaults of this panel use the sample disk partition and rpm package files
that can be found in the oscarsamples directory. At some point, you may want to customize these files
to make the image suit your particular requirements.
14
Figure 1: OSCAR Wizard.
15
Figure 2: Build the image.
Disk partitioning The disk partition file contains a line for each partition desired, where each line is
in the following format:
<partition>
<size in megabytes>
<type>
<mount point>
<options>
Here is a sample:
/dev/sda1
/dev/sda5
/dev/sda6
24
128
1000
ext2
swap
ext2
/boot
/
rw
rw
The last partition specified will grow to fill the entire disk. You can create your own partition files, but
make sure that you don’t exceed the physical capacity of your client hardware. The sample listed above, and
some others, are in the oscarsamples directory.
Package lists The package list is simply a list of rpm file names, 1 per line. Be sure and include all of
the prerequisites that any packages you might add have. You do not need to specify the architecture portion
of the filename, or the .rpm extension.
Custom kernels
Follow these steps:
If you want to use a customized kernel, you can add it to the image after it is built.
1. Copy the kernel and associated files (System.map, module-info, etc.) into the
/var/lib/systemimager/images/<imagename>/boot directory.
16
2. Edit the systemconfig.conf file in the
/var/lib/systemimager/images/<imagename>/etc/systemconfig/ directory and
change the PATH parameter in the [KERNEL0] stanza to match your new kernel’s filename.
3. When the clients are installed, they will copy over and boot your new kernel.
Note that there is no lilo.conf file in the image. That file is created by System Configurator from the
contents of the systemconfigurator.conf file.
See the systemconfig.conf man page for full details on the contents of this file.
5.3.2
Define your client machines
Press the Step 2 button of the wizard entitled Define OSCAR Clients. In the dialog box that is displayed, enter the appropriate information. Again the defaults will be correct in most cases. At a minumum,
you will need to enter a value in the Number of Hosts to specify how many clients you want to create.
1. The Image Name field should specify the image name that was used to create the image in Step 1.
2. The Domain Name field should be used to specify the client’s IP domain name. This field must
have a value. It should contain the server’s domain if it has one. If it is blank, enter a domain like
oscardomain.
3. The Base name field is used to specify the first part of the client name and hostname. It will have
an index appended to the end of it.
4. The Number of Hosts field specifys how many clients to create.
5. The Starting Number specifies the index to append to the Base Name to derive the first client
name. It will be incremented for each subsequent client.
6. The Starting IP specifies the IP address of the first client. It will be incremented for each subsequent client.
7. The Subnet Mask specifies the IP netmask for the clients.
8. The Default Gateway specifies the default route for the clients.
When finished entering information, press the Addclients button. A sample dialog is shown in
Figure 3.
After the clients are created, a dialog will pop up with the completion status. After closing that you may
press the Close button and continue with the next step.
5.3.3
Collect client MAC addresses and Setup Networking
The MAC address of a client is a twelve hex-digit hardware address embedded in the client’s ethernet
adapter. MAC addresses look like, 00:0A:CC:01:02:03, as opposed to the familiar format of IP addresses.
These MAC addresses uniquely identify client machines on a network before they are assigned IP addresses.
DHCP uses the MAC address to assign IP addresses to the clients.
In order to collect the MAC addresses, press the Step 3 button of the wizard entitled Setup Networking. The OSCAR network utility dialog box will be displayed. To use this tool, you will need to
know how to network boot your client nodes. For instructions on doing so, see Appendix A. A sample
dialog is shown in Figure 4.
17
Figure 3: Define the Clients.
To start the collection, press the Start Collecting button and then network boot the first client.
As the clients broadcast, their MAC addresses will show up in the left hand window. Select a MAC address
and the appropriate client in the right side window. Click Assign MAC to node to associate that MAC
address with that node. If you would like to make specific nodes associated with specific client definitions,
you should boot them one at a time. If you don’t care which node gets associated with which client, you
may boot them all at once and randomly assign the MAC addresses.
When you have collected all of the MAC addresses, click the Stop Collecting button and then
click the Setup DHCP Server button to configure it.
You may also configure your remote boot method from this panel. The Build Autoinstall
Floppy button will build a boot floppy for client nodes that do not support PXE booting. The Setup
Network Boot button will configure the server to answer PXE boot requests if your client hardware supports it. See Appendix A for more details. When you have collected the addresses for all your client nodes,
and completed the networks setup press Close.
5.4
Client Installations
During this phase, you will network boot your client nodes and they will automatically be installed and
configured as specified in Section 5.3 above. For a detailed explanation of what happens during client
installation, see Appendix B.
5.4.1
Network boot the client nodes
See Appendix A for instructions on network booting clients.
18
Figure 4: Collect client MAC addresses.
19
5.4.2
Check completion status of nodes
After a few minutes, the clients should complete the installation. You can watch the client consoles to
monitor the progress. Depending on the Post Installation Action you selected when building the image, the
clients will either halt, reboot or beep incessantly when the installation is completed. The time required
for installation deepends on the capabilities of your server, your clients, your network, and the number of
simultaneous client installations, it should complete within a few minutes.
5.4.3
Reboot the client nodes
After confirming that a client has completed its installation, you should reboot the node from its hard drive.
If you chose to have your clients reboot after installation, they will do this on their own. If the clients are
not set to reboot, you must manually reboot them. The filesystems will have been unmounted so it is safe to
simply reset or power cycle them. Note: If you had to change the BIOS boot order on the client to do a
network boot before booting from the local disk, you will need to reset the order to prevent the node
from trying to do another network install.
5.4.4
Check network connectivity to client nodes
In order to perform the final cluster configuration, the server must be able to communicate with the client
nodes over the network. If a client’s ethernet adapter is not properly configured upon boot, however, the
server will not be able to communicate with the client. A quick and easy way to confirm network connectivity is to do the following (assuming OSCAR installed in ∼ root):
cd ∼ /oscar-1.2/scripts
./ping clients
The above commands will run the ping clients script, which will attempt to ping each defined client
and will print a message stating success or failure. If a client cannot be pinged, there was a problem configuring the ethernet adapter, and you will have to log in to the machine and manually configure the adapter.
Once all the clients have been installed, rebooted, and their network connections have been confirmed, you
may proceed with the next step.
5.5
Cluster Configuration
During this phase, the server and clients will be configured to work together as a cluster.
5.5.1 Complete the cluster configuration
Press the Step 4 button of the wizard entitled Complete Cluster Setup. This will run the post install
script, which does the following:
1. Queries number of processors from the client nodes.
Note that any users created on the server after the OSCAR installation will not be in the password/group
files of the clients until they have been synced with the server - you can accomplish this using the C3 cpush
tool.
20
5.5.2
Test your cluster using the OSCAR Cluster Test software
Provided along with OSCAR is a simple test to make sure the key cluster components (PBS, MPI, and PVM)
are functioning properly.
Press the Step 5 button of the wizard entitled Test Cluster Setup. This will open a window that
will test the server services and then ask for a non-root user to use for the testing. A sample dialog is shown
in Figure 5.
Figure 5: Setup cluster tests
Once the test setup completes do the following steps:
1. open another terminal
2. login as the user you gave in the setup
3. change to the OSCAR test directory
4. run the test cluster script, answering the questions that are asked.
5.5.3
Congratulations!!
Your cluster setup is now complete. Your cluster nodes should be ready for work.
21
A
Network Booting Client Nodes
There are two methods available for network booting your client nodes. The first is to use the Preboot
eXecution Environment (PXE) network boot option in the client’s BIOS, if available. If the option is not
available, you will need to create a network boot floppy disk using the SystemImager boot package. Each
method is described below.
1. Network booting using PXE. To use this method, your client machines’ BIOS and network adapter
will need to support PXE version 2.0 or later. The PXE specification is available at http://
developer.intel.com/ial/wfm/tools/pxepdk20/index.htm. Earlier versions may
work, but experience has shown that versions earlier than 2.0 are unreliable. As BIOS designs vary,
there is not a standard procedure for network booting client nodes using PXE. More often than not,
the option is presented in one of two ways.
(a) The first is that the option can be specified in the BIOS boot order list. If presented in the
boot order list, you will need to set the client to have network boot as the first boot device.
In addition, when you have completed the client installation, remember to reset the BIOS and
remove network boot from the boot list so that the client will not attempt to do the installation
again.
b)The second is that the user must watch the output of the client node while booting and press a
specified key such as “N” at the appropriate time. In this case, you will need to do so for each
client as it boots.
2. Network booting using a SystemImager boot floppy. The SystemImager boot package is provided
with OSCAR just in case your machines do not have a BIOS network boot option. You can create
a boot floppy through the oscar wizard on the Setup Networking panel or by using the
mkautoinstalldiskette command.
Once you have created the SystemImager boot floppy, set your client’s BIOS to boot from the floppy
drive. Insert the floppy and boot the machine to start the network boot. Check the output for errors to make
sure your network boot floppy is working properly. Remember to remove the floppy when you reboot the
clients after installation.
B
What Happens During Client Installation
Once the client is network booted, it either boots off the autoinstall diskette you created or uses PXE to
network boot and loads the install kernel. It then broadcasts a BOOTP/DHCP request to obtain the IP address
associated with its MAC address. The DHCP server provides the IP information and the client looks for its
auto-install script in /var/lib/systemimager/scripts/. The script is named ¡nodename¿.sh and
is a symbolic link to the script for the desired image. The auto-install script is the installation workhorse,
and does the following:
1. partitions the disk as specified in the image in <imagedir>etc/systemimager/partitionschemes.
2. mounts the newly created partitions on /a
3. chroots to /a and uses rsync to bring over all the files in the image.
4. calls systemconfigurator to customize the image to the client’s particular hardware and configurate.
22
5. unmounts /a
Once clone completes, the client will either reboot, halt or beep as specified when defining the image.
C
Troubleshooting
C.1
Using LAM/MPI Instead of MPICH
Both LAM/MPI and MPICH are installed on all nodes in an OSCAR cluster. As of OSCAR version 1.2,
MPICH is the default MPI implementation for all users. LAM/MPI can be made the default for all users, or
on a user-by-user basis.
Scripts in the /etc/profile.d directory add both LAM/MPI and MPICH to each user’s environment. Scripts in this directory are executed in sorted order. Four scripts in particular are relevant:
/etc/profile.d/mpi-00mpich.sh
/etc/profile.d/mpi-00mpich.csh
/etc/profile.d/mpi-01lam.sh
/etc/profile.d/mpi-01lam.csh
Each script adds the respective MPI implementation to the user’s environment by appending a directory
to the end of the user’s path. Since the MPICH scripts are run before the LAM scripts, the MPICH bin
directory is added to the user’s path before the LAM bin directory. Hence, the MPICH binaries are found
before the LAM binaries. For example, in the bash shell, the command:
% which mpirun
will show the path for the MPICH mpirun.
Please note that future versions of OSCAR will make the process of switching between LAM/MPI and
MPICH (for all users and for individual users) much simpler.
C.1.1 Making LAM/MPI the Default for All Users
To set the default environment for all users to use LAM/MPI instead of MPICH, either delete the mpi00mpich* files, or rename them to have a number higher than 01 so that they will be executed after the
LAM script. For example:
% cd /etc/profile.d
% mv mpi-00mpich.sh mpi-05mpich.sh
% mv mpi-00mpich.csh mpi-05mpich.csh
C.1.2
Making LAM/MPI the Default for an Individual User
If inidividual users want to use LAM/MPI instead of MPICH, they should edit their shell-setup file to invoke
the LAM profile.d setup script before all the other /etc/profile.d scripts. For example, bash
users can typically edit their .bashrc file to add the following line before /etc/bashrc is invoked:
. /etc/profile.d/mpi-01lam.sh
Similarly, csh-style shell users can source the mpi-01lam.csh script in their .cshrc file.
23
C.2
Managing machines and images
During the life of your cluster, you may want to delete unused machines or images, create new images, or
change the image that a client uses. Currently OSCAR doesn’t have a direct interface to do this, but you can
use the SIS commands directly. Here are some useful examples:
• To list all defined machines, run:
mksimachine --List
• To list all defined images, run:
mksiimage --List
• To delete an image, run:
mksiimage --Delete --name <imagename>
• To delete a machine, run:
mksimachine --Delete --name <machinename>
• To delete all machines, run:
mksimachine --Delete --all
• To change which image a machine will install, run:
mksimachine --Update --name <machinename> --image <imagename>
There is also a SIS gui that is availble. Start it by running tksis. It doesn’t yet support the update
function, but it can make the other operations easier.
More details on these commands can be obtained from their respective man pages.
C.3
Known Problems and Solutions
C.3.1 Client nodes fail to network boot
There are two causes to this problem. The first is that the DHCP server is not running on the server machine,
which probably means the /etc/dhcpd.conf file format is invalid. Check to see if it is running by
running the command “service dhcpd status” in the terminal. If no output is returned, the DHCP
server is not running. See the problem solution for “DHCP server not running” below. If the DHCP server
is running, the client probably timed out when trying to download its configuration file. This may happen
when a client is requesting files from the server while multiple installs are taking place on other clients. If
this is the case, just try the network boot again when the server is less busy. Occasionally, restarting the
inet daemon also helps with this problem as it forces tftp to restart as well. To restart the daemon, issue the
following command:
service xinetd restart
24
C.3.2
DHCP server not running
Run the command “service dhcpd start” from the terminal and observe the output. If there are
error messages, the DHCP configuration is probably invalid. A few common errors are documented below.
For other error messages, see the dhcpd.conf man page.
1. If the error message produced reads something like “Can’t open lease database”, you need
to manually create the DHCP leases database, /var/lib/dhcp/dhcpd.leases, by issuing the
following command in a terminal:
touch /var/lib/dhcp/dhcpd.leases
2. If the error message produced reads something like “Please write a subnet declaration for the network segment to which interface ethx is attached”, you need to manually edit the DHCP configuration file, /etc/dhcpd.conf, in order to try to get it valid. A valid configuration file will have at
least one subnet stanza for each of your network adapters. To fix this, enter an empty stanza for the
interface mentioned in the error message, which should look like the following:
subnet subnet-number netmask subnet-mask { }
The subnet number and netmask you should use in the above command are the one’s associated with
the network interface mentioned in the error message.
C.3.3
PBS is not working
The PBS configuration done by OSCAR did not complete successfully and requires some manual tweaking.
Issue the following commands to configure the server and scheduler:
service pbs server start
service maui start
cd /root/oscar-1.2/pbs/config
/usr/local/pbs/bin/qmgr < pbs server.conf
Replace “/root” with the directory into which you unpacked OSCAR in the change directory command above.
C.3.4 Uni-processor P4 nodes fail to boot
This problem has occured on some newer machines after the ’oscar wizard’ has run and the nodes have
rsync’d their files to the local harddrive. The nodes then restart and begin to boot the newly installed kernel
and hang with a message similar to the following,
Getting VERSION: 0
Getting VERSION: ff00ff
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
calibrating APIC timer ...
..... CPU clock speed is 1995.0407 Mhz.
25
..... host bus clock speed is 0.0000 Mhz.
cpu: 0, clocks: 0, slice: 0
_
The cursur just blinks here forever.
This problem occurs because the SMP kernel is installed and the machine needs a standard uni-processor
kernel. Most machines will boot normally with the SMP kernel but a small number exhibit this issue.
Currently, the simplest fix is to rebuild the image with the uni-processor kernel (e.g. kernel-2.4.2-2) in the
rpmlist (e.g. oscarsamples/sample.rpmlist) and re-install the failing node.
We are currently investigating the problem further. If you are experiencing this problem, please check
the OSCAR web page at http://oscar.sourceforge.net/ for the latest information and solutions
to this problem.
C.4
What to do about unknown problems?
For help in solving problems not covered by this HowTo, send a detailed message describing the problem to
the OSCAR users mailing list at [email protected] You may also wish to visit the OSCAR
web site, http://oscar.sourceforge.net, for updates on newly found and resolved problems.
C.5
Starting Over or Uninstalling OSCAR
If you feel that you want to start the cluster installation process over from scratch in order to recover from
irresolvable errors, you can do so with the start over script located in the scripts subdirectory. This
script is interactive, and will prompt you when removing components installed by OSCAR that you may not
want to remove.
If you would like to remove all traces of OSCAR from your server, you may do so by running the
uninstall script located in the scripts subdirectory. This will run the start over script and then
remove the OSCAR directory created by unpacking the tarball, but does not remove the tarball itself.
D
D.1
Security
Security layers
Linux cluster security should, ideally, consist of multiple layers. Most people will not want to implement
more than one or two security layers, but done properly, this can yield reasonable levels of security. The main
security layers are router packet filtering, network stack protections, host based packet filtering, tcpwrappers,
service paring, service configuration, and secure communications.
D.2
Router packet filtering
This involves adding packet filtering rules to your border network router. Normally, this is not done because
of the difficulty in modifying router tables. Packet filtering involves looking at each network packet, and
deciding whether each packet should be allowed, dropped, or rejected, based on tables of rules.
D.3
Network stack protections
Linux kernels have security features built in that can help prevent outsiders from pretending that they are
part of your internal network. These features are enabled through /proc filesystem entries. A good firewall
will turn these on appropriately.
26
D.4
Host based packet filtering
Like router packet filtering, host based network packet filtering involves examining each packet and deciding
what do do with it. But with host based filtering, each machine indivudually filters the network packets going
to, from, or through it. Linux kernels from 2.4 on include connection tracking and “stateful” packet filtering,
which keeps track of ongoing network connections, allowing better filtering decisions to be made based on
whether packets are part of an already allowed connection. The problem with packet filtering is that it
requires the administrator to generate filtering “rulesets” that the iptables and ipchains programs interpret
and store in the running kernel. Creating these rulesets is similar to writing software in assembly language,
and like writing software, there are now higher level “languages” and compilers that can be used to generate
the rulesets and provide firewalls. One ruleset compiler/firewall package is the pfilter package. For more
information, and to download pfilter, see the http://pfilter.sourceforge.net/. This security
layer, when done using a good ruleset compiler, yields a large security return for little effort. For example,
using the pfilter package a configuration file like this:
OPEN
OPEN
ssh
ftp
121.122.0.0/16
on a server machine that has client machines hidden behind it on a private network would set up the following:
1. The server machine would have complete access to the internet.
2. The only access to the server machine from outside the cluster would be through ssh logins from
anywhere, and ftp access from the local domain.
3. Client machines in the server’s cluster would have complete access to the internet through IP masquerading, but would be hidden and protected from the internet.
4. Applicable network stack protections and packet forwarding would be turned on.
D.5
Tcpwrappers
Tcpwrappers are an access control system that allows control over which network addresses or address
ranges can access particular network services on a computer host. This is controlled by the /etc/hosts.allow
and /etc/hosts.deny files. This allows certain services to be only accessable from your local domain,
for instance. A common use of this would be to limit exported NFS filesystems to only be accessable from
your local domain, while allowing security logins through ssh to come in from anywhere. This would be
done with a /etc/hosts.deny file that looks like this:
ALL:
ALL
and a /etc/hosts.allow file that looks like this:
# allow nfs service to domain.net only
portmap:
.domain.net
rpc.mountd:
.domain.net
# allow ssh logins from anywhere
sshd:
ALL
27
D.6
Service paring
This is probably the most used of all the security layers, since turning off un-needed network services gets rid
of opportunities for network breakins. To hunt down and turn off unwanted services, the lsof/chkconfig/service
system commands can be used. To display which network services are currently listening on a system, do
this:
lsof -i | grep LISTEN | awk ’{print $1,$(NF-2),$(NF-1)}’ | sort | uniq
To list the services that will be started by default at the current runlevel do this:
chkconfig --list | grep ‘grep :initdefault: /etc/inittab | \
awk -F: ’{print $2}’‘:on | awk ’{print $1}’ | sort | column
To find services started by xinetd do this:
chkconfig --list | awk ’NF==2&&$2==’’off’’{print}’ | \
awk -F: ’{print $1}’ | sort | column
The nmap port scanning command is also useful to get a hackers-eye view of your systems. The chkconfig and service commands can be used to turn on and off system services.
D.7
Service configuration
Some network services have their own configuration files. These should be edited to tighten down outside
access. For instance, the NFS filesystem uses the /etc/exports to determine which network addresses
can access individual file systems, and which have read-write or read-only access.
D.8
Secure communications
By all means, use ssh for network logins. There are also modified versions of the venerable ftp programs
that allow encrytion and other imrovements.
28
E
Screen-by-Screen Walkthrough
The following is a screen-by-screen walkthrough of a simple installation. It is intended as supplementary
material to aid in providing a better feel for the general progression of the installation. For a detailed
discussion of the steps, please refer to the Detailed Cluster Installation Procedure.
Figure 6: Getting OSCAR.
29
Figure 7: Unpacking OSCAR.
30
Figure 8: Running the install cluster script.
31
Figure 9: Running the install cluster script.
32
Figure 10: The OSCAR installation wizard.
33
Figure 11: Beginning step 1.
34
Figure 12: Step 1: Building the image.
Figure 13: Step 1: Building the image, completed.
35
Figure 14: Beginning step 2.
36
Figure 15: Step 2: Defining the clients.
Figure 16: Step 2: Defining the clients, completed.
37
Figure 17: Beginning step 3.
38
Figure 18: Step 3: Setting up networking, scanning for MAC addresses.
39
Figure 19: Booting the client.
40
Figure 20: Client is broadcasting.
41
Figure 21: Step 3: Setting up networking, got MAC addresses, stop scanning.
42
Figure 22: Step 3: Setting up networking, assigning MAC address to client.
43
Figure 23: Step 3: Setting up networking, configuring DHCP server.
44
Figure 24: Booting the client.
45
Figure 25: Client broadcasting and is answered.
46
Figure 26: Client partitioning disk.
47
Figure 27: Client installing image.
48
Figure 28: Nodes have finished the install.
49
Figure 29: Beginning step 4.
50
Figure 30: Step 4: Post installation
51
Figure 31: Beginning step 5.
52
Figure 32: Step 5: Setting up tests
Figure 33: Step 5: Executing tests as a non-root user
53
Figure 34: Step 5: Executing tests as a non-root user
54