IBM RS/6000 SP2 ¼öÆÛÄÄÇ»ÅÍ |
---|
2002/03/05(23:37) from 218.149.65.204 | |
ÀÛ¼ºÀÚ : °ÁÙ±â (jkkang65@hanmail.net) | Á¶È¸¼ö : 1967 , ÁÙ¼ö : 194 |
Re: [LoadLeveler] Job Command File ¼³¸í - 2 |
---|
Introduction to Using LoadLeveler Contents Introduction Command Interface to LoadLeveler Classes Nodes vs Processors Submitting Requests Monitoring Queues and Requests Canceling Requests Examples More Information Introduction LoadLeveler is the batch scheduler running on the IBM SP. To run parallel jobs, you submit them to LoadLeveler and you use it to ascertain the status of jobs. Unlike interactive jobs, batch jobs are controlled via scripts (which you must write), and once you submit a job script, you can log out, and go for coffee, as LoadLeveler rounds up the resources required by the job, runs it, and handles standard I/O streams for you. Command Interface to LoadLeveler Here are frequently used commands ("man" pages are available): llclass Tells you about the available classes (queues) llsubmit Submits a job to be dispatched by LoadLeveler llq Returns information about jobs that have been dispatched. llcancel Removes a job which you have previously submitted Classes The term "batch queues" in other scheduling systems (like NQS) is analagous to the term "classes" in the LoadLeveler world. NQS pipe queues don't have a LoadLeveler analog--you submit directly to the destination class, and must be sure that the specific resources (such as number of nodes and run-time hours) you request are within the range provided by the class. The command "llclass" shows the names of the available classes. To see the resource limits of a particular class, use: llclass -l Nodes vs Processors The IBM SP is a collection of symmetric memory processor (SMP) nodes. Each node consists of a small number of processors which share one memory (on icehawk, there are four processors per node, sharing 2 GB). LoadLeveler allocates nodes, not processors. Thus, if you need 12 processors, you should request three nodes, four processors each. Requesting more nodes would result in wasted processors and unnecessary charges to your project. Submitting Requests The command: llsubmit will submit the given script for processing. You must write a script specific to your job. It contains the information LoadLeveler needs to allocate the resources your job requires, to handle standard I/O streams, and to run the job. The following sample script runs an MPI program on 8 CPUs (using all four processors on each of two nodes): #!/bin/ksh # @ output = $(Executable).$(jobid).out # @ error = $(Executable).$(jobid).err # @ environment = MP_SHARED_MEMORY=yes # @ notification = never # @ network.mpi = css0,not_shared,US # @ node_usage = not_shared # @ wall_clock_limit=1800 # @ job_type = parallel # @ node = 2 # @ tasks_per_node = 4 # @ class = Qlarge # @ queue cd /u1/uaf/username/Progs/MLP ./mlp See example #1 for a complete discussion of this script. Monitoring Queues and Requests The command: llq will show all the jobs currently running or queued. For details about your particular job, issue the command: llq -l where Canceling Requests The command: llcancel where Examples Example #1 This is a basic LoadLeveler script to run an 8 processor MPI job. #!/bin/ksh # # --- # --- Begin LoadLeveler Job Specifications --- # --- # @ output = $(Executable).$(jobid).out # @ error = $(Executable).$(jobid).err # @ environment = MP_SHARED_MEMORY=yes # @ notification = never # @ network.mpi = css0,not_shared,US # @ node_usage = not_shared # @ wall_clock_limit=1800 # @ job_type = parallel # @ node = 2 # @ tasks_per_node = 4 # @ class = Qlarge # @ queue # --- # --- Begin executable shell commands --- # --- cd /u1/uaf/username/Progs/MLP ./mlp The script consists of two main parts. Instructions to Loadleveler come first in the file and appear on lines which begin with: # @ The LoadLeveler specifications end with # @ queue which says to queue the request. The second part of the script shell commands which will be executed when the job runs. Here's a line-by-line discussion of the sample script: #!/bin/ksh Specifies the shell to be used when executing the command portion of the script. Korn shell is the default, but it doesn't hurt to specify it. # --- # --- Begin LoadLeveler Job Specifications --- # --- # @ output = $(Executable).$(jobid).out Standard output from this job will be copied to this file. The file name is generated at run time according to the definition, and will have the name of the script file and the unique LoadLeveler job id. # @ error = $(Executable).$(jobid).err Standard error file. # @ environment = MP_SHARED_MEMORY=yes Sets the given variable in the environment of the job--you could specify a number of variables this way. (MP_SHARED_MEMORY should always be set to "yes.") # @ notification = never LoadLeveler will "never" send email notification of events regarding this job. Other notification options include "always," "error," and "start." # @ network.mpi = css0,not_shared,US Sets the communication protocol and network. For MPI jobs, you should always use these settings. # @ node_usage = not_shared Specifies that once allocated, your job will not share nodes with other jobs. This is the only mode possible on icehawk. # @ wall_clock_limit=1800 Specifies the wall clock time requirements of your job. (This request must fall within the limits of the class you're requesting.) # @ job_type = parallel Your job is parallel # @ node = 2 Tells LoadLeveler that your job requires 2 SMP nodes. (This request must fall within the limits of the class you're requesting.) # @ tasks_per_node = 4 The two specifications, tasks_per_node X node, determine the total number of MPI processes that will be created. Often, tasks_per_node will equal the number of physical processors per node, however, other scenarios exist. For instance, a hybrid MPI/OpenMP code might run with one MPI process per node, but then spawn four OpenMP threads per node to use all four processors. # @ class = Qlarge Specifies the LoadLeveler class. # @ queue Required as final LoadLeveler specification! Queues your job. Any LoadLeveler key words following queue are ignored. # --- # --- Begin executable shell commands --- # --- cd /u1/uaf/username/Progs/MLP The first shell command to be run when LoadLeveler starts the job. This changes the default directory to that containing the user's executable file. ./mlp Runs the user's executable file, mlp. More Information "man" pages ("man llq", "man llsubmit", etc.) http://www.rs6000.ibm.com/resource/aix_resource/sp_books/loadleveler/index.html ARSC Consulting Go to: ARSC HomePage Resources and Allocations General Information News and Publications Science and Engineering Search This Site User Services Help! Arctic Region Supercomputing Center * P.O. Box 756020 * Fairbanks, Alaska * 99775-6020 voice: (907) 474-6935 * fax: (907) 474-5494 * e-mail: info@arsc.edu |