Atos
Iz Fizika za študente Praktične matematike 2007 - 2008
Redakcija: 12:45, 8 julij 2008 (spremeni) 193.2.4.4 (Pogovor) (New page: == Queuing system == As an example let us write a program which calculates a product and a quotient of two numbers. We first create a directory ''test'' where the program will be located:...) ← Pojdi na prejšnje urejanje |
Trenutna redakcija (17:43, 3 februar 2009) (spremeni) (undo) 193.2.4.4 (Pogovor) |
||
Vrstica 1: | Vrstica 1: | ||
== Queuing system == | == Queuing system == | ||
- | As an example let us write a program which calculates a product and a quotient of two numbers. We first create a directory ''test'' where the program will be located: | + | The queuing system supports two types of jobs: in a '''batch job''' there is no interaction between the user and the job while the job is running. If such an interaction is needed you should ask the queuing system for an '''interactive job'''. |
+ | |||
+ | === Batch jobs === | ||
+ | |||
+ | As an example let us write a program which waits for two minutes and then calculates a product and a quotient of two numbers. We first create a directory ''test'' where the program will be located: | ||
<code><font color=blue>rejec@atos:~> mkdir test</font> | <code><font color=blue>rejec@atos:~> mkdir test</font> | ||
<font color=blue>rejec@atos:~> cd test</font> | <font color=blue>rejec@atos:~> cd test</font> | ||
Vrstica 11: | Vrstica 15: | ||
print "a = $a\n"; | print "a = $a\n"; | ||
print "b = $b\n"; | print "b = $b\n"; | ||
+ | print "Waiting for two minutes ...\n"; | ||
+ | sleep 120; | ||
print "a * b = ", $a * $b, "\n"; | print "a * b = ", $a * $b, "\n"; | ||
print "a / b = ", $a / $b, "\n";</code> | print "a / b = ", $a / $b, "\n";</code> | ||
Vrstica 17: | Vrstica 23: | ||
a = 2 | a = 2 | ||
b = 0 | b = 0 | ||
+ | Waiting for two minutes ... | ||
a * b = 0 | a * b = 0 | ||
Illegal division by zero at ./programcek line 6.</code> | Illegal division by zero at ./programcek line 6.</code> | ||
Vrstica 22: | Vrstica 29: | ||
<code><font color=blue>rejec@atos:~/test> cat programcek.sh</font> | <code><font color=blue>rejec@atos:~/test> cat programcek.sh</font> | ||
#!/bin/bash | #!/bin/bash | ||
+ | #PBS -l walltime=00:03:00 | ||
cd test | cd test | ||
programcek 2 0</code> | programcek 2 0</code> | ||
- | The shell script should first change the directory to the one where the program itself is located (''cd test''). Then it should run the program (''programcek 2 0''). | + | In the second line we inform the system that the job will run at most three minutes (''#PBS -l walltime=00:03:00''). If the job takes longer to complete it will be killed. If this line is missing in the shell script the default value of one minute will be used. The shell script should then change the directory to the one where the program itself is located (''cd test''). Then it should run the program (''programcek 2 0''). |
Now we are ready to put the program in the queue. This we can do with the ''qsub'' command: | Now we are ready to put the program in the queue. This we can do with the ''qsub'' command: | ||
Vrstica 60: | Vrstica 68: | ||
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME | JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME | ||
- | <font color=red>1849 rejec Idle 1 INFINITY Wed Apr 2 18:02:20</font> | + | <font color=red>1849 rejec Idle 1 00:03:00 Wed Apr 2 18:02:20</font> |
1 Idle Job | 1 Idle Job | ||
Vrstica 82: | Vrstica 90: | ||
1826 vilfan Running 6 INFINITY Wed Apr 2 09:37:38 | 1826 vilfan Running 6 INFINITY Wed Apr 2 09:37:38 | ||
1827 vilfan Running 4 INFINITY Wed Apr 2 13:32:32 | 1827 vilfan Running 4 INFINITY Wed Apr 2 13:32:32 | ||
- | <font color=red>1849 rejec Running 1 INFINITY Wed Apr 2 18:14:18</font> | + | <font color=red>1849 rejec Running 1 00:03:00 Wed Apr 2 18:14:18</font> |
9 Active Jobs 64 of 96 Processors Active (66.67%) | 9 Active Jobs 64 of 96 Processors Active (66.67%) | ||
Vrstica 112: | Vrstica 120: | ||
Illegal division by zero at ./programcek line 6.</code> | Illegal division by zero at ./programcek line 6.</code> | ||
The output and the errors of the job were saved in the ''programcek.sh.o1849'' and ''programcek.sh.e1849'' files, respectively. | The output and the errors of the job were saved in the ''programcek.sh.o1849'' and ''programcek.sh.e1849'' files, respectively. | ||
+ | |||
+ | === Interactive jobs === | ||
+ | |||
+ | Use the following command to ask the queuing system for a two hour interacting job: | ||
+ | |||
+ | <code><font color=blue>rejec@atos:~/test/> qsub -l walltime=02:00:00 -I</font> | ||
+ | qsub: waiting for job 12813.atos.ijs.si to start | ||
+ | qsub: job 12813.atos.ijs.si ready | ||
+ | |||
+ | <font color=blue>rejec@n19:~/></font></code> | ||
+ | |||
+ | The system found a free processor on the n19 node and opened a shell there. The shell can be used to do whatever you wish for the next two hours. | ||
+ | |||
+ | === Additional information === | ||
You can find additional information in the [[Media:PBSProUG_5_4_0.pdf|users manual]]. | You can find additional information in the [[Media:PBSProUG_5_4_0.pdf|users manual]]. |
Trenutna redakcija
Vsebina |
[spremeni] Queuing system
The queuing system supports two types of jobs: in a batch job there is no interaction between the user and the job while the job is running. If such an interaction is needed you should ask the queuing system for an interactive job.
[spremeni] Batch jobs
As an example let us write a program which waits for two minutes and then calculates a product and a quotient of two numbers. We first create a directory test where the program will be located:
rejec@atos:~> mkdir test
rejec@atos:~> cd test
rejec@atos:~/test>
Let's call the program programcek. The code in perl is
rejec@atos:~/test> cat programcek
#!/usr/bin/perl
my ($a, $b) = @ARGV;
print "a = $a\n";
print "b = $b\n";
print "Waiting for two minutes ...\n";
sleep 120;
print "a * b = ", $a * $b, "\n";
print "a / b = ", $a / $b, "\n";
An example output:
rejec@atos:~/test> programcek 2 0
a = 2
b = 0
Waiting for two minutes ...
a * b = 0
Illegal division by zero at ./programcek line 6.
Now we'll run the program using the queuing system. First we need to write a shell script. Let's call it programcek.sh.
rejec@atos:~/test> cat programcek.sh
#!/bin/bash
#PBS -l walltime=00:03:00
cd test
programcek 2 0
In the second line we inform the system that the job will run at most three minutes (#PBS -l walltime=00:03:00). If the job takes longer to complete it will be killed. If this line is missing in the shell script the default value of one minute will be used. The shell script should then change the directory to the one where the program itself is located (cd test). Then it should run the program (programcek 2 0).
Now we are ready to put the program in the queue. This we can do with the qsub command:
rejec@atos:~/test> qsub programcek.sh
1849.atos.ijs.si
Note that an id number 1849 was assigned to the job. From this point on, you'll be able to check the status of job, stop it and access its results only through the id number. So don't forget it.
Let's check what is going on in the queue with the showq command:
rejec@atos:~/test> showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1732 vilfan Running 5 3:20:30:14 Tue Apr 1 10:32:35
1646 vilfan Running 12 INFINITY Thu Mar 27 17:09:44
1716 vilfan Running 6 INFINITY Mon Mar 31 17:52:49
1717 vilfan Running 6 INFINITY Mon Mar 31 18:01:39
1739 vilfan Running 12 INFINITY Tue Apr 1 13:36:11
1740 vilfan Running 12 INFINITY Tue Apr 1 14:14:54
1826 vilfan Running 6 INFINITY Wed Apr 2 09:37:38
1827 vilfan Running 4 INFINITY Wed Apr 2 13:32:32
1835 rejec Running 4 INFINITY Wed Apr 2 18:00:22
1836 rejec Running 10 INFINITY Wed Apr 2 18:00:35
1838 rejec Running 1 INFINITY Wed Apr 2 18:00:51
1839 rejec Running 10 INFINITY Wed Apr 2 18:00:57
1841 rejec Running 4 INFINITY Wed Apr 2 18:01:33
1845 rejec Running 1 INFINITY Wed Apr 2 18:01:54
1846 rejec Running 1 INFINITY Wed Apr 2 18:02:03
1847 rejec Running 1 INFINITY Wed Apr 2 18:02:14
1848 rejec Running 1 INFINITY Wed Apr 2 18:02:17
17 Active Jobs 96 of 96 Processors Active (100.00%)
24 of 24 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
1849 rejec Idle 1 00:03:00 Wed Apr 2 18:02:20
1 Idle Job
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 18 Active Jobs: 17 Idle Jobs: 1 Blocked Jobs: 0
We note that our job is listed under the IDLE JOBS. This means that all the processors are currently occupied and our job has to wait until one of them is free before it can be run. Let's check again few hours later:
rejec@atos:~/test> showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
1732 vilfan Running 5 3:20:18:17 Tue Apr 1 10:32:35
1646 vilfan Running 12 INFINITY Thu Mar 27 17:09:44
1716 vilfan Running 6 INFINITY Mon Mar 31 17:52:49
1717 vilfan Running 6 INFINITY Mon Mar 31 18:01:39
1739 vilfan Running 12 INFINITY Tue Apr 1 13:36:11
1740 vilfan Running 12 INFINITY Tue Apr 1 14:14:54
1826 vilfan Running 6 INFINITY Wed Apr 2 09:37:38
1827 vilfan Running 4 INFINITY Wed Apr 2 13:32:32
1849 rejec Running 1 00:03:00 Wed Apr 2 18:14:18
9 Active Jobs 64 of 96 Processors Active (66.67%)
23 of 24 Nodes Active (95.83%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 9 Active Jobs: 9 Idle Jobs: 0 Blocked Jobs: 0
Now the program running: it is listed under the ACTIVE JOBS. If in the meantime you've found out that there is a bug in your program, you should stop it to make the resources available to other users. You can do this with the qdel command:
rejec@atos:~/test> qdel 1849
Let's assume everything was OK with your program, and you'd waited until it stopped running. The results of the calculation are available in the directory where the job was started:
rejec@atos:~/test> ls
programcek programcek.sh programcek.sh.e1849 programcek.sh.o1849
rejec@atos:~/test> cat programcek.sh.o1849
a = 2
b = 0
a * b = 0
rejec@atos:~/test> cat programcek.sh.e1849
Illegal division by zero at ./programcek line 6.
The output and the errors of the job were saved in the programcek.sh.o1849 and programcek.sh.e1849 files, respectively.
[spremeni] Interactive jobs
Use the following command to ask the queuing system for a two hour interacting job:
rejec@atos:~/test/> qsub -l walltime=02:00:00 -I
qsub: waiting for job 12813.atos.ijs.si to start
qsub: job 12813.atos.ijs.si ready
rejec@n19:~/>
The system found a free processor on the n19 node and opened a shell there. The shell can be used to do whatever you wish for the next two hours.
[spremeni] Additional information
You can find additional information in the users manual.