Slurm
Work in progress.
Important commands
sbatch
# starts a script job in the backgroundsrun
# starts a job in the foreground (should only be used for testing as it will be stopped on server reboot/connection loss; or from inside sbatch scripts)squeue
# shows current queuesinfo
# cluster informationscancel
# cancels a job
The sinfo
and squeue
commands can be adjusted with -o $FORMAT
or -l
to show more information on the command line. For example:
squeue -o "%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %6C %6m %R"
Queues / partitions
❯ sinfo -s
PARTITION AVAIL TIMELIMIT NODES(A/I/O/T) NODELIST
normal* up 4:00:00 6/9/1/16 compute[1-13],menten,michaelis,weierstrass
fastscratch up 4:00:00 3/1/0/4 compute[5-8]
short up 30:00 6/9/1/16 compute[1-13],menten,michaelis,weierstrass
verylong up 20-00:00:0 5/0/0/5 compute[1,7-8,10-11]
long up 5-00:00:00 6/8/1/15 compute[1-13],menten,michaelis
The main difference between the partitions is their time limit. Additionally, the higher the time limit, the fewer nodes can be used at the same time. The special partition ‘fastscratch’ includes all nodes on which the /scratch
folder is accessible with a higher throughput and will exclusively schedule jobs there.
File locations
When using slurm, it is best to work in the /scratch
file system, but it is also possible to access files from the /home
and /groups
folders. (Just don’t overload the system there.)
Please be aware that if you run multiple tasks at the same time, they might try to write to the same file at the same time and potentially overwrite previous results.
As the /extra
folders are NOT shared, they are typically not the best place to store results. (If you use them for intermediate data, please remember to clean up afterwards.)