Slurm

Work in progress.

Important commands

sbatch # starts a script job in the background
srun # starts a job in the foreground (should only be used for testing as it will be stopped on server reboot/connection loss; or from inside sbatch scripts)
squeue # shows current queue
sinfo # cluster information
scancel # cancels a job

The sinfo and squeue commands can be adjusted with -o $FORMAT or -l to show more information on the command line. For example:

squeue -o "%.18i %.9P %.8j %.8u %.8T %.10M %.9l %.6D %6C %6m %R"

Queues / partitions

❯ sinfo -s
PARTITION   AVAIL  TIMELIMIT   NODES(A/I/O/T) NODELIST
normal*        up    4:00:00         6/9/1/16 compute[1-13],menten,michaelis,weierstrass
fastscratch    up    4:00:00          3/1/0/4 compute[5-8]
short          up      30:00         6/9/1/16 compute[1-13],menten,michaelis,weierstrass
verylong       up 20-00:00:0          5/0/0/5 compute[1,7-8,10-11]
long           up 5-00:00:00         6/8/1/15 compute[1-13],menten,michaelis

The main difference between the partitions is their time limit. Additionally, the higher the time limit, the fewer nodes can be used at the same time. The special partition ‘fastscratch’ includes all nodes on which the /scratch folder is accessible with a higher throughput and will exclusively schedule jobs there.

File locations

When using slurm, it is best to work in the /scratch file system, but it is also possible to access files from the /home and /groups folders. (Just don’t overload the system there.)

Please be aware that if you run multiple tasks at the same time, they might try to write to the same file at the same time and potentially overwrite previous results.

As the /extra folders are NOT shared, they are typically not the best place to store results. (If you use them for intermediate data, please remember to clean up afterwards.)