Biowulf

From 太極
Jump to navigation Jump to search
  • Biowulf User Guide. Note that Biowulf2 runs Centos (RedHat) 6.x. (Biowulf1 is at Centos 5.x)

Swarm fig 1.png

helix and data transfer

  • Data transfer and intensive file management tasks should not be performed on the Biowulf login node, biowulf.nih.gov. Instead such tasks should be performed on the dedicated interactive data transfer node (DTN), helix.nih.gov
  • Examples of such tasks include:
    • cp, mv, rm commands on large numbers of files or directories
    • file compression/uncompression (tar, zip, etc.)
    • file transfer (sftp, scp, rsync, etc.)
  • https://helix.nih.gov/ Helix is an interactive system for short jobs. Moving large data transfers to Helix, which is now designated for interactive data transfers. For instance, use Helix when transferring hundreds of gigabytes of data or more using any of these commands: cp, scp, rsync, sftp, ascp, etc.
  • https://hpc.nih.gov/docs/rhel7.html#helix Helix transitioned to becoming the dedicated interactive data transfer and file management node [1] and its hardware was later upgraded to support this role [2]. Running processes such as scp, rsync, tar, and gzip on the Biowulf login node has been discouraged ever since.

Linux distribution

$ ls /etc/*release  # login mode
$ cat /etc/redhat-release  
Red Hat Enterprise Linux Server release 6.8 (Santiago)

$ sinteractive      # switch to biowulf2 computing nodes
$ cat /etc/redhat-release 
CentOS release 6.6 (Final)
$ cat /etc/centos-release 
CentOS release 6.6 (Final)

Training notes

Storage

https://hpc.nih.gov/storage/

/scratch and /lscratch

The /scratch area on Biowulf is a large, low-performance shared area meant for the storage of temporary files.

  • Each user can store up to a maximum of 10 TB in /scratch. However, 10 TB of space is not guaranteed to be available at any particular time.
  • If the /scratch area is more than 80% full, the HPC staff will delete files as needed, even if they are less than 10 days old.
  • Files in /scratch are automatically deleted 10 days after last access.
  • Touching files to update their access times is inappropriate and the HPC staff will monitor for any such activity.
  • Use /lscratch (not /scratch) when data is to be accessed from large numbers of compute nodes or large swarms.
  • The central /scratch area should NEVER be used as a temporary directory for applications -- use /lscratch instead.
  • Running RStudio interactively. It is generally recommended to allocate at least a small amount of lscratch for temporary storage for R.
  • See the slides of Using the NIH HPC Storage Systems Effectively from NIH HPC classes

Transfer files

Alternatives:

User Dashboard

https://hpc.nih.gov/dashboard/

Dashboard

User dashboard Unlock account, disk usage, job info

Quota

checkquota

Environment modules

# What modules are available
module avail
module -d avail # default
module avail STAR
module spider bed # search by a case-insensitive keyword

# Load a module
module list # loaded modules
module load STAR
module load STAR/2.4.1a
module load plinkseq macs bowtie # load multiple modules
# if we try to load a module in a bash script, we can use the following
module load STAR || exit 1

# Unload a module
module unload STAR/2.4.1a

# Switch to a different version of an application
# If you load a module, then load another version of the same module, the first one will be unloaded.

# Examine a modulefile
$ module display STAR
-----------------------------------------------------------------------------
   /usr/local/lmod/modulefiles/STAR/2.5.1b.lua:
-----------------------------------------------------------------------------
help([[This module sets up the environment for using STAR.
Index files can be found in /fdb/STAR
]])
whatis("STAR: ultrafast universal RNA-seq aligner")
whatis("Version: 2.5.1b")
prepend_path("PATH","/usr/local/apps/STAR/2.5.1b/bin")

# Set up personal modulefiles

# Using modules in scripts

# Shared Modules

Single file - sbatch

Don't use the swarm command on a single script file since swarm will treat each line of the script file as an independent command.

sbatch --cpus-per-task=2 --mem=4g --time=24:00:00 MYSCRIPT
# Use --time=24:00:00 to increase the wall time from the default 2 hours

An example of the script file (Slurm environment variable $SLURM_CPUS_PER_TASK within your script was used to specify the number of threads to the program)

#!/bin/bash

module load novocraft
novoalign -c $SLURM_CPUS_PER_TASK  -f s_1_sequence.txt -d celegans -o SAM > out.sam

rslurm

rslurm package. Functions that simplify submitting R scripts to a Slurm workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.

Multiple files - swarm

swarm -f run.sh --time=24:00:00

swarm -t 3 -g 20 -f run_seqtools_vc.sh --module samtools,picard,bwa --verbose 1 --devel
# 3 commands run in 3 subjobs, each command requiring 20 gb and 3 threads, allocating 6 cores and 12 cpus
swarm -t 3 -g 20 -f run_seqtools_vc.sh --module samtools,picard,bwa --verbose 1

# To change the default walltime, use --time=24:00:00

swarm -t 8 -g 24 --module tophat,samtools,htseq -f run_master.sh
cat sw3n17156.o

Environment variables $SLURM

  • $SLURM_MEM_PER_NODE: 32768
  • $SLURM_JOB_ID
  • $SLURM_NNODES: 1
  • $SLURM_CPUS_ON_NODE: 2
  • $SLURM_JOB_CPUS_PER_NODE: 2
  • $SLURM_NODELIST: cn2448

Why a job is pending

Partition and freen

Biowulf nodes are grouped into partitions. A partition can be specified when submitting a job. The default partition is 'norm'. The freen command can be used to see free nodes and CPUs, and available types of nodes on each partition.

We may need to run swarm commands on non-default partitions. For example, not many free CPUs are available in 'norm' partition. Or Total time for bundled commands is greater than partition walltime limit. Or because the default partition norm has nodes with a maximum of 120GB memory.

We can run the swarm command on different partition (the default is 'norm'). For example, to run on b1 parition (the hardware in b1 looks inferior to norm)

swarm -f outputs/run_seqtools_dge_align -g 20 -t 16 \
         --module tophat,samtools,htseq \
         --time=6:00:00 --partition b1 --verbose 1

If we want to restrict the output of freen to the norm nodes, we can use freen | grep -E 'Partition|----|norm' ; see the User Guide.

Partition      FreeNds      FreeCPUs      FreeGPUs  Cores  CPUs  GPUs    Mem   Disk Features
-------------------------------------------------------------------------------------------------------
norm*           8 / 216    3702 / 14748                36    72         373g  3200g cpu72,core36,g384,ssd3200,x6140,ibhdr100
norm*         101 / 522   20030 / 29232                28    56         247g   800g cpu56,core28,g256,ssd800,x2680,ibfdr
norm*         497 / 529   16696 / 16928                16    32         121g   800g cpu32,core16,g128,ssd800,x2650,10g
norm*         504 / 539   29192 / 30184                28    56         247g   400g cpu56,core28,g256,ssd400,x2695,ibfdr
norm*           6 / 7       342 / 392                  28    56         247g  2400g cpu56,core28,g256,ssd2400,x2680,ibfdr

jobhist and track the resource usage

  • jobhist XXXXXXXX will show the resource usage. The Jobid Runtime, MemUsed columns can be used to identify the job that was using most resource
  • To identify the command run by a job, check out the log file <swarm_XXXXXX_Y.o>

Running R scripts

https://hpc.nih.gov/apps/R.html

Running a swarm of R batch jobs on Biowulf

$ cat Rjobs
R --vanilla < /data/username/R/R1  > /data/username/R/R1.out
R --vanilla < /data/username/R/R2  > /data/username/R/R2.out
# swarm -g 32 -t 10 -f /home/username/Rjobs --module R/3.6.0 --time=6:00:00

Pay attention to the default wall time (eg 2 hours) and various swarm options. See Swarm.

Parallelizing with parallel.

  • The following is modified from biowulf R's webpage. I change it so it works on non-biowulf system (like local Linux, Mac, or Windows).
  • The number of allocated CPUs and available memory are related.
    • Here I assume the Windows and Mac only has a modest RAM (eg 16GB) and the local Linux has enough RAM.
    • The RAM size on the local system can be obtained through the commented lines.
detectBatchCPUs <- function() {
  ncores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK"))
  if (is.na(ncores)) {
    ncores <- as.integer(Sys.getenv("SLURM_JOB_CPUS_PER_NODE"))
  }
  if (is.na(ncores)) {
    # the system is not biowulf
    if (grepl("linux", R.version$os)) {
      # if it is linux, we assume there are enough ram
      # mem <- system('grep MemTotal /proc/meminfo', intern = TRUE)
      # mem <- strsplit(mem, " ")[[1]]
      # mem <- as.integer(mem[length(mem) -1])
      ncores <- future::availableCores()
      # } else if (grepl("darwin", R.version$os)) {
      #   ncores <- 2
    } else ncores <- 2
  }
  return(ncores)
}
ncpus <- detectBatchCPUs() 

options(mc.cores = ncpus) 
mclapply(..., mc.cores = ncpus) 
makeCluster(ncpus)

Some experiences

freen command shows the maximum threads is 56 and the memory is 246GB.

When I run an R script (foreach is employed to loop over simulation runs), I find

  • Assign 56 threads can guarantee 56 simulations run at the same time (check by the jobload command).
  • We need to worry about the RAM size. The larger the threads, the more memory we need. If we don't assign enough memory, weird error message will be spit out.
  • Even assigning 56 threads can help to run 56 simulations at the same time, the actual execution time is longer than when I run fewer simulations.
allocated threads allocated memory number of runs memory used time (min)
56 64 10 30 10
56 64 20 36 13
56 64 56 58 27

Monitor jobs/Delete jobs

https://hpc.nih.gov/docs/userguide.html#monitor

sjobs
watch -n 30 jobload
scancel -u XXXXX
scancel NNNNN
squeue -u XXXX

jobhist 17500  # report the CPU and memory usage of completed jobs.

The other two commands are very useful too jobhist and swarmhist (temporary).

$ cat /usr/local/bin/swarmhist
#!/bin/bash
usage="usage: $0 jobid"
jobid=$1
[[ -n $jobid ]] || { echo $usage; exit 1; }
ret=$(grep "jobid=$jobid" /usr/local/logs/swarm_on_slurm.log)
[[ -n $ret ]] || { echo "no swarm found for jobid = $jobid"; exit; }
echo $ret | tr ';' '\n'

$ jobhist 22038972
$ swarmhist 22038972
date=(SKIP)
 host=(SKIP)
 jobid=22038972
 user=(SKIP)
 pwd=(SKIP)
 ncmds=1
 soptions=--array=0-0 --job-name=swarm --output(SKIP)
 njobs=1
 job-name=swarm
 command=/usr/local/bin/swarm -t 16 -g 20 -f outputs/run_seqtools_vc --module samtools,picard --verbose 1

Show how busy is one node: jobload -n cnXXXX

This will show many jobs running in a node. It'll show USER, JOBID, NODECPUS, TOTALCPUS and TOTALNODES (1).

Show properties of a node: freen -n

Use freen -n.

This is helpful if we want to know the node that is allocated from the output (Nodelist column) of the sjobs command.

$ freen -n
                                           ........Per-Node Resources........  
Partition  FreeNds       FreeCPUs     Cores CPUs   Mem    Disk    Features                                            Nodelist
----------------------------------------------------------------------------------------------------------------------------------
norm*      160/454      17562/25424       28    56    248g   400g   cpu56,core28,g256,ssd400,x2695,ibfdr            cn[1721-2203,2900-2955]
norm*      0/476        5900/26656        28    56    250g   800g   cpu56,core28,g256,ssd800,x2680,ibfdr            cn[3092-3631]       
norm*      278/309      8928/9888         16    32    123g   800g   cpu32,core16,g128,ssd800,x2650,10g              cn[0001-0310]       
norm*      281/281      4496/4496          8    16     21g   200g   cpu16,core8,g24,sata200,x5550,1g                cn[2589-2782,2799-2899]
norm*      10/10        160/160            8    16     68g   200g   cpu16,core8,g72,sata200,x5550,1g                cn[2783-2798]       
...

$ sjobs
User   JobId     JobNa   Part         St  Reason  Runtime     Walltime     Nodes  CPUs   Memory Dependey  Nodelist
=============================================================================================================================
XXX 51944300  sinteracti interactive  R           1:13:32      8:00:00      1     16   32 GB              cn0862
XXX 51946396  myjob      norm         R              2:57     12:00:00      1     10   32 GB              cn0925
=============================================================================================================================

Exit code

https://hpc.nih.gov/docs/b2-userguide.html#exitcodes

Local disk and temporary files

See https://hpc.nih.gov/docs/b2-userguide.html#local and https://hpc.nih.gov/storage/

Walltime limits

$ batchlim
Max jobs per user: 4000
Max array size:    1001

Partition        MaxCPUsPerUser     DefWalltime     MaxWalltime                
---------------------------------------------------------------
norm                     7360         02:00:00     10-00:00:00 
multinode                7560         08:00:00     10-00:00:00 
	turbo qos        15064                         08:00:00
interactive                64         08:00:00      1-12:00:00 (3 simultaneous jobs)
quick                    6144         02:00:00        04:00:00 
largemem                  512         02:00:00     10-00:00:00 
gpu                       728         02:00:00     10-00:00:00 (56 GPUs per user)
unlimited                 128        UNLIMITED       UNLIMITED 
student                    32         02:00:00        08:00:00 (2 GPUs per user)
ccr                      3072         04:00:00     10-00:00:00 
ccrgpu                    448         04:00:00     10-00:00:00 (32 GPUs per user)
forgo                    5760       1-00:00:00      3-00:00:00 

Interactive debugging

Default is 2 CPUs, 4G memory (too small) and 8 hours walltime.

Increase them to 60 GB and more cores if we run something like STAR for rna-seq reads alignment.

sinteractive --mem=32g -c 16 --gres=lscratch:100

The '--gres' option will allocate a local disk, 100GB in this case. The local disk directory will be /lscratch/$SLURM_JOBID.

For RStudio, the example shows to allocate 5GB of temporary space.

For R, it also recommends to allocate a minimal amount of lscratch of 1GB plus whatever lscratch storage is required by your code.

Parallel jobs

Parallel (MPI) jobs that run on more than 1 node: Use the environment variable $SLURM_NTASKS within the script to specify the number of MPI processes.

Reproduciblity/Pipeline

Singularity

Snakemake (and Singularity)

R program

https://hpc.nih.gov/apps/R.html

Find available R versions:

module -r avail '^R$'

where -r means to use regular expression match. This will match "R/3.5.2" or "R/3.5" but not "Rstudio/1.1.447".

(Self-installed) R package directory

On our systems, the default path to the library is ~/R/<ver>/library where where ver is the two digit version of R (e.g. 3.5). However, R won't automatically create that directory and in its absence will try to install to the central packge library which will fail. To install packages in your home directory manually create ~/R/<ver>/library first.

The directory ~/R/x86_64-pc-linux-gnu-library/ was not used anymore in Biowulf.

SSH tunnel

https://hpc.nih.gov/docs/tunneling/

The use of interactive application servers (such as Jupyter notebooks) on Biowulf compute nodes requires establishing SSH tunnels to make the service accessible to your local workstation.

Jupyter

Jupyter on Biowulf

Terminal customization

  • ssh add authorized_keys
  • ~/.bashrc:
    • change PS1
    • add an alias for nano
  • ~/.bash_profile: no change
  • ~/.nanorc, ~/r.nanorc and ~/bin/nano/bin/nano (4.2)
  • ~/.emacs: global-display-line-numbers-mode
  • ~/.vimrc: set number
  • .Rprofile: options(editor="emacs")
  • .bash_logout: no change

tmux for keeping SSH sessions

tmux