The Hyperion Cluster - Info for Users
NOTE: The old Hyperion Cluster, which this page refers to, no longer exists in here described form,
it has been assimilated into the newer Quantum Hyperion cluster as its sub-cluster and also its modules and packages are different!
This is but a remnant of its old documentation.
Table of Contents
- Hyperion Cluster Hardware
- Hyperion Cluster Statistics
- Hyperion Cluster Helpdesk
- Grid Engine User Basics
- Shell Modules
- Currently Available Modules
(binutils,
gaussian,
gcc,
gcc-5.4,
gmp,
hwloc/netloc,
isl,
knem,
mpc,
mpfr,
openmpi-1.10,
openmpi-2.0,
openucx,
libibverbs,
libnl,
numactl/libnuma,
vdpau,
xds,
xpmem)
- Other Currently Available Software Outside of Environment Modules
(CUDA,
nvidia-drivers,
Grid Engine)
- Resources
- Parallel Environments
(make,
orte,
smp,
quant4,
quant8,
quant16,
quant32,
gaussian,
gaussian-linda,
xds)
- Special Software Available on the Hyperion Cluster
Hyperion Cluster Hardware
Find the description of the Hyperion cluster hardware at the following link: Hyperion Cluster Hardware
Hyperion Cluster Statistics
You can see some usage statistics of the Hyperion cluster on the following link: Hyperion Cluster Ganglia statistics (no longer available)
Hyperion Cluster Helpdesk
If you have any trouble, problem, question, request for improvement, new software or features that you would like see on the Hyperion cluster, please state your issue in particular component of the Hyperion Cluster product at the following Bugzilla site.
To access the site you need to have a faculty account with which you must log into the Bugzilla in order to either enter a new issue or see currently solved issues and optionally join some to be informed of the progress and related discussions.
Grid Engine User Basics
How to login to the Hyperion cluster?
- from Linux: ssh username@hyperion.fjfi.cvut.cz
- from Windows: Use the PuTTY or any other terminal emulating client for windows with SSH capability and connect to the server hyperion.fjfi.cvut.cz with your username and your FJFI domain password.
- You need to have an account at the Faculty od Nuclear Sciences and Physical Engineering, Czech Technical University in Prague,
and you need to have a permission by one of the Hyperion cluster administrators (Martin Drab or Pavel Strachota).
How to see current status of the cluster and its queues, which are available to me?
- qstat -f -U $USER
- For more details and other arguments see: man qstat
How to see all queues currently available on the cluster?
- qstat -g c
- For more details and other arguments see: man qstat
How to see all cluster queues for which I currently have access to?
- qstat -g c -U $USER
- For more details and other arguments see: man qstat
How to see all jobs currently running/scheduled on the cluster?
- qstat -u \*
- For more details and other arguments see: man qstat
How to see my jobs currently running/scheduled on the cluster?
- qstat -u $USER
- For more details and other arguments see: man qstat
How to launch a simple single-slot/process job on the cluster?
How to launch a simple multi-slot/process MPI job on the cluster?
- Let us have a simple C code mpi_test1.c. The program does nothing fancy, it just uses the Open MPI environment to report which instance of the program it is out of how many launched, where the scratch directory for that process points, how big is the scratch space, what permissions does it have and which UID and GID owns the scratch directory, then it just quits. Plain and simple. Let us compile the program by issuing
module load hyp-openmpi-1.10-mod.x86_64
mpicc -O2 -o mpi_test1 mpi_test1.c
- Let us construct a job script job.sh for launching the job on the all.q queue with the make
parallel environment with (say) 60 slots (which basically means processes in the Grid Engine terms) and let the job be named "MPI_Test1", the file will contain the following:
# Job name
#$ -N MPI_Test1
# Launch the script with current working directory set to current working directory at the time of submission
#$ -cwd
# Queue in which we want to be scheduled
#$ -q all.q
# Shell by which this script shall be launched
#$ -S /bin/bash
# Parallel environment "make" with 60 slots to be used
#$ -pe make 60
module load hyp-openmpi-1.10-mod.x86_64
mpirun ./mpi_test1
Notice, that we need to load the hyp-openmpi-1.10-mod.x86_64 module from within the job file, even though we've loaded it also during the compilation. Because the compilation is done on the login node (or anywhere else) and generally in a completely different shell instance than where each individual process of the job is going to be launched. See section "Shell Modules" below to learn about the shell modules.
- Now we can just submit the job for running by issuing
qsub job.sh
- There is also a good practice to specify how long your job is supposed to/can run at maximum by issuing the -l h_rt=time and
-l s_rt=time arguments to qsub, where the time is a "real" time (also called "elapsed" or "wall clock"), such as 1:02:03 specifying 1 hour 2 minutes and 3 seconds.
The s_rt is a soft real time parameter, after which the job is gently notified by the SIGUSR1 signal that its time is up and it should probably think about ending itself.
The h_rt is a hard real time parameter, after which the job receives a hard SIGKILL signal and is immediatelly killed by the system without any chance of saving results which were not already saved or simple put, without the slightest chance to do anything.
In theory one can specify any time he or she wants (also if you do not specify a queue in which to schedule your job, only queues which allow these running times are considered), however it can not be bigger than limits set on particular queues, if they are set. If the limits are bigger than the particular queue allows, the job is refused on that queue.
- Another good practice is to specify how much memory your job is going to use by issuing the -l s_vmem=bytes and
-l h_vmem=bytes where the bytes is the maximal number of bytes (optionally with size specifiers, such as "M" for MegaBytes etc.) used by one process of the job.
Again when the s_vmem limit is reached, the process receives a SIGXCPU signal to warn the process it is reaching the limit.
And when the h_vmem limit is reached, the process receives SIGKILL and is killed immediatelly.
How to delete my submitted job?
- qdel job_identifier
- Where the job_identifier is either a job name as issued by the -N option to qsub or a job-ID which is listed by the appropriate qstat command.
- For more details and other arguments see: man qdel
How to see how much resources is currently available on each node?
Shell Modules
Some of the libraries and utilities on the cluster are accessed via the modules package.
By loading a specific module, the shell environment is modified in a way, that the particular library or utility is available for use.
This enables us to have multiple versions of the same library or utility available on the cluster at the same time and let each job/script/user choose which one of these it wants to use.
- To see the list of available modules, use
module avail
The modules with the libraries and utilities available on the cluster are listed in the section under the "/etc/modulefiles" file. For libraries the name usually also incorporates the architecture for which it is built (mostly x86_64).
- To show information about the specific module (what environments is it setting and how), use
module show module_name
For instance to show information about the deault OpenMPI, use
module show hyp-openmpi-1.10-mod.x86_64
- To use a specific module in current shell environment, use
module load module_name
For instance to use the OpenMPI (for instance the well tested Open MPI 1.10.4), use
module load hyp-openmpi-1.10-mod.x86_64
- After you are done with using the specific module, you can unload it from the current shell environment by issuing
module unload module_name
Though, if the script within which you've loaded the module ends, the module is automatically unloaded as all environment of that shell is destroyed. So, unless you need to change the specific module within one script, you do not need to bother with the unloading.
- If you want to see the list of your currently loaded modules, use
module list
- For more details and functions on handling the shell modules, see man module.
Currently Available Modules
- hyp-binutils-mod.x86_64
- Binutils 2.27 - A GNU collection of binary utilities (GNU Assembler, GNU Linker, ...).
- hyp-xpmem-mod.x86_64
- Cross Partition Memory (xpmem) GIT 20161228 - Enables a process to map the memory of another process into its virtual address space.
- This is an experimental version of XPMEM based on a version provided by Cray and uploaded to Google Code.
- The term partition on Cray systems originally referred to a single software (or a process) running on the system. So, cross-partition sharing in fact means sharing between processes.
- This is a method, which can be used to exchange data or messages between individual processes running on a single node via sharing a memory. A sort of a fast RDMA within one node.
- Requires the xpmem kernel module to be loaded in order for this to work.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- gaussian-09d
- Gaussian 0.9D - A computing tool providing capabilities for electronic structure modelling (quantum mechanics, energy predictions, molecular structures, vibrational frequencies and molecular properties and reactions in variety of chemical environments) used by chenists, chemical engineers, biochemists, physicists and other scientists worldwide.
- hyp-gmp-mod.x86_64
- GMP 6.1.1 - GNU Multiple Precision arithmetic library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating-point numbers.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- hyp-gcc-mod-alias.x86_64
- hyp-gcc-5.4-mod.x86_64
- GNU Compiler Collection 5.4.0 - A collection of GNU compilers including C, C++, Fortran, Go, JIT, LTO, Objective-C, and Objective-C++.
- Automatically includes the following modules: hyp-binutils-mod.x86_64, hyp-gmp-mod.x86_64, hyp-mpfr-mod.x86_64, hyp-mpc-mod.x86_64, and hyp-isl-mod.x86_64.
- hyp-isl-mod.x86_64
- ISL 0.17.1 - A library for manipulating sets and relations of integer points bounded by linear constraint.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Automatically includes the hyp-gmp-mod.x86_64 module as this ISL library depends on that specific installation of GMP 6.1.1.
- hyp-knem-mod.x86_64
- KNEM 1.1.2 - KNEM is a Linux kernel module enabling high-performance intra-node MPI communication for large messages.
- Offers support for asynchronous and vectorial data transfers as well as offloading memory copies on to Intel I/OAT hardware.
- MPI implementations usually offer a user-space double-copy based intra-node communication strategy. It's very good for small message latency, but it wastes many CPU cycles, pollutes the caches, and saturates memory busses. KNEM transfers data from one process to another through a single copy within the Linux kernel. The system call overhead (about 100ns these days) isn't good for small message latency but having a single memory copy is very good for large messages (usually starting from dozens of kilobytes).
- Some vendor-specific MPI stacks (such as Myricom MX, Qlogic PSM, ...) offer similar abilities but they may only run on specific hardware interconnect while KNEM is generic (and open-source). Also, none of these competitors offers asynchronous completion models, I/OAT copy offload and/or vectorial memory buffers support as KNEM does.
- Requires the knem kernel module to be loaded in order for this to work.
- hyp-mpc-mod.x86_64
- MPC 1.0.3 - C library for the arithmetic of complex numbers with arbitrarily high precision and correct rounding of the result.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Automatically includes the hyp-gmp-mod.x86_64 and hyp-mpfr-mod.x86_64 module as this MPFR library depends on those two.
- hyp-mpfr-mod.x86_64
- MPFR 3.1.4 - C library for multiple-precision floating-point computations with correct rounding.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Automatically includes the hyp-gmp-mod.x86_64 module as this MPFR library depends on that specific installation of GMP 6.1.1.
- hyp-libnl-mod.x86_64
- Netlink Protocol Library Suite (libnl) 3.2.25 - The libnl suite is a collection of libraries providing APIs to netlink protocol based Linux kernel interfaces. Netlink is a IPC mechanism primarly between the kernel and user space processes. It was designed to be a more flexible successor to ioctl to provide mainly networking related kernel configuration and monitoring interfaces.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- hyp-numactl-mod.x86_64
- numactl/libnuma 2.0.11 - The numactl program allows you to run your application program on specific cpu's and memory nodes. It does this by supplying a NUMA memory policy to the operating system before running your program. The libnuma library provides convenient ways for you to add NUMA memory policies into your own program.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- hyp-libibverbs-mod.x86_64
- OFED InfiniBand/RDMA Verbs Library (libibverbs) 1.2.1 - A library for direct use of InfiniBand/RDMA verbs.
- This is the production implementation to be used in applications, it is compiled without the Valgrind support, if you use Valgrind debugging for software using this library, it will complaint about accessing uninitialized part of memory, which is not a problem, since the memory is initialized by kernel and Valgrind has no way of knowing that. For debugging purposes please use the debugging version with Valgrind support instead (to be added later as a separate module ...).
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Intel InfiniPath HCA Userspace Driver (libipathverbs) 1.3 - A userspace driver for the Intel InfiniBand HCAs (Host Controller Adapters).
- It works as a plug-in module for libibverbs that allows programs to use Intel HCAs directly from userspace.
- libipathverbs will be loaded and used automatically by programs linked with libibverbs. The ib_qib
kernel module, which is a kernel space driver counterpart to the libipathverbs, must be loaded for Intel HCA devices to be detected and used.
(NOTE: The Intel's ib_qib is an updated version and a replacement of the older QLogic's
ib_ipath, which is nowadays still maintained and kept around, but only to manage the old
HTX based QLogic IinfiniBand HCAs, all the latest PCI Express QLE-series of SDR, DDR and QDR InfiniBand HCAs should use the new
ib_qib.)
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- OpenFabrics Alliance InfiniBand User Management Datagram Library (libibumad) 1.3.10.2 - OpenIB user management datagram library functions which sit on top of the user management datagram modules (like ib_umad) in the kernel.
- Its functions are used by the IB diagnostic and management tools, including OpenSM.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- This instantiation of the library is also compiled without the Valgrind support, if you use Valgrind debugging for software using this library, please use the debugging version with Valgrind support instead (to be added later as a separate module ...).
- OpenFabrics Alliance InfiniBand Management Datagram Library (libibmad) 1.3.12 - A convenience library to encode, decode, and dump InfiniBand management datagram (MAD) packets.
- It is implemented on top of and in conjunction with libibumad (the user MAD kernel interface library).
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- OpenFabrics Alliance InfiniBand Userspace RDMA Connection Manager (librdmacm) 1.1.0 - A library providing a userspace RDMA Communication Managment API.
- General RDMA communication manager. Used to establish connections over any RDMA transport, including InfinBand and iWarp.
- Applications that wish to run over any RDMA device should use this library.
- Includes an RDMA 'socket' API and protocol useful for developers that wish to take advantage of RDMA hardware, but desire a TCP/IP socket programming model.
- It is implemented directly on top of libibverbs.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- OpenFabrics Alliance User Space InfiniBand Connection Manager (libibcm) 1.0.5 - A userspace library that handles the majority of the low level work required to open a RDMA connection between two machines.
- This is an InfiniBand specific communication manager. It is used to establish connections over InfiniBand.
- The librdmacm is the recommended library for most applications, since it is easier to use and takes advantage of IP based addressing.
- Applications that require greater control over the connection or are unable to use IP addresses should use the libibcm.
- It is implemented directly on top of libibverbs.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- User Space Direct Access Transport v2.0 Library / User Space Direct Access Programming Library and Tools (uDAT/uDAPL) 2.1.10 - Defines a single set of user-level transport-independent platform-standard APIs for all RDMA-capable Transports.
- DAPL is currently targetted to the following application domains: DAFS, homogenous and heterogenous clusters/databases, Sockets using RDMA capabilities (SDP), Message Passing Interface (MPI), SCSI RDMA Protocol (SRP) and iSCSI extensions for RDMA (iSER).
- DAPL currently considers the following transport mechanisms providing RDMA capabilities: InfiniBand, Virtual Interface Architecture and iWARP.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Intel Performance Scaled Messaging Libraries (infinipath-psm) 3.3-19_g67c0807_open - Intel's low-level user-level communications interface for the True Scale family of products.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Intel Performance Scaled Messaging 2 Library (opa-psm2) 2.1 (git snapshot from 21.12.2016) - Intel's low-level user-level communications interface for the Intel Omni-Path Architecture family of products.
- Not really usable for us now, since we do not have any of the Intel Omni-Path Architecture family products. We should stick with older PSM(1).
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- OpenFabric Interaces Library (libfabric) 1.4.0 - A framework focused on exporting fabric communication services to applications.
- OFI is best described as a collection of libraries and applications used to export fabric services.
- Libfabric is a core component of OFI. It is the library that defines and exports the user-space API of OFI, and is typically the only software that applications deal with directly. It works in conjunction with provider libraries, which are often integrated directly into libfabric.
- The goal of OFI, and libfabric specifically, is to define interfaces that enable a tight semantic map between applications and underlying fabric services. Specifically, libfabric software interfaces have been co-designed with fabric hardware providers and application developers, with a focus on the needs of HPC users.
- Libfabric supports multiple interface semantics, is fabric and hardware implementation agnostic, and leverages and expands the existing RDMA open source community.
- Libfabric is designed to minimize the impedance mismatch between applications, including middleware such as MPI, SHMEM, and PGAS, and fabric communication hardware. Its interfaces target high-bandwidth, low-latency NICs, with a goal to scale to tens of thousands of nodes.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- This is the InfiniBand module. You'll need it when you want to use InfiniBand utilities or libraries.
- Automatically includes the hyp-libnl-mod.x86_64 and hyp-gcc-5.4-mod.x86_64 modules.
- hyp-openmpi-1.10-mod.x86_64
- Open MPI 1.10.4 - The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.
- Currently compiled without the InfiniBand support and without other advanced features like OpenSHMEM, PSM, KNEM, XPMEM, UCX etc. Hopefully support for some (or most) of these features will be added later one by one as the underlying layers will be built.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Automatically includes the hyp-gcc-5.4-mod.x86_64, hyp-libnl-mod.x86_64, hyp-libibverbs-mod.x86_64, hyp-numactl-mod.x86_64 and hyp-hwloc-mod.x86_64 modules.
- hyp-openmpi-2.0-mod.x86_64
- Open MPI 2.0.1 - The Open MPI Project is an open source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.
- Currently compiled without the InfiniBand support and without other advanced features like OpenSHMEM, PSM, KNEM, XPMEM, UCX etc. Hopefully support for some (or most) of these features will be added later one by one as the underlying layers will be built.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- NOTE: If you have problems compiling/running your software with Open MPI 2.0, please rather try using the Open MPI 1.10, since the 2.0 version is still somewhat too new.
- Automatically includes the hyp-gcc-5.4-mod.x86_64, hyp-libnl-mod.x86_64, hyp-libibverbs-mod.x86_64, hyp-numactl-mod.x86_64 and hyp-hwloc-mod.x86_64 modules.
- hyp-openucx-mod.x86_64
- Open Unified Communication X (Open UCX) 8bf075a - an open-source production grade communication framework for data centric and high-performance applications.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Automatically includes the hyp-gcc-5.4-mod.x86_64, hyp-libnl-mod.x86_64, hyp-libibverbs-mod.x86_64, hyp-numactl-mod.x86_64 and hyp-xpmem-mod.x86_64 modules.
- hyp-hwloc-mod.x86_64
- Portable Hardware Locality (hwloc) 1.11.5 - Provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs.
- Portable Network Locality (netloc) 0.5 - Provides network topology discovery tools, and an abstract representation of those networks topologies for a range of network types and configurations. It is provided as a companion to the Portable Hardware Locality (hwloc) package.
- Optimized for particular CPU => do not do static linking, if you wish to run the program on nodes with a different CPU!
- Automatically includes the hyp-numactl-mod.x86_64 module.
- hyp-vdpau-mod.x86_64
- VDPAU library 1.1.1 (+ VDPAU Info 1.0) - VDPAU is the Video Decode and Presentation API for UNIX. It provides an interface to video decode acceleration and presentation hardware present in modern GPUs. The library used by applications that wish to use VDPAU is libvdpau. This is a wrapper library that loads the appropriate implementation backend. There is also a tracing library that can be used to debug VDPAU applications.
- hyp-xds-mod.x86_64
- XDS X-ray Detector Software for processing single-crystal monochromatic diffraction data recorded by the rotation method.
- The software is licensed always for just one year at a time. So, current installation should stop working after June 30, 2017, when it should be replaced by a new version again.
- Some additional modules
Other Currently Available Software Outside of Environment Modules
- nVidia CUDA Toolkit 8.0.44
- The nVidia CUDA Toolkit provides a comprehensive development environment for C and C++ developers building GPU-accelerated applications. The CUDA Toolkit includes a compiler for nVidia GPUs, math libraries, and tools for debugging and optimizing the performance of your applications. You’ll also find programming guides, user manuals, API reference, and other documentation to help you get started quickly accelerating your application with GPUs.
- NOTE: To use CUDA on a GPU or do any GPU-related operations (currently available on node22), one needs to be a part of the nvidia group! If you feel you need to be a part of that group and are not, turn to the administrators, preferably via the Hyperion helpdesk.
- nVidia Drivers 375.26
- Drivers for nVidia graphics cards. (Currently mainly for the GTX Titan available on node22 and to use CUDA.)
- Open Grid Scheduler/Grid Engine 2011.11 patch 1
- Open Grid Scheduler/Grid Engine is a commercially supported open-source batch-queuing system for distributed resource management. OGS/GE is based on Sun Grid Engine.
Resources
Follows a list of specially defined resources on the Hyperion Cluster, which can be used by the users to either
claim resources or run on a queue which provides resources. These resources can be requested by the -l resource=value
argument of the qsub command when submitting a job.
- scratch
- Size of the scratch partition assigned to each slot of a job.
- Scratch space is a fast-access space available to a job on the node's local disk for local temporary data storage. It is fast and is local to the node, so it does not affect the network speed or jobs on other nodes.
- Multiple slots of one job on one node aggregate the reserved space, which means that if multiple processes of one job are running on one node, then they share the same scratch space of the size of a sum of scratch spaces assigned to each of their slots (minus the filesystem overhead of course).
- Requestable consumable.
- Integer type argument given in LVM PE units (1 LVM PE = 4 MB) of partition size.
- Each node has a defined maximum number of PEs available (for nodes 2,…,22 it is currently 108338 LVM PEs = cca 423.19 GB per node).
- Default value is 4 = 16 MB partition.
- When the combined value for the job for any given node is below 4, the job shall not be executed and shall be placed in an error state, with the appropriate reason being printed out to the standard error output of the job.
- The directory of job's appropriate partition is pointed to by the TMP and TMPDIR environment variables, which the user should use for accessing the scratch.
- The scratch for a job is created during launching the job and is destroyed right after the job is terminated (for any reason, regular or not), including all data stored there.
- During the lifespan of the job, the job can use the entire space of the scratch partition as it sees fit.
- Example: Submit a single-process job with scratch space on a partition of size 10 LVM PEs = 10 * 4 MB = 40 MB.
qsub -l scratch=10 single_thread_job.sh
- Example: Submit a 40-process parallel job with 10 LVM PEs = 10 x 4 MB = 40 MB scratch partition size for each process. If the job shall for instance span two nodes, with having 32 processes on one node and 3 processes on another node, it shal have a scratch partition of size 32 x 40 MB = 1280 MB on the first node and of size 3 x 40 MB = 120 MB on the other node. (Assuming the make parallel environment is set to distribute jobs appropriately as we propose.)
qsub -l scratch=10 -pe make 35 parallel_job.sh
- How to find the amount of currently available space for scratch on each node? (We need to ask by queue and queue all.q covers all nodes, so that is why we explicitly specify this queue, of course you can specify any other queue as well, if you will.)
qstat -q all.q -F scratch
The resulting numbers for each node are again specified in the LVM block units (see above).
To see the list of all available complexes (a superset of resources) use
qconf -sc
For explanation of the listed columns, see man complex and meaning of some of the standard complexes and resources is described in man queue_conf in section "RESOURCE LIMITS". Also see paragraph "How to launch a simple multi-slot/process MPI job on the cluster?" (above) here for description of several other important complexes.
Parallel Environments
Follows a description of general and special-purpose parallel environments defined on the Hyperion cluster. These are used with the -pe parallel-environment [slots] argument of qsub.
A parallel environment in the Grid Engine terms is a configuration, which tells the Grid Engine scheduler how to choose which cluster execution nodes are going to be used for a job which has more that one process, and how to distribute the individual processes of a job onto the nodes and their slots.
- make
- General purpose parallel environment with the fill-up allocation rule.
- Meaning, that it would try to concentrate the processes/slots of the submitted job in as small number of nodes as possible by first trying to fill currently allocated node with processes/slots before allocating processes/slots in the next available node.
- This parallel environment is good if your job profits from having its processes/slots concentrated together in smaller number of nodes. For instance for the SMP parallelism or for possibly faster intra-node communications.
- orte
- General purpose parallel environment with the round-robin allocation rule.
- Meaning, that it would try to spread the processes/slots of the submitted job in as many nodes as possible by first putting just one process/slot of your job in each available node and only when there is no free available node, where your job is not scheduled yet, it goes over the nodes again and puts another one process/slot in each node with free slots in a round-robin fashion again.
- This parallel environment is good if your job profits from having its processes/slots spread accross as many nodes as possible, ideally with one process/slot per node. For instance if each of your processes require more scratch space or memory or just does local-disk-intensive operations.
- smp
- General purpose parallel environment with the pe-slots allocation rule.
- Meaning, that it would try to concentrate the processes/slots of the submitted job on just one node. If no such node is available, which can run the entire job, the job is not run.
- This parallel environment is good if your job only does SMP parallelism, such as with the use of OpenMP, or if for some other reason you only want your job to run on one node (for instance, if you need all processes/threads to access the same scratch).
- quant4
- General purpose parallel environment with the 4-slot granularity allocation rule.
- Meaning, that it would allocate exactly 4 slots/processes per each node assigned for your job.
- It is also a good thing to request (and use) for your job a number of slots that would be a multiple of 4, if you really want to use the feature to the maximum.
- This parallel environment is good if your job profits from having 4-way SMP parallel processing for each node.
- quant8
- Same as the quant4, but for multiples of 8 instead of 4.
- quant16
- Same as the quant4, but for multiples of 16 instead of 4.
- quant32
- Same as the quant4, but for multiples of 32 (which currently is the maximum possible for our exec nodes) instead of 4.
- gaussian
- gaussian-linda
- Special parallel environment to be used with the Gaussian software in Linda multi-node 8-way shared-memory parallel (SMP) mode.
- NOTE: This method of using Gaussian does not really work now, so please use the gaussian parallel environment for Gaussian instead. Reasons for why it does not work are unknown, suspission is that the Linda just does not work properly when launched like this, even though officially it should work. (??)
- Allocates Gaussian Linda workers of size 8 SMP threads per worker accross multiple nodes, one worker per node.
- BEWARE: In order to use Gaussian, you need to be a member of the gaussian group. If you feel you need to be a part of that group and are not, turn to the head of the Solid State Engineering Department for allowance and to the administrators for executing the allowance.
- Example: Assume we have a Gaussian input file gaussian_test.gjf beginning with something like:
$RunGauss
...
but without the %NProcShared=... header line. The number of processes per worker and the individual worker nodes are automatically supplied to the Linda by the parallel environment via the GAUSS_WDEF and GAUSS_PDEF shell environment variables.
Then we can create a job file gaussian_linda_test.job looking like:
#$ -N gaussian_linda_test
#$ -cwd
#$ -q all.q
#$ -S /bin/bash
#$ -l scratch=3385
#$ -pe gaussian-linda 16
module load gaussian-09d
g09 < gaussian_test.job > gaussian_test.log
We can launch the job like this:
qsub gaussian_linda_test.job
The number given on the -pe gaussian-linda ...
line in the job file should be a multiple of 8!
- xds
Special Software Available on the Hyperion Cluster
Contents of this page was written and is maintained by Martin Dráb © 2016 KIPL FJFI CVUT