Quark Cluster: running larger/multiple jobs
For simple code development and reading/plotting data, you can use the desktop PCs and/or your laptop. However, if you need or produce larger data files, or need more computing power, you can use the quark cluster.
Important
To get access to the quark cluster, please ask your supervisor. They will ask one of the system administrators to add your account to the list of people who have access.
What is a computing cluster/batch farm?
A computing cluster is a number of computers that can be used to perform calculations. The quark cluster consists of a head node to which you log in and where you can run code interactively and 7 batch nodes on which you can run programs. Each batch node has 6 cores, so a total of 42 jobs can run in parallel. The idea of a batch farm is that you develop code on the head node and then submit jobs to a batch queue to run multiple jobs in parallel, for example to generate independent sets of events or to run over data files that each contain a subset of events.
Specifics of the setup of quark
You can reach the quark cluster by logging into its head node:
ssh -Y quark.science.uu.nl
from any computer at the university network.
Note
The -Y
option specifies that a graphical ('X11') connection also needs to be opened. You can also set this option in the ssh options file. Nikhef has a guide to help you configuring ssh connections. If you are curious, you can get more information on the ssh X11 forwarding.
To access the cluster from outside the university, you need to first ssh to the gateway machine gemini.science.uu.nl
(or use VPN). Use your solis-id as username (no capital letter at the start) and your solis password to log in. Your account has to be activated for access to quark; if this is not the case, ask your project supervisor.
There is a graphical interface (only accessible from university computers) to see the main parameters of the cluster and find a user manual.
Some more technical information:
- Home directories are not mounted from the university network. You can copy files to and from the cluster with
scp
. - There are two files systems:
/data1
: (20 Tb) Contains all home directories and software (ROOT, AliRoot)./data2
: (100 Tb) To keep larger (shared) data sets, like AODs (ALICE event data files) and simulation output.
- The batch system on the farm is SGE (more details below)
qsub -V -cwd <exec>
to submit a job;<exec>
is the name of the executable, usually a shell script that invokes your program with suitable arguments.qstat
to see your jobs in the queue.qdel
to delete a job from the queue.
Note
Not sure what a bash script is? You can check some examples or search for some tutorials.
-
Software is installed locally in
/data1/software/
To set up some basic environment, add the following to your .bashrc file:
module load python/2.7
export PATH=/cm/local/apps/environment-modules/3.2.10/Modules/3.2.10/bin/:$PATH
export ALIBUILD_WORK_DIR=/data1/software/alisoft
unset PYTHONPATH
To load the environment before starting aliroot (you can also define an alias in your .bashrc
file), use:
alienv enter --architecture slc6_x86-64 --shellrc AliPhysics/latest-master-root6
To define aliases, you can add for example the following to your .bashrc
file:
alias alienv='alienv --architecture slc6_x86-64'
alias ali='alienv enter --shellrc AliPhysics/latest-master-root6,AliRoot-OCDB/latest-release'