{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", " \"Kamiak" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Kamiak Cluster at WSU" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we document our experience using the Kamiak HPC cluster at WSU." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Kamiak Specific\n", "\n", "* [Kamiak Users Guide](https://hpc.wsu.edu/users-guide/): Read this.\n", "* [Service Requests](https://hpc.wsu.edu/support/service-requests/): Request access to Kamiak here and use this for other service requests (software installation, issues with the cluster, etc.)\n", "* [Queue List](https://hpc.wsu.edu/kamiak-hpc/queue-list/): List of queues.\n", "\n", "### General\n", "* [SLURM](https://slurm.schedmd.com): Main documentation for the current job scheduler.\n", "* [Lmod](http://lmod.readthedocs.org): Environment module system.\n", "* [Conda](https://conda.io/en/latest/): Package manager for python and other software.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## TL;DR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you have read everything below, then you can use this job script.\n", "\n", "Notes:\n", "\n", "* Make sure that you can clone everything without an SSH agent. (I.e. any pip-installable packages.)" ] }, { "cell_type": "markdown", "metadata": { "heading_collapsed": true }, "source": [ "### Python on a Single Node\n", "\n", "If you are running only on a single node, then it make sense to create an environment that uses a `/local` scratch space since this is the fastest sort of storage available. Here we create the environment in our SLURM script storing the location in `my_workspace`.\n", "\n", "\n", "```bash\n", "#!/bin/bash\n", "#SBATCH -n 1 # Number of cores\n", "#SBATCH -t 0-00:10 # Runtime in D-HH:MM\n", "\n", "# Local workspace for install environments.\n", "# This will be removed at the end of the job.\n", "my_workspace=\"$(mkworkspace --backend=/local --quiet)\"\n", "\n", "function clean_up { # Clean up. Remove temporary workspaces and the like.\n", " rm -rf \"${my_workspace}\"\n", " exit\n", "}\n", "trap 'clean_up' EXIT\n", "\n", "# TODO: Why does hg-conda not work here?\n", "module load conda mercurial\n", "conda activate base\n", "\n", "# TODO: Make this in /scratch for long-term use\n", "export CONDA_PKGS_DIRS=\"${my_workspace}/.conda\"\n", "conda_prefix=\"${my_workspace}/current_conda_env\"\n", "#conda env create -f environment.yml --prefix \"${conda_prefix}\"\n", "mamba env create -q -f environment.yml --prefix \"${conda_prefix}\"\n", "conda activate \"${conda_prefix}\"\n", "\n", "... # Do your work.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the cluster requires understanding the following components:\n", "\n", "### Obtaining Access\n", "\n", "Request access by submitting a [service request](https://hpc.wsu.edu/support/service-requests/). Identify your advisor/supervisor.\n", "\n", "### Connecting\n", "\n", "To connect to the cluster, use SSH. I recommend generating and installing an SSH key so you can connect without a password.\n", "\n", "### Jobs and Queues\n", "\n", "All activity – including development, software installation, etc. – must be run on the compute nodes. You gain access to these by submitting a job to the appropriate job queue (scheduled with SLURM). There are three types of jobs:\n", "\n", "* Dedicated jobs: If you or your supervisor own nodes on the system, you can submit jobs to the appropriate queue and gain full access to these, kicking anyone else off. Once you have access to your nodes, you can do what you like. An example would be the [CAS queue `cas`](https://hpc.wsu.edu/kamiak-hpc/queue-list/#builder-section-1452018328352).\n", "* Backfill jobs: The default is to submit a job to the [Backfill queue `kamiak`](https://hpc.wsu.edu/kamiak-hpc/queue-list/). These will run on whatever nodes are not occupied, but can be preempted by the owners of the nodes. For this reason, you **must** implement a checkpoint-restart mechanism in your code so you can pickup where you left off when you get preempted.\n", "\n", "On top of these, you can choose either background jobs (for computation) or interactive jobs (for development and testing).\n", "\n", "### Resources\n", "\n", "When you submit a job, you must know:\n", "\n", "* How many nodes you need.\n", "* How many processes you will run.\n", "* Roughly how much memory you will need.\n", "* How long your job will take.\n", "\n", "Make sure that your actual usage matches your request. To do this you must **profile your code**. Understand the expected memory and time usage before you run, then actually test this to make sure your code is doing what you expect. If you exceed the requested resources, you may slow down the cluster for other users. E.g. launching more processes than there are threads on a node will cause thread contention, significantly impacting the performance of your program and that of others.\n", "\n", "**Nodes are a shared resource - request only what you need and do not use more than you request.**\n", "\n", "### Software\n", "\n", "Much of the software on the system is managed by the [Lmod](http://lmod.readthedocs.org) module system. Custom software can be installed by sending service requests, or built in your own account. I maintain an up-to-date [conda](https://conda.io/en/latest/) installation and various environments." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Preliminary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SSH\n", "\n", "To connect to the cluster, I recommend configuring your local SSH server with something like this. (**Change `m.forbes` to your username!**)\n", "\n", "```bash\n", "# ~/.ssh/config\n", "Host kamiak\n", " HostName kamiak.wsu.edu\n", " User m.forbes\n", " ForwardAgent yes\n", "\n", "Host cn*\n", " ProxyCommand ssh kamiak nc %h %p\n", " User m.forbes\n", " ForwardAgent yes\n", " # The following are for jupyter notebooks. Run with:\n", " # jupyter notebook --port 18888 \n", " # and connect with \n", " # https://localhost:18888\n", " ######## PORT FORWARDING TO NODES DOES NOT WORK.\n", " #LocalForward 10001 localhost:10001\n", " #LocalForward 10002 localhost:10002\n", " #LocalForward 10003 localhost:10003\n", " #LocalForward 18888 localhost:18888\n", " # The following is for snakeviz\n", " #LocalForward 8080 localhost:8080 \n", "```\n", "\n", "This will allow you to connect with `ssh kamiak` rather than `ssh m.forbes@kamiak.wsu.edu`. Then use `ssh-keygen` to create a key and copy it to `kamiak:~/.ssh/authorized_keys`. The second entry allows you to directly connect to the compute nodes, forwarding ports so you can run Jupyter notebooks. **Only do this for nodes for which you have been granted control through the scheduler.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interactive Queue\n", "\n", "Before doing any work, be sure to start an interactive session on one of the nodes. (Do not do work on the login nodes, this is a violation of the Kamiak user policy.) Once you have tested and profiled your code, run it with a non-interactive job in the batch queue.\n", "\n", "```bash\n", "$ idev --partition=kamiak -t 60\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Home Setup" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I have included the following setup. This will cause your `~/.bashrc` file to load some environmental variables, and create links to the data directory.\n", "\n", "```bash\n", "ln -s /data/lab/forbes ~/data\n", "ln -s ~/data/bashrc.d/inputrc ~/.inputrc # Up-arrow history for commands\n", "ln -s ~/data/bashrc.d/bash_alias ~/.bash_alias # Sets up environment\n", "```\n", "\n", "If you do not have a `.bashrc` file, then you can copy mine and similar related files.\n", "\n", "```bash\n", "cp ~/data/bashrc.d/bashrc ~/.bashrc\n", "cp ~/data/bashrc.d/bash_profile ~/.bash_profile\n", "cp ~/data/bashrc.d/hgrc ~/.hgrc\n", "cp ~/data/bashrc.d/hgignore ~/.hgignore\n", "```\n", "\n", "If you do have one, then you can append these commands using `cat`:\n", "\n", "```bash\n", "cat >> ~/.bashrc <\n", "\n", "# Common global ignores\n", "ignore.common = ~/.hgignore\n", "\n", "[extensions]\n", "graphlog =\n", "extdiff = \n", "rebase = \n", "record = \n", "histedit =\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Conda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I do not have a good solution yet for working with Conda on Kamiak. Here are some goals and issues:\n", "\n", "**Goals**\n", "* Allow users to work with custom environments ensuring reproducible computing.\n", "* Allow users to install software using `conda`. (The other option is to use `pip`, but I am migrating to make sure all of my packages are available on my `mforbes` anaconda channel.\n", "\n", "**Issues**\n", "* Working with conda in the user's home directory (default) or on `/scratch` is very slow. For some timings, installing a minimal python3 two times in succession (so that the second time needs no downloads). We also compare the time required to copy the environment to the Home directory, and the time it takes to run `rm -r pkgs envs`:\n", "\n", "| Location | Fresh Install | Second Install | Copy Home | Removal |\n", "|----------|---------------|----------------|-----------|---------|\n", "| Home | 3m32s | 1m00s | N/A | 1m03s |\n", "| Scratch | 2m16s | 0m35s | 2m53s | 0m45s |\n", "| Local | 0m46s | 0m11s | 1m05s | 0m00s |\n", "\n", "**Recommendation**\n", "* If you need a custom environment, use the Local drive `/local` and build it at the start of your job. *A full anaconda installation takes about 5m24s on `/local`.*\n", "* If you need a persistent environment, build it in your Home directory, but keep the `pkgs` directory on Scratch or Local to avoid exceeding your quota. *(Note: conda environments are [not relocatable](https://github.com/conda/conda/issues/3097), so you can't just copy the one you built on Local to your home directory. With the copy speeds, it is faster just to build the environment again.)*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Playing with Folders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will need to manage our own environment so we can install appropriate versions of the\n", "python software stack. In principle this should be possible with Anaconda 4.4 (see this\n", "issue – [Better support for conda envs accessed by multiple\n", "users](https://github.com/conda/conda/issues/1329) – for example), but Kamiak does not\n", "yet have this version of Conda. Untill then, we maintain our own stack.\n", "\n", "## Conda Root Installation\n", "\n", "We do this under our lab partition `/data/forbes/apps/conda` so that others in our group\n", "can share these environments. To use these do the following:\n", "\n", "1. `module load conda`: This will allow you to use our conda installation.\n", "2. `conda activate`: This activates the base environment with `hg` and `git-annex`.\n", "3. `conda env list`: This will show you which environments are available. Choose the\n", " appropriate one and then:\n", "4. `conda activate --stack `: This will activate the specified environment,\n", " stacking this on top of the base environment so that you can continue to use `hg` and\n", " `git-annex`.\n", "5. `conda deactivate`: Do this a couple of times when you are done to deactivate your\n", " environments.\n", "6. `module unload conda`: Optionally, unload the conda module.\n", "\n", "Note: you do not need to use the undocumented `--stack` feature for just running code:\n", "`conda activate ` will be fine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Primary Conda Environments (OLD)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```bash\n", "conda create -y -n work2 python=2\n", "conda install -y -n work2 anaconda\n", "conda update -y -n work2 --all\n", "conda install -y -n work2 accelerate\n", "\n", "conda create -y -n work3 python=3\n", "conda install -y -n work3 anaconda\n", "conda update -y -n work3 --all\n", "conda install -y -n work3 accelerate\n", "\n", "for _e in work2 work3; \n", " do . activate $_e\n", " pip install ipdb \\\n", " line_profiler \\\n", " memory_profiler \\\n", " snakeviz \\\n", " uncertainties \\\n", " xxhash \\\n", " mmf_setup\n", "done\n", "\n", "module load cuda/8.0.44 # See below - install cuda and the module files first\n", "for _e in work2 work3; \n", " do . activate $_e\n", " pip install pycuda \\\n", " scikit-cuda\n", "done\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once these base environments are installed, we lock the directories so that they cannot be changed accidentally." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use python, first load the module of your choice:\n", "\n", "```bash\n", "[cn14] $ module av\n", "...\n", " anaconda2/2.4.0\n", " anaconda2/4.2.0 (D)\n", " anaconda3/2.4.0\n", " anaconda3/4.2.0\n", " anaconda3/5.1.0 (D) \n", "[cn14] $ module load anaconda3\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can create an environment in which to update everything.\n", "\n", "```bash\n", "[cn14] $ conda create -n work3 python=3\n", "Solving environment: done\n", "\n", "## Package Plan ##\n", "\n", " environment location: /home/m.forbes/.conda/envs/work3\n", "\n", " added / updated specs: \n", " - python=3\n", "\n", "\n", "The following packages will be downloaded:\n", "\n", " package | build\n", " ---------------------------|-----------------\n", " certifi-2018.11.29 | py37_0 146 KB\n", " wheel-0.33.1 | py37_0 39 KB\n", " pip-19.0.3 | py37_0 1.8 MB\n", " python-3.7.2 | h0371630_0 36.4 MB\n", " setuptools-40.8.0 | py37_0 643 KB\n", " ------------------------------------------------------------\n", " Total: 39.0 MB\n", "\n", "The following NEW packages will be INSTALLED:\n", "\n", " ca-certificates: 2019.1.23-0 \n", " certifi: 2018.11.29-py37_0 \n", " libedit: 3.1.20181209-hc058e9b_0\n", " libffi: 3.2.1-hd88cf55_4 \n", " libgcc-ng: 8.2.0-hdf63c60_1 \n", " libstdcxx-ng: 8.2.0-hdf63c60_1 \n", " ncurses: 6.1-he6710b0_1 \n", " openssl: 1.1.1b-h7b6447c_0 \n", " pip: 19.0.3-py37_0 \n", " python: 3.7.2-h0371630_0 \n", " readline: 7.0-h7b6447c_5 \n", " setuptools: 40.8.0-py37_0 \n", " sqlite: 3.26.0-h7b6447c_0 \n", " tk: 8.6.8-hbc83047_0 \n", " wheel: 0.33.1-py37_0 \n", " xz: 5.2.4-h14c3975_4 \n", " zlib: 1.2.11-h7b6447c_3 \n", "\n", "Proceed ([y]/n)? y\n", "\n", "\n", "Downloading and Extracting Packages\n", "certifi-2018.11.29 | 146 KB | ################################################################################################################################################################### | 100% \n", "wheel-0.33.1 | 39 KB | ################################################################################################################################################################### | 100% \n", "pip-19.0.3 | 1.8 MB | ################################################################################################################################################################### | 100% \n", "python-3.7.2 | 36.4 MB | ################################################################################################################################################################### | 100% \n", "setuptools-40.8.0 | 643 KB | ################################################################################################################################################################### | 100% \n", "Preparing transaction: done\n", "Verifying transaction: done\n", "Executing transaction: done\n", "#\n", "# To activate this environment, use:\n", "# > source activate work3\n", "#\n", "# To deactivate an active environment, use:\n", "# > source deactivate\n", "#\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can activate `work3` and update anaconda etc.\n", "\n", "```bash\n", "[cn14] $ . activate work3\n", "(work3) [cn14] $ conda install anaconda\n", "Solving environment: done\n", "\n", "## Package Plan ##\n", "\n", " environment location: /home/m.forbes/.conda/envs/work3\n", "\n", " added / updated specs: \n", " - anaconda\n", "\n", "\n", "The following packages will be downloaded:\n", "\n", " package | build\n", " ---------------------------|-----------------\n", " anaconda-2018.12 | py37_0 11 KB\n", " keyring-17.0.0 | py37_0 49 KB\n", " dask-core-1.0.0 | py37_0 1.2 MB\n", " ...\n", " ------------------------------------------------------------\n", " Total: 559.3 MB\n", "\n", "The following NEW packages will be INSTALLED:\n", "\n", " alabaster: 0.7.12-py37_0 \n", " anaconda: 2018.12-py37_0 \n", " anaconda-client: 1.7.2-py37_0\n", " ...\n", "\n", "The following packages will be DOWNGRADED:\n", "\n", " ca-certificates: 2019.1.23-0 --> 2018.03.07-0 \n", " libedit: 3.1.20181209-hc058e9b_0 --> 3.1.20170329-h6b74fdf_2\n", " openssl: 1.1.1b-h7b6447c_0 --> 1.1.1a-h7b6447c_0 \n", " pip: 19.0.3-py37_0 --> 18.1-py37_0 \n", " python: 3.7.2-h0371630_0 --> 3.7.1-h0371630_7 \n", " setuptools: 40.8.0-py37_0 --> 40.6.3-py37_0 \n", " wheel: 0.33.1-py37_0 --> 0.32.3-py37_0 \n", "\n", "Proceed ([y]/n)? y\n", "\n", "\n", "Downloading and Extracting Packages\n", "anaconda-2018.12 | 11 KB | ################################################# | 100%\n", "...\n", "\n", "\n", "(work3) $ du -sh .conda/envs/*\n", "36M\t.conda\n", "(work2) $ du -sh /opt/apps/anaconda2/4.2.0/\n", "2.2G\t/opt/apps/anaconda2/4.2.0/\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some files are installed, but most are linked so this does not create much of a burden." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Issues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The currently recommended approach for setting up conda is to source the file `.../conda/etc/profile.d/conda.sh`. This does not work well with the module system, so I had to write a custom module file that does what this file does. This may get better in the future if the following issues are dealt with:\n", "\n", "* [#6820: Consider shell-agnostic activate.d/deactivate.d mechanism](https://github.com/conda/conda/issues/6820): This one even suggests using Lmod for activation.\n", "* [#7407: Some conda environment variables are not being unset when you deactivate the virtual environment](https://github.com/conda/conda/issues/7407): Closed, but references [issue #7609](https://github.com/conda/conda/issues/7609).\n", "* [#7609: add conda deactivate --all flag](https://github.com/conda/conda/issues/7609): Might not help.\n", "\n", "\n", "## References\n", "* [Conda Docs: Multi-User support](https://docs.conda.io/projects/conda/en/latest/user-guide/configuration/admin-multi-user-install.html?highlight=multi-user): It seems like the Kamiak installations do not use a top-level `.condarc` file.\n", "* [Issue 1329: Better support for conda envs accessed by multiple users](https://github.com/conda/conda/issues/1329).\n", "* [PR 5159: support stacking environments](https://github.com/conda/conda/pull/5159)\n", "* [Constructor Issue 145: `conda --clone` surprised me by downloading a stack of files](https://github.com/conda/constructor/issues/145)." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Inspecting the Cluster" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes you might want to see what is happening with the cluster and various jobs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Queue" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see what jobs have been submitted use the [`squeue`](https://slurm.schedmd.com/squeue.html) command.\n", "\n", "```bash\n", "squeue\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Nodes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose you are running on a node and performance seems to be poor. It might be that you are overusing the resources you have requested. To see this, you can log into the node and use the [`top`]() command. For example:\n", "\n", "```bash\n", "$ squeue -u m.forbes\n", " JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)\n", " 661259 kamiak idv4807 m.forbes R 2:41 1 cn94\n", "$ squeue -o \"%.18i %.9P %.8j %.8u %.2t %.10M %.6D %C %R\" -w cn94\n", " JOBID PARTITION NAME USER ST TIME NODES CPUS NODELIST(REASON)\n", " 653445 kamiak SCR5 l... R 4-11:28:12 1 4 cn94\n", " 653448 kamiak SCR18 l... R 3-12:59:43 1 8 cn94\n", " 654674 kamiak SCR10 l... R 2-06:26:03 1 4 cn94\n", " 654675 kamiak SCR12 l... R 2-06:26:03 1 4 cn94\n", " 659459 kamiak meme1 e... R 2-06:26:03 1 1 cn94\n", " 660544 kamiak meme2 e... R 3-08:20:33 1 1 cn94\n", " 661259 kamiak idv4807 m... R 7:17 1 5 cn94\n", "```\n", "\n", "This tells us that I have 1 jobs running on note `cn94` which requested 5 CPUs, while user `l...` is running 4 jobs having requested a total of 20 CPUs, and user `e...` is running 2 jobs, having requested 1 CPU each. *(Note: to see the number of CPUs, I needed to manually adjust the format string as described in the [manual](https://slurm.schedmd.com/squeue.html).)*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Node Capabilities" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see what the compute capabilities of the node are, you can use the `lscpu` command:\n", "\n", "```bash\n", "[cn94] $ lscpu\n", "Architecture: x86_64\n", "CPU op-mode(s): 32-bit, 64-bit\n", "Byte Order: Little Endian\n", "CPU(s): 28\n", "On-line CPU(s) list: 0-27\n", "Thread(s) per core: 1\n", "Core(s) per socket: 14\n", "Socket(s): 2\n", "NUMA node(s): 2\n", "Vendor ID: GenuineIntel\n", "CPU family: 6\n", "Model: 79\n", "Model name: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz\n", "Stepping: 1\n", "CPU MHz: 2404.687\n", "CPU max MHz: 3200.0000\n", "CPU min MHz: 1200.0000\n", "BogoMIPS: 3990.80\n", "Virtualization: VT-x\n", "L1d cache: 32K\n", "L1i cache: 32K\n", "L2 cache: 256K\n", "L3 cache: 35840K\n", "NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26\n", "NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27\n", "Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts\n", "```\n", "\n", "This tells us sum information about the node, including that there are 14 cores per socket and 2 sockets, for a total of 28 cores on the node, so the 27 requested CPUs above should run fine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Node Usage\n", "\n", "To see what is actually happening on the node, we can log in and run top:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```bash \n", "$ ssh cn94\n", "$ top -n 1\n", "Tasks: 772 total, 14 running, 758 sleeping, 0 stopped, 0 zombie\n", "%Cpu(s): 46.5 us, 0.1 sy, 0.0 ni, 53.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st\n", "KiB Mem : 13172199+total, 10478241+free, 23872636 used, 3066944 buff/cache\n", "KiB Swap: 0 total, 0 free, 0 used. 10730780+avail Mem \n", "\n", " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND \n", " 20936 e... 20 0 1335244 960616 1144 R 3.6 0.7 4769:39 meme\n", " 30839 l... 20 0 1350228 0.995g 7952 R 3.6 0.8 236:38.54 R\n", " 30853 l... 20 0 1350228 0.993g 7952 R 3.6 0.8 236:38.75 R\n", " 30862 l... 20 0 1350228 0.995g 7952 R 3.6 0.8 236:37.37 R\n", "122856 l... 20 0 1989708 1.586g 7988 R 3.6 1.3 1452:29 R\n", "122865 l... 20 0 1989704 1.585g 7988 R 3.6 1.3 1452:25 R\n", "124397 l... 20 0 1885432 1.514g 7988 R 3.6 1.2 1434:18 R\n", "124410 l... 20 0 1885428 1.514g 7988 R 3.6 1.2 1434:17 R\n", "124419 l... 20 0 1885428 1.514g 7988 R 3.6 1.2 1434:17 R\n", " 26811 l... 20 0 2710944 2.259g 7988 R 3.6 1.8 2595:41 R\n", " 26833 l... 20 0 2710940 2.262g 7988 R 3.6 1.8 2595:51 R\n", "122847 l... 20 0 1989700 1.585g 7988 R 3.6 1.3 1452:29 R\n", "170160 e... 20 0 1150992 776276 1140 R 3.6 0.6 3216:06 meme\n", " 50214 m.forbes 20 0 168700 3032 1612 S 0.0 0.0 0:02.60 top\n", "```\n", "\n", "Here I am just looking with `top`, but the other users are running 13 processes that are each using a full CPU on the node. The 3.6% = 1/28, since the node has 28 CPUs. *(To see this view, you might have to press \"Shift-I\" while running top to disable Irix mode. If you want to save this as the default, press \"Shift-W\" which will write the defaults to your `~/.toprc` file.)*\n", "\n", "Note: there are several key-stroke commands you can use while running `top` to adjust the display. When two options are available, the lower-case version affects the listing below for each process, while the upper-case version affects the top summary line:\n", "\n", "* `e/E`: Changes the memory units.\n", "* `I`: Irix mode - toggles between CPU usage as a % of node capability vs as a % of CPU capability." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Software" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Modules" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To find out which modules exist, run `module avail`:\n", "\n", "\n", "```bash\n", "[cn112] $ module avail\n", "\n", "----------------------------------------- Compilers ------------------------------------------\n", " StdEnv (L) gcc/6.1.0 intel/xe_2016_update3 (L,D)\n", " gcc/4.9.3 gcc/7.3.0 (D) intel/16.2\n", " gcc/5.2.0 intel/xe_2016_update2 intel/16.3\n", "\n", "------------------------------- intel/xe_2016_update3 Software -------------------------------\n", " bazel/0.4.2 espresso/5.3.0 (D) hdf5/1.10.2 nwchem/6.8 (D)\n", " cmake/3.7.2 espresso/6.3.0 lammps/16feb16 octave/4.0.1\n", " corset/1.06 fftw/3.3.4 mvapich2/2.2 siesta/4.0_mpi\n", " dmtcp/2.5.2 gromacs/2016.2_mdrun netcdf/4 (D) stacks/1.44\n", " eems/8ee979b gromacs/2016.2_mpi (D) netcdf/4.6.1 stacks/2.2 (D)\n", " elpa/2016.05.003 hdf5/1.8.16 (D) nwchem/6.6\n", "\n", "--------------------------------------- Other Software ---------------------------------------\n", " anaconda2/2.4.0 git/2.6.3 python/2.7.10 (D)\n", " anaconda2/4.2.0 (D) globus/6.0 python/2.7.15\n", " anaconda3/2.4.0 google_sparsehash/4cb9240 python2/2.7.10 (D)\n", " anaconda3/4.2.0 graphicsmagick/1.3.10 python2/2.7.15\n", " anaconda3/5.1.0 (D) grass/6.4.6 python3/3.4.3\n", " angsd/9.21 grass/7.0.5 python3/3.5.0\n", " armadillo/8.5.1 grass/7.6.0 (D) python3/3.6.5 (D)\n", " arpack/3.6.0 gsl/2.1 qgis/2.14.15\n", " bamaddrg/1.0 hisat2/2.1.0 qgis/3.4.4 (D)\n", " bamtools/2.4.1 htslib/1.8 qscintilla/2.9.4\n", " bcftools/1.6 imagemagick/7.0.7-25 qscintilla/2.10 (D)\n", " beagle/3.0.2 interproscan/5.27.66 r/3.2.2\n", " beast/1.8.4 iperf/3.1.3 r/3.3.0\n", " beast/1.10.0 (D) java/oracle_1.8.0_92 (D) r/3.4.0\n", " bedtools/2.27.1 java/11.0.1 r/3.4.3\n", " binutils/2.25.1 jellyfish/2.2.10 r/3.5.1\n", " blast/2.2.26 jemalloc/3.6.0 r/3.5.2 (D)\n", " blast/2.7.1 (D) jemalloc/4.4.0 (D) rampart/0.12.2\n", " bonnie++/1.03e laszip/2.2.0 repeatmasker/4.0.7\n", " boost/1.59.0 ldhot/1.0 rmblast/2.2.28\n", " bowtie/1.1.2 libgeotiff/1.4.0 rmblast/2.6.0 (D)\n", " bowtie2/2.3.4 libint/1.1.4 rsem/1.3.1\n", " bowtie2/2.3.4.3 (D) libkml/1.3.0 salmon/0.11.3\n", " bwa/0.7.17 liblas/1.8.0 samtools/1.3.1\n", " canu/1.3 libspatialite/4.3.0a samtools/1.6\n", " cast/dbf2ec2 libxsmm/1.4.4 samtools/1.9 (D)\n", " ccp4/7.0 libzip/1.5.1 settarg/6.0.1\n", " cellranger/2.1.0 lmod/6.0.1 shelx/2016.1\n", " cellranger/3.0.2 (D) lobster/2.1.0 shore/0.9.3\n", " centrifuge/1.0.4 matlab/r2018a shoremap/3.4\n", " cp2k/4.1_pre_openmp matlab/r2018b (D) singularity/2.3.1\n", " cp2k/4.1_pre_serial mercurial/3.7.3-1 singularity/2.4.2\n", " cp2k/4.1 (D) mesa/17.0.0 singularity/3.0.0 (D)\n", " cuda/7.5 migrate/3.6.11 smbnetfs/0.6.0\n", " cuda/7.5.18 miniconda3/3.6 sqlite3/3.25.1\n", " cuda/8.0.44 mocat2/2.0 sratoolkit/2.8.0\n", " cuda/9.0.176 mothur/1.40.5 stringtie/1.3.5\n", " cuda/9.1.85 (D) music/4.0 superlu/4.3_dist\n", " cudnn/4_cuda7.0+ mysql/8.0.11 superlu/5.2.1\n", " cudnn/5.1_cuda7.5 mzmine/2.23 superlu/5.4_dist (D)\n", " cudnn/5.1_cuda8.0 namd/2.12_ib svn/2.7.10\n", " cudnn/6.0_cuda8.0 namd/2.12_smp swig/3.0.12\n", " cudnn/7.0_cuda9.1 namd/2.12 (D) tassel/3.0\n", " cudnn/7.1.2_cuda9.0 netapp/5.4p1 tcl-tk/8.5.19\n", " cudnn/7.1.2_cuda9.1 (D) netapp/5.5 (D) texinfo/6.5\n", " cufflinks/2.2.1 octave/4.2.0 texlive/2018\n", " dislin/11.0 octave/4.4.0 tiff/3.9.4\n", " dropcache/master octave/4.4.1 (D) tophat/2.1.1\n", " eigan/3.3.2 openblas/0.2.18_barcelona towhee/7.2.0\n", " emboss/6.6.0 openblas/0.2.18_haswell trimmomatic/0.38\n", " exonerate/2.2 openblas/0.2.18 trinity/2.2.0\n", " exonerate/2.4 (D) openblas/0.3.0 (D) trinity/2.8.4 (D)\n", " fastqc/0.11.8 orangefs/2.9.6 underworld/1.0\n", " fastx_toolkit/0.0.14 parallel/3.22 underworld2/2.5.1\n", " freebayes/1.1.0 parallel/2018.10.22 (D) underworld2/2.6.0dev (D)\n", " freebayes/1.2.0 (D) parflow/3.2.0 valgrind/3.11.0\n", " freetype/2.7.1 parmetis/4.0.3 vcflib/1.0.0-rc2\n", " freexl/1.0.2 paxutils/2.3 vcftools/0.1.16\n", " gatk/3.8.0 perl/5.24.1 (D) vmd/1.9.3\n", " gdal/2.0.0 perl/5.28.0 workspace_maker/master (L,D)\n", " gdal/2.1.0 pexsi/0.9.2 workspace_maker/1.1b\n", " gdal/2.3.1 (D) phenix/1.13 workspace_maker/1.1\n", " gdb/7.10.1 picard/2.18.6 workspace_maker/1.2\n", " geos/3.5.0 proj/4.9.2 wrf/3.9.1\n", " geos/3.6.2 (D) proj/5.1.0 (D) zlib/1.2.11\n", "\n", "------------------------------------- Licensed Software --------------------------------------\n", " amber/16 clc_genomics_workbench/8.5.1 (D) green/1.0\n", " buster/17.1 dl_polly/4.08 stata/14\n", " clc_genomics_workbench/6.0.1 gaussian/09.d.01 vasp/5.4.4\n", "\n", " Where:\n", " L: Module is loaded\n", " D: Default Module\n", "\n", "Use \"module spider\" to find all possible modules.\n", "Use \"module keyword key1 key2 ...\" to search for all possible modules matching any of the\n", "\"keys\".\n", "```\n", "\n", "You can alo use `module spider` for searching. For example, to find all the modules related to conda you could run:\n", "\n", "```bash\n", "[cn112] $ module -r spider \".*conda.*\"\n", "\n", "\n", "----------------------------------------------------------------------------\n", " anaconda2:\n", "----------------------------------------------------------------------------\n", " Description:\n", " Anaconda is a freemium distribution of the Python programming\n", " language for large-scale data processing, predictive analytics, and\n", " scientific computing.\n", "\n", " Versions:\n", " anaconda2/2.4.0\n", " anaconda2/4.2.0\n", "\n", "----------------------------------------------------------------------------\n", " For detailed information about a specific \"anaconda2\" module (including how to load the modules) use the module's full name.\n", " For example:\n", "\n", " $ module spider anaconda2/4.2.0\n", "----------------------------------------------------------------------------\n", "\n", "----------------------------------------------------------------------------\n", " anaconda3:\n", "----------------------------------------------------------------------------\n", " Description:\n", " Anaconda is a distribution of the Python programming language that\n", " includes the Python interpeter, as well as Conda which is a package\n", " and virtual environment manager, and a large collection of Python\n", " scientific packages. Anaconda3 uses python3, which it also calls\n", " python. Anaconda Navigator contains Jupyter Notebook and the Spyder\n", " IDE.\n", "\n", " Versions:\n", " anaconda3/2.4.0\n", " anaconda3/4.2.0\n", " anaconda3/5.1.0\n", "\n", "----------------------------------------------------------------------------\n", " For detailed information about a specific \"anaconda3\" module (including how to load the modules) use the module's full name.\n", " For example:\n", "\n", " $ module spider anaconda3/5.1.0\n", "----------------------------------------------------------------------------\n", "\n", "----------------------------------------------------------------------------\n", " conda: conda\n", "----------------------------------------------------------------------------\n", " Description:\n", " Michael Forbes custom Conda environment.\n", "\n", "\n", " This module can be loaded directly: module load conda\n", "\n", "\n", "----------------------------------------------------------------------------\n", " miniconda3: miniconda3/3.6\n", "----------------------------------------------------------------------------\n", " Description:\n", " Miniconda is a distribution of the Python programming language that\n", " includes the Python interpeter, as well as Conda which is a package\n", " and virtual environment manager. Miniconda3 uses python3, which it\n", " also calls python.\n", "\n", "\n", " You will need to load all module(s) on any one of the lines below before the \"miniconda3/3.6\" module is available to load.\n", "\n", " gcc/4.9.3\n", " gcc/5.2.0\n", " gcc/6.1.0\n", " gcc/7.3.0\n", " intel/16.2\n", " intel/16.3\n", " intel/xe_2016_update2\n", " intel/xe_2016_update3\n", " \n", " Help:\n", " For further information, see:\n", " https://conda.io/miniconda.html\n", " \n", " To create a local environment using the conda package manager:\n", " conda create -n myenv\n", " To use the local environment:\n", " source activate myenv\n", " To install packages into your local environment:\n", " conda install somePackage\n", " To install packages via pip:\n", " conda install pip\n", " pip install somePackage\n", " When installing, the \"Failed to create lock\" message can be ignored.\n", " \n", " Miniconda3 uses python3, which it also calls python.\n", " To use a different version for the name python:\n", " conda install python=2\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To inspect the actual module file (for example, if you would like to make your own based on this) you can use the `module show` command:\n", "\n", "```bash\n", "$ module show anaconda3\n", "------------------------------------------------------\n", " /opt/apps/modulefiles/Other/anaconda3/5.1.0.lua:\n", "------------------------------------------------------\n", "whatis(\"Description: Anaconda is a distribution of the Python programming language...\")\n", "help([[For further information...]])\n", "family(\"conda\")\n", "family(\"python2\")\n", "family(\"python3\")\n", "prepend_path(\"PATH\",\"/opt/apps/anaconda3/5.1.0/bin\")\n", "prepend_path(\"LD_LIBRARY_PATH\",\"/opt/apps/anaconda3/5.1.0/lib\")\n", "prepend_path(\"LIBRARY_PATH\",\"/opt/apps/anaconda3/5.1.0/lib\")\n", "prepend_path(\"CPATH\",\"/opt/apps/anaconda3/5.1.0/include\")\n", "prepend_path(\"MANPATH\",\"/opt/apps/anaconda3/5.1.0/share/man\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Running Jobs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before you consider running a job, you need to profile your code to determine the following:\n", "\n", "* How many nodes and how many cores-per-node do you need?\n", "* How much memory do you need per node?\n", "* How long will your program run?\n", "* What modules do you need to load to run your code?\n", "* What packages need to be installed to run your code?\n", "\n", "Once you have this information, make sure that your code is committed to a repository, then clone this repository to Kamiak. Whenever you perform a serious calculation, you should make sure you are running from a clean checkout of a repository with a well-defined set of libraries installed so that your runs are reproducible. This information should be stored along side your data so that you know exactly what version of your code produced the data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are my recommended steps. \n", "\n", "0. Run an interactive session.\n", "1. Log in directly to node so agent get forwarded.\n", "2. Checkout your code into a repository.\n", " \n", " ```bash\n", " mkdir ~/repositories\n", " cd repositories\n", " hg clone ...\n", " ```\n", "3. Link your run folder to `~/now`.\n", "4. Make a SLURM file in `~/runs`.\n", "\n", "```bash\n", "#!/bin/bash\n", "#SBATCH --partition=kamiak ### Partition (like a queue in PBS)\n", "#SBATCH --job-name=HiWorld ### Job Name\n", "#SBATCH --output=Hi.out ### File in which to store job output\n", "#SBATCH --error=Hi.err ### File in which to store job error messages\n", "#SBATCH --time=0-00:01:00 ### Wall clock time limit in Days-HH:MM:SS\n", "#SBATCH --nodes=1 ### Node count required for the job\n", "#SBATCH --ntasks-per-node=1 ### Nuber of tasks to be launched per Node\n", "./hello\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Issues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interactive Jobs do not ForwardAgent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Jupyter Notebook: Tunnel not working" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some reason, trying to tunnel to compute nodes is failing. It might be administrative settings disallow TCP through tunnels, or it might be something with the multi-hop." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mercurial and Conda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I tried the usual approach of putting mercurial in the conda `base` environment, but when running conda, mercurial cannot be found. Instead, one needs to load the mercurial module. I need to see if this will work with with `mmfhg`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Permissions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Building and Installing Software" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following describes how I have built and installed various pieces of software. You\n", "should not do this - just use the software as described above. However, this information\n", "may be useful if you need to install your own software.\n", "\n", "```bash\n", "#mkdir -p /data/lab/forbes # Provided by system.\n", "ln -s /data/lab/forbes ~/data\n", "mkdir -p ~/data/modules\n", "ln -s ~/data/modules ~/.modules\n", "mkdir -p ~/data/bashrc.d\n", "\n", "cat > ~/data/bashrc.d/inputrc < ~/data/bashrc.d/bash_alias < /data/lab/forbes/apps/conda/.condarc < ~/.modules/conda.lua < ~/.modules/mr.lua <> \"${data}/bashrc.d/bash_alias\" < WARNING: A newer version of conda exists. <==\n", " current version: 4.5.12\n", " latest version: 4.6.14\n", "\n", "Please update conda by running\n", "\n", " $ conda update -n base conda\n", "\n", "\n", "\n", "## Package Plan ##\n", "\n", " environment location: /home/m.forbes/.conda/envs/mmf_stack\n", "\n", "\n", "Proceed ([y]/n)? y\n", "\n", "Preparing transaction: done\n", "Verifying transaction: done\n", "Executing transaction: done\n", "#\n", "# To activate this environment, use:\n", "# > source activate mmf_stack\n", "#\n", "# To deactivate an active environment, use:\n", "# > source deactivate\n", "#\n", "$ . /opt/apps/anaconda3/5.1.0/etc/profile.d/conda.sh # Source since module does not install anaconda properly.\n", "$ conda activate mmf_stack\n", "(mmf_stack) $ conda install -c conda-forge uncertainties\n", "Solving environment: done\n", "\n", "\n", "==> WARNING: A newer version of conda exists. <==\n", " current version: 4.5.12\n", " latest version: 4.6.14\n", "\n", "Please update conda by running\n", "\n", " $ conda update -n base conda\n", "\n", "\n", "\n", "## Package Plan ##\n", "\n", " environment location: /home/m.forbes/.conda/envs/mmf_stack\n", "\n", " added / updated specs: \n", " - uncertainties\n", "\n", "\n", "The following packages will be downloaded:\n", "\n", " package | build\n", " ---------------------------|-----------------\n", " libblas-3.8.0 | 8_openblas 6 KB conda-forge\n", " tk-8.6.9 | h84994c4_1001 3.2 MB conda-forge\n", " wheel-0.33.1 | py37_0 34 KB conda-forge\n", " liblapack-3.8.0 | 8_openblas 6 KB conda-forge\n", " setuptools-41.0.1 | py37_0 616 KB conda-forge\n", " uncertainties-3.0.3 | py37_1000 116 KB conda-forge\n", " libffi-3.2.1 | he1b5a44_1006 46 KB conda-forge\n", " bzip2-1.0.6 | h14c3975_1002 415 KB conda-forge\n", " numpy-1.16.3 | py37he5ce36f_0 4.3 MB conda-forge\n", " zlib-1.2.11 | h14c3975_1004 101 KB conda-forge\n", " pip-19.1 | py37_0 1.8 MB conda-forge\n", " openblas-0.3.6 | h6e990d7_1 15.8 MB conda-forge\n", " xz-5.2.4 | h14c3975_1001 366 KB conda-forge\n", " sqlite-3.26.0 | h67949de_1001 1.9 MB conda-forge\n", " openssl-1.1.1b | h14c3975_1 4.0 MB conda-forge\n", " certifi-2019.3.9 | py37_0 149 KB conda-forge\n", " libcblas-3.8.0 | 8_openblas 6 KB conda-forge\n", " readline-7.0 | hf8c457e_1001 391 KB conda-forge\n", " ncurses-6.1 | hf484d3e_1002 1.3 MB conda-forge\n", " python-3.7.3 | h5b0a415_0 35.7 MB conda-forge\n", " ------------------------------------------------------------\n", " Total: 70.2 MB\n", "\n", "The following NEW packages will be INSTALLED:\n", "\n", " bzip2: 1.0.6-h14c3975_1002 conda-forge\n", " ca-certificates: 2019.3.9-hecc5488_0 conda-forge\n", " certifi: 2019.3.9-py37_0 conda-forge\n", " libblas: 3.8.0-8_openblas conda-forge\n", " libcblas: 3.8.0-8_openblas conda-forge\n", " libffi: 3.2.1-he1b5a44_1006 conda-forge\n", " libgcc-ng: 8.2.0-hdf63c60_1 \n", " libgfortran-ng: 7.3.0-hdf63c60_0 \n", " liblapack: 3.8.0-8_openblas conda-forge\n", " libstdcxx-ng: 8.2.0-hdf63c60_1 \n", " ncurses: 6.1-hf484d3e_1002 conda-forge\n", " numpy: 1.16.3-py37he5ce36f_0 conda-forge\n", " openblas: 0.3.6-h6e990d7_1 conda-forge\n", " openssl: 1.1.1b-h14c3975_1 conda-forge\n", " pip: 19.1-py37_0 conda-forge\n", " python: 3.7.3-h5b0a415_0 conda-forge\n", " readline: 7.0-hf8c457e_1001 conda-forge\n", " setuptools: 41.0.1-py37_0 conda-forge\n", " sqlite: 3.26.0-h67949de_1001 conda-forge\n", " tk: 8.6.9-h84994c4_1001 conda-forge\n", " uncertainties: 3.0.3-py37_1000 conda-forge\n", " wheel: 0.33.1-py37_0 conda-forge\n", " xz: 5.2.4-h14c3975_1001 conda-forge\n", " zlib: 1.2.11-h14c3975_1004 conda-forge\n", "\n", "Proceed ([y]/n)? \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Presumably people can update software \n", "2. \n", "* Currently it seems I need to use my own conda (until anaconda 4.4.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Programming\n", "### How to profile simple GPU code?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```bash\n", "$ module load conda\n", "$ hg clone ssh://hg@bitbucket.org/mforbes/cugpe ~/work/mmfbb/cugpe\n", "$ cd current\n", "$ ln -s ~/work/mmfbb/cugpe cugpe\n", "$ cd cugpe\n", "$ module load cuda\n", "$ conda env update -f environment.cugpe.yml -p /data/lab/forbes/apps/conda/envs/cugpe\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Investigations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we include some experiments run on Kamiak to see how long various things take. These results may change as the system undergoes transformations, so this information may be out of date." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we investigate the timing of creating some conda environments using the user's home directory vs `/scratch`, vs `/local`:\n", "\n", "### Home\n", "```bash\n", "$ time conda create -y -n mmf0 python=3 # Includes downloading packages\n", "real\t3m32.787s\n", "$ time conda create -y -n mmf1 python=3 # Using downloaded packages\n", "real\t1m0.429s\n", "$ time conda create -y -n mmf1c --clone mmf0\n", "real\t0m56.507s\n", "```\n", "\n", "```bash\n", "$ du -sh ~/.conda/envs/*\n", "182M /home/m.forbes/.conda/mmf0\n", "59M /home/m.forbes/.conda/mmf1\n", "59M /home/m.forbes/.conda/mmf1c\n", "$ du -shl ~/.conda/envs/*\n", "182M /home/m.forbes/.conda/mmf0\n", "182M /home/m.forbes/.conda/mmf1\n", "182M /home/m.forbes/.conda/mmf1c\n", "$ du -sh ~/.conda/pkgs/\n", "341M /home/m.forbes/.conda/pkgs/\n", "```\n", "\n", "From this we see that there is some space saving from the use of hard-links. Note that the packages also take up quite a bit of space.\n", "\n", "```bash\n", "$ time rm -r envs pkgs/\n", "real\t1m2.734s\n", "```\n", "\n", "### Scratch\n", "\n", "```bash\n", "mkworkspace -n m.forbes_conda\n", "mkdir /scratch/m.forbes_conda/envs\n", "mkdir /scratch/m.forbes_conda/pkgs\n", "ln -s /scratch/m.forbes_conda/envs ~/.conda/\n", "ln -s /scratch/m.forbes_conda/pkgs ~/.conda/\n", "```\n", "\n", "```bash\n", "$ time conda create -y -n mmf0 python=3 # Includes downloading packages\n", "real\t2m16.052s\n", "$ time conda create -y -n mmf1 python=3 # Using downloaded packages\n", "real\t0m35.337s\n", "$ time conda create -y -n mmf1c --clone mmf0\n", "real\t0m27.982s\n", "```\n", "\n", "```bash\n", "$ time rm -r /scratch/m.forbes_conda/envs /scratch/m.forbes_conda/pkgs/\n", "real\t0m45.193s\n", "```\n", "\n", "### Local\n", "\n", "```bash\n", "mkworkspace -n m.forbes_conda --backend=/local\n", "mkdir /local/m.forbes_conda/envs\n", "mkdir /local/m.forbes_conda/pkgs\n", "ln -s /local/m.forbes_conda/envs ~/.conda/\n", "ln -s /local/m.forbes_conda/pkgs ~/.conda/\n", "```\n", "\n", "```bash\n", "$ time conda create -y -n mmf0 python=3 # Includes downloading packages\n", "real\t0m45.948s\n", "$ time conda create -y -n mmf1 python=3 # Using downloaded packages\n", "real\t0m10.670s\n", "$ time conda create -y -n mmf1c --clone mmf0\n", "real\t1m42.742s\n", "```\n", "\n", "```bash\n", "$ time rm -r /local/m.forbes_conda/envs/ /local/m.forbes_conda/pkgs/\n", "real\t0m0.387s\n", "```\n", "\n", "### Home/Local\n", "\n", "```bash\n", "mkworkspace -n m.forbes_conda --backend=/local\n", "mkdir /local/scratch/m.forbes_conda/pkgs\n", "ln -s /local/scratch/m.forbes_conda/pkgs ~/.conda/\n", "```\n", "\n", "```bash\n", "$ time conda create -y -n mmf0 python=3 # Includes downloading packages\n", "real\t1m58.410s\n", "$ time conda create -y -n mmf1 python=3 # Using downloaded packages\n", "real\t1m41.889s\n", "real\t1m39.003s\n", "$ time conda create -y -n mmf1c --clone mmf0\n", "real\t1m42.742s\n", "```\n", "\n", "```bash\n", "$ time rm -r /local/m.forbes_conda/envs/ /local/m.forbes_conda/pkgs/\n", "real\t0m0.387s\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Local -> Home\n", "\n", "```bash\n", "$ my_workspace=\"$(mkworkspace -n m.forbes_conda --backend=/local --quiet)\"\n", "$ export CONDA_PKGS_DIRS=\"${my_workspace}/pkgs\"\n", "$ conda_prefix=\"${my_workspace}/current_conda_env\"\n", "$ time conda create -y --prefix \"${conda_prefix}\" python=3\n", "real\t0m16.295s\n", "$ time conda create -y --prefix ~/clone_env --clone \"${conda_prefix}\"\n", "real\t0m49.573s\n", "$ time conda create -y --prefix ~/clone_env2 python=3\n", "real\t0m44.628s\n", "```\n", "\n", "```bash\n", "$ my_workspace=\"$(mkworkspace -n m.forbes_conda --backend=/local --quiet)\"\n", "$ export CONDA_PKGS_DIRS=\"${my_workspace}/pkgs\"\n", "$ conda_prefix=\"${my_workspace}/current_conda_env\"\n", "$ time conda env create --prefix \"${conda_prefix}\" mforbes/work\n", "real\t0m16.295s\n", "$ time conda create -y --prefix ~/clone_env_work --clone \"${conda_prefix}\"\n", "$ time conda env create --prefix ~/clone_env_work2 mforbes/work\n", "real\t14m21.985s\n", "$ time conda create -y --prefix ~/clone_env --clone \"${conda_prefix}\"\n", "\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "jupytext": { "formats": "ipynb,md:myst" }, "kernelspec": { "display_name": "Python 2 (Ubuntu, plain)", "language": "python", "name": "python2-ubuntu" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" }, "nav_menu": {}, "nikola": { "date": "2016-10-21 12:50:27 UTC-07:00", "description": "", "link": "", "slug": "kamiak", "tags": "", "title": "Kamiak", "type": "text" }, "toc": { "base_numbering": 1, "nav_menu": { "height": "138px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "236px" }, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }