Draft Forbes Group Website (Build by Nikola). The official site is hosted at:
License: GPL3
ubuntu2004
Prerequisites
This post describes the prerequisites that I will generally assume you have if you want to work with me. It also contains a list of references where you can learn these prerequisites. Please let me know if you find any additional resources particularly useful so I can add them for the benefit of others. This list is by definition incomplete - you should regard it as a minimum.
Standards and Expectations
This section describes various coding standards, conventions, and expectations for your workflow. Some og these are just ideas I am developing: feedback is welcome.
Lab Notebook
Like an experimentalist, please keep a lab notebook where you describe each day what you are working on, and summarize at the end of the day what you have accomplished. This serves several purposes:
Quickly start working again: you have a summary of where you left off and what still needs to be done.
History of work: Sometimes you will remember that you solve a similar problem. This is where you can see what and when you solved this. (The "when" might help in identifying corrupt simulation data that was made before a bug was fixed for example.)
Introspection: by keeping track of what you do, you can see where you are effective and where you waste time. This can help you develop better work habits.
How you do this is your choice: I tend to keep a Notes.md
file in the root of each project where I record project specific information. The only problem with this approach is that I must search each project to get an overall history of what I have done. Suggestions on how to get the benefit of both a global and local lab notebook are welcome. Here is an example:
Coding
Style
Please following a coding style. Which guidelines you choose will depend on language, but stick with it, and enforce it in your editor and tests.
Testing
Make sure your code is well tested using a framework like pytest and run it frequently - ideally before all commits. Use code coverage tools such as coverage which integrates with pytest. Aim for 100% code coverage through automated tests (difficult to achieve though).
Get code working, write tests, then modify to make sure you don't break anything.
Separate fast tests from slow tests so you can run the basic things quickly and regularly. Run the slow tests, but don't let them slow you down.
Continuous integration (CI) is a great idea, but I have not implemented it yet, mostly due to issues with setup of Conda environments on the CI frameworks.
Version Control
Version control all of your work, and host it on a site like our Heptapod server (mercurial and private work), GitLab (has an educational program but WSU should do this as a whole: can do it on more finely grained divisions if needed), or github (good for public work).
Personally I recommend mercurial as It think it is much easier to use than git. If using our Heptapod server, then you should follow their workflow recommendations:
A couple of important points and limitations.
Named branches can have only one head. If you want to work on an independent branch, you will either need to merge or rebase that before pushing
Commit messages should start with a single line <~ 80 characters summarizing the commit, followed by details. E.g.:
I use the following acronyms to start messages – multiple acronyms can be combined such as ENH,API,TST
:
CHK: Checkpoint of my work. Commit often to make sure you do not lose anything, but do not push these to public repos. In mercurial you can do the following:
I usually just do this manually with explicit revision numbers. Now I can continue working, but have a backup of everything in the
CHK
revision just in case I need to revert something or check. I try to follow the NumPy conventions.WIP: Work in progress. Consider squashing these, but only if it makes sense.
STY: Spaces, lines, PEP8, etc. Cleanup that does not affect execution. (I sometimes use SPC but STY is consistent with numpy.
DOC: Update documentation. Please separate documentation updates from code updates. Update the code, then update the corresponding documentation. All notebook commits should be included here. Please strip out unneeded output before committing using
nbstripout
.ENH: Enhancement. New features etc.
API: The public application programming interface has been changed. These revisions might require users to change their code.
BUG: Demonstrate, work on, or fix a bug.
TST: Add or update testing code.
BLD: Update configuration or build scripts such as
setup.py
, requirements etc.
Administration, Shell, Etc.
Working with the Shell
You need to be comfortable using a linux shell (I generally use bash). Even if you do not use a linux computer, you will need to loging to other computers such as HPC clusters, where you will be presented with a shell. In particular, you should be able to do the following:
Running Programs
Run programs from the command line.
Know why one might need
./program
rather than justprogram
.Capture the output and send it to a file and send errors to
/dev/null
.Use
tee
.Run a program in the background, bring it to the foreground, kill it etc.
See what processes are running with
ps
andjobs
, and then be able to send signals to running programs usingkill
orkill -KILL
.Use GNU
screen
to run a program in a virtual terminal and reconnect if you are logged out of a server.Use
nohup
appropriately.
Environment
Inspect and set environment variables.
Manage your environment with startup files
.bashrc
,.profile
, etc. Know the difference between these. Have a good process for quickly getting started working on a new computer. (I store all my settings in my configurations project, so I can just clone this and run a commandmmf_initial_setup
to get going.)Know how to use the
module
command (mostly for HPC clusters).
Shell Tools
Use
find
,locate
,grep
etc. to look for information in files, or in the output of commands.
Networking
We often need to work on other computers, so you need to have some basic knowledge about how to connect to other computers on the internet. The standard and secure approach is to use SSH:
SSH
Use
ssh
to log into other computers.Use
ssh-keygen
to setup passwordless login. (Know about~/.ssh/authorized_keys
.)Use
ssh
to forward ports from one computer to another. For example, we might like to connect to ajupyter
notebook server running on one computer (a remote server) using a browser on our local machine (laptop). One can use port forwarding to do this securely without having to expose the server to the world (thereby allowing the server to be hacked).Use
scp
to copy files and directories securely from one machine to another.Use
rsync
in combination withssh
to do the same, but only sending changed files.
Many resoures are available on the world-wide web, so you should understand the following:
HTTP
Know what a URL is: i.e.
https://www.google.com:443
.Know what an IP address is (note that there is a new standard IPv6 that is starting to be used more frequently).
Know the difference between
http
andhttps
.Know what a proxy is and how or why we might need to use one.
Networking
Use
ping
to test if a server is up.Find out the IP and MAC address of the network devices your machine is using to connect to the internet.
References
Editing Files
Get to know how to use a powerful text editor. I recommend Emacs or Vi. Whatever editor you choose, make sure you know how to:
Efficiently cut and paste text.
Search for content.
Perform a search-and-replace with patterns (i.e. regexp).
Syntax highlighting, auto indent, electric parentheses.
Expand/hide sections of the file.
Define useful abbreviations and expand them quickly as needed.
Change the encoding of a file.
Run a programs like
pylint
,pyflakes
etc. to check your code.Run a spell checker.
Programming
Be familiar with Best Practices for Scientific Computing and know how to apply them to your work. (DRY, version control, testing, profiling, debugging, etc.)
Compiling
How to compile software. (Even if you don't write in C++ or Fortran, there will be times you need to use a library and you need to know how to build and install it.)
The difference between static and dynamic libraries, where they go, how to link with them etc.
How to use Makefiles.
Python
Know how to use Python. See the following:
Python Language Tutorial: This is the place to start. Great tutorial written by the language author. Some people find that this this great up to and including section 5, but gets harder beyond this unless you are familiar with concepts like classes from other languages. At this point, Python for Dummies can be helpful.
Python for Dummies and Python for Data Science for Dummies are useful if you do not have much exposure with data types, classes etc.
A Student’s Guide to Python for Physical Modeling: Ome found this to be a good introduction to numpy arrays etc. and a useful place to start learning python for physics.
Debugging
When things go wrong, you need to know how to figure out where the problem lies. You should be able to:
Print or inspect various quantities at points in your code.
Instrument your code with debugging symbols (for compiled code).
Use a debugger to interactively inspect your code.
Use a debugger to determine where a program crashed based on a core dump.
Testing
Unit test your code.
Test your tests for code coverage.
Profiling and Optimization
Profile your code using a profiler.
Optimize the slow spots of your code based on the output of the profiler.
Version Control
Documentation
Documenting your code and your work is essential. I recommend you develop a strict and regular strategy for documenting your progress. You should establish and regularly record your progress in the equivalent of an experimentalist's laboratory notebook.
LaTeX
How to use. (Including good editing environment.)
Install an up-to-date version of TeXLive.
Markup Languages (for wikis, notes, etc.)
ReStructuredText
Markdown
Physics
Here are some resources that students have found useful for learning various topics.
Quantum Fluids
A primer on quantum fluids: A gentle and practical introduction, but somewhat short on details.
Lagrangian (co-moving) and Eulerian formulation of fluids.
Chiral Perturbation Theory and Effective Field Theory
Chiral Effective Field Theory and Nuclear Forces: Good survey of the current state of affairs (as of 2011) with a very nice appendix with details about the expansion etc. Start here.
A Primer for Chiral Perturbation Theory: This provides some additional foundational details and is a good supplement to the previous review paper.