{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This page lists projects which might be suitable for undergraduate research.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# From Symbols to Numbers" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Often we have a set of equations that need to be implemented numerically. To quickly get code working, while ensuring that no mistakes are made, it is often convenient to start with the symbolic expressions and then generate code that performs the numerical computation. This is especially useful when working with minimizers or solvers that can utilize derivative information.\n", "\n", "However, when nested deeply, or highly differentiated, even relatively simple expression can become unweildy in their final form. The goal of this project is to provide an efficient and effective set of tools for getting from symbolic expressions to the robust code required to evaluate them numerically." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use Cases" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Lagrangian Dynamics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider the problem of setting up the Euler-Lagrange equations for a triple pendulum. This generates pretty nasty expressions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Neutron Stars" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a use case, we have a project for modelling the physics of neutron stars. Here the star is described by a nuclear equation-of-state (EoS) defined by the function $\\mathcal{E}(n_n, n_p)$ which is the energy-density (energy per unit volume) of homogeneous nuclear matter as a function of neutron $n_n$ and proton $n_p$ densities. This expression on its own is rather large as it results from inverting a system of 5 linear equations, but then must be inserted into a further set of equations defining the so-called compressible liquid drop model (CLDM) for nuclei which approximates the construction of a nuclear lattice in the crust of the neutron star by minimizing the total energy including electromagnetic interaction and surface effects. Inserting $\\mathcal{E}(n_n, n_p)$ into these equations and taking derivatives as required to define the minimization conditions produces huge expressions that take hours to even simplify.\n", "\n", "We practically solved this problem by treating $\\mathcal{E}(n_n, n_p)$ as a numerical black box, and decoupling the form of these expressions from the CLDM model. This is not ideal, however, as it required defining an appropriate interface for calling $\\mathcal{E}(n_n, n_p)$ which was additional programming work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Existing Tools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most of our code is in python, so we would like to start there. I am aware of the following tools which might be useful:\n", "\n", "* [SymPy](http://www.sympy.org/): This is the \"standard\" symbolic manipulation package for python. It has quite a few tools, and works reasonably well. Here are some notes, including serious limitations:\n", " * **Simplification:** Once large expressions are generated, the simplification tools are very slow. In particular, the default approach `expr.simplify()` tries many different strategies, then measures the outcomes. Some of these strategies are extremely expensive, so this can literally take hours. Unfortunately, no record is kept about which strategies worked and which did not, so one cannot easily issolate the best approach. One small community project might be to look at the `sympy` code and at least keep track of which simplification strategy worked the best so the user can call that directly in the future.\n", " * **Numerical Computation:** Once an expression is found, fairly efficient numerical implementations can be produced using [``sympy.lambdify``](http://docs.sympy.org/latest/modules/utilities/lambdify.html). Performance characteristics of various numerical approaches are [discussed here](http://docs.sympy.org/latest/modules/numeric-computation.html).\n", " * **Caching:** Once work is done simplifying and generating numerical functions, it would be great to cache these to disk so they can be reused without future expensive computations. Unfortunately, general expressions and lambdified expressions in particular have no good way to be archived. Expressions can often be converted to strings, but these take some time to process when loading, which can be a performance issue. Caching the lamdified functions is thus preferred from a performance standpoint, but runs into many issues:\n", " * [How to serialize sympy lambdified function?](http://stackoverflow.com/questions/31314517/how-to-serialize-sympy-lambdified-function)\n", " * [Cannot pickle lambdas from SymPy](https://github.com/uqfoundation/dill/issues/104)\n", " * [Python - write symbolic expression (sympy) to txt file](http://stackoverflow.com/questions/17517651/python-write-symbolic-expression-sympy-to-txt-file)\n", " * [Cannot pickle sympy.UndefinedFunction](https://github.com/cloudpipe/cloudpickle/issues/65)\n", " \n", " Many of these issues have been partially resolved using appropriate settings of advanced pickling libraries such as [dill](https://pypi.python.org/pypi/dill) or [cloudpickle](https://github.com/cloudpipe/cloudpickle), but they demonstrate that this approach can be brittle. Another issue is that pickles can \"go sour\" meaning that if you upgrade your code, or even the underlying python interpreter, the pickles might not be able to be loaded.\n", " \n", " My preferred approach is to use human-readable (and writable) representations and have had reasonable success using [`sympy.printing.lambdarepr.NumPyPrinter`](http://docs.sympy.org/latest/modules/printing.html?highlight=lambdarepr#module-sympy.printing.lambdarepr) to print an importable module with the following structure:\n", "\n", " ```python\n", "from __future__ import division\n", "import math\n", "import numpy\n", "import uncertainties.core\n", "import uncertainties.unumpy\n", "import sympy\n", " # Comment to keep nice formatting in \n", " # notebook... (not sure why needed)\n", "class Mixin(object):\n", " _names = {names}\n", "\n", " @staticmethod\n", " def {name}(*v):\n", " MutableDenseMatrix = numpy.array\n", "\n", " if any([isinstance(_v, sympy.Expr) for _v in v]):\n", " np = sympy\n", " MutableDenseMatrix = sympy.MutableDenseMatrix\n", " elif any([isinstance(_v, uncertainties.core.AffineScalarFunc) for _v in v]):\n", " np = uncertainties.unumpy\n", " elif any([isinstance(_v, numpy.ndarray) for _v in v]):\n", " np = numpy\n", " else:\n", " np = math\n", "\n", " exp, log, pi = np.exp, np.log, numpy.pi\n", " {args} = v\n", " return {expr}\n", "```\n", " \n", " This provides a human friendly form of importable code with minimal performance hits. It also allows the functions to be called symbolically, with numpy, or with numbers. However, it is a bit of a hack and requires careful attention to where these \"cache\" files are stored.\n", " \n", "* [Theano](http://deeplearning.net/software/theano/): This project is designed to produce optimized code for symbolic expressions using the CPU, multiple threads, and GPU accelleration if possible. It works, supports CSE, and is quite fast, but unfortunately, the expression manipulation is all done internally and the results are not easily extracted. It is also a somewhat large package and can be difficult to install (as opposed to sympy which is pure python). I also find Theano to be less intuitive to use than SymPy.\n", "\n", " In 2013 people looked at translation between SymPy and Theano. Here are some discussions about using both SymPy and Theano. This does not seem to be currently an active topic:\n", " \n", " * [Theano + Sympy for system of ODEs](https://groups.google.com/forum/#!topic/sympy/VtaxCRNO4sE/discussion)\n", " * [theano_sympy](https://github.com/nouiz/theano_sympy) github project.\n", " * [Matthew Rocklin's blog]](http://matthewrocklin.com/blog/) about [Sympy and Theano](http://matthewrocklin.com/blog/tags.html#SymPy-ref) (2013)\n", " * [Code Generation](http://matthewrocklin.com/blog/work/2013/03/19/SymPy-Theano-part-1)\n", " * [Scalar Simplification](http://matthewrocklin.com/blog/work/2013/03/28/SymPy-Theano-part-2)\n", " * [Matrix Expressions](http://matthewrocklin.com/blog/work/2013/04/05/SymPy-Theano-part-3)\n", " * [Using SymPy within Theano](http://matthewrocklin.com/blog/work/2013/08/14/SymPy-Theano-part-4)\n", " \n", "The crux of this project thus seems to be about combining an efficient symbolic computation path for specifying non-linear equations with a CSE aware simplification process for computing derivatives etc. and finally generating somewhat optimized numerical code. (Using [Numba](http://numba.pydata.org) would be great too.)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Numerical Differentiation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In many applications, one needs to compute derivatives numerically. In principle this is a straightforward application of the formula:\n", "\n", "$$\n", " f'(x) = \\frac{f(x + h) - f(x - h)}{2h} + h^2\\frac{f'''(x)}{6} + \\order(h^4).\n", "$$\n", "\n", "However, in practice, one must trade truncation errors (minimized by making $h$ as small as possible) with roundoff errors amplified in the subtraction. To be explicit, assume that $f(x)$ can be computed to a relative precision of $\\epsilon$. The roundoff error in this expression is thus of the order:\n", "\n", "$$\n", " \\text{roundoff error (relative)} \\sim \\frac{\\epsilon}{\\sqrt{2}h}.\n", "$$\n", "\n", "Balancing these we obtain an optional value of:\n", "\n", "$$\n", " h_c \\approx \\sqrt[3]{\\frac{6\\epsilon f(x)}{\\sqrt{2}f'''(x)}} \\approx 1.6x\\sqrt[3]{\\epsilon}\n", "$$\n", "\n", "assuming that $f'''(x)/f(x) \\sim x^{-3}$. Even with this optimal step size, the relative error will be of order $\\epsilon^{2/3}/2x$ which for double precision reduces the precision from 16 digits to 10. Here we demonstrate these fomulae these for the function $\\sin'(x) = \\cos(x)$ at $x=1$:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n", "Max rel. err. in denom: 0.00256821005781\n" ] }, { "data": { "text/plain": [ "