Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
| Download
Project: Math 152 - Winter 2017
Views: 17363Kernel: SageMath (stable)
Math 152 - Intro to Mathematical Software
2017-03-13
Kiran Kedlaya; University of California, San Diego
adapted from lectures by William Stein, University of Washington
Guest lecturer (and worksheet author): Kartik Venkatram
** Lecture 26: Public key cryptography and number theory, part 1 **
Announcements:
Kiran will be monitoring the chat room this hour.
No sections today.
Additional guest lecture Wednesday: Alina Bucur
Office hours this week:
Zonglin: Thursday, March 16, 3:30-5:30 in APM 5748.
Kiran: Friday, March 17, 11-12 in APM 7202.
Peter: Friday, March 17, 12-1 and 3-4 in APM 6132.
Homework due Friday, 8pm. Peer grading due Sunday, 8pm.
No final exam!
Please do your course evaluation (CAPE)! Evaluations close Monday, March 20 at 8am.
In [0]:
%latex \section*{Introduction} In this lecture, we're going to talk about \emph{public key cryptography}, one of the most important applications of number theory in the modern world. Public key cryptography is the basis for essentially all internet security: whenever you see the little lock symbol in your browser, that means your communication with that website is being encrypted for security. There are three major public key techniques in use today: \begin{itemize} \item RSA (Rivest-Shamir-Adleman), based on the difficulty of factoring large numbers; \item DH (Diffie-Hellman), based on the difficulty of computing \emph{discrete logarithms} in the ring of integers modulo a large prime $p$; and \item ECDH (elliptic curve Diffie-Hellman), based on the difficulty of computing discrete logarithms in the group of point on an \emph{elliptic curve} \end{itemize} While these can all be used as honest cryptosystems (i.e. protocols for exchanging messages securely), they are all are typically used for \emph{secure key exchange}. Say Alice and Bob want to communicate with each other. \begin{enumerate} \item First, they use the key-exchange protocol to construct a \emph{shared secret key} $K$. \item Then they use \emph{symmetric key cryptography} (e.g. AES) to actually communicate. \end{enumerate} This talk will focus on constructing (and breaking) Diffie-Hellman key exchange. \section*{Discrete Logarithms} Last week, you looked at integers modulo a number $n$. If $n$ is prime, these integers form a \emph{field}, meaning you can add, subtract, multiply, and divide them without anything ever breaking. Let's try this out.
In [0]:
# Let's make the field of integers modulo 23. For historical reasons, it is called a "Galois field" p = 23 F = GF(p) print F print "13+17 = ", F(13)+F(17) print "13*17 = ", F(13)*F(17)
In [0]:
%latex How else can you get these values? Remember that \% mean "modulo" in python.
In [0]:
print ?, ?
In [0]:
%latex \noindent Now, the ordinary logarithm function is the inverse of exponentiation, i.e. $a^b = c \Leftrightarrow b = \log_a(c)$. The discrete logarithm has the exact same property, except in finite fields.
In [0]:
a = F(13) b = 5 c = a^b print a, "^", b, "=", c print "log_", a, c, "=", c.log(a)
In [0]:
%latex \noindent Recall that, if $m = \phi(n)$, where $\phi$ is the \emph{Euler phi function}, then $a^m \mod n = 1$ for any $a$. This means that there are at most $m$ distinct powers of any integer modulo $n$. If $n$ is prime, $\phi(n)$ is just $n-1$.
In [0]:
m = euler_phi(p) print "phi(p) =", m print "powers of 5:", [F(5)^a for a in range(2*m)] print "powers of 2:", [F(2)^a for a in range(2*m)]
In [0]:
%latex \noindent Observe that the various powers of $2$ only get to half the numbers between $0$ and $p$, but the various powers of $5$ reach all those values. Indeed, the powers of $2$ are just every other power of $5$, since $5^2 \mod p = 2$. We call $5$ a (multiplicative) \emph{generator} of the field. The main difference between the ordinary logarithm and the discrete logarithm is that, while the former is always easy to calculate, the latter is much more difficult. Let's try doing a discrete logarithm on a much larger field.
In [0]:
e = randint(1, 11) c = F(2)^e %timeit c.log(F(2)) p = 3408871799 F2 = GF(p) e = randint(1, (p-1)/2) c = F2(2)^e %timeit c.log(F2(2))
In [0]:
%latex \noindent Though this doesn't look so bad, it is still 1000x worse...practical implementations of Diffie-Hellman use 1000 bit primes (or larger), for which discrete logarithms are believed to be extremely difficult. \section*{Diffie-Hellman} Next, let's use our machinery to construct a Diffie-Hellman key exchange protocol: here are the steps. \begin{enumerate} \item Alice and Bob (publicly) agree on a large prime $p$, modulo which all computations will take place, and a generator $g$ of $GF(p)$. \item Alice chooses a secret, random integer $a$ between $0$ and $p-1$, while Bob chooses a secret random integer $b$ in the same range. \item Alice (publicly) transmits $A=g^a\mod p$ to Bob, while Bob (publicly) transmits $B=g^b\mod p$ to Alice. \item Alice and Bob both compute $g^{ab}\mod p = B^a\mod p = A^b\mod p$, and use that value as their shared secret. \end{enumerate} Now, if Eve wants to find their shared secret, she has to be able to compute $g^{ab}$ only knowing $g$, $g^a$, and $g^b$. This is called the \emph{Diffie-Hellman problem}, and is believed to be as hard as computing discrete logarithms in general. You should have all the tools to make this yourself: please fill in the outline below.
In [0]:
# Public stuff p = random_prime(2^64) Fp = GF(p) g = Fp.multiplicative_generator() # Alice's private and public keys a = ? A = ? # Bob's private and public keys b = ? B = ? # Alice computes the shared secret K_alice = ? # Bob computes the shared secret K_bob = ? # Finally, check that they are the same K_alice == K_bob
In [0]:
%latex \noindent And that's it! Alice and Bob have a shared secret, which they can use to send messages. \section*{Attacking Diffie-Hellman} We're going to consider two attacks on Diffie-Hellman key exchange: the first is a \emph{brute force} attack on the system, while the second is an attack which works if $p$ is a \emph{weak Diffie-Hellman prime}. In both cases, we're actually going to attack the discrete logarithm problem directly. That is, given $g^a$ for some unknown $a$, find $a$ The simplest brute force attack, of course, is simply \emph{exhaustion}. That is, just try every value until we find a hit. Let's try this on a particularly small example.
In [0]:
p = previous_prime(2^20) Fp = GF(p) g = Fp.multiplicative_generator() # Alice's public key, based on an "unknown" random number A = g^(randint(2, p-1)) # Now let's find it def exhaust(g, A): for a in range(2, g.multiplicative_order()): if g^a == A: return a %time print A, g^exhaust(g,A)
In [0]:
%latex \noindent Try the above with a larger prime, say one with 20 bits. Hint: you can get a random n-bit prime by doing {\tt random\_prime($2^n$)}. How much harder is it? There are better brute-force attacks which use memory to reduce the amount of computation required. We will talk about one particular one, called \emph{baby-step giant-step}, which works as follows: \begin{enumerate} \item Compute $m=\lfloor\sqrt{p-1}\rfloor$ \item For $e=0,\ldots,m-1$, compute and store $g^e\mod p$ along with the index $e$. \item For $f=0,\ldots,m-1$, compute $A*g^{-fm}\mod p$, and check if it's in the list above. If it is, stop, and output the pair $(e,f)$. \end{enumerate} If the above produces an output, we know that $g^e \equiv g^{a-fm}\mod p$, i.e. $e \equiv a-fm\mod p-1$, i.e. $a \equiv e+fm\mod p-1$. Moreover, since we can always write $a<p-1$ as $e+fm$ for $e,f < m$, this algorithm should always output. Now, let's try it out.
In [0]:
p=4294967291 # a 32-bit prime Fp = GF(p) g = Fp.multiplicative_generator() # Construct random public key as before A = g^randint(0,p-1) def bsgs(g, A): # Step 1 m = floor(sqrt(g.multiplicative_order())) # Step 2 small_g_powers = {g^e:e for e in range(m)} # Step 3 gpow = A mult = g^(-m) for f in range(m): if gpow in small_g_powers: return small_g_powers[gpow]+f*m gpow *= mult %time print A, g^bsgs(g, A)
In [0]:
%latex \noindent Note that we were able to break a 32-bit problem faster than the exhaustive approach broke a 20-bit problem. Try going a little higher, say a 40-bit prime. Finally, let's look at one attack involving a potential weakness of the prime $p$, namely that $p-1$ has smaller factors. Assume $p-1 = q_1*q_2$ for $q_1, q_2$ relatively prime factors of size $\approx \sqrt{p}$. Since $x^{p-1}\mod p = 1$ for any $x$, $(x^{q_1})^{q_2}\mod p = 1$ for any $x$. That is, if we only look at elements with are $q_1$-powers of something, they have order $q_2$. An analogous thing happens if we switch $q_1$ and $q_2$. We use this to recover $a\mod q_1$ and $a\mod q_2$, from which we can use the Chinese Remainder Theorem (which you learned earlier) to recover $a$. \begin{enumerate} \item Find the discrete logarithm of $A^{q_1}$ with respect to $g^{q_1}$ using one of the algorithms above, i.e. $a_2$ such that $g^{a_2*q_1} = A^{q_1}$. This implies that $a_2*q_1 \equiv a*q_1\mod q-1$, i.e. $a_2\equiv a\mod q_2$. \item Find the discrete logarithm of $A^{q_2}$ with respect to $g^{q_2}$ using one of the algorithms above, i.e. $a_1$ such that $g^{a_1*q_2} = A^{q_2}$. This implies that $a_1*q_2 \equiv a*q_2\mod q-1$, i.e. $a_1\equiv a\mod q_1$. \item Compute $a = CRT(a_1, a_2, q_1, q_2)$. \end{enumerate}
In [0]:
# Let's find a particularly nice p of the appropriate form q_1 = previous_prime(2**32) for q_2 in range(2**32-1000, 2**32): p = q_1*q_2+1 if is_prime(p): break print "p = ", p print "p-1 = ", factor(p-1)
In [0]:
Fp = GF(p) g = Fp.multiplicative_generator() # Make Alice's public key as usual A = g^(randint(1,p-1)) def crt_attack(g, A, q_1, q_2): a_2 = bsgs(g^q_1, A^q_1) a_1 = bsgs(g^q_2, A^q_2) return crt(a_1, a_2, q_1, q_2) %time print A, g^crt_attack(g, A, q_1, q_2)
In [0]:
%latex \noindent Now, it turns out that the Chinese remainder theorem actually works with any number of relatively prime moduli, i.e. we can work with all the factors of $p-1$ independently. In particular, no matter how large $p$ is, if $p-1$ can be factored into small values, we can use this technique to quickly attack the discrete logarithm problem modulo $p$.
In [0]: