Eric Rafaloff

Shamir's Secret Sharing Scheme

Consider a scenario in which you are tasked with managing the security of a bank's vault. The vault is considered impenetrable without a key, which you are given on your first day on the job. Your goal is to securely manage the key.

Let us suppose that you decide to keep the key on you at all times, granting access to the vault on an as-needed basis. You quickly learn that such a solution doesn't scale well in practice, as your physical presence is required whenever the vault needs to be opened. What about those vacation days you were promised? Also, perhaps more frighteningly, what if you lost the one and only key?

With your vacation days in mind, you decide to make a copy of the key and entrust it with another employee. However, you realize that this isn't ideal either. In doubling the number of keys, you have also doubled the number of opportunities for a thief to steal the vault's key.

Out of desperation, you destroy the duplicate and decide to split the original key in half. Now, you think, two trusted people must be physically present with each key shard to reassemble the key and unlock the vault. This means that a thief would need to steal two key shards to reassemble the original key, which is double the amount of work of stealing one key. However, you soon realize that this scheme isn't much better than simply having one key, because if either of you were to lose a key half then the full key cannot be recovered anyway.

This problem can be solved with a series of additional keys and locks, but can quickly require a lot of keys and locks when approached this way. You decide that an ideal key management scheme must involve splitting up a key, so that the security of the key isn't fully entrusted with one person. You also conclude that some threshold of key shards should be required for key reassembly, so that one person losing their key shard (or going on vacation) doesn't render the entire key unrecoverable.

How to Share a Secret

This is the type of key management scheme that Adi Shamir was thinking about in 1979 when he published How to Share a Secret.

The paper concisely explains a so-called (k,n) threshold scheme that can be used to efficiently split up a secret value (e.g. a cryptographic key) into n shares. Then, if and only if at least k of any n shares are collected, it should be easy to recover the secret, S. An important security goal of this scheme is that an attacker should learn absolutely nothing if they don't have at least k shares. Even k1 shares should provide no information. We call this property semantic security.

Polynomial Interpolation

Shamir's (k,n) threshold scheme is built around the concept of polynomial interpolation. If you're not familiar with this concept, it's actually quite simple. In fact, if you've ever plotted points on a graph and then drew lines or curves to connect them, you've already used it!

Consider a polynomial function with a degree of one, f(x)=x+2. If you wanted to plot this function on a graph, how many points would you need? Well, we know this is a linear function, which forms a line and therefore needs at least two points. Next let us consider a polynomial function with a degree of two, f(x)=x2+2x+1. This is a quadratic function, and therefore requires at least three points to plot on a graph. How about a polynomial with a degree of three? At least four points. And so on and so forth.

The really neat thing about this property is that given the degree of a polynomial function and at least degree+1 points, we can plot additional points for that polynomial function. We call extrapolating those additional points polynomial interpolation.

Plotting a Secret

You may already see Shamir's clever scheme beginning to play out here. Let us suppose that our secret, S, is 42. We can turn S into a point on a graph, (0,42), and come up with a polynomial function with a k1 degree that satisfies this point. Recall that k is going to be our threshold of required shares, so if we want a threshold of three shares, we should choose a polynomial function with a degree of two.

Our polynomial function is going to take the form f(x)=a0+a1x+a2x2+a3x3+...+ak1xk1, where a0=S and a1,...,ak1 are randomly chosen positive integers. All we're doing is constructing a polynomial function with a k1 degree, where the free coefficient, a0, is our secret, S, and the proceeding k1 terms each have a randomly chosen positive coefficient. If we go back to our original example and suppose that S=42,k=3,a1,...,ak1=[3,5], then we get the function f(x)=42+3x+5x2.

At this point we are free to generate our shares by plugging n unique integers into f(x)=42+3x+5x2 where x0 (since that's our secret). In this example we want to distribute four shares with a threshold of three, so we randomly generate the points (18,1716),(27,3768),(31,4940),(35,6272) and send one point to each of our trusted four individuals. We also inform the individuals that k=3, since this is considered public information and is needed to reconstruct S.

Reconstructing a Secret

We already discussed the concept of polynomial interpolation, and how it is the very foundation of Shamir's (k,n) threshold scheme. When any three of our four entrusted individuals want to reconstruct S, all they need to do is interpolate f(0) with their unique points. To achieve this they can define their points, (x1,y1),...,(xk,yk)=(18,1716),(27,3768),(31,4940), and calculate a Lagrange interpolating polynomial using the following formula. If you’re more comfortable with programming than mathematics, the pi is basically a for statement that multiplies all the results together, and the sigma a for statement that adds all of the results together.

P(x)=j=1kpj(x)

Pj(x)=yjm=1mjkxxmxjxm

When k=3, we can solve this like so and get back our original polynomial function:

P(x)=y1(xx2x1x2·xx3x1x3)+y2(xx1x2x1·xx3x2x3)+y3(xx1x3x1·xx2x3x2)

P(x)=1716(x271827·x311831)+3768(x182718·x312731)+4940(x183118·x273127)

P(x)=42+3x+5x2

Since we know S=P(0), reconstructing S is simple:

P(0)=42+3(0)+5(0)2

P(0)=42

Exploiting Insecure Integer Arithmetic

Although we've successfully applied the fundamental idea behind Shamir's (k,n) threshold scheme, we have a remaining problem that we've ignored up until now. Our polynomial function uses insecure integer arithmetic. Consider that for every additional point an attacker obtains on our function's graph, the smaller number of possibilities remain for what other points can be. You can begin to see this deduction visually as you graph an increasing number of points for a polynomial function using integer arithmetic. This is counterproductive to our stated security goal that an attacker should learn absolutely nothing if they don't have at least k shares.

To demonstrate how weak the scheme is when integer arithmetic is used, consider a scenario in which an attacker has obtained two points, (18,1716),(27,3768), and knows the public information that k=3. From this information they can derive the degree of f(x), which is two, and plug in known x and f(x) values.

f(x)=a0+a1x+a2x2+a3x3+...+ak1xk1

f(x)=S+a1x+a2x2

f(18)1716=S+a118+a2182

f(27)3768=S+a127+a2272

An attacker can then solve for a1 by evaluating f(27)f(18):

37681716=(SS)+(27a118a1)+(729a2324a2)

2052=9a1+405a2

9a1=2052405a2

a1=2052405a29

a1=22845a2

Since we defined a1,...,ak1 to be randomly chosen positive integers, there are only so many possibilities for what a2 can be. With this information, an attacker can deduce a2[1,2,3,4,5], since anything greater than 5 would make a1 negative. This turns out to be true since we defined a2=5.

From there an attacker can begin to deduce possible values of S, by replacing a1 in f(18):

1716=S+a118+a2182

1716=S+18(22845a2)+a2182

1716S=18(22845a2)+a2182

S=18(22845a2)+a21821716

S=4104810a2+a21821716

S=4104+810a2a2182+1716

S=810a2324a22388

S=486a22388

With the limited set of possibilities for what a2 can be, as deduced by the attacker earlier, it's clear how easy it would be to guess and check for values of S. There are only five possibilities.

Addressing Insecure Integer Arithmetic

In order to address this vulnerability, Shamir proposes that we use modulo arithmetic, changing f(x) to f(x)modp where p :p > S, p > n and is a set of all prime numbers.

A quick reminder of how modulo arithmetic works. Reading a clock is already a familiar concept, and uses hours that are mod12. As soon as the hour hand goes past twelve, it wraps back around to one. An interesting property of this system is that by simply looking at a clock, we can't deduce how many times the hour hand has wrapped around. That is to say, if 48 hours have passed, I wouldn't know that just from learning that it is 1:30. However, if we know that the hour hand has passed twelve 4 times, we can fully solve for the number of hours that have passed with a simple formula, a=mq+r, where m is our modulo divisor (here m=12), q is our quotient (the number of times that our divisor evenly goes into the original number, here q=4), and r is our remainder, which is usually what calling a modulo operator gives us (here r=1.5). Knowing all of these values allows us to solve for a=49.5, but if we were missing the quotient, we would never be able to reconstruct our original value.

We can demonstrate how this helps the security of our scheme by applying it to our previous example and using p=73. Our new polynomial function is f(x)=42+3x+5x2mod73, and our new points are (18,37),(27,45),(31,49),(35,67). Our entrusted individuals can once again use polynomial interpolation to recover our function, only this time addition and multiplication operations must be followed by reduction modulo p (e.g. 48+93mod73=68).

Using this new example, let us suppose that our attacker has obtained knowledge of two of these new points, (18,37),(27,45), and the public information k=3,p=73. This time the attacker deduces the following functions based on all of the information they have, where is a set of all positive integers, and qx represents the quotient of the modulo of f(x).

f(x)=S+a1x+a2x2mod73

f(x)=S+a1x+a2x273qx:qx

f(18)37=S+a118+a218273q18

f(27)45=S+a127+a227273q27

Now our attacker again solves for a1 by evaluating f(27)f(18):

4537=(SS)+(27a118a1)+(729a2324a2)+(73q27(73q18))

8=9a1+405a2+73(q18q27)

9a1=8+405a2+73(q18q27)

9a1=8405a273(q18q27)

a1=8405a273(q18q27)9

a1=8945a2739(q18q27)

From there our attacker again attempts to deduce S, by replacing a1 in f(18):

37=S+18(8945a2739(q18q27))+a218273q18

S=18(8945a2739(q18q27))+a218273q1837

S=18(8945a2739(q18q27))324a2+73q18+37

S=486a2+21+219q18146q27

This time around, our attacker has a serious problem. They are missing the values of a2, q18, and q27. Since there are an infinite number of valid combinations for these variables, there is no additional information that can be gained.

Security Considerations

Sharmir's secret sharing scheme offers information-theoretic security, meaning that the math we explored has been proven to be unbreakable, even against an attacker with unlimited computing power. However, the scheme still does contain a couple of known issues.

For example, Shamir's scheme does not produce verifiable shares, which means that individuals are free to submit fake shares and prevent the correct secret from being reconstructed. An adversarial share holder with enough information can even produce a different share such that S is reconstructed to a value of their choice. This issue is addressed by verifiable secret sharing schemes such as Feldman’s scheme.

Another issue is that because the length of any given share is equal to the length of an associated secret, the length of a secret is easily leaked. This issue is trivial to fix by simply padding the secret to a fixed length.

Finally, it is important to note that our concerns about security may extend beyond just the scheme itself. For real world cryptography applications, there is often the threat of side channel attacks, in which an attacker attempts to extract useful information from application timing, caching, fault, and more. If this is a concern, careful considerations should be made during development, such as using constant time functions and lookups, preventing memory from paging to disk, and a bunch of other things that are beyond the scope of this blog post.

#cryptography