Monday, February 27, 2017

Side-channel analysis

For many years cryptographers and industry implemented crypto algorithms on microcontrollers and ASICs simply by translating the algorithms into assembly or rtl language. Only thing that mattered was correct functionality of the design. However, in mid to late nineties some interesting attacks of cryptographic implementations were shown, that were able to completely break the security of it, reveal secret keys, while the underlying cryptographic algorithm remained mathematically unbroken. First, the timing attack was introduced. Namely the attacker was able discover information based on the time server took to respond to a query since this time depended on the secret key that was used. Most simple application of this is square and multiply algorithm used for RSA exponentiation. Namely, for every bit of the exponent square, or square and multiply operation is used. This will result in different execution times for different exponents.

Kocher et al. introduced new type of side-channel attack  by revealing that information obtained from the power measurements can be also used to extract secret values from devices such as smart cards. We start from assuming that the power signal consist of random noise and deterministic one dependent on the processed value $x$

$P = L(x) + R$

Random noise is modeled as a Gaussian distribution with zero mean, and deterministic key dependant value $L(x)$ is usually hamming weight or hamming distance of the value . Following on, we choose intermediate value to be targeted. Good target of the attack for symmetric designs is S-box output, since small hamming weight difference in the input of the sbox leads to not so small hamming weight difference in the output. In the original work it was simply a one bit value, that could be either one or zero. The simplest example is thus by using one output bit of the S-box, with the assumption that power consumption is different for one and zero value. For example, take one S-box operation of the first round of AES. It is calculated using equation below

$SO_i = S(pt_i \oplus key_i)$

Since we are unaware of what is the correct key value, we construct key guesses and calculate the targeted value, e.g. first bit of the S-box output. Now, for every key guess, we separate power traces into two groups, one that contains power traces for which the key guess produces target value one and other where target value is zero. Lastly, we compute the average of two groups, reducing the influence of the noise, and then we subtract these averages. If our key guess is wrong, difference of means will be negligible, since the power traces in two groups don’t correspond to correct values, and are thus uncorrelated. However, the correct key guess will have a noticeable spike in the point in time where first bit of S-box output is computed.

This very simple, yet very effective attack opened a new area of research. Soon, more elaborate attacks were devised, attacking bytes instead of bits using correlation. More advanced attacks include correlation with variance instead of the mean, or even higher statistical moments. These attacks are usually unpractical for orders higher than two, since the noise influence hardens the attacker’s possibility to retrieve the secret value. Profiling of the devices is sometimes performed in order to devise a more accurate model of the deterministic leakage functions. These attacks are known as template attacks. Designers also came up with various ways of trying to thwart these attacks, such as increasing the noise level by adding redundant circuitry or operations, and shuffling the execution operation. Other methods include secure logic design styles such as WDDL, where logical cells are designed to consume amount of power regardless of their output. Methods on the algorithmic level, e.g. Threshold Implementations, try to ensure no information leakage happens during the execution of the algorithm, regardless of the underlying leakage model of the design technology.

Since classical DPA and correlation analysis involve constructing guesses for all possible key bytes/nibbles, it can be very time consuming. This is where leakage tests became prominent. They tell if a set of randomly chosen traces can be distinguished for set of traces with fixed input value. It works in a very similar way to one bit DPA. 2 Sets of traces are obtained, one with random inputs and one with fixed inputs. Welsh t-test is used to measure if two sets can be distinguished from one another. If $\mu, \ s$, and $n$ are mean, variance and cardinality of a set, t values are calculated as follows

$t = \frac{\mu_0 - \mu_1}{\sqrt{\frac{s_0^2}{n_0} + \frac{s_1^2}{n_1}}}$

Approximating statistical degree of freedom with $n=n_0 + n_1$ we can with confidence 0.99999 claim that two sets are distinguishable for $t > 4.5$.  This type of leakage tests allows the designers to be more efficient since it reduces the testing time of the design. Downside of the approach is that leakage tests do not reveal how can the secret values be extracted, only that there is a possibility for an attacker to do so.

In best paper of CHES 2016: Differential Computation Analysis: Hiding Your White-Box Designs is Not Enough, side-channel analysis has been successfully used to break many white-box implementations. This time analysis has been performed on stack layout, instead of power traces, but the principle remained the same. Taking into account that white-box cryptography is currently being widely considered to be added to many software based designs, together with plethora of IoT devices that require protection, side-channel analysis importance will continue to be an important aspect of design process of cryptographic implementation. Thus, researchers will continue to work on improvement of leakage detection and key recovery methods to further reduce cost and time to market of new devices.