For many years cryptographers and industry implemented
crypto algorithms on microcontrollers and ASICs simply by translating the
algorithms into assembly or rtl language. Only thing that mattered was correct
functionality of the design. However, in mid to late nineties some interesting
attacks of cryptographic implementations were shown, that were able to
completely break the security of it, reveal secret keys, while the underlying cryptographic
algorithm remained mathematically unbroken. First, the timing attack was introduced. Namely the attacker was able discover information based on
the time server took to respond to a query since this time depended on the
secret key that was used. Most simple application of this is square and
multiply algorithm used for RSA exponentiation. Namely, for every bit of the
exponent square, or square and multiply operation is used. This will result in different execution times for different exponents.
Kocher et al. introduced new type of side-channel attack by revealing that
information obtained from the power measurements can be also used to extract
secret values from devices such as smart cards. We start from assuming that the power signal
consist of random noise and deterministic one dependent on the processed value $x$
$ P =
L(x) + R$
Random noise is modeled
as a Gaussian distribution with zero mean, and deterministic key dependant
value $L(x)$ is usually hamming weight or hamming distance of the value .
Following on, we choose intermediate value to be targeted. Good target of the
attack for symmetric designs is S-box output, since small hamming weight difference
in the input of the sbox leads to not so small hamming weight difference in the output. In
the original work it was simply a one bit value, that could be either one or
zero. The simplest example is thus by using one output bit of the S-box, with
the assumption that power consumption is different for one and zero value. For
example, take one S-box operation of the first round of AES. It is calculated using
equation below
$SO_i = S(pt_i \oplus key_i)$
Since we are unaware
of what is the correct key value, we construct key guesses and calculate the
targeted value, e.g. first bit of the S-box output. Now, for every key guess,
we separate power traces into two groups, one that contains power traces for
which the key guess produces target value one and other where target value is
zero. Lastly, we compute the average of two groups, reducing the influence of
the noise, and then we subtract these averages. If our key guess is wrong,
difference of means will be negligible, since the power traces in two groups
don’t correspond to correct values, and are thus uncorrelated. However, the
correct key guess will have a noticeable spike in the point in time where first
bit of S-box output is computed.
This very simple, yet very effective attack opened a new
area of research. Soon, more elaborate attacks were devised, attacking bytes
instead of bits using correlation. More advanced attacks include correlation
with variance instead of the mean, or even higher statistical moments. These
attacks are usually unpractical for orders higher than two, since the noise influence
hardens the attacker’s possibility to retrieve the secret value. Profiling of
the devices is sometimes performed in order to devise a more accurate model of
the deterministic leakage functions. These attacks are known as template
attacks. Designers also came up with various ways of trying to thwart these
attacks, such as increasing the noise level by adding redundant circuitry or
operations, and shuffling the execution operation. Other methods include secure
logic design styles such as WDDL, where logical cells are designed to consume
amount of power regardless of their output. Methods on the algorithmic level,
e.g. Threshold Implementations, try to ensure no information leakage happens during
the execution of the algorithm, regardless of the underlying leakage model of
the design technology.
Since classical DPA and correlation analysis involve
constructing guesses for all possible key bytes/nibbles, it can be very time
consuming. This is where leakage tests became prominent. They tell if a set of
randomly chosen traces can be distinguished for set of traces with fixed input
value. It works in a very similar way to one bit DPA. 2 Sets of traces are
obtained, one with random inputs and one with fixed inputs. Welsh t-test is
used to measure if two sets can be distinguished from one another. If $\mu, \
s$, and $n$ are mean, variance and cardinality of a set, t values are
calculated as follows
$ t =
\frac{\mu_0 - \mu_1}{\sqrt{\frac{s_0^2}{n_0} + \frac{s_1^2}{n_1}}}$
Approximating statistical degree of freedom with $n=n_0 +
n_1$ we can with confidence 0.99999 claim that two sets are distinguishable for
$t > 4.5$. This type of leakage tests
allows the designers to be more efficient since it reduces the testing time of
the design. Downside of the approach is that leakage tests do not reveal how
can the secret values be extracted, only that there is a possibility for an
attacker to do so.
In best paper of CHES 2016: Differential Computation Analysis: Hiding Your White-Box Designs is Not Enough, side-channel analysis has been successfully
used to break many white-box implementations. This time analysis has been
performed on stack layout, instead of power traces, but the principle remained
the same. Taking into account that white-box cryptography is currently being
widely considered to be added to many software based designs, together with plethora
of IoT devices that require protection, side-channel analysis importance will
continue to be an important aspect of design process of cryptographic implementation.
Thus, researchers will continue to work on improvement of leakage detection and
key recovery methods to further reduce cost and time to market of new devices.