Friday, December 28, 2018

Generic White-Box Attacks

Cryptography is deployed in almost all the objects (e.g., smartphones) to protect the communications between these objects connected in an open environment. However, the execution environments over them are usually untrustworthy, making cryptographic software implementations running in them are vulnerable to an attacker, who completely manipulates an execution platform and the software deployed on it. Specifically, an attack could try to extract the secret cryptographic keys by all kinds of means, for instance, by monitoring the accessed the memory while execution or by interfering in the execution and exploiting the leakage from erroneous outputs. This attacking model is called white-box model.

White-box cryptography


White-box cryptography is introduced by Chow et al. to protect against software cryptographic implementation against these white-box threats. In particular, it aims to make the key extraction infeasible to any malicious party that would gain full access to the program (and/or the At CHES 2016, Bos et al. proposed to use differential computation analysis (DCA) to attack white-box implementation. DCA is essentially an adaptation of the differential power analysis techniques in the white-box context. It exploits the fact that the variables appearing in the computation in some unknown encoded form might have a strong linear correlation with their original values. It works by first collecting some computation traces, composed runtime memory information through several executions through a dynamic instrumentation tool, such as Intel PIN. One then makes a key guess and predicts the value of a (supposedly) computed bit based on the guess and the computation input. The collected traces are then categorized into two groups according to the hypothetical bits, and a differential trace is calculated by subtracting the two average traces for each key guess. Finally, the key guess with the highest peak in the differential trace is selected as the key candidate.execution environment). Hence, white-box cryptography is considered as the last security frontier of the deployed software. Despite its practical interest, it has been widely acknowledged that no provably secure white-box implementation is put forward in the literature after almost 20 years exploration. Nevertheless, many different techniques have been proposed to mitigate this real-world security threat, but all these solutions have been broken by structural attacks. This situation has pushed the industry to deploy home-made white-box implementations, the designs of which are kept secret, to meet the increasing demands in the market. Although these implementations might not be not secure against a well-informed adversary, the security of their designs can make them practically hard to break since e.g. the known structural attacks do not apply as is. However, several generic attacks have been presented to break these obscure white-box implementations without gaining any knowledge of the designing principle behind.

Differential computation analysis


At CHES 2016, Bos et al. proposed to use differential computation analysis (DCA) to attack white-box implementation. DCA is essentially an adaptation of the differential power analysis techniques in the white-box context. It exploits the fact that the variables appearing in the computation in some unknown encoded form might have a strong linear correlation with their original values. It works by first collecting some computation traces, composed runtime memory information through several executions through a dynamic instrumentation tool, such as Intel PIN. One then makes a key guess and predicts the value of a (supposedly) computed bit based on the guess and the computation input. The collected traces are then categorized into two groups according to the hypothetical bits, and a differential trace is calculated by subtracting the two average traces for each key guess. Finally, the key guess with the highest peak in the differential trace is selected as the key candidate.

DCA has been shown especially effective to break many open white-box implementations and was extensively used as a white-box cryptanalytic technique in the recent WhibOx contest. At FSE 2016, Sasdrich et al. implemented Chow et al.'s white-box countermeasure in FPGA platform. They show that the classical DPA can reveal the secrets in hardware implementations of the white-box designs in the gray-box context, which extends the observation by Bos et al. Besides, the authors stress that the leakage of Chow et al.'s countermeasure comes from the imbalanceness of Boolean function modeling the intermediate variables appearing the computation. At ACNS 2018, Bock et al. give another reason why DCA works on previous countermeasures. However, their analysis is limited to nibble encodings, which could not be generalized to more complicated encoding techniques. A more in-depth analysis of DCA is expected by the community.

Summary


DCA is generic, automated and does not require any information about the design technique. But for a designer, there exist not too many ideas in the light of DCA attack. Besides, given that more DCA-like attacks appear in many following works and generic attacks, such as differential fault analysis are also destructive to practical white-box solutions, it becomes an even more complex problem. It is a challenging task to find a provably secure white-box scheme in the white-box context. However, to meet the needs of white-box secure solution from industry, we are forced to achieve security through obscurity for now. Although not perfect, it might be meaningful to consider to mitigate the gray-box attacks in the first place, for instance, by ideas inspired by the side-channel community.

Thursday, December 6, 2018

Design of Lightweight Linear Layers

Confusion and Diffusion

Confusion and diffusion introduced by Shannon are widely used two fundamental principles in the design of symmetric key primitives. Most modern block ciphers and hash functions have well-designed confusion and diffusion layers. Among many design strategies, substitution-permutation networks (SPN) have been popular in the design of block ciphers and hash functions. The best understood structure of SPN round function probably consists of a brick layer of local nonlinear permutations (usually S-boxes) followed by a multiplication with a diffusion matrix over a finite field (linear diffusion). Diffusion layers play a crucial role in providing resistance against two most powerful statistical attacks differential attack and linear attack .

AES, the most prominent example of SPNs, uses Maximum Distance Separable (MDS) matrix in the MixColumns operation. The MixColumns step employs the following circulant MDS matrix $$\small \left(\begin{array}{cccc} 2 & 3 & 1 & 1\\ 1 & 2 & 3 & 1\\ 1 & 1 & 2 & 3\\ 3 & 1 & 1 & 2 \end{array} \right)$$ where $1,2,3$ are element in the finite field GF($2^8$). It is shown that linear diffusion layers based on MDS matrices have optimal branch numbers and hence provide optimal diffusion properties for any AES-like ciphers in principle.


Lightweight Cryptography

The development of ubiquitous computing such as the Internet of Things (IoT) brings up the relevant security requirements. Specifically, the need for lightweight cryptography emerged from the lack of primitives that are suitable for constrained environments (e.g. RFID tags). Recent years have witnessed the development of various lightweight cipher designs. For more details on lightweight ciphers, the readers can refer to the blogpost Lightweight Cryptography by Ralph. In this post, we will focus on the design of the lightweight linear layer, i.e., lightweight diffusion matrices. We will show how the designers manage to achieve the tradeoff between security and performance.


Lightweight Recursive MDS Matrices

To reduce the hardware cost of MDS matrices, Guo et al. propose a novel design approach of recursive (or serial) MDS matrices. These matrices have been exploited in hardware-oriented lightweight block cipher LED and lightweight hash function PHOTON. The main idea of recursive MDS matrices is to represent an MDS matrix as a power of a very sparse matrix such as a companion matrix. In this way, the MDS matrix can be implemented by iterating the sparse matrix many times. A notable example is the PHOTON MDS matrix. $$\small M= \left(\begin{array}{cccc} 1 & 2 & 1 & 4\\ 4 & 9 & 6 & 17\\ 17 & 38 & 24 & 66\\ 66 & 149 & 100 & 11 \end{array} \right)= \left(\begin{array}{cccc} 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 1 & 2 & 1 & 4 \end{array} \right)^4$$ where the entries are elements in GF($2^4$). Compared to direct constructions, recursive MDS matrices have a substantially lower hardware area at the cost of additional clock cycles. Therefore, recursive MDS matrices are not suitable for low-latency applications.


Optimizing Entries in the Matrices

In the last five years, many constructions of lightweight MDS have been proposed. Most of them are based on matrices in some special classes, such as circulant, Hadamard, or Toeplitz matrices, with entries that can be efficiently computed. A novel idea by Li and Wang is to construct MDS matrix with elements in the form of binary matrices instead of elements in a finite field. Consider the following MDS matrix $$\small \left(\begin{array}{cccc} A & I & I & I\\ I & I & B & A\\ I & A & I & B \\ I & B & A & I \end{array} \right),\,\,{\rm where}\,\,\,\tiny A=\left(\begin{array}{cccccccc} 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \right)$$ and $B=A^{-2}$. This matrix can be implemented with only $106$ bitwise XOR, while a naive implementation of AES MDS matrix needs $152$ bitwise XOR.


Recent Works on Global Optimization

Most of the previous works consider the cost of an MDS matrix as the sum of costs of each coefficient and the cost of $nm(m-1)$ bit XOR operations, where the matrices are of order $n$ with entries from GF($2^m$). Actually this local optimization approach always overestimates the real cost. Indeed, many common intermediate values can be computed and then reused. In hardware implementations this can be done with merely wires. This implies that a globally optimized implementation can be significantly cheaper. Recently, Kranz et al. used Shorter Linear Straight-Line Programs to optimize hte previously proposed MDS matrices. They found a $4\times 4$ MDS matrix over GF($2^8$) (i.e., the same size of AES MDS matrix) that can be implemented with $72$ bitwise XORs. This number has been further reduced to $67$ in a recent work by Duval and Leurent following similar global optimization approach.


Near-MDS Matrices

It is well-known that any element of an MDS matrix over a finite field must be nonzero. Thus MDS matrices are very dense and hence costly in hardware implementation. MDS and recursive MDS matrices might not offer an optimal trade-off between security and efficiency. Near-MDS have sub-optimal branch numbers while they require less area than MDS matrices and they do not need additional clock cycles. Indeed, some diffusion layers constructed from near-MDS matrices outperform those based on MDS or recursive MDS matrices in terms of the FOAM framework proposed by Khoo et al.. Recently, near-MDS matrices have been adopted in some lightweight block ciphers, including PRINCE, PRIDE, and Midori. However, there is insufficient research on the construction and security properties of near-MDS matrices. These motivate us to present novel results on near-MDS matrices. Actually, We propose new designs of lightweight near-MDS matrices of order $5\sim9$. Further, it is shown that our linear layers with a well-chosen nonlinear layer can provide sufficient security against differential and linear cryptanalysis. The results have been published in ToSC 2017(1).


In sum, we already have pretty good solutions for various requirements in the design of lightweight diffusion matrices. Nevertheless, it remains open to achieve optimal implementation of linear layers in a specific cipher. It is also interesting to further investigate the impact of security on the adoption of non-MDS matrices.