Friday, December 28, 2018

Generic White-Box Attacks

Cryptography is deployed in almost all the objects (e.g., smartphones) to protect the communications between these objects connected in an open environment. However, the execution environments over them are usually untrustworthy, making cryptographic software implementations running in them are vulnerable to an attacker, who completely manipulates an execution platform and the software deployed on it. Specifically, an attack could try to extract the secret cryptographic keys by all kinds of means, for instance, by monitoring the accessed the memory while execution or by interfering in the execution and exploiting the leakage from erroneous outputs. This attacking model is called white-box model.

White-box cryptography


White-box cryptography is introduced by Chow et al. to protect against software cryptographic implementation against these white-box threats. In particular, it aims to make the key extraction infeasible to any malicious party that would gain full access to the program (and/or the At CHES 2016, Bos et al. proposed to use differential computation analysis (DCA) to attack white-box implementation. DCA is essentially an adaptation of the differential power analysis techniques in the white-box context. It exploits the fact that the variables appearing in the computation in some unknown encoded form might have a strong linear correlation with their original values. It works by first collecting some computation traces, composed runtime memory information through several executions through a dynamic instrumentation tool, such as Intel PIN. One then makes a key guess and predicts the value of a (supposedly) computed bit based on the guess and the computation input. The collected traces are then categorized into two groups according to the hypothetical bits, and a differential trace is calculated by subtracting the two average traces for each key guess. Finally, the key guess with the highest peak in the differential trace is selected as the key candidate.execution environment). Hence, white-box cryptography is considered as the last security frontier of the deployed software. Despite its practical interest, it has been widely acknowledged that no provably secure white-box implementation is put forward in the literature after almost 20 years exploration. Nevertheless, many different techniques have been proposed to mitigate this real-world security threat, but all these solutions have been broken by structural attacks. This situation has pushed the industry to deploy home-made white-box implementations, the designs of which are kept secret, to meet the increasing demands in the market. Although these implementations might not be not secure against a well-informed adversary, the security of their designs can make them practically hard to break since e.g. the known structural attacks do not apply as is. However, several generic attacks have been presented to break these obscure white-box implementations without gaining any knowledge of the designing principle behind.

Differential computation analysis


At CHES 2016, Bos et al. proposed to use differential computation analysis (DCA) to attack white-box implementation. DCA is essentially an adaptation of the differential power analysis techniques in the white-box context. It exploits the fact that the variables appearing in the computation in some unknown encoded form might have a strong linear correlation with their original values. It works by first collecting some computation traces, composed runtime memory information through several executions through a dynamic instrumentation tool, such as Intel PIN. One then makes a key guess and predicts the value of a (supposedly) computed bit based on the guess and the computation input. The collected traces are then categorized into two groups according to the hypothetical bits, and a differential trace is calculated by subtracting the two average traces for each key guess. Finally, the key guess with the highest peak in the differential trace is selected as the key candidate.

DCA has been shown especially effective to break many open white-box implementations and was extensively used as a white-box cryptanalytic technique in the recent WhibOx contest. At FSE 2016, Sasdrich et al. implemented Chow et al.'s white-box countermeasure in FPGA platform. They show that the classical DPA can reveal the secrets in hardware implementations of the white-box designs in the gray-box context, which extends the observation by Bos et al. Besides, the authors stress that the leakage of Chow et al.'s countermeasure comes from the imbalanceness of Boolean function modeling the intermediate variables appearing the computation. At ACNS 2018, Bock et al. give another reason why DCA works on previous countermeasures. However, their analysis is limited to nibble encodings, which could not be generalized to more complicated encoding techniques. A more in-depth analysis of DCA is expected by the community.

Summary


DCA is generic, automated and does not require any information about the design technique. But for a designer, there exist not too many ideas in the light of DCA attack. Besides, given that more DCA-like attacks appear in many following works and generic attacks, such as differential fault analysis are also destructive to practical white-box solutions, it becomes an even more complex problem. It is a challenging task to find a provably secure white-box scheme in the white-box context. However, to meet the needs of white-box secure solution from industry, we are forced to achieve security through obscurity for now. Although not perfect, it might be meaningful to consider to mitigate the gray-box attacks in the first place, for instance, by ideas inspired by the side-channel community.

Thursday, December 6, 2018

Design of Lightweight Linear Layers

Confusion and Diffusion

Confusion and diffusion introduced by Shannon are widely used two fundamental principles in the design of symmetric key primitives. Most modern block ciphers and hash functions have well-designed confusion and diffusion layers. Among many design strategies, substitution-permutation networks (SPN) have been popular in the design of block ciphers and hash functions. The best understood structure of SPN round function probably consists of a brick layer of local nonlinear permutations (usually S-boxes) followed by a multiplication with a diffusion matrix over a finite field (linear diffusion). Diffusion layers play a crucial role in providing resistance against two most powerful statistical attacks differential attack and linear attack .

AES, the most prominent example of SPNs, uses Maximum Distance Separable (MDS) matrix in the MixColumns operation. The MixColumns step employs the following circulant MDS matrix $$\small \left(\begin{array}{cccc} 2 & 3 & 1 & 1\\ 1 & 2 & 3 & 1\\ 1 & 1 & 2 & 3\\ 3 & 1 & 1 & 2 \end{array} \right)$$ where $1,2,3$ are element in the finite field GF($2^8$). It is shown that linear diffusion layers based on MDS matrices have optimal branch numbers and hence provide optimal diffusion properties for any AES-like ciphers in principle.


Lightweight Cryptography

The development of ubiquitous computing such as the Internet of Things (IoT) brings up the relevant security requirements. Specifically, the need for lightweight cryptography emerged from the lack of primitives that are suitable for constrained environments (e.g. RFID tags). Recent years have witnessed the development of various lightweight cipher designs. For more details on lightweight ciphers, the readers can refer to the blogpost Lightweight Cryptography by Ralph. In this post, we will focus on the design of the lightweight linear layer, i.e., lightweight diffusion matrices. We will show how the designers manage to achieve the tradeoff between security and performance.


Lightweight Recursive MDS Matrices

To reduce the hardware cost of MDS matrices, Guo et al. propose a novel design approach of recursive (or serial) MDS matrices. These matrices have been exploited in hardware-oriented lightweight block cipher LED and lightweight hash function PHOTON. The main idea of recursive MDS matrices is to represent an MDS matrix as a power of a very sparse matrix such as a companion matrix. In this way, the MDS matrix can be implemented by iterating the sparse matrix many times. A notable example is the PHOTON MDS matrix. $$\small M= \left(\begin{array}{cccc} 1 & 2 & 1 & 4\\ 4 & 9 & 6 & 17\\ 17 & 38 & 24 & 66\\ 66 & 149 & 100 & 11 \end{array} \right)= \left(\begin{array}{cccc} 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 1 & 2 & 1 & 4 \end{array} \right)^4$$ where the entries are elements in GF($2^4$). Compared to direct constructions, recursive MDS matrices have a substantially lower hardware area at the cost of additional clock cycles. Therefore, recursive MDS matrices are not suitable for low-latency applications.


Optimizing Entries in the Matrices

In the last five years, many constructions of lightweight MDS have been proposed. Most of them are based on matrices in some special classes, such as circulant, Hadamard, or Toeplitz matrices, with entries that can be efficiently computed. A novel idea by Li and Wang is to construct MDS matrix with elements in the form of binary matrices instead of elements in a finite field. Consider the following MDS matrix $$\small \left(\begin{array}{cccc} A & I & I & I\\ I & I & B & A\\ I & A & I & B \\ I & B & A & I \end{array} \right),\,\,{\rm where}\,\,\,\tiny A=\left(\begin{array}{cccccccc} 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0\\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \end{array} \right)$$ and $B=A^{-2}$. This matrix can be implemented with only $106$ bitwise XOR, while a naive implementation of AES MDS matrix needs $152$ bitwise XOR.


Recent Works on Global Optimization

Most of the previous works consider the cost of an MDS matrix as the sum of costs of each coefficient and the cost of $nm(m-1)$ bit XOR operations, where the matrices are of order $n$ with entries from GF($2^m$). Actually this local optimization approach always overestimates the real cost. Indeed, many common intermediate values can be computed and then reused. In hardware implementations this can be done with merely wires. This implies that a globally optimized implementation can be significantly cheaper. Recently, Kranz et al. used Shorter Linear Straight-Line Programs to optimize hte previously proposed MDS matrices. They found a $4\times 4$ MDS matrix over GF($2^8$) (i.e., the same size of AES MDS matrix) that can be implemented with $72$ bitwise XORs. This number has been further reduced to $67$ in a recent work by Duval and Leurent following similar global optimization approach.


Near-MDS Matrices

It is well-known that any element of an MDS matrix over a finite field must be nonzero. Thus MDS matrices are very dense and hence costly in hardware implementation. MDS and recursive MDS matrices might not offer an optimal trade-off between security and efficiency. Near-MDS have sub-optimal branch numbers while they require less area than MDS matrices and they do not need additional clock cycles. Indeed, some diffusion layers constructed from near-MDS matrices outperform those based on MDS or recursive MDS matrices in terms of the FOAM framework proposed by Khoo et al.. Recently, near-MDS matrices have been adopted in some lightweight block ciphers, including PRINCE, PRIDE, and Midori. However, there is insufficient research on the construction and security properties of near-MDS matrices. These motivate us to present novel results on near-MDS matrices. Actually, We propose new designs of lightweight near-MDS matrices of order $5\sim9$. Further, it is shown that our linear layers with a well-chosen nonlinear layer can provide sufficient security against differential and linear cryptanalysis. The results have been published in ToSC 2017(1).


In sum, we already have pretty good solutions for various requirements in the design of lightweight diffusion matrices. Nevertheless, it remains open to achieve optimal implementation of linear layers in a specific cipher. It is also interesting to further investigate the impact of security on the adoption of non-MDS matrices.


Wednesday, February 21, 2018

Fast Fully Homomorphic Evaluation of Neural Networks in the Cloud


In order for Fully Homomorphic Encryption (FHE) to be deployed in real-world applications, still today --- even if a theoretical solution has been around for almost 10 years --- it is required to increase the efficiency of used algorithms. As the interactions of parameters and components of nowadays lattice-based realizations of FHE are non-trivial, schemes once set up to meet a multitude of design constraints, often end up having high requirements. Too high for some "killer"-application as run-times may pose a prohibitive hurdle.

In this blog-post, I'd like to present a use-case where an Fully Homomorphic Encryption (FHE) scheme achieves unprecedentedly fast classification of encrypted data, and makes scale-invariant homomorphic evaluation of neural networks (NN) possible.

Are privacy-preserving services in the Cloud relevant?

At this point, I think, I can skip philosophizing about the ubiquitous utility of machine learning, as we can see its impact everyday all around us. On the other hand the quest for privacy-preserving application of machine learning algorithms to user data is becoming a central topic of discussions recently. It is expected that the General Data Protection Regulation (GDPR), a result of the call for European law (which could serve as paragon internationally) to protect its citizens, its economy, will push forward innovation in this direction too.

Simply put, users of Machine Learning as a Service (MLaaS) in the Cloud, want to only share & upload encrypted images as input to the companies' powerful, pre-trained cognitive models.

Clearly, encrypting content ensures data confidentiality, assuming the associated private key of the  public-key encryption scheme never leaves the user’s trusted device. (As a side note; recent news reports suggest that such an assumption for user controlled devices are not always guaranteed. The Cloud operating on FHE encrypted data on the other hand is not possibly vulnerable to leak private user data through the whole class of cache attacks, i.e. Meltdown and Spectre.)

Let's briefly look at the problem setting.

To overcome conflicting interests of confidentiality and utility of data in the Cloud-based scenario, Fully Homomorphic Encryption can help the user to receive a useful answer to their encrypted question in a privacy-preserving way. Hence, the cloud needs to support homomorphic computations on the FHE encrypted inputs and send back the still encrypted result of this delegated operation in a reasonable time. In principal, only the legitimate user can decrypt the output using their secret key. The cloud service cannot deduce information from the random looking inputs, intermediate or final results, but can still charge the user for providing the service, e.g. classifying an image in this example.


Let's briefly look at the task.

First step when approaching a solution of how to use FHE for NN is defining minimal requirements of the concrete task and knowing what can be considered practical FHE.
We want to showcase fast homomorphic evaluation of a pre-trained NN to classify a depicted shape without leaking privacy of the input data at an 80 [bit] security level, e.g. images of handwritten digits from the MNIST dataset.
The output, given in less than two seconds, shall be encrypted scores assigned to each possible output and the highest score, decrypted by the user, is the most probable label of the input image.
A depiction of how input is propagated in order to evaluate a discretized deep neural networks with an arbitrary depth $d$ of hidden layers to arrive at a classification. Each neuron performs operations $f_i$; a function linearly depending on values of the incoming wires and weights followed by a non-linear operations. The latter is typically referred to as ``activation''.

As deep neural networks with $d$ hidden layers give good results in practice, we target this type with a scale-invariant FHE scheme.


Let's look at the problem solution.

In an attempt to bringing forth FHE in practice, our C++ code builds on top of an existing Fast Fully Homomorphic Encryption Library over the Torus (TFHE) and introduces a new framework for homomorphic evaluation.

To increase the efficiency, which is an important step paving the way to practicality, the underlying FHE scheme needs to be parametrized once for a given network.
Secondly, a security analysis is another crucial step in vetting the algorithms, ensuring their use maturely resists state-of-the-art cryptanalysis and fulfills the targeted security level.

The main capability of our scheme is that when evaluating a single neuron, the output value can readily be used for the next operation as it is bootstrapped to ensure low error propagation.
Close-up on a single neuron.
We apply the activation function directly to the weighted sum of inputs according to the network's wires, i.e. computing $y = f(x) = sign( \langle x, w\rangle )$, with fixed weights for the neuron and sign as activation.

Scale-invariance means that privacy-preserving evaluation of deep neural networks do not longer pose a hurdle, as computations carried out by every neuron in the network is independent of the total number of neurons and layers and hence scales linearly.

With this approach, we can report the performance result of an experiment to classify 10000 encrypted images from the MNIST dataset with more than 96% accuracy on average taking less than 1.7 seconds, using the TFHE library as a starting point.
Running an experiment on a trained neural net with 784:100:10--topology deployed in the Cloud.
An uploaded encrypted test image is input to the homomorphic evaluation of our scheme that classifies a depicted shape (without leaking privacy of the input data). The evaluation of the neural network outputs the encrypted scores $S_i$ assigned to each digit $i$. The highest score, decrypted by the user, is the most probable label of their image.

I'd like to stress that the good performance of this scale-able approach is not limited to homomorphic evaluation of neural networks with one hidden layer, as depicted above, but can be applied to deep neural networks, that in practice could be composed of possibly a hundred hidden layers or an even broader class of cognitive models.

For a detailed, formal description I refer to the full version of the paper or you may try out the proof-of-concept implementation code, available online that shows how to obtain these research results, applying our generic framework to a trained NN and MNIST dataset inputs as a demonstration.

Let's look at directions for future work and open-questions.

Finally, mentioning limits on the functionality of our FHE scheme, and pointing out the applicability to other well-specified domains rounds off this treatment here.

To comfort a potential concern for the service providers that their users might be sending malicious requests, to evaluate private networks with our framework is not dealt with at the moment, although it is in principal possible.
They could either try to learn the company's intellectual property (the weights and the topology of the neural network itself), or try to derive sensitive information encoded therein (which could be a breach into the privacy of the training dataset).
In this latter case a statistical databases studied in the differential privacy literature can be used in the training phase.

An open question is how further performance gains can be achieved by refining the algorithms. Also listing all general cognitive models that are possible is interesting.

So long, stay tuned for faster solutions and more general demonstrations!