tag:blogger.com,1999:blog-51495742096368402772024-03-11T08:48:26.361+01:00ECRYPT-EUA blog for the H2020 ECRYPT projects ECRYPT.NET and ECRYPT.CSANigel Smarthttp://www.blogger.com/profile/17681184541012804026noreply@blogger.comBlogger110125tag:blogger.com,1999:blog-5149574209636840277.post-82743643269075394522018-12-28T18:55:00.001+01:002018-12-28T19:03:00.695+01:00Generic White-Box AttacksCryptography is deployed in almost all the objects (e.g., smartphones) to protect the communications between these objects connected in an open environment. However, the execution environments over them are usually untrustworthy, making cryptographic software implementations running in them are vulnerable to an attacker, who completely manipulates an execution platform and the software deployed on it. Specifically, an attack could try to extract the secret cryptographic keys by all kinds of means, for instance, by monitoring the accessed the memory while execution or by interfering in the execution and exploiting the leakage from erroneous outputs. This attacking model is called <i>white-box</i> model.<br />
<br />
<h3>
White-box cryptography </h3>
<br />
<i>White-box cryptography</i> is introduced by <a href="https://link.springer.com/content/pdf/10.1007/3-540-36492-7_17.pdf">Chow <i>et al</i>.</a> to protect against software cryptographic implementation against these white-box threats. In particular, it aims to make the key extraction infeasible to any malicious party that would gain full access to the program (and/or the
At CHES 2016, <a href="https://eprint.iacr.org/2015/753">Bos <i>et al.</i></a> proposed to use <i>differential computation analysis </i>(DCA) to attack white-box implementation. DCA is essentially an adaptation of the <a href="https://link.springer.com/content/pdf/10.1007/3-540-48405-1_25.pdf" rel="">differential power analysis</a>
techniques in the white-box context. It exploits the fact that the
variables appearing in the computation in some unknown encoded form
might have a strong linear correlation with their original values. It
works by first collecting some computation traces, composed runtime
memory information through several executions through a dynamic
instrumentation tool, such as Intel PIN. One then makes a key guess and
predicts the value of a (supposedly) computed bit based on the guess and
the computation input. The collected traces are then categorized into
two groups according to the hypothetical bits, and a differential trace
is calculated by subtracting the two average traces for each key guess.
Finally, the key guess with the highest peak in the differential trace
is selected as the key candidate.execution environment). Hence, white-box cryptography is considered as the last security frontier of the deployed software. Despite its practical interest, it has been widely acknowledged that no provably secure white-box implementation is put forward in the literature after almost 20 years exploration. Nevertheless, many different techniques have been proposed to mitigate this real-world security threat, but all these solutions have been broken by structural attacks. This situation has pushed the industry to deploy home-made white-box implementations, the designs of which are kept secret, to meet the increasing demands in the market. Although these implementations might not be not secure against a well-informed adversary, the security of their designs can make them practically hard to break since e.g. the known structural attacks do not apply as is. However, several generic attacks have been presented to break these <i>obscure</i> white-box implementations without gaining any knowledge of the designing principle behind.<br />
<br />
<h3>
Differential computation analysis </h3>
<br />
At CHES 2016, <a href="https://eprint.iacr.org/2015/753">Bos <i>et al.</i></a> proposed to use <i>differential computation analysis </i>(DCA) to attack white-box implementation. DCA is essentially an adaptation of the <a href="https://link.springer.com/content/pdf/10.1007/3-540-48405-1_25.pdf" rel="">differential power analysis</a> techniques in the white-box context. It exploits the fact that the variables appearing in the computation in some unknown encoded form might have a strong linear correlation with their original values. It works by first collecting some computation traces, composed runtime memory information through several executions through a dynamic instrumentation tool, such as Intel PIN. One then makes a key guess and predicts the value of a (supposedly) computed bit based on the guess and the computation input. The collected traces are then categorized into two groups according to the hypothetical bits, and a differential trace is calculated by subtracting the two average traces for each key guess. Finally, the key guess with the highest peak in the differential trace is selected as the key candidate.<br />
<br />
DCA has been shown especially effective to break many open white-box
implementations and was extensively used as a white-box cryptanalytic
technique in the recent <a href="https://whibox-contest.github.io/">WhibOx</a> contest. At FSE 2016, <a href="https://eprint.iacr.org/2016/203">Sasdrich <i>et al.</i></a> implemented <a href="https://link.springer.com/content/pdf/10.1007/3-540-36492-7_17.pdf">Chow <i>et al</i>.</a>'s white-box countermeasure in FPGA platform. They show that the classical DPA can reveal the secrets in hardware implementations of the white-box designs in the gray-box context, which extends the observation by <a href="https://eprint.iacr.org/2015/753">Bos <i>et al.</i></a> Besides, the authors stress that the leakage of Chow <i>et al</i>.'s countermeasure comes from the imbalanceness of Boolean function modeling the intermediate variables appearing the computation. At ACNS 2018, <a href="https://eprint.iacr.org/2018/301" rel="">Bock <i>et al.</i></a> give another reason why DCA works on previous countermeasures. However, their analysis is limited to nibble encodings, which could not be generalized to more complicated encoding techniques. A more in-depth analysis of DCA is expected by the community.<br />
<br />
<h3>
Summary</h3>
<br />
DCA is generic, automated and does not require any information about the
design technique. But for a designer, there exist not too many ideas in
the light of DCA attack. Besides, given that more DCA-like attacks
appear in many following works and generic attacks, such as differential
fault analysis are also destructive to practical white-box solutions, it
becomes an even more complex problem. It is a challenging task to find a provably secure white-box scheme in the white-box context. However, to meet the needs of white-box secure solution from industry, we are forced to achieve <i>security through obscurity</i> for now. Although not perfect, it might be meaningful to consider to mitigate the gray-box attacks in the first place, for instance, by ideas inspired by the side-channel community.Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-83290121750391467082018-12-06T11:52:00.000+01:002018-12-06T11:52:53.413+01:00Design of Lightweight Linear Layers
<h3> Confusion and Diffusion </h3>
<p>
<a href="https://en.wikipedia.org/wiki/Confusion_and_diffusion"> <i> Confusion </i> and <i>diffusion </i> </a> introduced by Shannon are widely used two fundamental principles in the design of symmetric key primitives. Most modern block ciphers and hash functions have well-designed confusion and diffusion layers.
Among many design strategies, <a href="https://en.wikipedia.org/wiki/Substitution%E2%80%93permutation_network"> substitution-permutation networks </a> (SPN) have been popular in the design of block ciphers and hash functions. The best understood structure of SPN round function probably consists of a brick layer of local nonlinear permutations (usually S-boxes) followed by a multiplication with a diffusion matrix over a finite field (linear diffusion).
Diffusion layers play a crucial role in providing resistance against two most powerful statistical attacks <a href="https://en.wikipedia.org/wiki/Differential_cryptanalysis"> differential attack</a> and <a href="https://en.wikipedia.org/wiki/Linear_cryptanalysis">linear attack </a>.
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a>, the most prominent example of SPNs, uses Maximum Distance Separable (MDS) matrix in the MixColumns operation.
The MixColumns step employs the following circulant MDS matrix
$$\small \left(\begin{array}{cccc}
2 & 3 & 1 & 1\\
1 & 2 & 3 & 1\\
1 & 1 & 2 & 3\\
3 & 1 & 1 & 2
\end{array}
\right)$$
where $1,2,3$ are element in the finite field GF($2^8$).
It is shown that linear diffusion layers based on MDS matrices have optimal branch numbers and hence provide optimal diffusion properties for any AES-like ciphers in principle.
</p>
<br>
<h3> Lightweight Cryptography </h3>
<p>
The development of ubiquitous computing such as the Internet of Things (IoT) brings up the relevant security requirements. Specifically, the need for lightweight cryptography emerged from the lack of primitives that are suitable for constrained environments (e.g. RFID tags). Recent years have witnessed the development of various lightweight cipher designs. For more details on lightweight ciphers, the readers can refer to the blogpost <a href="http://ecrypt-eu.blogspot.com/search/label/lightweight%20cryptography"><i>Lightweight Cryptography</i> </a> by Ralph. In this post, we will focus on the design of the lightweight linear layer, i.e., lightweight diffusion matrices. We will show how the designers manage to achieve the tradeoff between security and performance.
</p>
<br>
<h3> Lightweight Recursive MDS Matrices </h3>
<p>
To reduce the hardware cost of MDS matrices, Guo <i> et al.</i> propose a novel design approach of recursive (or serial) MDS matrices.
These matrices have been exploited in hardware-oriented lightweight block cipher <a href="https://sites.google.com/site/ledblockcipher/">LED</a>
and lightweight hash function <a href="https://sites.google.com/site/photonhashfunction/home">PHOTON</a>.
The main idea of recursive MDS matrices is to represent
an MDS matrix as a power of a very sparse matrix such as a companion matrix.
In this way, the MDS matrix can be implemented by iterating the sparse
matrix many times. A notable example is the PHOTON MDS matrix.
$$\small M= \left(\begin{array}{cccc}
1 & 2 & 1 & 4\\
4 & 9 & 6 & 17\\
17 & 38 & 24 & 66\\
66 & 149 & 100 & 11
\end{array}
\right)=
\left(\begin{array}{cccc}
0 & 1 & 0 & 0\\
0 & 0 & 1 & 0\\
0 & 0 & 0 & 1\\
1 & 2 & 1 & 4
\end{array}
\right)^4$$
where the entries are elements in GF($2^4$). Compared to direct constructions, recursive MDS matrices
have a substantially lower hardware area at the cost of additional clock cycles.
Therefore, recursive MDS matrices are not suitable for low-latency applications.
</p>
<br>
<h3> Optimizing Entries in the Matrices </h3>
<p>
In the last five years, many constructions of lightweight MDS have been proposed. Most of them are based on matrices in some
special classes, such as circulant, Hadamard, or Toeplitz matrices, with entries that can be efficiently computed.
A novel idea by <a href="https://eprint.iacr.org/2016/406">Li and Wang</a> is to construct MDS matrix with elements
in the form of binary matrices instead of elements in a finite field. Consider the following MDS matrix
$$\small \left(\begin{array}{cccc}
A & I & I & I\\
I & I & B & A\\
I & A & I & B \\
I & B & A & I
\end{array}
\right),\,\,{\rm where}\,\,\,\tiny A=\left(\begin{array}{cccccccc}
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & 0 & 0 & 1 & 0\\
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0
\end{array}
\right)$$
and $B=A^{-2}$. This matrix can be implemented with only $106$ bitwise XOR, while a naive implementation of
AES MDS matrix needs $152$ bitwise XOR.
</p>
<br>
<h3>Recent Works on Global Optimization</h3>
<p> Most of the previous works consider the cost of an MDS matrix as the sum of costs of each coefficient and the cost of
$nm(m-1)$ bit XOR operations, where the matrices are of order $n$ with entries from GF($2^m$). Actually this local optimization
approach always overestimates the real cost. Indeed, many common intermediate values can be computed and then reused.
In hardware implementations this can be done with merely wires. This implies that a globally optimized implementation
can be significantly cheaper. Recently, <a href="https://tosc.iacr.org/index.php/ToSC/article/view/813"> Kranz <i>et al.</i></a> used
Shorter Linear Straight-Line Programs to optimize hte previously proposed MDS matrices. They found a $4\times 4$ MDS matrix
over GF($2^8$) (i.e., the same size of AES MDS matrix) that can be implemented with $72$ bitwise XORs.
This number has been further reduced to $67$ in a recent work by <a href="https://tosc.iacr.org/index.php/ToSC/article/view/888">
Duval and Leurent</a> following similar global optimization approach.
</p>
<br>
<h3> Near-MDS Matrices </h3>
<p> It is well-known that any element of an MDS matrix over a finite field must be nonzero.
Thus MDS matrices are very dense and hence costly in hardware implementation.
MDS and recursive MDS matrices might not offer an optimal trade-off between security and efficiency.
Near-MDS have sub-optimal branch numbers while they require less area than MDS matrices and they do not need additional clock cycles.
Indeed, some diffusion layers constructed from near-MDS matrices outperform those based on MDS or recursive MDS matrices in terms of the FOAM framework proposed by
<a href="https://eprint.iacr.org/2014/530">Khoo <i>et al.</i></a>.
Recently, near-MDS matrices have been adopted in some lightweight block ciphers, including <a href="https://eprint.iacr.org/2012/529">PRINCE</a>, <a href="https://eprint.iacr.org/2014/453">PRIDE</a>, and <a href="https://eprint.iacr.org/2015/1142">Midori</a>. However, there is insufficient research on the construction and security properties of near-MDS matrices.
These motivate us to present novel results on near-MDS matrices. Actually, We propose new designs of lightweight near-MDS matrices of order $5\sim9$. Further, it is shown that our linear layers with a well-chosen nonlinear layer can provide sufficient security against differential and linear cryptanalysis. The results have been published in
<a href="https://tosc.iacr.org/index.php/ToSC/article/view/588">ToSC 2017(1)</a>.
</p>
<br>
<p> In sum, we already have pretty good solutions for various requirements in the design of lightweight diffusion matrices.
Nevertheless, it remains open to achieve optimal implementation of linear layers in a specific cipher.
It is also interesting to further investigate the impact of security on the adoption of non-MDS matrices.
</p>
<br>
Chaoyun Lihttp://www.blogger.com/profile/13709110456124646740noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-27545371493629741742018-02-21T18:28:00.000+01:002018-02-21T18:28:17.946+01:00Fast Fully Homomorphic Evaluation of Neural Networks in the Cloud<div class="separator" style="clear: both; text-align: center;">
<br /></div>
In order for Fully Homomorphic Encryption (FHE) to be deployed in real-world applications, still today --- even if a theoretical solution has been around for almost 10 years --- it is required to increase the efficiency of used algorithms. As the interactions of parameters and components of nowadays lattice-based realizations of FHE are non-trivial, schemes once set up to meet a multitude of design constraints, often end up having high requirements. Too high for some "killer"-application as run-times may pose a prohibitive hurdle.<br />
<br />
In this blog-post, I'd like to present a use-case where an Fully Homomorphic Encryption (FHE) scheme achieves unprecedentedly fast classification of encrypted data, and makes scale-invariant homomorphic evaluation of neural networks (NN) possible.<br />
<br />
<h3>
Are <i>privacy-preserving </i>services in the Cloud relevant?</h3>
At this point, I think, I can skip philosophizing about the ubiquitous utility of machine learning, as we can see its impact everyday all around us. On the other hand the quest for <i>privacy-preserving</i> application of machine learning algorithms to user data is becoming a central topic of discussions recently. It is expected that the General Data Protection Regulation (<a href="https://www.eugdpr.org/" target="_blank">GDPR</a>), a result of the call for European law (which could serve as paragon internationally) to protect its citizens, its economy, will push forward innovation in this direction too.<br />
<br />
Simply put, users of Machine Learning as a Service (MLaaS) in the Cloud, want to only share & upload encrypted images as input to the companies' powerful, pre-trained cognitive models.<br />
<br />
Clearly, encrypting content ensures data confidentiality, assuming the associated private key of the public-key encryption scheme never leaves the user’s trusted device. (As a side note; recent news reports suggest that such an assumption for user controlled devices are not always guaranteed. The Cloud operating on FHE encrypted data on the other hand is not possibly vulnerable to leak private user data through the whole class of cache attacks, i.e. Meltdown and Spectre.)<br />
<br />
<h3>
Let's briefly look at the problem setting.</h3>
<div style="text-align: left;">
To overcome conflicting interests of confidentiality and utility of data in the Cloud-based scenario, Fully Homomorphic Encryption can help the user to receive a useful answer to their encrypted question in a privacy-preserving way. Hence, the cloud needs to support homomorphic computations on the FHE encrypted inputs and send back the still encrypted result of this delegated operation in a reasonable time. In principal, only the legitimate user can decrypt the output using their secret key. The cloud service cannot deduce information from the random looking inputs, intermediate or final results, but can still charge the user for providing the service, e.g. classifying an image in this example.</div>
<h3>
</h3>
<h3>
<br />Let's briefly look at the task. </h3>
First step when approaching a solution of how to use FHE for NN is defining minimal requirements of the concrete task and knowing what can be considered practical FHE.<br />
We want to showcase fast homomorphic evaluation of a pre-trained NN to classify a depicted shape without leaking privacy of the input data at an 80 [bit] security level, e.g. images of handwritten digits from the MNIST dataset.<br />
The output, given in less than two seconds, shall be encrypted scores assigned to each possible output and the highest score, decrypted by the user, is the most probable label of the input image.<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3oNc7O21IT8tRkNLdK8nXTCrOTDog3y-ysH9yy-vC5Bkoqcibxu7h1WNZNqNyNO0X3X3eNw6B-uKQs5gwJFK8nqbEbp4wEpeQhVOazNApZkuh6LA1tBSeOCl5txREVIzc3mK6HrE8T08/s1600/TikZ_NN_general_IHO.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="796" data-original-width="1047" height="301" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3oNc7O21IT8tRkNLdK8nXTCrOTDog3y-ysH9yy-vC5Bkoqcibxu7h1WNZNqNyNO0X3X3eNw6B-uKQs5gwJFK8nqbEbp4wEpeQhVOazNApZkuh6LA1tBSeOCl5txREVIzc3mK6HrE8T08/s400/TikZ_NN_general_IHO.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">A depiction of how input is propagated in order to evaluate a
discretized deep neural networks with an arbitrary depth $d$ of hidden
layers to arrive at a classification. Each neuron performs operations
$f_i$; a function linearly depending on values of the incoming wires and
weights followed by a non-linear operations. The latter is typically
referred to as ``activation''. </td></tr>
</tbody></table>
<br />
As deep neural networks with $d$ hidden layers give good results in practice, we target this type with a scale-invariant FHE scheme.<br />
<br />
<br />
<h3>
Let's look at the problem solution.</h3>
In an attempt to bringing forth FHE in practice, our C++ code builds on top of an existing Fast Fully Homomorphic Encryption Library over the Torus (<a href="http://tfhe.github.io/" target="_blank">TFHE</a>) and introduces a new framework for homomorphic evaluation.<br />
<br />
To increase the efficiency, which is an important step paving the way to practicality, the underlying FHE scheme needs to be parametrized once for a given network.<br />
Secondly, a security analysis is another crucial step in vetting the algorithms, ensuring their use maturely resists state-of-the-art cryptanalysis and fulfills the targeted security level.<br />
<br />
The main capability of our scheme is that when evaluating a single neuron, the output value can readily be used for the next operation as it is bootstrapped to ensure low error propagation.<br />
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: left; text-align: left;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_gTGScCQFxWvQeLktfSra1MSvWu5nvRhiIRyXxnJ_ueml1S03vA5nHnMW_rVewWg4HExLbZp3KuQXY_KokXAFsk-Ge11eILwltDMRKzsUQeOGGdusSL5ulSPULhlUpn6RTY3L7MFMYSo/s1600/TikZ_NN_neuron.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" data-original-height="382" data-original-width="740" height="102" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_gTGScCQFxWvQeLktfSra1MSvWu5nvRhiIRyXxnJ_ueml1S03vA5nHnMW_rVewWg4HExLbZp3KuQXY_KokXAFsk-Ge11eILwltDMRKzsUQeOGGdusSL5ulSPULhlUpn6RTY3L7MFMYSo/s200/TikZ_NN_neuron.png" width="200" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Close-up on a single neuron.</td></tr>
</tbody></table>
We apply the activation function directly to the weighted sum of inputs according to the network's wires, i.e. computing $y = f(x) = sign( \langle x, w\rangle )$, with fixed weights for the neuron and sign as activation.<br />
<br />
Scale-invariance means that privacy-preserving evaluation of deep neural networks do not longer pose a hurdle, as computations carried out by every neuron in the network is independent of the total number of neurons and layers and hence scales linearly.<br />
<br />
With this approach, we can report the performance result of an experiment to classify 10000 encrypted images from the MNIST dataset with more than 96% accuracy on average taking less than 1.7 seconds, using the TFHE library as a starting point.<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjktmMbuMB4MhJLd9yZRH_AiykIoS8iv78HZ8j1XOtAY8hCB9n0rnnYPJdUX-Z6QQw8wNKx_v4mFxLTn5H0FCedv6n3P5yBu3JAf6clJ46wOdJZq6GttS_jGO7EnH4E8yC9Nj0eNRv81N8/s1600/TikZ_NN_Seven_784_100_10_scores_x.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="684" data-original-width="1600" height="272" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjktmMbuMB4MhJLd9yZRH_AiykIoS8iv78HZ8j1XOtAY8hCB9n0rnnYPJdUX-Z6QQw8wNKx_v4mFxLTn5H0FCedv6n3P5yBu3JAf6clJ46wOdJZq6GttS_jGO7EnH4E8yC9Nj0eNRv81N8/s640/TikZ_NN_Seven_784_100_10_scores_x.png" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Running an experiment on a trained neural net with 784:100:10--topology deployed in the Cloud.</td></tr>
</tbody></table>
An uploaded encrypted test image is input to the homomorphic evaluation of our scheme that classifies a depicted shape (without leaking privacy of the input data). The evaluation of the neural network outputs the encrypted scores $S_i$ assigned to each digit $i$. The highest score, decrypted by the user, is the most probable label of their image.<br />
<br />
I'd like to stress that the good performance of this scale-able approach is not limited to homomorphic evaluation of neural networks with one hidden layer, as depicted above, but can be applied to deep neural networks, that in practice could be composed of possibly a hundred hidden layers or an even broader class of cognitive models.<br />
<br />
For a detailed, formal description I refer to the full version of the <a href="http://ia.cr/2017/1114" target="_blank">paper</a> or you may try out the proof-of-concept implementation code, available <a href="https://github.com/MatthiasMi/dinn" target="_blank">online</a> that shows how to obtain these research results, applying our generic framework to a trained NN and MNIST dataset inputs as a demonstration.<br />
<br />
<h3>
Let's look at directions for future work and open-questions.</h3>
Finally, mentioning limits on the functionality of our FHE scheme, and pointing out the applicability to other well-specified domains rounds off this treatment here.<br />
<br />
To comfort a potential concern for the service providers that their users might be sending malicious requests, to evaluate private networks with our framework is not dealt with at the moment, although it is in principal possible. <br />
They could either try to learn the company's intellectual property (the weights and the topology of the neural network itself), or try to derive sensitive information encoded therein (which could be a breach into the privacy of the training dataset).<br />
In this latter case a statistical databases studied in the differential privacy literature can be used in the training phase.<br />
<br />
An open question is how further performance gains can be achieved by refining the algorithms. Also listing all general cognitive models that are possible is interesting.<br />
<br />
<b>So long, stay tuned for faster solutions and more general demonstrations!</b>Matthiashttp://www.blogger.com/profile/08311091211169735623noreply@blogger.comtag:blogger.com,1999:blog-5149574209636840277.post-6241841545543764702017-11-02T13:03:00.000+01:002017-11-02T13:03:03.396+01:00Compressing ledgers of financial transactions<div dir="ltr" style="text-align: left;" trbidi="on">
<p>
The history of modern banking begins almost 550 years ago, with the establishment of Bank <i>Monte dei Paschi di Siena</i>, in nowadays Italy. However, it wasn't until 1980s (early introduction of home banking), when pioneering financial institutions started to make use of computing machines to automatically process part of their financial transactions, and thus replace the manual, more error-prone process. As the availability of the Internet increased, starting with early 2000s, many major banks began to offer Internet banking to their customers. This means that one can access his/her account's balance or history through a web-browser or smartphone and initiate or receive transactions.
</p>
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtyv5OsyhtMN44QuLf-axEMjYjXfBraFeVKT9k9FX8Ytk18BBdpqKYZDrz4g5eQmNNBRuaI_oVqbBW8Xsj1juXFIzRinQw4wUgkSNvrrXkwxml2aDm3LlYt4QV66sNvRgYvPycOAwP5DWw/s1600/size.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgtyv5OsyhtMN44QuLf-axEMjYjXfBraFeVKT9k9FX8Ytk18BBdpqKYZDrz4g5eQmNNBRuaI_oVqbBW8Xsj1juXFIzRinQw4wUgkSNvrrXkwxml2aDm3LlYt4QV66sNvRgYvPycOAwP5DWw/s320/size.png" width="320" height="213" data-original-width="889" data-original-height="591" /></a></div>
<p>
Nowadays, a number of transactions in the order of millions are processed on a daily basis by a large bank. As the time passes and more and more transactions are processed, the size of this set only increases. Thus, one should think at potential solutions for removing the old and useless ones, such that they can be safely archived. The story repeats for the case of crypto-currencies (with the corresponding modifications - multiple senders or receivers allowed, accounts replaced with addresses).
If one considers for instance the size of transactions processed by a well-known crypto-currency, in the first six years of its lifespan, the size of the ledger reached aound 95 GB, while only in the first 9 months of 2017, more than 40 GB were processed. Such a growth implies increased computational costs while verifying the actual balance of a specific address, as a traversal through the set of addresses needs to be employed.
</p>
<br />
<p>
A first easy step one can do is to try to switch from the representation of transactions as lists, to a more visual one, which represents accounts as nodes in a graph and transactions as edges. Since multiple edges are allowed, the graph becomes in fact a multigraph. In their <a href="http://eprint.iacr.org/2017/751.pdf">work</a>, recently presented at <a href="http://securecomm.org/2017/">SecureComm17</a>, Rémi Geraud, David Naccache and Răzvan Roșie put forward the problem of finding "<b>nilcatenations</b>" in sets of transactions. Loosely speaking, a nilcatenation is a subgraph in a given multigraph with a special property: the balances of the nodes are zero, for each existing account. Stated differently, every single user part of a nilcatenation receives the same amount of money that it gets. Since the balances in such components do not affect the <b>global</b> balance of the original multigraph, nilcatenations can be decoupled and archived.
</p>
<br />
<p>
Some interesting observations can be made about nilcatenations. Any occurrence of such a component can be only as a <i>part of</i> a strongly connected components (SCC): a maximal subgraph of a directed graph with paths between any two nodes. Another observation can be made regarding simple, obstructing nodes: if we identify nodes with the <i>in</i> and <i>out</i> degrees set to 1, but with different weights for incoming and outgoing edges, then such nodes cannot be included in a nilcatenation (we dub them "first order obstructions").
After clearly formalizing it, the problem turns out to be be NP-Complete, via a reduction to the 0-target SSP problem (which is equivalent with SSP).
</p>
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrzgfw_oKzdl7ZoNf2xHf9fj0dlUf_RCRBnuawgwEjVRCNnUse9gdaclCCuj_QM_5PahuFSwI5bHD2-3pc2awws4LsOgqHsieEIznAToJ1-zncCAnyb6zRSUd26K0OmMiZwrC1G5estK_Z/s1600/algo.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrzgfw_oKzdl7ZoNf2xHf9fj0dlUf_RCRBnuawgwEjVRCNnUse9gdaclCCuj_QM_5PahuFSwI5bHD2-3pc2awws4LsOgqHsieEIznAToJ1-zncCAnyb6zRSUd26K0OmMiZwrC1G5estK_Z/s320/algo.png" width="320" height="212" data-original-width="554" data-original-height="367" /></a></div>
<p>
After pruning the original graph into smaller components (by employing SCC-split and first-order obstruction removal steps until convergence), one can benefit from known techniques in attacking the (multi-dimensional) subset-sum problem for each component, independently. Particularly, we can see the problem of finding nilcatenations as a multi-dimensional version of the SSP, and tackle each component independently. The known techniques employing the usage of an SVP-oracle the density work on low density instances. An overview of our heuristic is given in the second picture. More details can be found in the <a href="http://eprint.iacr.org/2017/751.pdf">original work</a>.
</p>
</div>
Razvanhttp://www.blogger.com/profile/07352518857479000506noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-74747607429480542042017-09-03T00:23:00.006+02:002017-09-03T00:25:26.273+02:00Simulating a quantum computer in classical computer<h2 style="text-align: left;">
Quantum Simulator? Quantum Computing?</h2>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
We heard a lot about quantum computers and what happens when one is build. We have several algorithms that can run on this computer and it will "shift" with cryptography. However, can we simulate a quantum computer? Can I run quantum algorithms already? </div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
The simple answer is: <b>yes, for both questions.</b></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<b><br /></b></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
Yes, the simulation is possible. However, that does not mean that we can break the crypto. The simulators that are available to use have several restrictions. Such as:</div>
</div>
<div>
<ul>
<li style="text-align: left;">Number of operations;</li>
<li style="text-align: left;">Size of circuit;</li>
<li style="text-align: left;">Number of qubits that is possible to use;</li>
</ul>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
</div>
</div>
<h3 style="text-align: left;">
Simulators, Special languages and etc...</h3>
<h4 style="text-align: left;">
IBM Q Experience</h4>
<div style="text-align: justify;">
<div style="text-align: left;">
IBM developed "<a href="https://quantumexperience.ng.bluemix.net/qx/community">IBM Q experience"</a> where you can design your circuit and run. The circuit will run on their computers and when it is ready, i.e., when your computation finish you will receive an email. In addition of the simulator, IBM added a nice introduction about quantum algorithms.</div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
</div>
<h4 style="text-align: left;">
Limitations</h4>
<div style="text-align: justify;">
<div style="text-align: left;">
The problem with the IBM Q is that it just allows you to use 16 qubits as a member of the community or 17 qubits for commercial use. Also, if you use the community edition sometimes you could be "stuck" in the queue. The reason is that with the community edition you share the "quantum simulator" with the other users and it cannot run more than a certain number of circuits. </div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
</div>
<h4 style="text-align: left;">
Microsoft Liquid</h4>
<div style="text-align: justify;">
<div style="text-align: left;">
Microsoft is in the running of "quantum simulators", in fact, they developed their own language for quantum simulation, they created <a href="http://stationq.github.io/Liquid/">Liquid</a>. It is "cross-platform", i.e., you can run in Windows, Mac OS or Linux. However, if you decide to use in your Linux distro, you will need to download <a href="http://www.mono-project.com/docs/about-mono/supported-platforms/linux/">Mono</a> and then run an executable compiled for Windows. </div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
The good side of <span style="font-family: inherit;">"</span><span style="background-color: white; text-align: center;"><span style="font-family: inherit;">The Language Integrated Quantum Operations Simulator" is that it is based in "F#" and it came with a lot of examples such as "Quantum </span>Teleportation<span style="font-family: inherit;">", "Shor's Algorithm" and others algorithms. </span></span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<span style="background-color: white; text-align: center;"><span style="font-family: inherit;"><br /></span></span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<span style="background-color: white; text-align: center;"><span style="font-family: inherit;">In fact, the paper </span></span><b style="background-color: white; text-align: start;"><a href="https://eprint.iacr.org/2017/598">Quantum Resource Estimates for Computing Elliptic Curve Discrete Logarithms</a> </b><span style="background-color: white; text-align: start;">uses liquid to implement an attack on ECC even with the following limitations. </span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<span style="background-color: white; text-align: center;"><span style="font-family: inherit;"><br /></span></span></div>
</div>
<h4 style="text-align: left;">
<span style="background-color: white;">Limitations</span></h4>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<span style="background-color: white;">Liquid allow you to use just 23 qubits. However, the paper before used more than this amount, the reason (when I asked on github) was that the authors are from Microsoft and they can use as much as they need. </span></div>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<h3>
<span style="background-color: white;">LibQuantum</span></h3>
</div>
<div>
<span style="background-color: white;">The libquantum is a library written in C that you can extend and use to create your circuit. It is considerably fast and easy to use. The good side of this library is that you can use as many as qubits you want, the only problem is that if you use a big amount of qubits your "program" will be very slow and it will use a big quantity of memory. </span></div>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<span style="background-color: white;">When I am running tests, I am using this library. I am going to create a small tutorial to use this library. I am assuming that the reader/user is on linux distribution and has a little bit of knowledge of C. First step is to download the library: <a href="http://www.libquantum.de/">www.libquantum.de</a></span></div>
<div>
<br /></div>
<div>
After, downloading, compile the library and install the library in your system so you will be able to use it. </div>
<div>
<br /></div>
<div>
First, let's create a register and do some operations:<br />
<br />
<script src="https://gist.github.com/gbanegas/eccd79736314a472528029807a8badb8.js"></script>
</div>
<div>
<br /></div>
<div style="text-align: justify;">
<div style="text-align: left;">
<span style="background-color: white;">In the code, we first create a quatum register with initial value in 0 and with 2 qubits. After, we print the content of the register, in this example it will be "0" and finally we do some operations with the register. We perform a simple "CNOT" gate but because the value of the register is "|00>" nothing will change. If we change the value in the 4th line from 0 to 1, in the end we will have a register with "|11>".</span></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
The library provides other operations such as: Toffoli, Sigma X, Sigma Y, Sigma Z, phase scale, phase kick and hadamard. </div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
In the code below we are going to have a more "complex" circuit:</div>
<div style="text-align: left;">
<br />
<script src="https://gist.github.com/gbanegas/d94ce42a45bc856036826e53adae96a6.js"></script></div>
</div>
<div style="text-align: justify;">
<div style="text-align: left;">
<span style="background-color: white; text-align: center;"><span style="font-family: inherit;"><br /></span></span>
<span style="background-color: white; text-align: center;"><span style="font-family: inherit;">Now, we initialize the quantum register with the value "|10>" and perform first a Hadamard transform to put the qubit 0 in superposition and after we aply a cnot gate with the value from qubit at position 1 to the target qubit at position 0. In the end, we are going to have a composed state as:</span></span><br />
<div style="text-align: center;">
0.707107 +0.000000i|3> (5.000000e-01) (|11>)</div>
<span style="background-color: white; text-align: center;"><span style="font-family: inherit;"></span></span><br />
<div style="text-align: center;">
0.707107 +0.000000i|0> (5.000000e-01) (|00>)</div>
<div style="text-align: left;">
<br /></div>
<div style="text-align: left;">
The library came with more examples that you can compile and run by yourself. I hope that you like this quick tutorial how to simulate quantum computers in classical computers. </div>
<div style="text-align: left;">
<br /></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/16974975055367272535noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-77122334523189662972017-07-12T23:27:00.000+02:002017-07-13T09:43:29.233+02:00Looking for fast OpenSource algorithms on lattices? Try fpLLL!Dear ECRYPT-NET fellows and readers,<br />
I have some news from CWI @ Science Park in Amsterdam where <a href="https://github.com/fplll/fplll/wiki/fplll-days-3" target="_blank">fplll-days-3</a>, organized by <a href="http://homepages.cwi.nl/~ducas/" target="_blank">Leo Ducas</a> and <a href="https://marc-stevens.nl/research/" target="_blank">Marc Stevens</a>, are currently taking place!<br />
<br />
Previously held at ENS Lyon, this is the third time already for such a combined effort to enhance the fplll OpenSource project. fplll has become a lively project with many suggestions that help to debug and feature requests for continuously improving the code-base in various directions.<br />
<br />
As a brief history of fplll it can be noted that the first code was written by <a href="http://perso.ens-lyon.fr/damien.stehle/" target="_blank">Damien Stehlé</a>. It is now written by many active contributors (according to GitHub, the <a href="https://github.com/fplll/fplll#acknowledgments" target="_blank">most</a> <a href="https://github.com/fplll/fplll#contributors" target="_blank">active</a> developers are: Martin Albrecht, Shi Bai, Damien Stehlé, Guillaume Bonnoron, Marc Stevens and Koen de Boer) and maintained by <a href="http://malb.io/" target="_blank">Martin Albrecht</a> and <a href="http://maths-people.anu.edu.au/%7Ebai/" target="_blank">Shi Bai</a>.<br />
<br />
<h3>
What does fplll do?</h3>
What fplll does, depending on additionally specified parameters, is performing its implementation of the <a href="https://en.wikipedia.org/wiki/LLL_algorithm" target="_blank">LLL algorithm</a> using fast floating-point arithmetic under the hood. Other available lattice reduction algorithms are HKZ, BKZ reduction and variants. These algorithms can be applied on an input lattice represented by a matrix, given i.e. in a file as a set of row vectors, to obtain a reduced representation --- a versatile starting point of numerous applications. Furthermore, fplll allows to solve (arbitrarily adjustable approximate-) <a href="https://en.wikipedia.org/wiki/Lattice_problems#Shortest_vector_problem_.28SVP.29" target="_blank">SVP</a> and <a href="https://en.wikipedia.org/wiki/Lattice_problems#Closest_vector_problem_.28CVP.29" target="_blank">CVP</a> instances, when used to find a shortest lattice vector relative to a user-chosen center.<br />
<br />
To get started, one can not only use and compile the <a href="https://github.com/fplll" target="_blank">fplll</a> C++ sources to run experiments, but the often dubbed 'user-friendlier variant' <a href="https://github.com/fplll/fpylll" target="_blank">fpylll</a> which provides Python access to the underlying, fast C++ functions. Finally, every mathematician's dear, Sage, (at least for anyone who isn't fully satisfied by pure Python) benefits from an improved fpylll as well, because importing the fpylll module seamlessly allows direct usage within Sage. Soon a new Sage version, <a href="https://git.sagemath.org/sage.git/?h=8.0" target="_blank">SageMath 8.0</a>, will be released, which ships the current fpylll module that accesses said, fast C++ routines.<br />
<br />
The significance of lattice-based cryptography is repeatedly mentioned in paper's abstracts and has been explored in past ECRYPT-NET blog posts i.e. on <a href="http://ecrypt-eu.blogspot.com/2017/03/what-are-lattices.html" target="_blank">'What are lattices?'</a> and <a href="http://ecrypt-eu.blogspot.com/2016/01/learning-problems-and-cryptography-lpn.html" target="_blank">'Learning problems and cryptography: LPN, LWE and LWR'</a>.<br />
<br />
<h3>
Significance for cryptanalysis</h3>
From a cryptanalysts point of view, the significance lies in the fact that most security models of lattice-based cryptography typically assume lattice reduction to be the most promising attack-vector against the underlying lattice-based primitive. Some security models are able to immediately (and provably) rule out certain classes of attacks and, for instance, a few others can be argued to be less promising than known formulations as lattice problems. Such arguing hence leads to fplll basically representing the SVP/CVP-oracle and it's performance is deemed as a lower bound for the practical performance of an attack. Typically attacks require many calls to such an oracle function, thus such an approach of taking the time of a single run as a lower bound is used to set parameters in experimental cryptosystems, when commonly a more conservative
<br />
<div wrap="">
lower bound including a security margin is chosen. Specifically, many proposed lattice-based crypto-schemes have been parametrized such that these best known-attacks are taken into account.</div>
<br />
I suppose, I do not need to point out the numerous <a href="https://en.wikipedia.org/wiki/Free_and_open-source_software#Benefits_over_proprietary_software" target="_blank">advantages</a> of OpenSource software (over closed-source projects) but and its value to the research community the significance of having freely-available, fast lattice reduction routines is manifold.<br />
<br />
To begin with, there is a discrepancy between what theory predicts and algorithmic performance in practice. Techniques described in the literature, summarized as BKZ2.0, leave a broad range of implementation choices. Different groups using different software and metrics where their approach is supreme, naturally lead to results that are hard to compare. If there was software that comes with meaningful defaults for many standard lattice tasks, is customizable, and extensible to individual lattice solutions, then there is hope that the community can agree on problem instances. Ideally, problem instances should cover deployed or model experimental cryptosystems such that they embody a meaningful benchmark for new designs.<br />
<br />
Originally, fplll was trying to provide such algorithms with reasonable speed. Recently, developers broadened theirs goals and try to fill gaps of cryptanalytic research. Concretely, now
fplll strives for speed from low level optimizations, and by implementing
diverse techniques from the literature hence catching up with the state of
the art. Additionally, it can be easily tweaked on a high algorithmic level with the Python layer fpylll, yet easily exploiting all the available optimized routines boosting the performance. One can argue that together with diverse lattice challanges this project helps to benchmark and compare various efforts to cryptanalyze cryptographic primitives used in cryptosystem's constructions.<br />
<br />
A couple of <a href="https://latticechallenge.org/" target="_blank">Lattice Challenges</a> have been proposed (<a href="https://www.latticechallenge.org/svp-challenge/" target="_blank">SVP</a>-, <a href="https://latticechallenge.org/ideallattice-challenge/index.php#" target="_blank">Ideal-</a>, <a href="https://latticechallenge.org/lwe_challenge/challenge.php" target="_blank">LWE-</a> and <a href="http://web.eecs.umich.edu/~cpeikert/rlwe-challenges/" target="_blank">Ring-</a>Challenges) and it seems that researchers also test their code on these instances, which aids a comparison of approaches.<br />
<br />
Having them conveniently accessible and high-level, fast lattice operations allows to quickly try out a new idea, or slightly different approach which saves time and hopefully makes researchers willing to share their tweaks and algorithmic tricks more often in the future.<br />
<br />
<h3>
The workshop </h3>
To come back to the start, the fplll-days are meant to be a hands-on, work-oriented workshop that enables direct discussions with core developers and with the goal to improve existing functions and the many algorithms involved. The general idea behind this meeting is to optimize often used routines, make it user-friendlier and accessible to cryptanalysts, for example.<br />
<br />
By using code profiling tools, performance and memory usage bottlenecks can be spotted in a first overview, which allows to direct efforts where they might lead to significant speed-ups. After discussing know issues and useful features, this workshop tries to provide an implementation of numerically stable algorithmic variants to push the dimension LLL can handle (like Givens rotations while resorting only to machine floating point type), sophisticated pruning strategies to speed-up enumeration, and implementing sieving algorithms --- all as a promising new direction in finding short vectors faster.<br />
<br />
It is exciting to join and shape such a project so let's hope for many interesting projects that got started and delegated here to be completed during this week and further interested researchers turning into active users joining the party and coming up with meaningful, reproducible research results. Remember that Newton "<span class="js-about-module-abstr">has seen further, by standing on the shoulders of giants"</span> thus achieving progress, and so you too are encouraged to become active, using an already established framework!Matthiashttp://www.blogger.com/profile/08311091211169735623noreply@blogger.comtag:blogger.com,1999:blog-5149574209636840277.post-20996143252547653542017-07-06T11:02:00.000+02:002017-07-06T11:02:10.691+02:00A Brief Survey of Physical Attacks on Cryptographic Hardware<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; text-align: justify;">
</div>
<div class="page" title="Page 1">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<div style="text-align: left;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">Previously, the topic of <a href="http://ecrypt-eu.blogspot.nl/2017/02/side-channel-analysis.html">side-channel attacks</a> (SCA) was covered on this blog. These attacks are very </span><span style="font-family: HelveticaNeue; font-size: 11pt;">popular, for they can be mounted using very cheap equipment and do not necessarily
require high level of expertise. Hence, SCA are widely accessible and present a common
danger. As a result, they are well researched, and various countermeasures have been
developed. Still, they are just a small part of the stack of physical attacks. Figure 1. crudely
depicts the this colorful “stack”. The one thing all physical attacks have in common is that it is assumed
that the attacker must gain physical access to the target device, and attain it for a certain
amount of time. In the remainder of this post, a brief survey of these attacks will be given. More
detailed descriptions will be provided in a series of posts that will follow. </span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span></div>
</div>
</div>
</div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSL_8W5DMU1YdFFB_BKGL0TfYqvETlcXJz52DX_asNfLCvE9hCbGAxj7ejJutx2Eyv_EDp52xhnIlSSJpdqRTEZinHM9uwoMZ0kCbdGXRWUtRFlPVCct0gCWHoL9oTzicq86CSqgSBXCM/s1600/stack-crop.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="410" data-original-width="1120" height="234" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjSL_8W5DMU1YdFFB_BKGL0TfYqvETlcXJz52DX_asNfLCvE9hCbGAxj7ejJutx2Eyv_EDp52xhnIlSSJpdqRTEZinHM9uwoMZ0kCbdGXRWUtRFlPVCct0gCWHoL9oTzicq86CSqgSBXCM/s640/stack-crop.png" width="640" /></a></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; text-align: justify;">
<span style="font-kerning: none;"><br /></span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 12px;">
<span style="font-kerning: none;"></span><br /></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; text-align: center;">
<span style="font-kerning: none;">Figure 1: Stack of Physical Attacks</span></div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; min-height: 12px; text-align: center;">
<span style="font-kerning: none;"></span><br /></div>
<div class="page" title="Page 1">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">Invasiveness</span><br />
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">The first segregation is based on the “invasiveness”. Invasive attacks entail breach of target’s
packaging, or its surrounding enclosure. This is often a very delicate process which often requires
expensive equipment and a high level of expertise. Since the breach is destructive by nature, it can be easily detected by subsequent users — if the chip itself was not destroyed in the
process that is. The goal of this breach is to gain access to internal state of a chip. Commonly attackers target on-chip busses or storage elements, which may contain sensitive intermediaries of
cryptographic computations or keys themselves. Aforementioned enclosures are a privilege of
expensive devices, often called Hardware Security Modules (HSMs). HSMs may cost tens of
thousands of Euros, and are envisioned to provide secure computational environments at high
speeds. Apart from restricting access to the chip using sturdy build and “tamper-proof” latches
and locks, enclosures are frequently equipped with seals and coatings that are supposed to
witness any foul play that may have taken place. Additionally, tamper detection measures may
be built in, envisioned to void all sensitive information at the first glimpse of attacker’s activities.
Hence, invading these juggernauts is commonly more expensive and time consuming.
Unfortunately, market-share of HSMs compared to bare smart-cards and RFIDs is neighboring
negligible, especially with the rise of the IoT.
</span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">On the contrary, non-invasive adversaries do not cause any structural damage to packaging
nor enclosures. They interact with the target device using its existing interfaces, and mediums
that require no mechanical interaction with the device. They are virtually free, but may require
significant expertise of attackers.
</span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span></div>
</div>
</div>
</div>
</div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; text-align: justify;">
</div>
<div class="page" title="Page 2">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">Activeness
</span><br />
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">The second segregation is based on the “activeness” of the attacker. Active attacks entail
induction of computational (logical) or structural changes in the target chip. When we talk
about computational changes, a very common example are Fault Injection (FI) attacks. There
are two phases to FI attacks: fault injection during the execution of the targeted algorithm, and
the analysis based on the observations of faulty outputs. A common method for altering
device’s execution is called clock glitching. Namely, by introducing a premature edge on the
clock signal, attacker violates devices’s critical path. As a result, incorrect values are captured
in device’s registers. Alternatively, faults can be induced by shooting a laser beam with enough
power to change the state of the device, while allowing it to remain operational. Here, any
data or control register fall under “state of the device”. For example, round counter, commonly
used in implementations of block ciphers, is a very favored target for such faults. Active attacks
may require higher level of technical skill, and a more sophisticated setup.
</span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">On the contrary, passive adversaries may only observe device’s execution, while interacting
through its predefined interfaces. Well-known SCA fall under this category. These attacks are
well researched, and can be mounted using very cheap equipment. Developed techniques
(e.g., Mutual Information Analysis) are extremely powerful, and once incorporated in the
attackers setup can be reproduced quite trivially. Consequently, although they entail only
limited exposure of the device, they pose a serious threat for they are very accessible even to
attackers with modest capabilities.
</span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span></div>
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">The Reality
</span><br />
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">Activeness and invasiveness are two orthogonal properties, resulting in a total of four
possibilities (although I find that the existence of “invasive and passive attacks” calls for a
philosophical debate). Unfortunately, situation is much more complex than that in practice.
Firstly, attackers are likely to use combined attacks. For example, FI + SCA may be a very powerful combination.
Additionally, the distinction mentioned above is not as binary. Rather, along each of the two
orthogonal axes there are many shades. For example, faults can be injected in some chips by
applying laser beams to their packaging (non-invasive), while others may be shielded from
such beams (hence they have to be attacked invasively).
</span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span></div>
<div style="text-align: justify;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">Consequently, there exists a myriad of possible attack variations. Moreover, even if we lock on
a certain extreme — let us say passive, non-invasive, CPA — quality of the measurement setup
plays a very significant role. A 500 Euro oscilloscope can hardly match its 30000 Euro
counterpart. In hindsight, there are no upper bounds to the power of a skilled invasive attacker
performing a battery of active and passive attacks, apart from the temporal and financial
constraints.
</span></div>
<br />
<div style="text-align: justify;">
<span style="font-family: HelveticaNeue; font-size: 14.666666984558105px;"><br /></span></div>
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><div style="text-align: justify;">
<span style="font-size: 11pt;">Taking all above into account, choosing a set of countermeasures is a di</span><span style="font-size: 11pt;">ffi</span><span style="font-size: 11pt;">cult task (let alone
implementing them properly). Bare in mind that these countermeasures are not for free. They
may significantly increase the price of devices, reducing the profit margins severely. Therefore,
there are no silver bullets in protection against physical attacks. In other words, in practice
security engineers work to demotivate attackers with high probability. They try to stop “the
attacker of interest”, rather then stopping all attacks. To achieve this, first step is identifying
potential attackers. This process is often called profiling, and in a nutshell I would describe it as
follows. Please note that this is a gross simplification of the problem, meant to depict the
general idea. No distinction is made between fixed (price of the setup) and recurring (every time
the attack is mounted) costs, nor between temporal and financial costs. Lastly, please note that
the </span><span style="font-size: 11pt; font-style: italic;">value </span><span style="font-size: 11pt;">of assets is heavily simplified as well, for the sake of avoiding a philosophical
discussion yet again.</span></div>
</span><br />
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"><br /></span>
</div>
</div>
</div>
</div>
<div style="-webkit-text-stroke-color: rgb(0, 0, 0); -webkit-text-stroke-width: initial; font-family: 'Helvetica Neue'; font-size: 11px; line-height: normal; text-align: justify;">
</div>
<div class="page" title="Page 3">
<div class="section" style="background-color: rgb(100.000000%, 100.000000%, 100.000000%);">
<div class="layoutArea">
<div class="column">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">Manufacturer’s Dilemma
</span><br />
<div style="text-align: justify;">
</div>
<div class="page" title="Page 3">
<div class="section">
<div class="layoutArea">
<div class="column">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">Assume that a device </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">D</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">, which costs </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">d</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">to manufacture, protects assets worth </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">x</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">, and features a
</span><br />
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">countermeasure </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">C </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">that costs </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">c</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-style: italic;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">to deploy. We may consider </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">D </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">to be secure against an attacker
</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">A, </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">who can mount a successful attack at a cost </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-style: italic;">a </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">(which includes </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">A</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">’s investment in
</span><br />
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">development of expertise), as long as</span><br />
<div style="text-align: center;">
<span style="font-family: STIXGeneral; font-size: 13pt; font-style: italic;">x</span><span style="font-size: 11pt;"> </span><span style="font-family: STIXGeneral; font-size: 13pt;">≤</span><span style="font-family: STIXGeneral; font-size: 13pt; font-style: italic;">a</span><span style="font-family: STIXGeneral; font-size: 13pt;">+</span><span style="font-family: STIXGeneral; font-size: 13pt; font-style: italic;">μ</span><span style="font-family: STIXGeneral; font-size: 9pt; font-style: italic; vertical-align: -3pt;">A</span><span style="font-size: 11pt;">,</span></div>
<br />
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">μ</span><span style="font-family: 'STIXGeneral'; font-size: 9.000000pt; font-style: italic; vertical-align: -3.000000pt;">A </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">being the attackers profit margin. In other words, if the cost </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">a</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">is high enough attacker can
not obtain desired amount of profit for given assets. On the other hand a manufacturer </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">M </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">that
produces </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">D </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">wants to sell for a price </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">m </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">such that
</span><br />
<div style="text-align: center;">
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">m </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt;">≥ </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">d </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt;">+ </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">c </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt;">+ </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">μ</span><span style="font-family: 'STIXGeneral'; font-size: 9.000000pt; font-style: italic; vertical-align: -3.000000pt;">M</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">,
</span></div>
<span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">μ</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'STIXGeneral'; font-size: 9.000000pt; font-style: italic; vertical-align: -3.000000pt;">M </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">being </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">M</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">’s profit margin. In other words, price of deploying countermeasures </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">c</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">directly cuts
into manufacturer’s profits. Looking at these inequalities, it seems that there is no dilemma at
all. Nevertheless, cost of attack depends on the selection of a countermeasure, i.e.,
</span><br />
<div style="text-align: center;">
<span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">a</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt;">=</span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">f</span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt;">(</span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">c</span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt;">)</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">.
</span></div>
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">Assuming that an increase in </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">c</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">leads to the increase in </span><span style="font-family: 'STIXGeneral'; font-size: 13.000000pt; font-style: italic;">a</span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;"> </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">, by applying some high school math
(readers are welcome to play with it), we see that the selection of </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-weight: 700;">C </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">must be performed based
on the value of </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt; font-style: italic;">assets </span><span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">it protects. A more detailed discussion on this topic will be given in one
of the following posts.
</span><br />
<span style="font-family: 'HelveticaNeue'; font-size: 11.000000pt;">In conclusion, physical attacks are a great threat. As IoT progresses, and the amount of
ubiquitous devices increases their potential impact may only grow. Deploying devices that
protect assets against physical attacks is a complex problem, which demands bespoke
solutions, tailored to individual use cases. </span><br />
</div>
</div>
</div>
</div>
</div>
</div>
<div class="layoutArea">
<div class="column">
</div>
</div>
</div>
</div>
Danilo Šijačićhttp://www.blogger.com/profile/05665001947121313797noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-67023248409255936932017-06-23T12:56:00.001+02:002017-06-23T13:21:54.983+02:00Algorithm and Key Size Document<div dir="ltr" style="text-align: left;" trbidi="on">
<br /><div style="text-align: justify;" wrap="">
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvAUU_c6YDv30uZclY2qz77NgEcFBwjFh7F1ObzT2w9l6YwohT13MEpSN2LKRVKg7Drg9NIkvsmGvWLGWlHmXw-Hkr88VbHKMd62NsGo1OcvC0TPwKNgrfLRV2efeS854TgA4Ks7NT-ok/s1600/Kitchener.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="289" data-original-width="560" height="165" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvAUU_c6YDv30uZclY2qz77NgEcFBwjFh7F1ObzT2w9l6YwohT13MEpSN2LKRVKg7Drg9NIkvsmGvWLGWlHmXw-Hkr88VbHKMd62NsGo1OcvC0TPwKNgrfLRV2efeS854TgA4Ks7NT-ok/s320/Kitchener.jpeg" width="320" /></a></div>
</div>
<div style="text-align: justify;" wrap="">
<div style="text-align: justify;" wrap="">
The
ECRYPT Algorithm and Key Size document is probably <b>the most high impact
output </b>from our ECRYPT projects. It is referenced and used throughout
the world, to guide the uses of cryptography in practice. The current
version of the document can be found here</div>
<div style="text-align: justify;" wrap="">
<br /></div>
<div style="text-align: center;" wrap="">
<a href="http://www.ecrypt.eu.org/csa/documents/D5.2-AlgKeySizeProt-1.0.pdf">Algorithms, Key Size and Protocols Report (2016)</a></div>
<div style="text-align: justify;" wrap="">
<br /></div>
We are requesting input for the next edition of this document. To do
this we have created a <a href="https://slack.com/">Slack</a> channel where people can debate
inputs.<br />
<br /></div>
<div style="text-align: justify;" wrap="">
We encourage everyone to get involved by sending
us your email so we can add you. Once added you can
add other people to the channel as you see fit. Please email <a href="mailto:nigel@cs.bris.ac.uk">Nigel Smart</a> or <a href="mailto:saartje.verheyen@kuleuven.be">Saartje Verheyen</a> to be added if you do not know someone who is already involved.</div>
<div style="text-align: justify;" wrap="">
<br /></div>
<div style="text-align: justify;" wrap="">
We ask you to<b> add comments to the slack channel of
corrections and new text to add</b> (including where you
think it should go). After you have presented some input, other people can then comment on
your text, and add further corrections. </div>
<div style="text-align: justify;" wrap="">
<br /></div>
<div style="text-align: justify;" wrap="">
At the end of September we will freeze the discussion and start the process of incorporating all the suggestions into the final document. </div>
<div style="text-align: justify;" wrap="">
<br /></div>
<div style="text-align: justify;" wrap="">
If you have contributed
to the Slack discussion in a positive manner we will<b>
include you as an author on the final
document as a contributor</b>. That way you get to claim
you have contributed to a high impact document
(carrot); if you do not contribute however then you
cannot complain if we say something you disagree
with (stick). </div>
<div style="text-align: justify;" wrap="">
<br /></div>
<div style="text-align: justify;" wrap="">
Of course in the end it is a community
effort, and in case of disagreement the editors
will need to take one side or another.
</div>
</div>
Nigel Smarthttp://www.blogger.com/profile/17681184541012804026noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-29683496602243385942017-06-16T15:22:00.001+02:002017-06-17T11:35:03.280+02:00Boomerang AttacksIn cryptography, a boomerang attack is a method of cryptanalysis that is based on differential cryptanalysis.<br />
<br />
Boomerang attacks were first introduced by <a href="https://link.springer.com/chapter/10.1007/3-540-48519-8_12">Wagner</a> and allow an adversary to concatenate two high probability differential trails to attack a cipher. This is especially useful if there is a lack of long differentials with sufficient probabilities. The adversary can therefore decompose the encryption function $F$ in two subciphers $f$ and $g$ such that $F = f \circ g$. Then the adversary can search for high probability trails $\Delta \rightarrow \Delta^*$ with probability $p$ for $f$ and $\nabla \rightarrow \nabla^*$ with probability $q$ for $g$. The differential trails can then be combined in a chosen plaintext/adaptive chosen ciphertext attack to mount a boomerang distinguisher and then a key recovery attack based on this distinguisher to recover the secret key.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgc2ZSTjB0D7pgb3uk15WtxWouzzSttI0bNDcXIejw3T8A6HIFOHGJkq__Y8mEQVH6XPCoseZvjrNPVgpgQ_GjUaZTuekhMWOSy2uQu1NglvbKFwcg2M2PsacR0EVN023A7ncDNlcquc7E/s1600/Screenshot+from+2017-06-16+14-45-54.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="865" data-original-width="694" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgc2ZSTjB0D7pgb3uk15WtxWouzzSttI0bNDcXIejw3T8A6HIFOHGJkq__Y8mEQVH6XPCoseZvjrNPVgpgQ_GjUaZTuekhMWOSy2uQu1NglvbKFwcg2M2PsacR0EVN023A7ncDNlcquc7E/s400/Screenshot+from+2017-06-16+14-45-54.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">A boomerang distinguisher</td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<br />
The basic attack works as follows:<br />
<br />
<ol>
<li>The adversary chooses a random plaintext $X_1$ and calculates $X_2 = X_1 \oplus \Delta$.</li>
<li>The adversary requests the ciphertexts for $X_1$ and $X_2$ from an encryption oracle which are $Y_1 = F(X_1)$ and $Y_2 = F(X_2)$</li>
<li>Calculate ciphertexts $Y_3 = Y_1 \oplus \nabla$ and $Y_4 = Y_2 \oplus \nabla$.</li>
<li>Request the decryptions of $Y_3$ and $Y_4$ to obtain $X_3 = F^{-1}(Y_3)$ and $X_4 = F^{-1}(Y_4)$.</li>
<li>If the difference between $X_3$ and $X_4$ is the same as between $X_1$ and $X_2$, namely $\Delta$ we obtain a correct quartett $(X_1, X_2, X_3, X_4)$.</li>
</ol>
<br />
Calculating a correct quartet requires an attacker to consider both plaintext pairs $(X_1, X_2)$ and $(X_3, X_4)$ and results in a total probability of (pq)^2.<br />
For an attack to succeed, for the probability of the boomerang distinguisher it must hold that $(pq) > 2^{n/2}$. For N plaintext pairs, an adversary expects about $N\cdot(pq)^2$ correct quartets in an attack, while there are only $N\cdot2^{-n}$ (where n is the blocksize) correct quartets for an ideal primitive. <br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />Ralph Ankelehttp://www.blogger.com/profile/12643681816558972378noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-10892583013546142982017-04-27T14:34:00.001+02:002017-04-28T10:48:53.798+02:00Yet Another "Ni Hao (Hello)" Hello, everyone. I am <a href="http://junwei.co/">Junwei Wang</a> from China. It’s pleasant to be an <a href="http://www.ecrypt.eu.org/net/">ECRYPT-NET</a> fellow and to receive funding from the European Union’s <a href="https://ec.europa.eu/programmes/horizon2020/">Horizon 2020</a> research and innovation programme under the Marie Skłodowska-Curie actions (MSCA).<br />
<br />
I received my bachelor degree and master degree from <a href="http://en.sdu.edu.cn/">Shandong University</a> in 2012 and the <a href="http://wwwen.uni.lu/">University of Luxembourg</a> in 2015, respectively. My master thesis is about the practical implementation of countermeasures resistant again high-order differential power analysis (DPA). Presently, I am doing my Ph.D. student at <a href="https://www.cryptoexperts.com/">CryptoExperts</a> under the supervisor of <a href="http://www.jscoron.fr/">Jean-Sébastien CORON</a>, <a href="http://www.math.univ-paris13.fr/~mesnager/index-en.html">Sihem MESNAGER</a>, <a href="http://dblp.uni-trier.de/pers/hd/p/Paillier:Pascal">Pascal PAILLIER</a>, and <a href="http://www.matthieurivain.com/">Matthieu RIVAIN</a>. My research interest is <a href="http://www.whiteboxcrypto.com/">white-box cryptography</a>.<br />
<br />
The advent of content-protected applications and mobile payment systems in recent years motivates the exploration of a solution to protect sensitive information from being extracted under the white-box context in which the attacker has the power of control the whole process of the execution of software. Though some theoretical evidence reveals the fact that no generic obfuscator can be constructed to shield algorithms to be attacked, and all existed practical schemes has been broken, it is still interesting to investigate new schemes which can withstand all the existing attacks, and to propose and analyse a new attacking method which is more generic than the existing ones. I hope somebody has similar interest with me, then we can deeply explore this area together.<br />
<br />
By the way, I am looking forward to meeting with you in Paris during <a href="https://eurocrypt2017.di.ens.fr/">EUROCRYPT</a>.<br />
<br />Anonymousnoreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-20304007102551782052017-04-17T19:47:00.002+02:002017-04-17T19:47:25.366+02:00What does privacy mean to you? A 5-day challengeI recently came across <a href="https://project.wnyc.org/privacy-paradox/">The Privacy Paradox</a> project, a call by WNYC Studios to take part in a five-day-long challenge on defining what does privacy mean to yourself.<br />
<br />
Amongst its best features, the process can be started anytime, by anyone, just by registering with an e-mail account. From that moment and during the five following days, you will receive a daily e-mail explaining you the topic addressed by that day's podcast, most of them finishing with an easy demand to the listener to reflect on the matter. <br />
<br />
With the audio's length ranging from 11 to 15 minutes, and the topics from cryptography --with Bruce Schneier starring on the first episode-- to psychology, there is something on it for everyone, and at a very accessible level. I personally found the format very nice and entertaining, and it is maybe something to learn about when we talk about privacy with people not so interested in the topic.<br />
<br />
Finally, the people behind the project also gathered a nice <a href="http://www.wnyc.org/story/privacy-paradox-tip-sheet/">list with things you can do</a> to face the current privacy situation.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://project.wnyc.org/privacy-paradox/images/facebook.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="168" src="https://project.wnyc.org/privacy-paradox/images/facebook.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><a href="https://project.wnyc.org/privacy-paradox/">https://project.wnyc.org/privacy-paradox/</a></td></tr>
</tbody></table>
<br />Eduardo Soria-Vázquezhttp://www.blogger.com/profile/16749852289271603693noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-3519034631039145542017-04-11T22:50:00.004+02:002017-04-18T15:31:28.090+02:00Robust encryption for symmetric primitives<div dir="ltr" style="text-align: left;" trbidi="on">
Robustness for encryption schemes has been introduced in the context of public-key encryption in the <a href="http://www.iacr.org/archive/tcc2010/59780477/59780477.pdf">work</a> of Abdalla, Bellare and Neven.
In a nutshell, it states that a ciphertext cannot be decrypted under two different keys.
Later, their work has been <a href="https://eprint.iacr.org/2012/673.pdf">extended,</a> by taking into account the cases where the keys are adversarially generated.
<br />
The <a href="https://eprint.iacr.org/2017/288.pdf">work</a> of Farshim, Orlandi and Roşie studies the robustness in the context of symmetric primitives
under the <i>incorrect</i> usage of keys.
Roughly speaking, a <i>key-robust</i> scheme does not output ciphertexts/tags
that are valid with respect to distinct keys. Key-robustness is a
notion that is often tacitly expected/assumed in protocol design
--- as is the case with anonymous auction, oblivious transfer, or public-key
encryption.
<br />
<br />
To motivate the new notion, "consider the following protocol, for constructing a ${3 \choose 2}$-OT protocol using only ${3 \choose 1}$-OTs: the sender picks $3$ random keys $k_1,k_2,k_3$ and inputs the message $x_1=(k_2,k_3),x_2=(k_1,k_3)$ and $x_3=(k_1,k_2)$ to the OT. At the same time, the sender sends encryptions of his messages under these keys, i.e., sends $c_i=E(k_i,m_i)$ for $i=1..3$. Now the receiver inputs the index of the message <i>he does not want to learn</i> to the ${3 \choose 1}$-OT and learns all keys except $k_i$. Intuitively the fact that the messages are sent only once (encrypted) should guarantee that the sender's choice of messages is uniquely defined. However, consider the following attack: the corrupt sender inputs $x^*_1=(k_2, k^*)$ (instead of $x_1$) such that $D(k^*,c_3)=m^*_3$ with $m^*_3\neq m_3$ and $m^*_3\neq \bot$. This means that the receiver will see two different versions of $m_3$ depending on whether the receiver asked for the pair $(2,3)$ or $(1,3)$. (This attack is an example of <i>input-dependence</i> and is a clear breach of security since it cannot be simulated in the ideal world)."
<br />
In terms of the results, the new <a href="https://eprint.iacr.org/2017/288.pdf">work</a> considers "both notions where the adversary has control over the keys and notions where the keys are generated honestly. The strongest notion that is formulated is called <i>complete robustness</i> and allows an adversary to generate the keys used in the system. The work shows that whether the adversary is in control of the keys or not makes a significant difference, by giving separations between the notions. While previous work in the public-key setting also had to deal with adversarially generated keys that were also invalid, this is not an issue in the setting, since in the symmetric world keys are often bit-strings of some
pre-specified length and can be easily checked for validity. By focusing on
correctly formed keys it can can shown equivalence between <i>complete robustness</i> and a syntactically simpler notion, which we call <i>full robustness</i>."
Then, it is shown that full robustness composes well: any fully robust symmetric encryption when
combined with a fully robust MAC results in a fully robust AE scheme. Analogous composition results also hold for MAC-then-Encrypt and Encrypt-and-MAC.
<br />
<br />
One of the most interesting question is if "feasibility results for robustness in the public-key setting can be translated
to the
symmetric setting. This turns out not to be the case. The main reason for this is that in the asymmetric setting the public key can be used as a mechanism to commit to its associated secret
key. In the symmetric case, on the other hand, there is no such public information. It might be tempting to think that one can just commit to the secret key and append it to the ciphertext. Unfortunately this approach cannot be proven secure due to a circular <i>key-dependency </i>between the encryption and the commitment components. To give a provably secure construction, with the authors constructing appropriate commitments that can be used in this setting. This requires a right-injective PRG, that can be in turn based on one-way permutations. This
result relies on the one-time security of the MAC and its collision-resistance, which once again is based on right-injective PRGs."
<br />
<br /></div>
Razvanhttp://www.blogger.com/profile/07352518857479000506noreply@blogger.com2tag:blogger.com,1999:blog-5149574209636840277.post-26435167618248966062017-04-03T14:00:00.001+02:002017-04-03T14:07:03.534+02:00Three everyday git scenarios<p>You may remember <a href="https://ecrypt-eu.blogspot.co.uk/2016/02/git-basics-for-cryptographers.html">Ralph's post</a> on git basics from a little over a year ago. In this post, I'll share three things I've learned are possible (and practically painless) to do with git that go beyond the basic add, merge, push, and pull. Everything in this post is done from the command line.</p>
<h2>Scenario 1</h2>
<p>You're working on a project with others. These hardworking colleagues of yours just pushed a bunch of commits and you want to see exactly what they changed since your last commit.</p>
<h3>What to do</h3>
<p>Use <code>git diff</code>. Besides using it to show unstaged changes, you can also use <code>git diff</code> to show changes between any two commits. Suppose that after your last commit, your colleagues made 4 commits. To see the changes, use <code>git diff HEAD~4 HEAD</code> or <code>git diff HEAD^^^^ HEAD</code>. If you don't want to count how many commits there were, you can look up the abbreviated commit IDs (SHA-1 hashes in hex) using <code>git log --oneline</code>. Then, you could use something like <code>git diff 54180a8 129ec44</code>.</p>
<p>There's a lot more to <code>git diff</code>: you can use it to compare specific files, you can show differences at the word level rather than at the line level, etc. If you want to learn more about it, see its page on the <a href="https://git-scm.com/docs/git-diff">git website</a>. If you're working on LaTeX files, <a href="https://gitlab.com/git-latexdiff/git-latexdiff">git-latexdiff</a> is convenient: given any two commits, it will produce a PDF file showing the changes, with removed text striked through and in red, and added text underlined and in blue.</p>
<h2>Scenario 2</h2>
<p>You were in the zone, hacking away at multiple issues and successfully completing all of them. Well done, you! Wait—each issue should have its own commit...</p>
<h3>What to do</h3>
<p>Use interactive staging: use <code>git add -p</code> or <code>git add --patch</code> to add the file or files. Git will look at each file you want to add and split the changes into hunks. For each hunk, it will show you the diff and ask you what to do with it:</p>
<p><code>Stage this hunk [y,n,q,a,d,/,s,e,?]?</code></p>
<p>Here is the full list of options. They're pretty easy to remember, especially if you're a vi user who is used to navigating with HJKL.</p>
<pre>y - stage this hunk
n - do not stage this hunk
q - do not stage this hunk or any of the remaining ones
a - stage this hunk and all later hunks in the file
d - do not stage this hunk or any of the later hunks in the file
g - select a hunk to go to
/ - search for a hunk matching the given regex
j - leave this hunk undecided, see next undecided hunk
J - leave this hunk undecided, see next hunk
k - leave this hunk undecided, see previous undecided hunk
K - leave this hunk undecided, see previous hunk
s - split the current hunk into smaller hunks
e - manually edit the current hunk
? - print help</pre>
<p><code>git add -p</code> is a powerful command that helps you keep your commits reasonably sized. It does require some care, though, since each individual commit should be consistent.</p>
<h2>Scenario 3</h2>
<p>You have to stop working, but you haven't finished fixing the issue you're working on. </p>
<h3>What to do</h3>
<p>You should do <i>something</i> because you don't want to have a dirty working directory.</p>
<p><b>Option 1:</b> commit now, then use <code>git commit --amend</code> (introduced in <a href="https://ecrypt-eu.blogspot.co.uk/2016/02/git-basics-for-cryptographers.html">Ralph's post</a>) once you've finished what you were working on. <code>git commit --amend</code> is useful for a bunch of other things, like adding files you forgot to stage to a commit and fixing typos in a commit message.</p>
<p>Commits are local, so what should you do if you're worried about hardware failure? If you're working on a personal project, it may be acceptable to push this commit and later push the amended commit. You'll need the <code>-f</code> (<code>--force</code>) flag for the latter push. If you're working on a project with others, however, it would be bad practice to amend a commit that you've already pushed. Instead, you could create a temporary personal branch, commit your changes to this branch, and push it to the remote repository. Then, you could push the amended commit to this branch without worrying about rewriting anyone else's history and merge it with the main branch when you've finished fixing the issue.</p>
<p><b>Option 2:</b> shelve your changes with <code>git stash</code>, which will sweep away both staged and unstaged changes, leaving you with a clean working directory. You can stash changes more than once. To see all stashes, use <code>git stash list</code>. To re-apply the most recent stash you made, use <code>git stash pop</code>. By default, <code>git stash</code> excludes new files (that haven't yet been staged) and ignored files. <code>git stash</code> is also useful when you want to see upstream changes made by someone else, but aren't ready to commit your work.</p>
<p>There's much more you can do with stashes: apply one stash to multiple branches, delete stashes you no longer need, stash parts of a file, etc. Stashes are always local; they can never be pushed to a remote repository. <a href="https://www.atlassian.com/git/tutorials/git-stash">Atlassian</a> has a good, detailed tutorial on <code>git stash</code>.</p>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr>
<td style="text-align: center;"><a href="https://github.com/thepracticaldev/orly-full-res"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvLgRMFcyCN39O12DTo3nIqecIj4_duYkrutGPohyphenhyphenITKqPArGXtZGZIfoNAuh-N69x8vEb0O5b6Hhuh2U-SHx08p-SXadNqScxzNdrMfDYFFVzcPKQpcZTPeXHU6jQkR2900oH-HR2Mk_O/s400/uninformativegitcommit-big.png" style="margin-left: auto; margin-right: auto;" width="244" /></a></td>
<td style="text-align: center;"><a href="https://github.com/thepracticaldev/orly-full-res"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhYLoEmJnB3OBRJQe-t5JMYGRtud6pngN0gtDt3fX2lGEcNL6IhcVaOVInExZhBsoqYYue9VHSsl6Vz73y4t8IcFK92EjKkjhVea7ynwtgxD78NskQIEKYyq57Ue1wdPVb2TKumuR5VDsId/s320/memorizingsixgitcommands-big.png" style="margin-left: auto; margin-right: auto;" width="244" /></a></td>
</tr>
<tr><td colspan="2">Some books you may (or may not) find useful.</td></tr>
</tbody></table>Marie-Sarahhttp://www.blogger.com/profile/09170706020176267783noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-43747697184675889782017-03-21T13:19:00.002+01:002017-03-22T10:59:27.996+01:00Learn You a GPU For Great Good! (Part 1?)<i>Side note: I stole the title from the most famous, most awesome Haskell book I know.</i>
<br />
<i><br /></i>
If you are reading this blog you are most likely interested in cryptography.
Today I want to convince you that GPUs are also, well, pretty awesome. I have
personally done a few crypto-related projects using GPUs and this post is my
attempt at crystallizing the knowledge and experience I built up during that time.
<br />
<br />
The purpose of this post is to provide a simple, meaningful introduction to
developing GPU-accelerated programs. We will discuss setup, the two primary
frameworks, basic code examples and development workflow as well as some
optimization tips. In the end, I want to show that developing this type of
application is not hard at all. If the post is successful I may do a follow-up
with a few more detailed and tricky examples. Throughout this post I will assume
you are familiar with basic C and/or C++, as the code examples will be in that
language. I will not focus too much on develop complicated kernels or how to
exploit multi-dimensional parallelism, I will leave that for a later post.
Instead, I will focus on a few things that may help you in making the firsts
steps towards GPU programming easier, as well as a few things that may help it
scale a bit better.
<br />
<br />
<div class="outline-2" id="outline-container-orga0472ee">
<h2 id="orga0472ee">
The Why & When</h2>
<div class="outline-text-2" id="text-orga0472ee">
<br />
GPU programming was originally designed for, and should be used for, large-scale
parallel computation problems. The more parallelism you can utilize, the better
GPUs will fit your problem. The most simple example is probably when you loop
over a very large collection of elements, performing on each a simple operation
independently.
<br />
<br />
For large-scale parallel computation problems I tend to think of three different
architectural setups that you can use (they also mix). The simplest is utilizing
multi-core CPUs (possibly over many machines). This has the shortest development
time due to its familiarity and easy-of-use and is suitable to many
applications. CPUs are of course trivially available. On the other end of the
spectrum is the development of custom hardware clusters, utilizing many FPGAs or
even ASICs. Development time is fairly long, even for experienced hardware
designers; the upside is that this very likely gives you optimal performance.
<br />
<br />
GPUs fall somewhere in the middle. Development time is very close to that for
CPUs; the primary constraint is availability. It is simply easier to get access
to CPU clusters. However, these days you can also rent all the GPU power you
need from Amazon EC2 instances, as was done for the <a href="https://shattered.io/">recent SHA1 collision</a>. If
you solve the availability issue, you can get a lot of bang out of your buck
performance-wise.
</div>
</div>
<div class="outline-2" id="outline-container-org54f4536">
<h2 id="org54f4536">
</h2>
<h2 id="org54f4536">
</h2>
<h2 id="org54f4536">
</h2>
<h2 id="org54f4536">
The How</h2>
<div class="outline-text-2" id="text-org54f4536">
<br />
First, you need to get your hands on a machine with a GPU, preferably a remote
machine or otherwise a machine with more than one GPU. The reason is that if
your GPU is also driving your desktop environment, programming errors may cause
your computer to hang or crash. It also allows you to more easily run
long-lasting kernels as well as giving you more reliable performance.
</div>
<div class="outline-3" id="outline-container-org92e3ea0">
<h3 id="org92e3ea0">
</h3>
<h3 id="org92e3ea0">
</h3>
<h3 id="org92e3ea0">
</h3>
<h3 id="org92e3ea0">
CUDA vs OpenCL</h3>
<div class="outline-text-3" id="text-org92e3ea0">
<br />
Assuming you have a GPU in your system, your next choice is between CUDA and
OpenCL, two programming environments for GPU programming. If you do not plan to
use an NVIDIA GPU you are stuck with OpenCL, whereas you otherwise have the
choice of using CUDA.
<br />
Having used both for different projects at different times, I can say that both
are perfectly usable and that the differences are mostly superficial. OpenCL is
more portable and integrates easier into existing projects; CUDA has the
superior tool-chain.
<br />
<br />
The examples in this post will be for CUDA, as it typically involves less
boilerplate. Also, we will use the more basic CUDA C++ implementation, as it
provides a better basis for understanding than special-purpose
libraries. This is particularly relevant if you want to computations that are
not a native part of these libraries, which is definitely true if you want to,
for instance, compute CPA-like correlations in parallel.
</div>
</div>
<div class="outline-3" id="outline-container-org7113637">
<h3 id="org7113637">
</h3>
<h3 id="org7113637">
</h3>
<h3 id="org7113637">
</h3>
<h3 id="org7113637">
Hello World</h3>
<div class="outline-text-3" id="text-org7113637">
<br />
I am not one to break tradition and thus we start the "Hello world" of classic
parallel programming, namely SAXPY. Or, more formally, given input vectors
\(\textbf{x}, \textbf{y}\) of length \(n\) and a scalar \(a\), compute the output
vector \(\textbf{z}\) where \(\textbf{z} = a\textbf{x} + \textbf{y}\).
<br />
First let us consider the basic C implementation of this function, where \(z =
y\), i.e., we update \(y\) using the scalar \(a\) and a vector \(x\).
<br />
<br />
<div class="org-src-container">
<pre class="src src-c"><span class="linenr"> 1: </span><span style="color: #ba2f59; font-weight: bold;">void</span> <span style="color: #6c3163; font-weight: bold;">saxpy</span><span style="color: #3a81c3;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> <span style="color: #715ab1;">a</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> * __restrict__ <span style="color: #715ab1;">x</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> * __restrict__ <span style="color: #715ab1;">y</span><span style="color: #3a81c3;">)</span> <span style="color: #3a81c3;">{</span>
<span class="linenr"> 2: </span> <span style="color: #3a81c3; font-weight: bold;">for</span> <span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">i</span> = <span style="color: #4e3163;">0</span>; i < n; ++i<span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr"> 3: </span> y<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span> = a*x<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span> + y<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span>;
<span class="linenr"> 4: </span> <span style="color: #6c3163;">}</span>
<span class="linenr"> 5: </span><span style="color: #3a81c3;">}</span>
<span class="linenr"> 6: </span>
<span class="linenr"> 7: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">...</span>
<span class="linenr"> 8: </span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span> = <span style="color: #4e3163;">1</span> << <span style="color: #4e3163;">20</span>;
<span class="linenr"> 9: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">allocate vectors x,y of n elements each.</span>
<span class="linenr">10: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">...</span>
<span class="linenr">11: </span>
<span class="linenr">12: </span>saxpy<span style="color: #3a81c3;">(</span>n, <span style="color: #4e3163;">3.14</span>, x, y<span style="color: #3a81c3;">)</span>;
</pre>
<pre class="src src-c"></pre>
</div>
<br />
Nothing too special going on here. We simply iterate over every element and
perform our update with the scalar \(a=3.14\). Note the use of the __<code>restrict__</code>
keyword to indicate that <code>x</code> and <code>y</code> point to <i>different</i> objects in memory.
Just giving the compiler a helping hand, which is generally a useful thing to
do. Anything that makes it behave less like a random function, I say.
<br />
<br />
Conversion to CUDA is straightforward. In GPU programming you are always
defining what a single parallel unit of computation is doing, this is called a
<i>kernel</i>. When programming such a kernel, you are computing from the point of
view of the thread. Before delving in too deep, let us see what the
CUDA-equivalent code looks like.
<br />
<br />
<div class="org-src-container">
<pre class="src src-c++"><span class="linenr"> 1: </span>__global__
<span class="linenr"> 2: </span><span style="color: #ba2f59; font-weight: bold;">void</span> <span style="color: #6c3163; font-weight: bold;">saxpy</span><span style="color: #3a81c3;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> <span style="color: #715ab1;">a</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> * __<span style="color: #715ab1;">restrict__</span> x, <span style="color: #ba2f59; font-weight: bold;">float</span> * __<span style="color: #715ab1;">restrict__</span> y<span style="color: #3a81c3;">)</span> <span style="color: #3a81c3;">{</span>
<span class="linenr"> 3: </span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">i</span> = blockIdx.x*blockDim.x + threadIdx.x;
<span class="linenr"> 4: </span> <span style="color: #3a81c3; font-weight: bold;">if</span> <span style="color: #6c3163;">(</span>i < n<span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr"> 5: </span> y<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span> = a*x<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span> + y<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span>;
<span class="linenr"> 6: </span> <span style="color: #6c3163;">}</span>
<span class="linenr"> 7: </span><span style="color: #3a81c3;">}</span>
<span class="linenr"> 8: </span>
<span class="linenr"> 9: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">...</span>
<span class="linenr">10: </span><span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span> = <span style="color: #4e3163;">1</span><<<span style="color: #4e3163;">20</span>;
<span class="linenr">11: </span>
<span class="linenr">12: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">allocate and initialize host-side buffers x,y</span>
<span class="linenr">13: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">...</span>
<span class="linenr">14: </span>
<span class="linenr">15: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">allocate device-side buffers x,y</span>
<span class="linenr">16: </span>cudaMalloc<span style="color: #3a81c3;">(</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">void</span> **<span style="color: #6c3163;">)</span>&d_x, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #6c3163;">)</span> * n<span style="color: #3a81c3;">)</span>;
<span class="linenr">17: </span>cudaMalloc<span style="color: #3a81c3;">(</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">void</span> **<span style="color: #6c3163;">)</span>&d_y, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #6c3163;">)</span> * n<span style="color: #3a81c3;">)</span>;
<span class="linenr">18: </span>
<span class="linenr">19: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">copy host-side buffers to the device</span>
<span class="linenr">20: </span>cudaMemcpy<span style="color: #3a81c3;">(</span>d_x, x, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #6c3163;">)</span> * n, cudaMemcpyHostToDevice<span style="color: #3a81c3;">)</span>;
<span class="linenr">21: </span>cudaMemcpy<span style="color: #3a81c3;">(</span>d_y, y, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #6c3163;">)</span> * n, cudaMemcpyHostToDevice<span style="color: #3a81c3;">)</span>;
<span class="linenr">22: </span>
<span class="linenr">23: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">compute the saxpy</span>
<span class="linenr">24: </span><span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">threads_per_block</span> = <span style="color: #4e3163;">256</span>;
<span class="linenr">25: </span><span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">number_of_blocks</span> = n / threads_per_block;
<span class="linenr">26: </span>saxpy<<<span style="color: #3a81c3;"><</span>number_of_blocks, threads_per_block<span style="color: #3a81c3;">></span>>><span style="color: #3a81c3;">(</span>n, <span style="color: #4e3163;">3.14</span>, d_x, d_y<span style="color: #3a81c3;">)</span>;
<span class="linenr">27: </span>
<span class="linenr">28: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">copy the output buffer from the device to the host</span>
<span class="linenr">29: </span>cudaMemcpy<span style="color: #3a81c3;">(</span>y, d_y, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #6c3163;">)</span> * n, cudaMemcpyDeviceToHost<span style="color: #3a81c3;">)</span>;
<span class="linenr">30: </span>
<span class="linenr">31: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">free the device buffers</span>
<span class="linenr">32: </span>cudaFree<span style="color: #3a81c3;">(</span>d_x<span style="color: #3a81c3;">)</span>;
<span class="linenr">33: </span>cudaFree<span style="color: #3a81c3;">(</span>d_y<span style="color: #3a81c3;">)</span>;
<span class="linenr">34: </span>
<span class="linenr">35: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">clean up the x, y host buffers</span>
<span class="linenr">36: </span><span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">...</span>
</pre>
</div>
<br />
Let us consider the kernel first, denoted by the simple fact of the function
definition starting with <code>__global__</code>. The parameters to the function are the
same as before, nothing special there. Line <code>3</code> is a key first step in any
kernel: we need to figure out the correct offset into our buffers <code>x and y</code>.
To understand this, we need to understand CUDA's notion of threads and blocks
(or work groups and work items in OpenCL).
<br />
<blockquote>
<b>The Grid</b>
<br />
The CUDA threading model is fairly straightforward to imagine. A thread essentially
computes a single instance of a kernel. These threads form groups called <i>blocks</i> that have
somewhat-more-efficient inter-thread communication primitives. The blocks
together form what is known as the <i>grid</i>. The grid can have up to three
dimensions, i.e., the blocks can be ordered into \((x,y, z)\) coordinates. The same
goes for threads inside a block, they can be addressed with \((x, y, z)\)
coordinates as well.</blockquote>
<blockquote>
Mostly though, I have tended to stick to 1-dimensional grids. This is simply
dividing a vector of \(n\) elements into $n/m$-sized sequential blocks (even
better if \(n\) is a multiple of \(m\)). </blockquote>
<blockquote>
A quick note about <i>warps</i> (or <i>wavefronts</i> in OpenCL), which is a related
concept. A <i>warp</i> is a unit of scheduling, it determines the amount of threads
that actually execute in lockstep. It is good practice to have your
<i>block size</i> as a multiple of the <i>warp size</i> but other than that you should not
worry overly much about warps.
</blockquote>
In this case we find our thread by multiplying the block <code>id</code> with the size of
block and then adding the offset of the thread <i>within the block</i>. The rest of
the kernel is straightforward, we simply perform the same computation as in the
original code but we omit the <code>for</code>-loop. The conditional at line <code>4</code> makes sure
we do not write outside the bounds of our vector, though that should not happen
if we choose our grid carefully.
<br />
<br />
The rest of the code is the standard boilerplate that you will find in most CUDA
programs. A key notion is that there is a distinction between buffers allocated
on the <i>device</i> (the GPU) and buffers allocated on the <i>host</i>.
Note that on line <code>26</code> we schedule the kernel for execution. The first two
weird-looking parameters (within angle brackets) are the number of blocks and
the block size respectively.
</div>
</div>
<div class="outline-3" id="outline-container-org906d583">
<h3 id="org906d583">
</h3>
<h3 id="org906d583">
</h3>
<h3 id="org906d583">
</h3>
<h3 id="org906d583">
Improving & Testing "Hello World"</h3>
<div class="outline-text-3" id="text-org906d583">
<br />
To showcase a few things that I found helpful we are going to improve this
simple code example. And because this is my blog post and I decide what is in
it, I get to talk to you about how to test your code. GPU code tends to be a bit
flaky: it breaks easily. Thus, I argue that creating simple tests for your code
is essential. These do not have to be very complicated but I recommend that you
use a proper framework for writing unit tests. For C++ I have had success with
<a href="https://github.com/philsquared/Catch">Catch</a> and <a href="https://github.com/onqtam/doctest">doctest</a>, both single-headers that you include into your project.
<br />
<br />
Before we include these tests however, I propose that we make two more changes
to the program. First of all, we are going to add better error checking. Most of
the <code>cudaFoo</code> functions return a value indicating whether the operation was
successful. Otherwise, we get something which we can use to determine the error.
<br />
<br />
<div class="org-src-container">
<pre class="src src-c++"><span class="linenr">1: </span><span style="color: #6c3163;">#define</span> <span style="color: #6c3163; font-weight: bold;">check</span><span style="color: #3a81c3;">(</span><span style="color: #715ab1;">e</span><span style="color: #3a81c3;">)</span> <span style="color: #3a81c3;">{</span> _check<span style="color: #6c3163;">(</span><span style="color: #2d9574;">(</span>e<span style="color: #2d9574;">)</span>, __FILE__, __LINE__<span style="color: #6c3163;">)</span>; <span style="color: #3a81c3;">}</span>
<span class="linenr">2: </span>
<span class="linenr">3: </span><span style="color: #3a81c3; font-weight: bold;">inline</span> <span style="color: #ba2f59; font-weight: bold;">cudaError_t</span> <span style="color: #6c3163; font-weight: bold;">_check</span><span style="color: #3a81c3;">(</span><span style="color: #ba2f59; font-weight: bold;">cudaError_t</span> <span style="color: #715ab1;">result</span>, <span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">char</span> *<span style="color: #715ab1;">file</span>, <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">line</span><span style="color: #3a81c3;">)</span> <span style="color: #3a81c3;">{</span>
<span class="linenr">4: </span> <span style="color: #3a81c3; font-weight: bold;">if</span> <span style="color: #6c3163;">(</span>result != cudaSuccess<span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr">5: </span> fprintf<span style="color: #2d9574;">(</span>stderr, <span style="color: #2d9574;">"CUDA Runtime Error: %s (%s:%d)\n"</span>, cudaGetErrorString<span style="color: #67b11d;">(</span>result<span style="color: #67b11d;">)</span>, file, line<span style="color: #2d9574;">)</span>;
<span class="linenr">6: </span> assert<span style="color: #2d9574;">(</span>result == cudaSuccess<span style="color: #2d9574;">)</span>;
<span class="linenr">7: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">8: </span> <span style="color: #3a81c3; font-weight: bold;">return</span> result;
<span class="linenr">9: </span><span style="color: #3a81c3;">}</span>
</pre>
<pre class="src src-c++"><span style="color: #3a81c3;">
</span></pre>
</div>
And then simply wrap the <code>cudaFoo</code> functions with this <code>check</code> macro.
Alternatively, you may want to rewrite this to use exceptions instead of
asserts. Pick your poison.
<br />
<br />
Another thing I would recommend adding if you are doing CUDA in C++ is wrapping
most of the allocation and de-allocation logic in a <code>class</code>. I generally take a
more utilitarian view of classes for simple pieces of code and thus the
following is not necessarily idiomatic or good C++ code.
<br />
<br />
<div class="org-src-container">
<pre class="src src-c++"><span class="linenr"> 1: </span><span style="color: #3a81c3; font-weight: bold;">class</span> <span style="color: #ba2f59; font-weight: bold;">Saxpy</span> <span style="color: #3a81c3;">{</span>
<span class="linenr"> 2: </span><span style="color: #3a81c3; font-weight: bold;">public</span>:
<span class="linenr"> 3: </span> <span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span>;
<span class="linenr"> 4: </span> <span style="color: #ba2f59; font-weight: bold;">float</span> *<span style="color: #715ab1;">d_x</span>;
<span class="linenr"> 5: </span> <span style="color: #ba2f59; font-weight: bold;">float</span> *<span style="color: #715ab1;">d_y</span>;
<span class="linenr"> 6: </span> <span style="color: #ba2f59; font-weight: bold;">float</span> *<span style="color: #715ab1;">x</span>;
<span class="linenr"> 7: </span> <span style="color: #ba2f59; font-weight: bold;">float</span> *<span style="color: #715ab1;">y</span>;
<span class="linenr"> 8: </span>
<span class="linenr"> 9: </span> <span style="color: #6c3163; font-weight: bold;">Saxpy</span><span style="color: #6c3163;">(</span><span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span><span style="color: #6c3163;">)</span> : n<span style="color: #6c3163;">(</span>n<span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr">10: </span> x = <span style="color: #3a81c3; font-weight: bold;">new</span> <span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #2d9574;">[</span>n<span style="color: #2d9574;">]</span>;
<span class="linenr">11: </span> y = <span style="color: #3a81c3; font-weight: bold;">new</span> <span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #2d9574;">[</span>n<span style="color: #2d9574;">]</span>;
<span class="linenr">12: </span>
<span class="linenr">13: </span> check<span style="color: #2d9574;">(</span>cudaMalloc<span style="color: #67b11d;">(</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">void</span> **<span style="color: #b1951d;">)</span>&d_x, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #b1951d;">)</span> * n<span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">14: </span> check<span style="color: #2d9574;">(</span>cudaMalloc<span style="color: #67b11d;">(</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">void</span> **<span style="color: #b1951d;">)</span>&d_y, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #b1951d;">)</span> * n<span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">15: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">16: </span>
<span class="linenr">17: </span> ~<span style="color: #6c3163; font-weight: bold;">Saxpy</span><span style="color: #6c3163;">()</span> <span style="color: #6c3163;">{</span>
<span class="linenr">18: </span> check<span style="color: #2d9574;">(</span><span style="color: #ba2f59; font-weight: bold;">cudaFree</span><span style="color: #67b11d;">(</span><span style="color: #715ab1;">d_x</span><span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">19: </span> check<span style="color: #2d9574;">(</span><span style="color: #ba2f59; font-weight: bold;">cudaFree</span><span style="color: #67b11d;">(</span><span style="color: #715ab1;">d_y</span><span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">20: </span>
<span class="linenr">21: </span> <span style="color: #3a81c3; font-weight: bold;">delete</span><span style="color: #2d9574;">[]</span> x;
<span class="linenr">22: </span> <span style="color: #3a81c3; font-weight: bold;">delete</span><span style="color: #2d9574;">[]</span> y;
<span class="linenr">23: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">24: </span>
<span class="linenr">25: </span> <span style="color: #ba2f59; font-weight: bold;">Saxpy</span>& <span style="color: #6c3163; font-weight: bold;">fill</span><span style="color: #6c3163;">()</span> <span style="color: #6c3163;">{</span>
<span class="linenr">26: </span> <span style="color: #3a81c3; font-weight: bold;">for</span> <span style="color: #2d9574;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">i</span> = <span style="color: #4e3163;">0</span>; i < n; ++i<span style="color: #2d9574;">)</span> <span style="color: #2d9574;">{</span>
<span class="linenr">27: </span> x<span style="color: #67b11d;">[</span>i<span style="color: #67b11d;">]</span> = i / <span style="color: #4e3163;">12.34</span>;
<span class="linenr">28: </span> y<span style="color: #67b11d;">[</span>i<span style="color: #67b11d;">]</span> = i / <span style="color: #4e3163;">56.78</span>;
<span class="linenr">29: </span> <span style="color: #2d9574;">}</span>
<span class="linenr">30: </span>
<span class="linenr">31: </span> check<span style="color: #2d9574;">(</span>cudaMemcpy<span style="color: #67b11d;">(</span>d_x, x, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #b1951d;">)</span> * n, cudaMemcpyHostToDevice<span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">32: </span> check<span style="color: #2d9574;">(</span>cudaMemcpy<span style="color: #67b11d;">(</span>d_y, y, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #b1951d;">)</span> * n, cudaMemcpyHostToDevice<span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">33: </span>
<span class="linenr">34: </span> <span style="color: #3a81c3; font-weight: bold;">return</span> *<span style="color: #3a81c3; font-weight: bold;">this</span>;
<span class="linenr">35: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">36: </span>
<span class="linenr">37: </span> <span style="color: #ba2f59; font-weight: bold;">Saxpy</span>& <span style="color: #6c3163; font-weight: bold;">run</span><span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span> <span style="color: #715ab1;">a</span><span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr">38: </span> <span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">threads_per_block</span> = <span style="color: #4e3163;">256</span>;
<span class="linenr">39: </span> <span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">number_of_blocks</span> = n / threads_per_block;
<span class="linenr">40: </span> saxpy<<<span style="color: #2d9574;"><</span>number_of_blocks, threads_per_block<span style="color: #2d9574;">></span>>><span style="color: #2d9574;">(</span><span style="color: #ba2f59; font-weight: bold;">n</span>, a, d_x, d_y<span style="color: #2d9574;">)</span>;
<span class="linenr">41: </span>
<span class="linenr">42: </span> <span style="color: #3a81c3; font-weight: bold;">return</span> *<span style="color: #3a81c3; font-weight: bold;">this</span>;
<span class="linenr">43: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">44: </span>
<span class="linenr">45: </span> <span style="color: #ba2f59; font-weight: bold;">Saxpy</span>& <span style="color: #6c3163; font-weight: bold;">load</span><span style="color: #6c3163;">()</span> <span style="color: #6c3163;">{</span>
<span class="linenr">46: </span> check<span style="color: #2d9574;">(</span><span style="color: #ba2f59; font-weight: bold;">cudaDeviceSynchronize</span><span style="color: #67b11d;">()</span><span style="color: #2d9574;">)</span>;
<span class="linenr">47: </span> check<span style="color: #2d9574;">(</span>cudaMemcpy<span style="color: #67b11d;">(</span>y, d_y, <span style="color: #3a81c3; font-weight: bold;">sizeof</span><span style="color: #b1951d;">(</span><span style="color: #ba2f59; font-weight: bold;">float</span><span style="color: #b1951d;">)</span> * n, cudaMemcpyDeviceToHost<span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">48: </span> <span style="color: #3a81c3; font-weight: bold;">return</span> *<span style="color: #3a81c3; font-weight: bold;">this</span>;
<span class="linenr">49: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">50: </span><span style="color: #3a81c3;">}</span>;
</pre>
<pre class="src src-c++"></pre>
</div>
<br />
Why we went through all this trouble becomes clear if we put this in a test (I am using <code>doctest</code> syntax as an example).
<br />
<br />
<div class="org-src-container">
<pre class="src src-c++"><span class="linenr"> 1: </span><span style="color: #715ab1;">TEST_CASE</span><span style="color: #3a81c3;">(</span><span style="color: #2d9574;">"testing saxpy"</span><span style="color: #3a81c3;">)</span> <span style="color: #3a81c3;">{</span>
<span class="linenr"> 2: </span> <span style="color: #ba2f59; font-weight: bold;">float</span> <span style="color: #715ab1;">a</span> = <span style="color: #4e3163;">3.14</span>;
<span class="linenr"> 3: </span> <span style="color: #3a81c3; font-weight: bold;">const</span> <span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span> = <span style="color: #4e3163;">1024</span>;
<span class="linenr"> 4: </span> <span style="color: #ba2f59; font-weight: bold;">Saxpy</span> <span style="color: #6c3163; font-weight: bold;">s</span><span style="color: #6c3163;">(</span>n<span style="color: #6c3163;">)</span>;
<span class="linenr"> 5: </span>
<span class="linenr"> 6: </span> s.fill<span style="color: #6c3163;">()</span>.run<span style="color: #6c3163;">(</span>a<span style="color: #6c3163;">)</span>.load<span style="color: #6c3163;">()</span>;
<span class="linenr"> 7: </span>
<span class="linenr"> 8: </span> <span style="color: #3a81c3; font-weight: bold;">for</span> <span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">i</span> = <span style="color: #4e3163;">0</span>; i < n; ++i<span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr"> 9: </span> <span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">we didn't keep the old y values so we recompute them here</span>
<span class="linenr">10: </span> <span style="color: #ba2f59; font-weight: bold;">float</span> <span style="color: #715ab1;">y_i</span> = i / <span style="color: #4e3163;">56.78</span>;
<span class="linenr">11: </span> <span style="background-color: #ecf3ec; color: #2aa1ae;">// </span><span style="background-color: #ecf3ec; color: #2aa1ae;">the approx is because floating point comparison is wacky</span>
<span class="linenr">12: </span> CHECK<span style="color: #2d9574;">(</span>s.y<span style="color: #67b11d;">[</span>i<span style="color: #67b11d;">]</span> == <span style="color: #4e3163;">doctest</span>::Approx<span style="color: #67b11d;">(</span>a * s.x<span style="color: #b1951d;">[</span>i<span style="color: #b1951d;">]</span> + y_i<span style="color: #67b11d;">)</span><span style="color: #2d9574;">)</span>;
<span class="linenr">13: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">14: </span><span style="color: #3a81c3;">}</span>
</pre>
</div>
<br />
That is a, for C++ standards, pretty concise test. And indeed, our tests succeed. Yay.
<br />
<pre class="example">
</pre>
<pre class="example">===============================================================================
[doctest] test cases: 1 | 1 passed | 0 failed | 0 skipped
[doctest] assertions: 1024 | 1024 passed | 0 failed |
</pre>
</div>
</div>
<div class="outline-3" id="outline-container-orga210b35">
<h3 id="orga210b35">
</h3>
<h3 id="orga210b35">
</h3>
<h3 id="orga210b35">
</h3>
<h3 id="orga210b35">
A Final Improvement</h3>
<div class="outline-text-3" id="text-orga210b35">
<br />
Because this post is already too long I will conclude with one last really nice
tip that I absolutely did not steal from <a href="https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/">here</a>. Actually, the NVIDIA developer
blogs contain a lot of really good CUDA tips.
<br />
Our current kernel is perfectly capable of adapting to a situation where we give
it <i>less data</i> than the grid can support. However, if we give it <i>more data</i>,
things will break. This is where gride-stride loops come in. It works by looping
over the data <i>one grid at a time</i> while maintaining <i>coalesced memory access</i>
(which is something I will write about next time).
<br />
<br />
Here's our new kernel using these kinds of loops.
<br />
<br />
<div class="org-src-container">
<pre class="src src-c++"><span class="linenr">1: </span>__global__
<span class="linenr">2: </span><span style="color: #ba2f59; font-weight: bold;">void</span> <span style="color: #6c3163; font-weight: bold;">saxpy</span><span style="color: #3a81c3;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">n</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> <span style="color: #715ab1;">a</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> *<span style="color: #715ab1;">x</span>, <span style="color: #ba2f59; font-weight: bold;">float</span> *<span style="color: #715ab1;">y</span><span style="color: #3a81c3;">)</span> <span style="color: #3a81c3;">{</span>
<span class="linenr">3: </span> <span style="color: #3a81c3; font-weight: bold;">for</span> <span style="color: #6c3163;">(</span><span style="color: #ba2f59; font-weight: bold;">int</span> <span style="color: #715ab1;">i</span> = blockIdx.x * blockDim.x + threadIdx.x; i < n; i += blockDim.x * gridDim.x<span style="color: #6c3163;">)</span> <span style="color: #6c3163;">{</span>
<span class="linenr">4: </span> y<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span> = a * x<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span> + y<span style="color: #2d9574;">[</span>i<span style="color: #2d9574;">]</span>;
<span class="linenr">5: </span> <span style="color: #6c3163;">}</span>
<span class="linenr">6: </span><span style="color: #3a81c3;">}</span>
</pre>
</div>
</div>
</div>
</div>
<div class="outline-2" id="outline-container-org6000988">
<h2 id="org6000988">
</h2>
<h2 id="org6000988">
</h2>
<h2 id="org6000988">
</h2>
<h2 id="org6000988">
Conclusion</h2>
<div class="outline-text-2" id="text-org6000988">
<br />
I hope this convinces you that GPU programming is actually pretty simple. The
kernel here is pretty trivial, but as long as you understand that within the
kernel you can basically write C/C++, you are going to do just fine.
<br />
<br />
If there is a next post I will write more about memory in GPUs, a very important
topic if you want your code to actually run fast. If you want to skip ahead you should
read about the different types of memory (global, local, shared, texture, etc.)
and what memory coalescing entails.
<br />
<br />
Until next time.
</div>
</div>
Erik Bosshttp://www.blogger.com/profile/09790534761609367831noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-80503131340787913732017-03-14T11:27:00.000+01:002017-03-14T18:59:30.068+01:00What are lattices?<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
Do all the cool post-quantum cryptographers around you talk about lattices and all you can think about is a good salad?<br />
<br />
Don't worry! You can be part of the hip crowd at the next cocktail party in just three easy steps!<br />
<br />
<br />
First, I will tell you what a lattice is. It is actually really easy.<br />
Second, we will prove a tiny fact about lattices so that those poor neurons that did all the work in the first step have some friends.<br />
Third, nothing. There isn't even a third step. That's how easy it is!<br />
<br />
So, let's tackle the first step. What, actually, is a lattice?<br />
The answer could not be easier. It is simply a grid of points. Like this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzQ9E3_aE63HhYssJVtUciQNm6qiQPcbIkqU1e_0qCgE8cAYDoc1JF6Let7VXcLZ0lbM8T7gBcnBWjx-gqvb5BKxqMcgtDHQfXO4NuHoGFdZFkOzWNMRIBLXr9hr_jyZ9RDRTstPp28Mw/s1600/lattice.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="281" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgzQ9E3_aE63HhYssJVtUciQNm6qiQPcbIkqU1e_0qCgE8cAYDoc1JF6Let7VXcLZ0lbM8T7gBcnBWjx-gqvb5BKxqMcgtDHQfXO4NuHoGFdZFkOzWNMRIBLXr9hr_jyZ9RDRTstPp28Mw/s320/lattice.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Easy, right?<br />
The formal definition is just as simple:<br />
<blockquote class="tr_bq">
<br />
Given $n$ linearly independent vectors $b_1, b_2, \ldots, b_n \in \mathbb{R}^m$, the lattice generated by them is defined as<br />
\[<br />
\mathcal{L}(b_1, b_2, \ldots, b_n) = \left\{ \sum x_i b_i \middle| x_i \in \mathbb{Z} \right\}.<br />
\]</blockquote>
<br />
We can add those so called basis vectors $b_i$ to our diagram and will readily see how each point is constructed.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLhQEnwWxTsFoPAWr0Sa9N4EBWlGmLwGN-YIEP0hWROpL3QUbM6Qa0ammpsa1X-L6U8ZijDuN0TI3Sd3-3i4L7-UhtmDFbIOW6Man3Mt3vd-nrFzbSpDU6GUGqHYpjyAp5N9lOzzsw3LI/s1600/latticefull.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="281" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLhQEnwWxTsFoPAWr0Sa9N4EBWlGmLwGN-YIEP0hWROpL3QUbM6Qa0ammpsa1X-L6U8ZijDuN0TI3Sd3-3i4L7-UhtmDFbIOW6Man3Mt3vd-nrFzbSpDU6GUGqHYpjyAp5N9lOzzsw3LI/s320/latticefull.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<br />
And now you already know what a lattice is! In the next step we will talk about some more definitions and show the proof of a theorem.<br />
<br />
We will be discussing something related to the question of what the length of the shortest non-zero vector in a given lattice $\mathcal{L}$ is.<br />
<br />
Definition: <br />
<blockquote class="tr_bq">
$\lambda_1(\mathcal{L})$ is the length of the shortest non-zero vector in $\mathcal{L}$. We use the euclidean norm for the definition of length.</blockquote>
And we also need the volume of a lattice which we are not going to define formally we will just take it to be the volume of that $n$ dimensional parallelogram in the picture:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1WuUleztl9Ispq7HfJ7VKOGoZzXEiE4s5aoSsyjs_DV5Kt194Iif61TJsi_L2gxB1ZFQBFE923bV0JBCxxWM_yI0OgcAENTe8esgk2RyXFA-6L8bN4QXE14yt1Z55xn5bi40aHqe9d4A/s1600/latticevolume.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1WuUleztl9Ispq7HfJ7VKOGoZzXEiE4s5aoSsyjs_DV5Kt194Iif61TJsi_L2gxB1ZFQBFE923bV0JBCxxWM_yI0OgcAENTe8esgk2RyXFA-6L8bN4QXE14yt1Z55xn5bi40aHqe9d4A/s1600/latticevolume.png" /></a></div>
<br />
<br />
Now we can already prove <i>Blichfeld's Theorem</i>! It makes a statement about where we can find a lattice point and roughly goes as follows:<br />
<blockquote class="tr_bq">
For a lattice $\mathcal{L}\subseteq \mathbb{R}^n$ and a set $S \subseteq \mathbb{R}^n$
where the volume of $S$ is bigger than the volume of the lattice there exist two nonequal points
$z_1, z_2 \in S$ such that $z_1 - z_2 \in \mathcal{L}$.</blockquote>
And here is the idea of the proof in a graphical way! First we simply draw an example for the set $S$ in blue:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgueUH6Ckw4ytAoQ7lV_ZqxvJD7Pre3F-o36Cl8rpWui11FGuj29pMf41K69hLDxDOrwcldv9tZ159rqM9zeDSqjHvvFTTnOYae97PSUyd2IsZjnVQqtRVs87YKnqpYs_yr0P3YPXGCGvo/s1600/set.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgueUH6Ckw4ytAoQ7lV_ZqxvJD7Pre3F-o36Cl8rpWui11FGuj29pMf41K69hLDxDOrwcldv9tZ159rqM9zeDSqjHvvFTTnOYae97PSUyd2IsZjnVQqtRVs87YKnqpYs_yr0P3YPXGCGvo/s1600/set.png" /></a></div>
<br />
Now we cut up our set and move all the parts into our first parallelogram!<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_rMN983C6p0nLQwbG83rdl_GoOxBAKwjSuR4vHdNmzME1pK8iD0AjbE53kjSK4ID5H17t7s78fs9Oltrv8uOK8d5auGAvPnh1G0LIMJpX9b0AgXs8ySKgB84BUZjHqvACH_C1D4bQjro/s1600/movedset.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="387" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_rMN983C6p0nLQwbG83rdl_GoOxBAKwjSuR4vHdNmzME1pK8iD0AjbE53kjSK4ID5H17t7s78fs9Oltrv8uOK8d5auGAvPnh1G0LIMJpX9b0AgXs8ySKgB84BUZjHqvACH_C1D4bQjro/s400/movedset.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
As you can see there is quite a bit of overlap and the fact that the full volume of the set is bigger than the volume of the lattice aka the parallelogram guarantees there will be at least some overlap somewhere. But if there is overlap then we have two points, that have been moved by
multiples of the base vectors and which are now at the same position!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkQwCE26YfEMBRMZ-X8qRiJJn0Y7Zj7ZBHhpMhzbu0fO6nMBSP_J0cPTcXeyJzrFAcwC5c5DI3BzoXYEH9XQAhl6kqMxSUwJP_Y6fTFah9n1l9J4l9DvvZ9sj7bE-XK65cikaehoQ2_Mw/s1600/motion.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="310" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkQwCE26YfEMBRMZ-X8qRiJJn0Y7Zj7ZBHhpMhzbu0fO6nMBSP_J0cPTcXeyJzrFAcwC5c5DI3BzoXYEH9XQAhl6kqMxSUwJP_Y6fTFah9n1l9J4l9DvvZ9sj7bE-XK65cikaehoQ2_Mw/s320/motion.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
And therefore their difference must be a multiple of base vectors as well and we have found a difference which is a lattice point! Hooray!<br />
<br />
Now we will use a little trick to expand this result and make a statement about an actual point from the set being on the lattice not just the difference of two points!<br />
<br />
What we will show is called <i>Minkowski's Convex Body Theorem</i> and it states roughly<br />
<blockquote class="tr_bq">
Let $\mathcal{L}$ be a lattice. Then any centrally-symmetric convex set $S$, with volume bigger than $2^n$ times the volume of the lattice contains a nonzero lattice point.</blockquote>
So after this we will know such a set it will contain a lattice point and using a simple sphere as the set allows us to put a bound on $\lambda_1(\mathcal{L})$.<br />
<br />
Let's get to it then!<br />
First we blow up our lattice to twice its size along every dimension!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSUiXUv0YhD6Fw4FbOx8vPqYfUsJ_oDfwLxkYayKYRZ7mWVFGRKlS8kLgzHjZAlJbXep0vwF5jzQUmTQdlRYxnNqgBshSD9sVRg9YsQ56L0dSasxgnk5RWBjCwkg_GHP01dxKAPgNoSwQ/s1600/scale.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="281" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSUiXUv0YhD6Fw4FbOx8vPqYfUsJ_oDfwLxkYayKYRZ7mWVFGRKlS8kLgzHjZAlJbXep0vwF5jzQUmTQdlRYxnNqgBshSD9sVRg9YsQ56L0dSasxgnk5RWBjCwkg_GHP01dxKAPgNoSwQ/s320/scale.png" width="320" /></a></div>
<br />
<br />
<br />
<br />
<br />
Now we add our centrally-symmetric convex set $S$. Again in blue.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhLYAmCKN0h8gf8SJB9tjKJUxvozShANMHU9YLDx7rhBraV9nicHIL6ZuGdD6Mz9TCaMiEnkEJNQBGz7C4wlVFZFuRUY7sSI0MNLihLsBqNWr15L0xobmInNggQZtJa_Dnusl8obHyXXI/s1600/sym.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjhLYAmCKN0h8gf8SJB9tjKJUxvozShANMHU9YLDx7rhBraV9nicHIL6ZuGdD6Mz9TCaMiEnkEJNQBGz7C4wlVFZFuRUY7sSI0MNLihLsBqNWr15L0xobmInNggQZtJa_Dnusl8obHyXXI/s1600/sym.png" /></a></div>
<br />
And because we picked the volume of $S$ to be bigger than $2^n$ times the volume of the lattice we still get the colliding points from the last theorem EVEN in the double size lattice!<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMb4uR5aPq7o3VknbxaSfzwvZSK65ADArBxaqboTuAwgkeOuujF2ir1uoyIaK11dwfHel6SI5ptYoPIZCt_21k7f9TQ09WLVyMmR4Yy4mrU4Vzf8WXQs_fsgo3Rx6_sUHukZwvHMml3Bw/s1600/zets.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhMb4uR5aPq7o3VknbxaSfzwvZSK65ADArBxaqboTuAwgkeOuujF2ir1uoyIaK11dwfHel6SI5ptYoPIZCt_21k7f9TQ09WLVyMmR4Yy4mrU4Vzf8WXQs_fsgo3Rx6_sUHukZwvHMml3Bw/s1600/zets.png" /></a></div>
<br />
But since our set $S$ is symmetric about the origin if $z_2 \in S$ it follows that $-z_2 \in S$ and because it is convex $z_1, -z_2 \in S$ implies $\frac{z_1 - z_2}{2} \in S$. And because we've double the size of the lattice and $z_1 - z_2$ is on the bigger lattice it follows that $\frac{z_1 - z_2}{2} \in \mathcal{L}$ and we have shown that a point in our set is also in the lattice.<br />
<br />
You can see it quite clearly in the next picture.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJbWHiMRc1OazL3Z0wiUm_gY9Qpv2OXMC2BVYZmy1Qcg7tI4JWA5aEecQfIseaabXs0it07dSK_ZEY3Nqopxf7l4SKqxXcIOG4QJBQjQ5DEDhlcImHvE3wYvQLmTurthWCc10krq5QgF8/s1600/final.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJbWHiMRc1OazL3Z0wiUm_gY9Qpv2OXMC2BVYZmy1Qcg7tI4JWA5aEecQfIseaabXs0it07dSK_ZEY3Nqopxf7l4SKqxXcIOG4QJBQjQ5DEDhlcImHvE3wYvQLmTurthWCc10krq5QgF8/s1600/final.png" /></a></div>
<br />
As already stated above if we use a simple sphere for this set we can give a bound on $\lambda_1(\mathcal{L})$ based on the volume of $\mathcal{L}$. It is known as <i>Minkowski's First Theorem</i> and states:<br />
<blockquote class="tr_bq">
\[<br />
\lambda_1(\mathcal{L}) \leq \sqrt{n}(vol(\mathcal{L})^{1/n}.<br />
\] </blockquote>
Isn't that great?!<br />
<br />
If you have now completely fallen in love with lattices but would appreciate real math instead of pictures, worry not!<br />
You will find steps 3 - 14 on <a href="https://www.cims.nyu.edu/~regev/teaching/lattices_fall_2009/">Oded Regev's course website</a>! Enjoy!<br />
<br />
And in case you accidentally run into him at a conference here is the picture from <a href="https://www.cims.nyu.edu/~regev/">his website</a> so you don't miss the opportunity to say hello.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaR4kXt2Y3ZoxvbZ7JL3-n5a9oS2Bq4qzP4m8poTyRqD_1v-Lzf0js3j8YJNMb06kirmt8w9dJYedfcTPb8k58-PVU40IaH8bcwrswu3o1rYtfKo7fbJGAGnARSwTVNNrWCqnEOhGI2bc/s1600/oded_small_new.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgaR4kXt2Y3ZoxvbZ7JL3-n5a9oS2Bq4qzP4m8poTyRqD_1v-Lzf0js3j8YJNMb06kirmt8w9dJYedfcTPb8k58-PVU40IaH8bcwrswu3o1rYtfKo7fbJGAGnARSwTVNNrWCqnEOhGI2bc/s320/oded_small_new.png" width="262" /></a></div>
<br />
<br />Simon Friedbergerhttp://www.blogger.com/profile/11398138941386358772noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-15685890757150421442017-03-04T23:28:00.001+01:002017-03-06T14:41:53.887+01:00GMP - Arithmetic without limitations<span style="font-family: inherit;">Theoretical research in cryptography can be extremely exciting but, on the other hand, a lot of effort should be devoted to correctly and securely implementing cryptosystems that help protect people's privacy. The presence of a bug in a software can be very annoying but if this software deals with crypto and security, then the consequences can be catastrophic.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">For many cryptosystems based on number theoretic problems, one of the first issues that a developer faces is the <b>size of the numbers</b> involved in the computations. These numbers are composed of <b>thousands of bits</b> (or hundreds of digits) and they need to be added, multiplied, raised to some powers, etc. Performing these operations is complicated, and doing it efficiently is even more complicated. But thankfully there are tools (libraries) that come to the rescue.</span><br />
<br />
<span style="font-family: inherit;">In this post, I will talk about <b><a href="https://gmplib.org/">GMP</a></b>, The GNU Multiple Precision Arithmetic Library. Quoting from the library's homepage, "<i>GMP is a free library for arbitrary precision arithmetic, operating on
signed integers, rational numbers, and floating-point numbers. There is no
practical limit to the precision except the ones implied by the available
memory in the machine GMP runs on. GMP has a rich set of functions, and the
functions have a regular interface.</i>" This library is written in C but also offers an interface for C++ code, and we are going to focus on this particular one. The final goal of this post will be to implement a simple but complete example of an RSA cryptosystem, from key generation to encryption and decryption of messages.</span><br />
<span style="font-family: inherit;">But, before moving on, a little disclaimer: implementing cryptographic solutions is not an easy task and using homemade solutions is generally considered as not a good idea because of bugs and lack of scrutiny by other members of the community. The sole goal of this post is to show some of the potential of the GMP library.</span><br />
<br />
<ul>
<li><span style="font-family: inherit;"><b>Data type --</b> the basic data type that we will deal with is <span style="font-family: "courier new" , "courier" , monospace;">mpz_class</span>. We can think of it as an arbitrary-sized integer. Some of the functions we will talk about only accepts parameters of type <span style="font-family: "courier new" , "courier" , monospace;">mpz_t</span>, which is the corresponding C type. Luckily, we can always call the function <span style="font-family: "courier new" , "courier" , monospace;">get_mpz_t</span> on a <span style="font-family: "courier new" , "courier" , monospace;">mpz_class</span> object to obtain the corresponding <span style="font-family: "courier new" , "courier" , monospace;">mpz_t</span> object</span><span style="font-family: "courier new" , "courier" , monospace;">;</span></li>
<li><b>Generating random numbers --</b> first of all, we consider the problem of generating random numbers with a specific size (measured in bits). GMP offers a function called <span style="font-family: "courier new" , "courier" , monospace;">get_z_bits</span>, that takes in input a number of bits and returns a number with that many bits. But be careful! It means that the returned number takes <u>up to</u> that number of bits. So calling it with argument '3' means that the returned value will be a number between 0 and 7. Instead, what we want is a number that is represented <u>exactly</u> by 3 bits, so 4, 5, 6, or 7. Doing that is quite straightforward: given a number of bits k, we want a random number in the range $\left[2^{k-1}, 2^k\right)$. In order to handle this exponentiation (since $k$ can be very big), we are going to use the function <span style="font-family: "courier new" , "courier" , monospace;">mpz_ui_pow_ui</span>, that raises an <span style="font-family: "courier new" , "courier" , monospace;">unsigned int</span> basis to an <span style="font-family: "courier new" , "courier" , monospace;">unsigned int</span> power and writes the result to a <span style="font-family: "courier new" , "courier" , monospace;">mpz_t</span> object. In order to draw a random number, we are going to initialize a random number generator with this code:<span style="font-family: "courier new" , "courier" , monospace;"><br />gmp_randclass rng(gmp_randinit_mt)</span>.<br />Note: this example uses the Mersenne Twister PRNG, which is known to be insecure for cryptographic applications. After this is done, our function for generating a random number with a specific number of bits can look like the following<br /><span style="font-family: "courier new" , "courier" , monospace;">mpz_class generate_rand_exact_bits (const unsigned int size)<br />{<br /> // 2^(size-1)<br /> mpz_class out;<br /> mpz_ui_pow_ui(out.get_mpz_t(), 2, size-1);<br /><br /> // 2^(size)<br /> mpz_class bound;<br /> mpz_ui_pow_ui(bound.get_mpz_t(), 2, size);<br /><br /> // random offset<br /> mpz_class rand = rng.get_z_range(bound-out);<br /><br /> return (out + rand);<br />}</span></li>
<li><b>Generating prime numbers --</b> generating random numbers is nice but often not sufficient; we need to generate random-looking prime numbers. This task is obviously harder because there are more constraints that have to be satisfied. Thankfully, GMP comes to the rescue once again: we can in fact take advantage of the function <span style="font-family: "courier new" , "courier" , monospace;">mpz_probab_prime_p</span> that takes as input a <span style="font-family: "courier new" , "courier" , monospace;">mpz_t</span> object that represents the candidate prime and a number that specifies how many times the Miller-Rabin primality test has to be executed, and returns 0 if the candidate is not prime, 1 if it is probably prime and 2 if it is surely prime. Repeating the test 50 times gives a very high degree of confidence on the primality of the candidate, so we are going to generate a random number and test it until we find a candidate that passes all the tests. The code for a function that performs this task can be the following<br /><span style="font-family: "courier new" , "courier" , monospace;">mpz_class generate_prime_exact_bits (const unsigned int size)<br />{<br /> mpz_class candidate;<br /> do {<br /> candidate = generate_rand_exact_bits(size);<br /> } while (mpz_probab_prime_p(candidate.get_mpz_t(), REPEAT_MILLER_RABIN) == 0);<br /> return candidate;<br />}</span></li>
<li><b>Modular exponentiation --</b> the goal of this step is to take a basis $b$, an exponent $e$ and a modulus $m$ and compute $b^e \mod m$. GMP already offers a function called <span style="font-family: "courier new" , "courier" , monospace;">mpz_powm</span> that does what we need, but we can "beautify" it a little bit by wrapping it in another function like the following:<br /><span style="font-family: "courier new" , "courier" , monospace;">mpz_class mod_exp (const mpz_class base, const mpz_class exp, const mpz_class mod)<br />{<br /> mpz_class out;<br /> mpz_powm (out.get_mpz_t(), base.get_mpz_t(), exp.get_mpz_t(), mod.get_mpz_t());<br /> return out;<br />}</span></li>
<li><b>Encryption and decryption --</b> once we have the function <span style="font-family: "courier new" , "courier" , monospace;">mod_exp</span> for modular exponentiation, RSA encryption and decryption are trivial:<br /><span style="font-family: "courier new" , "courier" , monospace;">mpz_class rsa_encrypt (const mpz_class pk, const mpz_class m, const mpz_class mod)<br />{<br /> return mod_exp (m, pk, mod);<br />}<br /><br /><br />mpz_class rsa_decrypt (const mpz_class sk, const mpz_class c, const mpz_class mod)<br />{<br /> return mod_exp (c, sk, mod);<br />}</span></li>
</ul>
Now, we will put everything together and in the main function we will do the following:<br />
<ol>
<li>generate two numbers <span style="font-family: "courier new" , "courier" , monospace;">rsa_p</span> and <span style="font-family: "courier new" , "courier" , monospace;">rsa_q</span> which are prime and composed of an arbitrary (here 2048) number of bits;</li>
<li>multiply together <span style="font-family: "courier new" , "courier" , monospace;">rsa_p</span> and <span style="font-family: "courier new" , "courier" , monospace;">rsa_q</span> to obtain the modulus <span style="font-family: "courier new" , "courier" , monospace;">rsa_mod</span>;</li>
<li>calculate the Euler's totient function of the modulus by multiplying <span style="font-family: "courier new" , "courier" , monospace;">(rsa_p -1) </span>and <span style="font-family: "courier new" , "courier" , monospace;">(rsa_q - 1)</span>;</li>
<li>choose a public exponent <span style="font-family: "courier new" , "courier" , monospace;">rsa_pk</span>: here we are going to use 65537;</li>
<li>compute the secret exponent <span style="font-family: "courier new" , "courier" , monospace;">rsa_sk</span> by calculating the multiplicative inverse of <span style="font-family: "courier new" , "courier" , monospace;">rsa_pk</span> modulo <span style="font-family: "courier new" , "courier" , monospace;">rsa_totient</span>. For this task, we can use the function <span style="font-family: "courier new" , "courier" , monospace;">mpz_invert</span> offered by GMP;</li>
<li>generate a random 32-bits message;</li>
<li>encrypt it and decrypt it to show that the message is recovered correctly</li>
</ol>
The full code is available <a href="http://www.di.ens.fr/~minelli/docs/rsa_main.cpp">here</a>. Assuming that it is saved in a file called <span style="font-family: "courier new" , "courier" , monospace;">main.cpp</span>, it can be compiled with the following command:<br />
<span style="font-family: "courier new" , "courier" , monospace;">g++ -Wall -o main main.cpp -lgmp -lgmpxx</span><br />
<br />
Finally, here is a possible output of the program:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace;">p: 23792449500522212951496314702038682121347194317959904876008718825662914075875899470581320800653437713536069408253411676817620646430212407540403369694134781759107125478056393908946127803961090931948553568336876469309822272887045095561051090700123699901361196085633529272849875353866567000752182158071024473204684307004694765510326847369963753315267894351158697231078788807690227802109570252088084877486698404206015111529669790467025281249212356222609928349702116676081721161733842190421613247070310376995600752548098115546539048072846637620536666595226297844398938704443953343779702008604938853732596401295126782941987<br /><br />q: 29112805945439121371217246384724375309040486330485414315973357532238818896560861590487525988011993439077520550311524035984480970497774071062272700839286098068182559422588276697743896386738257138746524032717815962171954684796109782069096410172414189261142167684350607916360394902788649692558546060677368837462937413290555419196322614840219369972120722061013641236619226520826654042088554099043918184746705620357166310881550460726728460067235575345180739677700783783242815237362918214982959525838694224404066952951953880663045068810920848575666730532567285210572753317549500155052830379131918899635413209365556429526661<br /><br />Modulus: 692664965275363134868164310308043386530889977137116066460956916982191344093600913377382983586757778122293014614580541617686926517513948142377215870615527742653194700493457784251321237254925462486660424748287684799960005285232140262663908204076330275297015032277805385557802277182249107190208144962072627103333702166712562912412455374055377823609059561406201023023953754548932692770545184532804691011451566435370564027281954154745605054059335653411252856827668944053539165109486066934327992280592181117681540307774187890207024741310917968286855939570200842820225004987124097822236170460663883584621535528990367892742357001025439624939219952367312627679661863266398010079250545044069312072321631423895274591504669908947731669841676290437961150479342566664751624244101084250756141019398564481030304316812644414619975245318775213481846709606949913834082310037698177108248900032520946078124914227912956069612730720184002438971635446900598624230868985779777067182408067527262398836477646176180149265564531633256045959807480886640631812690951578507074608608075924577570127277599939833799849839997452314926854023853061044555560190656563707831534988797823714162801693363417381512465247240619960211197273430707134124687895683306648515432815407<br /><br />Totient: 692664965275363134868164310308043386530889977137116066460956916982191344093600913377382983586757778122293014614580541617686926517513948142377215870615527742653194700493457784251321237254925462486660424748287684799960005285232140262663908204076330275297015032277805385557802277182249107190208144962072627103333702166712562912412455374055377823609059561406201023023953754548932692770545184532804691011451566435370564027281954154745605054059335653411252856827668944053539165109486066934327992280592181117681540307774187890207024741310917968286855939570200842820225004987124097822236170460663883584621535528990367892742304095769993663604897238806225864622231475585749564760058562967711410339349194662834205744716004477795118079883111354725159048862414580186148948173567663370928851334497919810423614292621945066549280167717720521050364932649266758956452162536825639219086396668750961940935703957656300852919419991965254045660967825180303374046162336317566884059120678910850226498009948160851632383720333508904913956745247482616068631268540358255880854866759476646002336609572536933340525303598355554521449451080152039954160522951063655835325404680939946676605489966289587929410275548597966757698440898319397266934527673695987832220346760<br /><br />Message: 2699077189<br /><br />Ciphertext: 681377152725050864187889498396285584066530595533108416812276566730393956984527765118715661493472361295191987186484833223806930407591255557303290975217436524165953995313143542640632193731686512007342253568013661854913604989704338096439627987512846172004801196181232822308088685943243400345975426883909096147339045273787325984512823244957423398055174828695879849942493712502158108081782163024167101957978799078692054106581518638832961868215165854308314877242675212188837955873975174727647247990393606326403876490500815929414980002030276198625866328310973421532151059071628038306926281825502507926893439321472535762059281448571817796845306517579637703981694846774635322811199431875663174162751403156127612351378340405058549628085390124779083962094777489309709766752481986327599769351343828713992159924065615106068412838813886565270542810886921193120790895042994364186375596015716254331672407180666306582326345460867181975252903332881845979999145141098787286539117576791635044116865176716309063466576180971588107298159543491564320012605624880056428982408094160945322909734375254637560779872935691072801422766106105016077161030768880774211449289008906840655161792691129673110760644874000744631018536155622276779954270891913007605935768473<br /><br />Decrypted: 2699077189</span>Michele Minellihttp://www.blogger.com/profile/00455630004995030556noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-75035882178609235282017-02-27T00:00:00.000+01:002017-02-27T00:02:03.172+01:00Side-channel analysis<div dir="ltr" style="text-align: left;" trbidi="on">
<div class="MsoNormal">
For many years cryptographers and industry implemented
crypto algorithms on microcontrollers and ASICs simply by translating the
algorithms into assembly or rtl language. Only thing that mattered was correct
functionality of the design. However, in mid to late nineties some interesting
attacks of cryptographic implementations were shown, that were able to
completely break the security of it, reveal secret keys, while the underlying cryptographic
algorithm remained mathematically unbroken. First, the timing attack was introduced. Namely the attacker was able discover information based on
the time server took to respond to a query since this time depended on the
secret key that was used. Most simple application of this is square and
multiply algorithm used for RSA exponentiation. Namely, for every bit of the
exponent square, or square and multiply operation is used. This will result in different execution times for different exponents.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Kocher et al. introduced new type of side-channel attack by revealing that
information obtained from the power measurements can be also used to extract
secret values from devices such as smart cards. We start from assuming that the power signal
consist of random noise and deterministic one dependent on the processed value $x$<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
$ P =
L(x) + R$<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Random noise is modeled
as a Gaussian distribution with zero mean, and deterministic key dependant
value $L(x)$ is usually hamming weight or hamming distance of the value .
Following on, we choose intermediate value to be targeted. Good target of the
attack for symmetric designs is S-box output, since small hamming weight difference
in the input of the sbox leads to not so small hamming weight difference in the output. In
the original work it was simply a one bit value, that could be either one or
zero. The simplest example is thus by using one output bit of the S-box, with
the assumption that power consumption is different for one and zero value. For
example, take one S-box operation of the first round of AES. It is calculated using
equation below<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal" style="text-indent: .5in;">
$SO_i = S(pt_i \oplus key_i)$<o:p></o:p></div>
<div class="MsoNormal" style="text-indent: .5in;">
<br /></div>
<div class="MsoNormal">
Since we are unaware
of what is the correct key value, we construct key guesses and calculate the
targeted value, e.g. first bit of the S-box output. Now, for every key guess,
we separate power traces into two groups, one that contains power traces for
which the key guess produces target value one and other where target value is
zero. Lastly, we compute the average of two groups, reducing the influence of
the noise, and then we subtract these averages. If our key guess is wrong,
difference of means will be negligible, since the power traces in two groups
don’t correspond to correct values, and are thus uncorrelated. However, the
correct key guess will have a noticeable spike in the point in time where first
bit of S-box output is computed.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
This very simple, yet very effective attack opened a new
area of research. Soon, more elaborate attacks were devised, attacking bytes
instead of bits using correlation. More advanced attacks include correlation
with variance instead of the mean, or even higher statistical moments. These
attacks are usually unpractical for orders higher than two, since the noise influence
hardens the attacker’s possibility to retrieve the secret value. Profiling of
the devices is sometimes performed in order to devise a more accurate model of
the deterministic leakage functions. These attacks are known as template
attacks. Designers also came up with various ways of trying to thwart these
attacks, such as increasing the noise level by adding redundant circuitry or
operations, and shuffling the execution operation. Other methods include secure
logic design styles such as WDDL, where logical cells are designed to consume
amount of power regardless of their output. Methods on the algorithmic level,
e.g. Threshold Implementations, try to ensure no information leakage happens during
the execution of the algorithm, regardless of the underlying leakage model of
the design technology.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Since classical DPA and correlation analysis involve
constructing guesses for all possible key bytes/nibbles, it can be very time
consuming. This is where leakage tests became prominent. They tell if a set of
randomly chosen traces can be distinguished for set of traces with fixed input
value. It works in a very similar way to one bit DPA. 2 Sets of traces are
obtained, one with random inputs and one with fixed inputs. Welsh t-test is
used to measure if two sets can be distinguished from one another. If $\mu, \
s$, and $n$ are mean, variance and cardinality of a set, t values are
calculated as follows<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
$ t =
\frac{\mu_0 - \mu_1}{\sqrt{\frac{s_0^2}{n_0} + \frac{s_1^2}{n_1}}}$<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<o:p><br /></o:p></div>
<div class="MsoNormal">
Approximating statistical degree of freedom with $n=n_0 +
n_1$ we can with confidence 0.99999 claim that two sets are distinguishable for
$t > 4.5$. This type of leakage tests
allows the designers to be more efficient since it reduces the testing time of
the design. Downside of the approach is that leakage tests do not reveal how
can the secret values be extracted, only that there is a possibility for an
attacker to do so. <o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<br />
<div class="MsoNormal">
In best paper of CHES 2016: <a href="https://www.iacr.org/cryptodb/data/paper.php?pubkey=27856" style="background-color: white;">Differential Computation Analysis: Hiding Your White-Box Designs is Not Enough</a>, side-channel analysis has been successfully
used to break many white-box implementations. This time analysis has been
performed on stack layout, instead of power traces, but the principle remained
the same. Taking into account that white-box cryptography is currently being
widely considered to be added to many software based designs, together with plethora
of IoT devices that require protection, side-channel analysis importance will
continue to be an important aspect of design process of cryptographic implementation.
Thus, researchers will continue to work on improvement of leakage detection and
key recovery methods to further reduce cost and time to market of new devices. </div>
<div class="MsoNormal">
<o:p></o:p></div>
</div>
Dušan Božilovhttp://www.blogger.com/profile/01773687011832964482noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-21917897483397461932017-02-19T19:57:00.000+01:002017-02-27T08:16:16.364+01:00Differential privacy<script type="text/javascript">MathJax.Hub.Queue(["Typeset",MathJax.Hub]);</script>
<br />
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<style type="text/css">
p, li { white-space: pre-wrap; }
</style>These days one reads a lot about the right to privacy. But what is it and how does it differ between the real and the digital description of the world? Briefly, it is a person's right to have and more importantly maintain control over information about oneself, which is fundamental to human's freedom of self-determination. In the digital context one seemingly lost this right in favor of using free services, handy tools that satisfy people's urge to communicate with each other and stay in touch and up-to-date at all times. Since more and more people realize that behind such apps and web pages naturally there are business models, society increasingly demands to reclaim control over collected personal data, which has been provided, unsolicited or involuntarily, to online merchants or telephony service providers at the time of use, for example. On the other hand collecting user information in databases is crucial as corporations offering named services see it. In order to make good products, tailored recommendations, and especially in the age of big data and machine learning, precise predictions by evaluating various functions on the data.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<a href="http://faculty.nps.edu/dedennin/publications/TrackerThreat.pdf" target="_blank">Statistical database queries</a> have been studied quite some time now, and in fact it turns out that often it is sufficient to allow query access only to a population's aggregate data, not individual records, to derive useful statistics and achieve desired functionality! A common approach is to merely allow aggregate queries (i.e. range, histogram, average, standard deviation, ...) and rather than returning exact answers about sensitive data fields to specify intervals or give imprecise, statistically noisy counts.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
You might have asked yourself, don't cryptographic techniques that have been omnipresent in this blog such as FHE (Fully Homomorphic Encryption) or secure MPC (Multi-Party Computation) solve this problem? Wouldn't it be possible to encrypt the user data yet for the service provider to compute useful statistics on it?</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Indeed, it could be realized with the general FHE, MPC toolkit, but currently it is inefficient to operate them at that scale in practice, such that statistics over huge quantities of data are useful to infer statements about a given database. Hence specific, more slender tools have been designed to overcome this gap. Whereas FHE avoids a trusted 3rd party to compute (i.e. arbitrary functions or sophisticated statistics) on users sensitive data, here typically one explicitly allows a trusted 3rd party to collect and aggregate data in a privacy-preserving fashion. Users might do so i.e. when installing an app and argue to have an advantage for themselves like good default settings, an overall performance gain; or it might be a requirement to share information in order to use the service for free in the first place.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Differential privacy (often abbreviated DP) is a framework for formalizing privacy in statistical databases. It can protect against so called <a href="https://www.cs.utexas.edu/~shmat/shmat_oak09.pdf" target="_blank">de-anonymization techniques</a> that try identifying an individual record by linking two separately released databases that have been stripped off (quasi-)identifiers and look innocuous. Especially apriori knowledge or known partial history can be leveraged to derive more information from a released "anonymized dataset" other than the purpose it was originally intended to serve.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Let's look at a mathematical definition that captures and formalizes the notion of privacy and which has been studied in cryptography in the past 10 years. Let $d,n$ be a positive integers and $f: X^n \rightarrow \mathbb {R} ^{d}$ some statistics on a database comprised of $n$ records.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
An algorithm $\mathcal {A}: X^n \rightarrow \mathbb {R} ^{d}$ that computes $f$ is said to have the $(\epsilon, 0)$-differential private attribute or ($\mathcal {A}$ is $\epsilon$-DP, for short) if for all neighboring subsets of a given database $x_{1} \neq x_{2}$ and $x_{1} \sim x_{2} := x_{1} \sim_1 x_{2}$ (they differ in just 1 element), and all subsets $S \subseteq \mathbb {R} ^{d}:$</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
\begin{align}<br />
\mathbb{P}[\mathcal{A}(x_{1})\in S]\leq e^{\epsilon } \cdot \mathbb{P}[\mathcal{A}(x_{2})\in S]<br />
\end{align}</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
holds. Looking more closely at this definition $\forall x_{1}, x_{2} \in X^n, x_{1} \sim x_{2}:$</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
$$\mathbb{P}[{\mathcal{A}}(x_{1})\in S]<br />
\leq e^{\epsilon } \mathbb{P}[{\mathcal{A}}(x_{2})\in S] \Leftrightarrow \frac{\mathbb{P}[{\mathcal{A}}(x_{1})\in S]}{\mathbb{P}[\mathcal{A}(x_{2})\in S]} \leq e^{\epsilon }\\ \Leftrightarrow \log \left(\frac{\mathbb P[\mathcal{A}(x_{1})\in S]}{\mathbb{P}[{\mathcal{A}}(x_{2})\in S]}\right) \leq {\epsilon}$$</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
we can identify the so called "privay loss" of an algorithm (or in this context often called mechanism) $\mathcal {A}$. In this setting $\epsilon$ can be called the privacy budget. In less exact terms it captures the following: By specifying the privacy budget, it is possible to control the level of privacy and make an algorithm respect this additional constraint by techniques introduced below. <br />
For those familiar with the concept of max-divergence, the definition of privacy loss is in fact the definition of $$D_\infty( A(x_1) || A(x_2)) := \max_{S\subseteq {\textbf supp}(A(x_1))} \log \left(\frac{\mathbb P[\mathcal{A}(x_{1})\in S]}{\mathbb{P}[{\mathcal{A}}(x_{2})\in S]} \right).$$</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Furthermore, the multiplicative factor $e^\epsilon$ can be -- using a common approximation for small $\epsilon<1$ -- viewed as $1+\epsilon$:</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
$$e^\epsilon = exp(\epsilon) = 1 + \epsilon + \epsilon^2 + \dots \approx 1 + \epsilon.$$</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
In less formal terms this definition says that a given result is approximately the same whether it is computed using the first or the second, neighboring database.</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
A more general definition, that adds flexibility -- but also makes the proofs less elegant and more technical -- is $(\epsilon, \delta)$ -differentially privacy, when</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
$$\mathbb P[{\mathcal {A}}(x_{1})\in S]\leq e^{\epsilon } \cdot \mathbb P[{\mathcal {A}}(x_{2})\in S] + \delta.$$</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Interpreting the definition, the goal of DP is that the risk of violating one's privacy should not substantially increase as a result of either appearing in a statistical database or not. Thus an analyst should not be able to learn any information about a record (i.e. participating an online questionnaire) that couldn't have been learned if one had opted not to participate or answered the questions randomly by rolling a die or flipping a coin rather than answering truthfully.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
To overcome the fundamental challenge -- the trade-off between utility of data or accuracy of returned answers and privacy of records -- the set goal is to learn as much as possible about a group's data while revealing as little as possible about any individual within the group. Transforming an algorithm into a DP-algorithm requires probabilistic tools. The sensitivity of a function of $f$ is a good measure of how much statistical noise is needed to mask an answer:$$\Delta f=\max_{x_1 \sim x_2} ||f(x_{1})-f(x_{2})||_{1} = \max_{x_1 \sim x_2} \sum_{i=1}^d |f(x_{1})_i-f(x_{2})_i|.$$Low sensitivity of a function, i.e. small change of output given two neighboring inputs, allows to add statistical noise to achieve privacy yet don't use utility. The <b>Laplace mechanism</b>, adds noise from the Laplace distribution $\mathcal L(\lambda)$, i.e. noise $\eta(x)\propto \exp(-|x|/\lambda)$ which has 0 mean and $\lambda$ standard deviation. Substituting Laplace noise with other probability distributions, such with a 0 mean Gaussian and $\lambda$ standard deviation would be possible, but influences proof details.</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
A typical construction now is, instead of computing $f(x)$ directly, to compute $\mathcal A(x) = f(x) + \eta(x)$ and obtain a $\epsilon = \frac{\Delta f}{\lambda} $-DP algorithm, since the noise of two neighboring databases doesn't exceed this $\epsilon$ even in the worst-case. In terms of composability, sequential respectively parallel composition leads to a sum of all occurring $\epsilon_i$ resp. maximum of all occurring $\epsilon_i$ differentially private steps within the composed mechanism. This allows to efficiently turn algorithms into DP-algorithm. </div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
These basic construction, including detailed proofs, and much more were covered during the 7th BIU Winter School on Cryptography named "Differential Privacy: From Theory to Practice" featuring speakers, who defined and contributed already roughly 10 years ago to the field. Slides are already online and video recordings about to appear on the <a href="http://cyber.biu.ac.il/event/the-7th-biu-winter-school/" target="_blank">webpage</a>.</div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Furthermore, relationships of DP to various, related scientific fields ranging from statistics to machine learning and finally game theory were explored.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Concluding, in wake of the growing awareness of privacy issues in the digital domain, together with stricter interpretation of legislation and finally the possibility to satisfy most interests by anonymized data anyways; several big players strive to provide differentially private collection of data. Some companies market themselves as quasi-pioneers in privacy topics for some reasons: It pays off to be perceived as the first one; they would be facing various problems in the near future anyways, if they don't respect these issues; and most importantly, they can continue their business model: creating value of user's data. The more information is queried from the database, the more statistical noise has to mask the correct answer in order to meet a predefined privacy budget bound. This total allowed, justifiable privacy leakage can be specified in the number of admissible queries or the answer accuracy.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
Provable cryptography avoids the situation of mere obfuscation that can be undone by a clever enough attacker / strategy -- given the security assumption holds -- and provides bounds and thus a guideline on how to choose parameters to guarantee a desired level of privacy. Algorithms invest a given privacy budget at privacy-critical steps. With this in mind, differential privacy is an additional design paradigm for cryptographic algorithms and protocols to keep in mind.</div>
<div style="-qt-block-indent: 0; -qt-paragraph-type: empty; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
<br /></div>
<div style="-qt-block-indent: 0; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-indent: 0px;">
I'd like to end this cryptologic point of view on achieving privacy goals on the internet, as I started; with a fundamental sociological question. One thought that remains standing out is: Shall we collect as much data in the first place? Is it really necessary to predict individuals as online merchants? Do we want this ubiquitous tracking? As with advanced technologies, who's long-term effects cannot be predicted, maybe also in aggregating big data and tracking the only winning move seems to be not to collect data in the first place.</div>
Matthiashttp://www.blogger.com/profile/08311091211169735623noreply@blogger.comtag:blogger.com,1999:blog-5149574209636840277.post-57628163210477891672017-01-27T18:00:00.001+01:002017-01-27T18:00:49.987+01:00Symmetric Key Ciphers for FHE<!-- Intro -->
<p>
In recent years, small devices with limited storage and computing power have acquired more and more popularity. This phenomenon, together with the appearance of cloud services that offer extensive storage and high computing power, has made computation outsourcing a key application to study. This setting, where a user with limited resources stores data on a remote server and also asks this server to make some computation on his behalf, comes together with new challenges and concerns related to privacy and security.
</p>
<br>
<h3> A simple scenario </h3>
<p>
The user Alice encrypts her data under a homomorphic encryption scheme and sends the ciphertexts to the server (eg. cloud). Now the server computes some function (eg. mean value) on Alice's encrypted data and sends back an encryption of the result, which can be decrypted by Alice with the appropriate key. Despite its simplicity, this model still requires a lot of effort from the user. In fact, all the <a href="https://en.wikipedia.org/wiki/Homomorphic_encryption#Fully_homomorphic_encryption">FHE</a> schemes that we know of have very large keys and heavy encryption overhead. The most natural approach is then trying to free the user from the burden of complex homomorphic computations, by asking her to do only the key generation part, the symmetric encryption and the final decryption, while leaving all the rest to the powerful server.
</p>
<br>
<h3> <a href="https://eprint.iacr.org/2011/405.pdf">A hybrid framework</a> </h3>
<p>
The user Alice generates a secret key $sk^S$ for a symmetric encryption scheme, a pair $(pk^H, sk^H)$ for a homomorphic public key encryption scheme, and sends $pk^H$ to the server together with an encryption of $sk^S$ under $pk^H$. Then, Alice encrypts her data under the symmetric encryption scheme and sends the resulting ciphertexts to the server, which homomorphically evaluates the decryption circuit of the symmetric encryption scheme, thus obtaining ciphertexts under the homomorphic scheme. This last operation can be seen as a sort of bootstrapping (from one scheme to another one), but does not involve any circular security assumption. At this point, the server can homomorphically compute a certain function $f$ on the resulting ciphertexts and send back the result, encrypted under the homomorphic encryption scheme. Alice can finally recover the result thanks to her $sk^H$.
</p>
<p>
Despite numerous improvements in recent years FHE remains highly unpractical, and one of the main problems is <i>limited homomorphic capacity</i>. In fact, all the FHE schemes we know of contain a noise term that grows with homomorphic operations. The homomorphic capacity corresponds to the amount of computation that can be performed before having to ``refresh'' the ciphertext via the expensive operation of bootstrapping. It is thus of vital importance to manage the error growth during homomorphic operations.
</p>
<br>
<h3> State-of-the-art ciphers </h3>
<p>
<a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">AES</a> is a natural benchmark for FHE implementations. The homomorphic evaluations of some lightweight block ciphers such as SIMON have been investigated by <a href="https://eprint.iacr.org/2014/062.pdf">Lepoint and Naehrig</a> as well. It turns out that there is much room for improvement in optimizing the multiplicative depth. In this direction, many dedicated symmetric key ciphers with low-noise ciphertexts have been proposed.
</p>
<ul>
<li> <p>In 2015, Albrecht et al presented a flexible block cipher <a href="http://link.springer.com/chapter/10.1007/978-3-662-46800-5_17">LowMC</a>, which is based on an SPN structure with $80,128$ and $256$-bit key variants. The design philosophy is to minimize the multiplicative complexity, which is more relevant in the multi-party computation context. It also has very low multiplicative depth compared with existing block ciphers.
</p>
Later, <a href="https://eprint.iacr.org/2015/418.pdf">Dinur et al </a> mounted optimized interpolation attacks against some instances of LowMC. More precisely, some $80$-bit key instances could be broken $2^{23}$ times faster than exhaustive search and all instances that are claimed to provide $128$-bit security could be broken about $1000$ times faster than exhaustive search. These attacks show that the ciphers do not provide the expected security level. The cryptanalysis also led to a <a href="http://eprint.iacr.org/2016/687.pdf"> revised version of LowMC</a>.
</li>
<li> <a href="https://eprint.iacr.org/2015/113.pdf">Keyvrium</a> was designed by Canteaut et al and is a stream cipher aiming at $128$-bit security. It is based on the lightweight stream cipher <a href="https://en.wikipedia.org/wiki/Trivium_%28cipher%29">Trivium</a>, which has $80$-bit security. Trivium and Keyvrium both achieve similar multiplicative depth compared to instances of LowMC. They have a smaller latency than LowMC, but have a slightly smaller throughput.
</li>
<li> The <a href="https://eprint.iacr.org/2016/254.pdf">FLIP</a> family stream ciphers proposed by Méaux et al have the lowest depth compared with previous ciphers. Several versions are provided including $80$-bit and $128$-bit security instantiations. The main design principal is to filter a constant key register with a time-varying public bit permutation, which allows for small and constant noise growth. However, FLIP has a much higher number of ANDs per encrypted bit than the above ciphers.
</li>
</ul>
</p>
<br>
<p>
To sum up, blending symmetric ciphers and FHE is definitely an interesting idea that could prove fundamental for bringing homomorphic encryption in the real world.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEit85iiVfuOrSIWKH07JOuL6_NXgkieAjjooswyiCO1neKi4j5MhpLrCcc5KxAPx8j7OflrnM9OyYiyl-Ahuz9jIn9Cnqh9zdmNRU_6NeSj04fqOtGe4mrIhZ5kIOaNDCsrFJ5vmInuH6Q/s1600/chinese-new-year.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEit85iiVfuOrSIWKH07JOuL6_NXgkieAjjooswyiCO1neKi4j5MhpLrCcc5KxAPx8j7OflrnM9OyYiyl-Ahuz9jIn9Cnqh9zdmNRU_6NeSj04fqOtGe4mrIhZ5kIOaNDCsrFJ5vmInuH6Q/s400/chinese-new-year.jpg" width="400" height="150" /></a></div>
<i> The other fellows also made contributions to this blog post.</i>Chaoyun Lihttp://www.blogger.com/profile/13709110456124646740noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-5366239760600596582017-01-16T17:47:00.000+01:002017-01-16T18:09:27.328+01:00WhatsApp backdoor? Should I be worried? <h2>
<i>WhatsApp Backdoor!!! </i></h2>
Let's start this blog entry with that title and it is very easy to get reader's attention (Especially in the security community) with big bold letters and saying that there is a backdoor. I say this because it was exactly what happened on January 13th (Friday the 13th, maybe it is correlated) when The Guardian released <a href="https://www.theguardian.com/technology/2017/jan/13/whatsapp-backdoor-allows-snooping-on-encrypted-messages">the news</a>. After that, the security community/websites and others started a big debate about it. In fact, some hours after the release from the Guardian others websites started saying that it isn't a backdoor such as: <a href="https://gizmodo.com/theres-no-security-backdoor-in-whatsapp-despite-report-1791158247">Gizmodo</a>, <a href="https://whispersystems.org/blog/there-is-no-whatsapp-backdoor/">Open Whisper System blog</a> and others' websites.<br />
<br />
Well, we are going to give some "conclusions about the subject" later. First, let's explore more about it.<br />
<br />
<h3>
What is a backdoor?</h3>
If we talk about computer systems or cryptography, we are going to have something like this definition: "A backdoor is a method, often secret, of bypassing normal authentication in a product, computer system, cryptosystem or algorithm etc. Backdoors are often used for securing unauthorized remote access to a computer, or obtaining access to plaintext in cryptographic systems." In other words, it is an easy way to obtain access.<br />
<br />
<h4>
Are backdoors common?</h4>
Unfortunately, yes it appears in the news and computer systems more than we want. Also, after the Snowden revelations the term backdoor is very common in the non-academic area. However, it does not mean that people become more worried.<br />
<br />
<h3>
What is the real problem?</h3>
Perfect, we learned about backdoors. Now, let's explain the problem with WhatsApp. Let's see an hypothetical case, imagine that you have a smartphone and for some reason (fall in a lake, you have been abducted by E.T. and forget your phone in the spaceship) you don't have access to your smartphone anymore. Fortunately, you can recover your phone number, buy a new smartphone and reinstall Whatsapp. However, when you reinstall WhatsApp, it generates a new key. However, now we have a problem what the app should do with the messages that you received when you were offline. The app could just drop the messages and never deliver or it could ask for your contacts to encrypt with the new key. According to the Guardian this approach is the backdoor:<br />
"WhatsApp’s implementation automatically resends an undelivered message with a new key without warning the user in advance or giving them the ability to prevent it."<br />
<br />
I, personally, say that it is a mistake that WhatsApp just show a notification saying that the key change and automatically resends.<br />
<br />
<h3>
Is it exploitable?</h3>
I will say: Yes, it is.<br />
Let's see how it is possible to exploit.<br />
<br />
In fact, it consists in exploit another vulnerability, it is present in the SS7 protocol. The Signalling System No. 7 (SS7) is a set of telephony signaling protocols developed in 1975, which is used to set up and tear down most of the world's public switched telephone network (PSTN) telephone calls. It also performs number translation, local number portability, prepaid billing, Short Message Service (SMS), and other mass market services.<br />
<br />
The SS7 vulnerabilities are known since 2008. However, it had the attention from media in 2014, Tobias Engel at CCC showed how it is possible to exploit SS7 vulnerabilities. You can check exactly how in this <a href="https://media.ccc.de/v/31c3_-_6249_-_en_-_saal_1_-_201412271715_-_ss7_locate_track_manipulate_-_tobias_engel">link</a>. In the video, he shows that it is possible to change the phone number between cellphones. In the same congress and right after Tobias, Karsten Nohl showed how to read "others SMS", the talk can be followed in <a href="https://media.ccc.de/v/31c3_-_6122_-_en_-_saal_1_-_201412271830_-_mobile_self-defense_-_karsten_nohl">this link</a>.<br />
<br />
<h4>
What is the SS7 vulnerability?</h4>
Let see very quickly the structure of a GSM network, the next figure shows it.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKE8YKYVwDsKv4BOq_e0gzyqAFDNDZfh9OJciJkRSpCi5bFc8VBSQdIAug2FAVTSISxHuCvBPLoNNVg0Xe5sMUpBv3HYwovVSopW0fKz64u7ZsofyKw9uMLgRA9TZfknhJnMdrf9utfTQ/s1600/050416_2212_SS7protocol1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKE8YKYVwDsKv4BOq_e0gzyqAFDNDZfh9OJciJkRSpCi5bFc8VBSQdIAug2FAVTSISxHuCvBPLoNNVg0Xe5sMUpBv3HYwovVSopW0fKz64u7ZsofyKw9uMLgRA9TZfknhJnMdrf9utfTQ/s400/050416_2212_SS7protocol1.png" width="400" /></a></div>
<br />
So, if you are inside of the SS7 network, you can just start listen the network and as we saw in the videos it is possible to change somethings such as your number. <br />
<br />
Since it is possible to change your number and the verification from whatsApp can be done by call or a SMS verification. What happened if I have access to SS7 network and change my number to any other number and I reinstall WhatsApp and pass the verification? <br />
<br />
According to the policy of WhatsApp, my smartphone is going to generate a new key and all the new messages are going to be encrypted with this new key. I didn't create a new attack here, it has been done before as this youtube video shows:<br />
<div class="separator" style="clear: both; text-align: center;">
<iframe allowfullscreen="" class="YOUTUBE-iframe-video" data-thumbnail-src="https://i.ytimg.com/vi/fDJ-88e_06A/0.jpg" frameborder="0" height="266" src="https://www.youtube.com/embed/fDJ-88e_06A?feature=player_embedded" width="320"></iframe></div>
<br />
<br />
It is valid to say that this attack can be used in others messengers. However, others apps have another policy, e.g. Signal drops the messages until you "recheck" the new key.<br />
<br />
If you are interested to see the how messengers handle this key problem, there is a nice blog entry at <a href="https://medium.com/@pepelephew/a-look-at-how-private-messengers-handle-key-changes-5fd4334b809a#.uhubmhxx1">Medium</a>.<br />
<br />
<h2>
Conclusions? </h2>
The news from Guardian is not a real backdoor, it is a known problem about handling with keys and the news was more like a "click bait". However, the good thing about all of this history are the discussions in different communities and an alert to people take care about their privacy.<br />
<br />
All this incident makes us to start doing some social and technical questions, such as:<br />
Should we accept the news from the media as truth?<br />
Why the hell we continue to use a system from 1975?<br />
Can we create a better way to handle with keys? <br />
<br />
<br />
In addition, I should say that if you are looking for more privacy to your life. You should check <a href="https://www.privacytools.io/">Privacy Tools</a>, in this case, the section about "Encrypted Instant Messenger". <br />
Anonymoushttp://www.blogger.com/profile/16974975055367272535noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-50146060096277969342017-01-11T14:52:00.001+01:002017-01-11T14:52:40.551+01:00RWC 2017 - Post-quantum cryptography in the real-world<div class="MsoNormal" style="text-align: justify;">
A new year takes off and, along with it, thousands of resolutions are formulated. Although I am not the right person to talk about them (my diet will begin next Monday), I wish to discuss a resolution that the cryptographic community as a whole has set for itself in this 2017. Because that's what people do at Real World Crypto (RWC): they talk about new threads, topics could be worth exploring during the new year, directions for their researches and interests. This year, for the first time in RWC, post-quantum cryptography (PQC) was given an entire session, clear sign that time is changing and the moment has come to bring the discussion to the real world. The message is clear: even if quantum computers are not popping up in tomorrow's newspapers, we can't postpone any longer.<o:p></o:p></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
A very simple reason for this was given by Rene Peralta, of the NIST PQC team, during the overture of the session: standardisation takes time, up to seven years if we start right now, and full transition takes even longer. I found Rene's presentation to be neat and direct: our public-key cryptography fails against quantum computers and our symmetric one needs some (non-drastic) modifications. The resolution is to "start thinking about it this year, possibly by November 30<sup>th</sup>, 2017". However, a question arises quite naturally: are we ready?<o:p></o:p></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
The other three talks of the session tried to answer in the affirmative. Among the several PQC proposals that are around in theoretical papers, two made their ways into RWC: the well-stablished lattice-based cryptography and the new-born isogeny-based cryptography, which nevertheless carries the pride and sympathy of ECC.<o:p></o:p></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
<u>Lattices and funny names: NewHope and Frodo and Crystals<o:p></o:p></u></div>
<div class="MsoNormal" style="text-align: justify;">
Lattice-based cryptography has three representatives in the run for PQC schemes. Valeria Nikolaenko showed two: the first one is called NewHope and is a key agreement protocol based on the hardness of Ring-LWE. The latter is a problem very favourable to applications because it combines sound theoretical security (worst-case to average-case reduction) to fast implementations thanks to specific choices of parameters which allow for speed-ups in the computations: NewHope turns out to be even faster than ECC and RSA, but at the price of a larger communication. However, there are some concerns on the security of LWE when the ring structured is added. Thus, Frodo ("take off the ring") is designed to achieve the same goal using only standard LWE. The drawback is a degradation in performance, since the tricks hinted above cannot be used anymore and keys are generally bigger.<o:p></o:p></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
The third lattice-based scheme was presented by Tancrede Lepoint and is a suite called Crystals. This is based on yet another kind of lattices: module lattices, for which it is also known a worst-case to average-case reduction. These are less structured lattices (hence possibly calming down the detractors of ring structure) in which similar implementation speed-ups are possible: the timing is indeed comparable to NewHope's, while the communication is improved.<o:p></o:p></div>
<div class="MsoNormal" style="text-align: justify;">
<br /></div>
<div class="MsoNormal" style="text-align: justify;">
<u>"Make elliptic curves great again"</u></div>
Michael Naehrig presented a new proposal for PQC: do you remember curves with plenty of small subgroups where to easily solve the discrete logarithm problem? Now they come in handy again: all the subgroups (of order 2 and 3) are considered to be nodes of a graph, whose edges are the isogenies (a.k.a. bijetive homorphisms between curves). In this new context, given two curves in the graph, it is difficult to come up with the isogeny linking the two. However, such a new approach doesn't really stand against other solutions: keys are small but performance is not a pro (so to speak).<br />
<br />
[The same blog post appeared <a href="http://bristolcrypto.blogspot.be/2017/01/rwc-2017-post-quantum-cryptography-in.html">here</a>.]Anonymoushttp://www.blogger.com/profile/08942314440467161264noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-70726951712852263842016-12-29T23:34:00.001+01:002016-12-29T23:34:01.720+01:00[33C3 TLDR] PUFs, protection, privacy, PRNGs<i>Pol van Aubel</i> gave an excellent introduction to Physical(ly) Unclonable Functions from both a modern and historical perspective. If you want to know how to exploit intrinsic random properties of all kinds of things, give this talk a shot and don't be afraid of all the footnotes.<br />
<br />
<i>Takeaway: </i>PUFs are kinda neat and if you're interested check out this talk.Erik Bosshttp://www.blogger.com/profile/09790534761609367831noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-60100873095977104152016-12-29T22:05:00.003+01:002016-12-29T22:05:41.690+01:00[33c3 TLDR] Decoding the LoRa PHY<br />
Matt reverse engineered the PHY layer of the proprietary and patented LoRa techology and even wrote a GNUradio plug-in for it.<br />
<br />
He also gave an excellent introduction to basics of the LoRa technology.<br />
Actually, it was somehow an introduction to basics of radio technology in general and I highly recommend watching his talk.<br />
<br />
<a href="https://media.ccc.de/v/33c3-7945-decoding_the_lora_phy">Video</a>Simon Friedbergerhttp://www.blogger.com/profile/11398138941386358772noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-59181112566191557222016-12-29T17:27:00.000+01:002016-12-29T17:27:09.794+01:00[33c3 TLDR] Making Technology Inclusive Through Papercraft and SoundA talk by <i>bunnie</i> introducing their <i>Love to Code</i> program, a program for teaching children (adults too) to code using cheap papercraft-inspired devices and tools. The argument is that the diversity problem that exists in coding can maybe be fixed by starting at the earliest stages, i.e., starting with children.<br />
<br />
From a technical perspective they had to solve some interesting problems to make the technology fast, cheap and usable. In particular, they have a pretty awesome technique using sound to upload code to the microcontrollers they use.<br />
<br />
<i>Takeaway: </i>Pretty cool technology for a societally relevant goal. Check it out.Erik Bosshttp://www.blogger.com/profile/09790534761609367831noreply@blogger.com0tag:blogger.com,1999:blog-5149574209636840277.post-8495250772443611672016-12-29T17:07:00.000+01:002016-12-29T17:09:50.075+01:00[33c3 TLDR] Tapping into the coreAn interesting talk about how to abuse the USB to JTAG bridge available in newer Intel processors to attack computers through their USB port.<br />
<br />
It is not entirely clear to me if this is "just" another case of somebody forgetting to turn of a debug facility and leaving it as a security hole or if there are actually cases when it cannot be disabled.<br />
<br />
<a href="https://media.ccc.de/v/33c3-8069-tapping_into_the_core">Video</a> <br />
<br />
<br />
<br />
<br />Simon Friedbergerhttp://www.blogger.com/profile/11398138941386358772noreply@blogger.com0