Wednesday, July 12, 2017

Looking for fast OpenSource algorithms on lattices? Try fpLLL!

Dear ECRYPT-NET fellows and readers,
I have some news from CWI @ Science Park in Amsterdam where fplll-days-3, organized by Leo Ducas and Marc Stevens, are currently taking place!

Previously held at ENS Lyon, this is the third time already for such a combined effort to enhance the fplll OpenSource project. fplll has become a lively project with many suggestions that help to debug and feature requests for continuously improving the code-base in various directions.

As a brief history of fplll it can be noted that the first code was written by Damien Stehlé. It is now written by many active contributors (according to GitHub, the most active developers are: Martin Albrecht, Shi Bai, Damien Stehlé, Guillaume Bonnoron, Marc Stevens and Koen de Boer) and maintained by Martin Albrecht and Shi Bai.

What does fplll do?

What fplll does, depending on additionally specified parameters, is performing its implementation of the LLL algorithm using fast floating-point arithmetic under the hood. Other available lattice reduction algorithms are HKZ, BKZ reduction and variants. These algorithms can be applied on an input lattice represented by a matrix, given i.e. in a file as a set of row vectors, to obtain a reduced representation --- a versatile starting point of numerous applications. Furthermore, fplll allows to solve (arbitrarily adjustable approximate-) SVP and CVP instances, when used to find a shortest lattice vector relative to a user-chosen center.

To get started, one can not only use and compile the fplll C++ sources to run experiments, but the often dubbed 'user-friendlier variant' fpylll which provides Python access to the underlying, fast C++ functions. Finally, every mathematician's dear, Sage, (at least for anyone who isn't fully satisfied by pure Python) benefits from an improved fpylll as well, because importing the fpylll module seamlessly allows direct usage within Sage. Soon a new Sage version, SageMath 8.0, will be released, which ships the current fpylll module that accesses said, fast C++ routines.

The significance of lattice-based cryptography is repeatedly mentioned in paper's abstracts and has been explored in past ECRYPT-NET blog posts i.e. on 'What are lattices?' and 'Learning problems and cryptography: LPN, LWE and LWR'.

Significance for cryptanalysis

From a cryptanalysts point of view, the significance lies in the fact that most security models of lattice-based cryptography typically assume lattice reduction to be the most promising attack-vector against the underlying lattice-based primitive. Some security models are able to immediately (and provably) rule out certain classes of attacks and, for instance, a few others can be argued to be less promising than known formulations as lattice problems. Such arguing hence leads to fplll basically representing the SVP/CVP-oracle and it's performance is deemed as a lower bound for the practical performance of an attack. Typically attacks require many calls to such an oracle function, thus such an approach of taking the time of a single run as a lower bound is used to set parameters in experimental cryptosystems, when commonly a more conservative
lower bound including a security margin is chosen. Specifically, many proposed lattice-based crypto-schemes have been parametrized such that these best known-attacks are taken into account.

I suppose, I do not need to point out the numerous advantages of OpenSource software (over closed-source projects) but and its value to the research community the significance of having freely-available, fast lattice reduction routines is manifold.

To begin with, there is a discrepancy between what theory predicts and algorithmic performance in practice. Techniques described in the literature, summarized as BKZ2.0, leave a broad range of implementation choices. Different groups using different software and metrics where their approach is supreme, naturally lead to results that are hard to compare. If there was software that comes with meaningful defaults for many standard lattice tasks, is customizable, and extensible to individual lattice solutions, then there is hope that the community can agree on problem instances. Ideally, problem instances should cover deployed or model experimental cryptosystems such that they embody a meaningful benchmark for new designs.

Originally, fplll was trying to provide such algorithms with reasonable speed. Recently, developers broadened theirs goals and try to fill gaps of cryptanalytic research. Concretely, now fplll strives for speed from low level optimizations, and by implementing diverse techniques from the literature hence catching up with the state of the art. Additionally, it can be easily tweaked on a high algorithmic level with the Python layer fpylll, yet easily exploiting all the available optimized routines boosting the performance. One can argue that together with diverse lattice challanges this project helps to benchmark and compare various efforts to cryptanalyze cryptographic primitives used in cryptosystem's constructions.

A couple of Lattice Challenges have been proposed (SVP-, Ideal-, LWE- and Ring-Challenges) and it seems that researchers also test their code on these instances, which aids a comparison of approaches.

Having them conveniently accessible and high-level, fast lattice operations allows to quickly try out a new idea, or slightly different approach which saves time and hopefully makes researchers willing to share their tweaks and algorithmic tricks more often in the future.

The workshop

To come back to the start, the fplll-days are meant to be a hands-on, work-oriented workshop that enables direct discussions with core developers and with the goal to improve existing functions and the many algorithms involved. The general idea behind this meeting is to optimize often used routines, make it user-friendlier and accessible to cryptanalysts, for example.

By using code profiling tools, performance and memory usage bottlenecks can be spotted in a first overview, which allows to direct efforts where they might lead to significant speed-ups. After discussing know issues and useful features, this workshop tries to provide an implementation of numerically stable algorithmic variants to push the dimension LLL can handle (like Givens rotations while resorting only to machine floating point type), sophisticated pruning strategies to speed-up enumeration, and implementing sieving algorithms --- all as a promising new direction in finding short vectors faster.

It is exciting to join and shape such a project so let's hope for many interesting projects that got started and delegated here to be completed during this week and further interested researchers turning into active users joining the party and coming up with meaningful, reproducible research results. Remember that Newton "has seen further, by standing on the shoulders of giants" thus achieving progress, and so you too are encouraged to become active, using an already established framework!

Thursday, July 6, 2017

A Brief Survey of Physical Attacks on Cryptographic Hardware

Previously, the topic of side-channel attacks (SCA) was covered on this blog. These attacks are very popular, for they can be mounted using very cheap equipment and do not necessarily require high level of expertise. Hence, SCA are widely accessible and present a common danger. As a result, they are well researched, and various countermeasures have been developed. Still, they are just a small part of the stack of physical attacks. Figure 1. crudely depicts the this colorful “stack”. The one thing all physical attacks have in common is that it is assumed that the attacker must gain physical access to the target device, and attain it for a certain amount of time. In the remainder of this post, a brief survey of these attacks will be given. More detailed descriptions will be provided in a series of posts that will follow. 



Figure 1: Stack of Physical Attacks

Invasiveness
The first segregation is based on the “invasiveness”. Invasive attacks entail breach of target’s packaging, or its surrounding enclosure. This is often a very delicate process which often requires expensive equipment and a high level of expertise. Since the breach is destructive by nature, it can be easily detected by subsequent users — if the chip itself was not destroyed in the process that is. The goal of this breach is to gain access to internal state of a chip. Commonly attackers target on-chip busses or storage elements, which may contain sensitive intermediaries of cryptographic computations or keys themselves. Aforementioned enclosures are a privilege of expensive devices, often called Hardware Security Modules (HSMs). HSMs may cost tens of thousands of Euros, and are envisioned to provide secure computational environments at high speeds. Apart from restricting access to the chip using sturdy build and “tamper-proof” latches and locks, enclosures are frequently equipped with seals and coatings that are supposed to witness any foul play that may have taken place. Additionally, tamper detection measures may be built in, envisioned to void all sensitive information at the first glimpse of attacker’s activities. Hence, invading these juggernauts is commonly more expensive and time consuming. Unfortunately, market-share of HSMs compared to bare smart-cards and RFIDs is neighboring negligible, especially with the rise of the IoT.

On the contrary, non-invasive adversaries do not cause any structural damage to packaging nor enclosures. They interact with the target device using its existing interfaces, and mediums that require no mechanical interaction with the device. They are virtually free, but may require significant expertise of attackers.

Activeness
The second segregation is based on the “activeness” of the attacker. Active attacks entail induction of computational (logical) or structural changes in the target chip. When we talk about computational changes, a very common example are Fault Injection (FI) attacks. There are two phases to FI attacks: fault injection during the execution of the targeted algorithm, and the analysis based on the observations of faulty outputs. A common method for altering device’s execution is called clock glitching. Namely, by introducing a premature edge on the clock signal, attacker violates devices’s critical path. As a result, incorrect values are captured in device’s registers. Alternatively, faults can be induced by shooting a laser beam with enough power to change the state of the device, while allowing it to remain operational. Here, any data or control register fall under “state of the device”. For example, round counter, commonly used in implementations of block ciphers, is a very favored target for such faults. Active attacks may require higher level of technical skill, and a more sophisticated setup.

On the contrary, passive adversaries may only observe device’s execution, while interacting through its predefined interfaces. Well-known SCA fall under this category. These attacks are well researched, and can be mounted using very cheap equipment. Developed techniques (e.g., Mutual Information Analysis) are extremely powerful, and once incorporated in the attackers setup can be reproduced quite trivially. Consequently, although they entail only limited exposure of the device, they pose a serious threat for they are very accessible even to attackers with modest capabilities.

The Reality
Activeness and invasiveness are two orthogonal properties, resulting in a total of four possibilities (although I find that the existence of “invasive and passive attacks” calls for a philosophical debate). Unfortunately, situation is much more complex than that in practice. Firstly, attackers are likely to use combined attacks. For example, FI + SCA may be a very powerful combination. Additionally, the distinction mentioned above is not as binary. Rather, along each of the two orthogonal axes there are many shades. For example, faults can be injected in some chips by applying laser beams to their packaging (non-invasive), while others may be shielded from such beams (hence they have to be attacked invasively).

Consequently, there exists a myriad of possible attack variations. Moreover, even if we lock on a certain extreme — let us say passive, non-invasive, CPA — quality of the measurement setup plays a very significant role. A 500 Euro oscilloscope can hardly match its 30000 Euro counterpart. In hindsight, there are no upper bounds to the power of a skilled invasive attacker performing a battery of active and passive attacks, apart from the temporal and financial constraints.


Taking all above into account, choosing a set of countermeasures is a difficult task (let alone implementing them properly). Bare in mind that these countermeasures are not for free. They may significantly increase the price of devices, reducing the profit margins severely. Therefore, there are no silver bullets in protection against physical attacks. In other words, in practice security engineers work to demotivate attackers with high probability. They try to stop “the attacker of interest”, rather then stopping all attacks. To achieve this, first step is identifying potential attackers. This process is often called profiling, and in a nutshell I would describe it as follows. Please note that this is a gross simplification of the problem, meant to depict the general idea. No distinction is made between fixed (price of the setup) and recurring (every time the attack is mounted) costs, nor between temporal and financial costs. Lastly, please note that the value of assets is heavily simplified as well, for the sake of avoiding a philosophical discussion yet again.


Manufacturer’s Dilemma
Assume that a device D, which costs d to manufacture, protects assets worth x , and features a
countermeasure C that costs c to deploy. We may consider D to be secure against an attacker A, who can mount a successful attack at a cost a (which includes A’s investment in
development of expertise), as long as
x a+μA,

μA being the attackers profit margin. In other words, if the cost a is high enough attacker can not obtain desired amount of profit for given assets. On the other hand a manufacturer M that produces D wants to sell for a price m such that
m d + c + μM,
μ M being M’s profit margin. In other words, price of deploying countermeasures c directly cuts into manufacturer’s profits. Looking at these inequalities, it seems that there is no dilemma at all. Nevertheless, cost of attack depends on the selection of a countermeasure, i.e.,
a =f(c).
Assuming that an increase in c leads to the increase in a , by applying some high school math (readers are welcome to play with it), we see that the selection of C must be performed based on the value of assets it protects. A more detailed discussion on this topic will be given in one of the following posts.
In conclusion, physical attacks are a great threat. As IoT progresses, and the amount of ubiquitous devices increases their potential impact may only grow. Deploying devices that protect assets against physical attacks is a complex problem, which demands bespoke solutions, tailored to individual use cases.