## Wednesday, February 21, 2018

### Fast Fully Homomorphic Evaluation of Neural Networks in the Cloud

In order for Fully Homomorphic Encryption (FHE) to be deployed in real-world applications, still today --- even if a theoretical solution has been around for almost 10 years --- it is required to increase the efficiency of used algorithms. As the interactions of parameters and components of nowadays lattice-based realizations of FHE are non-trivial, schemes once set up to meet a multitude of design constraints, often end up having high requirements. Too high for some "killer"-application as run-times may pose a prohibitive hurdle.

In this blog-post, I'd like to present a use-case where an Fully Homomorphic Encryption (FHE) scheme achieves unprecedentedly fast classification of encrypted data, and makes scale-invariant homomorphic evaluation of neural networks (NN) possible.

### Are privacy-preserving services in the Cloud relevant?

At this point, I think, I can skip philosophizing about the ubiquitous utility of machine learning, as we can see its impact everyday all around us. On the other hand the quest for privacy-preserving application of machine learning algorithms to user data is becoming a central topic of discussions recently. It is expected that the General Data Protection Regulation (GDPR), a result of the call for European law (which could serve as paragon internationally) to protect its citizens, its economy, will push forward innovation in this direction too.

Simply put, users of Machine Learning as a Service (MLaaS) in the Cloud, want to only share & upload encrypted images as input to the companies' powerful, pre-trained cognitive models.

Clearly, encrypting content ensures data confidentiality, assuming the associated private key of the  public-key encryption scheme never leaves the user’s trusted device. (As a side note; recent news reports suggest that such an assumption for user controlled devices are not always guaranteed. The Cloud operating on FHE encrypted data on the other hand is not possibly vulnerable to leak private user data through the whole class of cache attacks, i.e. Meltdown and Spectre.)

### Let's briefly look at the problem setting.

To overcome conflicting interests of confidentiality and utility of data in the Cloud-based scenario, Fully Homomorphic Encryption can help the user to receive a useful answer to their encrypted question in a privacy-preserving way. Hence, the cloud needs to support homomorphic computations on the FHE encrypted inputs and send back the still encrypted result of this delegated operation in a reasonable time. In principal, only the legitimate user can decrypt the output using their secret key. The cloud service cannot deduce information from the random looking inputs, intermediate or final results, but can still charge the user for providing the service, e.g. classifying an image in this example.

### Let's briefly look at the task.

First step when approaching a solution of how to use FHE for NN is defining minimal requirements of the concrete task and knowing what can be considered practical FHE.
We want to showcase fast homomorphic evaluation of a pre-trained NN to classify a depicted shape without leaking privacy of the input data at an 80 [bit] security level, e.g. images of handwritten digits from the MNIST dataset.
The output, given in less than two seconds, shall be encrypted scores assigned to each possible output and the highest score, decrypted by the user, is the most probable label of the input image.
 A depiction of how input is propagated in order to evaluate a discretized deep neural networks with an arbitrary depth $d$ of hidden layers to arrive at a classification. Each neuron performs operations $f_i$; a function linearly depending on values of the incoming wires and weights followed by a non-linear operations. The latter is typically referred to as activation''.

As deep neural networks with $d$ hidden layers give good results in practice, we target this type with a scale-invariant FHE scheme.

### Let's look at the problem solution.

In an attempt to bringing forth FHE in practice, our C++ code builds on top of an existing Fast Fully Homomorphic Encryption Library over the Torus (TFHE) and introduces a new framework for homomorphic evaluation.

To increase the efficiency, which is an important step paving the way to practicality, the underlying FHE scheme needs to be parametrized once for a given network.
Secondly, a security analysis is another crucial step in vetting the algorithms, ensuring their use maturely resists state-of-the-art cryptanalysis and fulfills the targeted security level.

The main capability of our scheme is that when evaluating a single neuron, the output value can readily be used for the next operation as it is bootstrapped to ensure low error propagation.
 Close-up on a single neuron.
We apply the activation function directly to the weighted sum of inputs according to the network's wires, i.e. computing $y = f(x) = sign( \langle x, w\rangle )$, with fixed weights for the neuron and sign as activation.

Scale-invariance means that privacy-preserving evaluation of deep neural networks do not longer pose a hurdle, as computations carried out by every neuron in the network is independent of the total number of neurons and layers and hence scales linearly.

With this approach, we can report the performance result of an experiment to classify 10000 encrypted images from the MNIST dataset with more than 96% accuracy on average taking less than 1.7 seconds, using the TFHE library as a starting point.
 Running an experiment on a trained neural net with 784:100:10--topology deployed in the Cloud.
An uploaded encrypted test image is input to the homomorphic evaluation of our scheme that classifies a depicted shape (without leaking privacy of the input data). The evaluation of the neural network outputs the encrypted scores $S_i$ assigned to each digit $i$. The highest score, decrypted by the user, is the most probable label of their image.

I'd like to stress that the good performance of this scale-able approach is not limited to homomorphic evaluation of neural networks with one hidden layer, as depicted above, but can be applied to deep neural networks, that in practice could be composed of possibly a hundred hidden layers or an even broader class of cognitive models.

For a detailed, formal description I refer to the full version of the paper or you may try out the proof-of-concept implementation code, available online that shows how to obtain these research results, applying our generic framework to a trained NN and MNIST dataset inputs as a demonstration.

### Let's look at directions for future work and open-questions.

Finally, mentioning limits on the functionality of our FHE scheme, and pointing out the applicability to other well-specified domains rounds off this treatment here.

To comfort a potential concern for the service providers that their users might be sending malicious requests, to evaluate private networks with our framework is not dealt with at the moment, although it is in principal possible.
They could either try to learn the company's intellectual property (the weights and the topology of the neural network itself), or try to derive sensitive information encoded therein (which could be a breach into the privacy of the training dataset).
In this latter case a statistical databases studied in the differential privacy literature can be used in the training phase.

An open question is how further performance gains can be achieved by refining the algorithms. Also listing all general cognitive models that are possible is interesting.

So long, stay tuned for faster solutions and more general demonstrations!

## Thursday, November 2, 2017

### Compressing ledgers of financial transactions

The history of modern banking begins almost 550 years ago, with the establishment of Bank Monte dei Paschi di Siena, in nowadays Italy. However, it wasn't until 1980s (early introduction of home banking), when pioneering financial institutions started to make use of computing machines to automatically process part of their financial transactions, and thus replace the manual, more error-prone process. As the availability of the Internet increased, starting with early 2000s, many major banks began to offer Internet banking to their customers. This means that one can access his/her account's balance or history through a web-browser or smartphone and initiate or receive transactions.

Nowadays, a number of transactions in the order of millions are processed on a daily basis by a large bank. As the time passes and more and more transactions are processed, the size of this set only increases. Thus, one should think at potential solutions for removing the old and useless ones, such that they can be safely archived. The story repeats for the case of crypto-currencies (with the corresponding modifications - multiple senders or receivers allowed, accounts replaced with addresses). If one considers for instance the size of transactions processed by a well-known crypto-currency, in the first six years of its lifespan, the size of the ledger reached aound 95 GB, while only in the first 9 months of 2017, more than 40 GB were processed. Such a growth implies increased computational costs while verifying the actual balance of a specific address, as a traversal through the set of addresses needs to be employed.

A first easy step one can do is to try to switch from the representation of transactions as lists, to a more visual one, which represents accounts as nodes in a graph and transactions as edges. Since multiple edges are allowed, the graph becomes in fact a multigraph. In their work, recently presented at SecureComm17, Rémi Geraud, David Naccache and Răzvan Roșie put forward the problem of finding "nilcatenations" in sets of transactions. Loosely speaking, a nilcatenation is a subgraph in a given multigraph with a special property: the balances of the nodes are zero, for each existing account. Stated differently, every single user part of a nilcatenation receives the same amount of money that it gets. Since the balances in such components do not affect the global balance of the original multigraph, nilcatenations can be decoupled and archived.

Some interesting observations can be made about nilcatenations. Any occurrence of such a component can be only as a part of a strongly connected components (SCC): a maximal subgraph of a directed graph with paths between any two nodes. Another observation can be made regarding simple, obstructing nodes: if we identify nodes with the in and out degrees set to 1, but with different weights for incoming and outgoing edges, then such nodes cannot be included in a nilcatenation (we dub them "first order obstructions"). After clearly formalizing it, the problem turns out to be be NP-Complete, via a reduction to the 0-target SSP problem (which is equivalent with SSP).

After pruning the original graph into smaller components (by employing SCC-split and first-order obstruction removal steps until convergence), one can benefit from known techniques in attacking the (multi-dimensional) subset-sum problem for each component, independently. Particularly, we can see the problem of finding nilcatenations as a multi-dimensional version of the SSP, and tackle each component independently. The known techniques employing the usage of an SVP-oracle the density work on low density instances. An overview of our heuristic is given in the second picture. More details can be found in the original work.

## Quantum Simulator? Quantum Computing?

We heard a lot about quantum computers and what happens when one is build. We have several algorithms that can run on this computer and it will "shift" with cryptography. However, can we simulate a quantum computer? Can I run quantum algorithms already?
The simple answer is: yes, for both questions.

Yes, the simulation is possible. However, that does not mean that we can break the crypto. The simulators that are available to use have several restrictions. Such as:
• Number of operations;
• Size of circuit;
• Number of qubits that is possible to use;

### Simulators, Special languages and etc...

#### IBM Q Experience

IBM developed "IBM Q experience" where you can design your circuit and run. The circuit will run on their computers and when it is ready, i.e., when your computation finish you will receive an email. In addition of the simulator, IBM added a nice introduction about quantum algorithms.

#### Limitations

The problem with the IBM Q  is that it just allows you to use 16 qubits as a member of the community or 17 qubits for commercial use. Also, if you use the community edition sometimes you could be "stuck" in the queue. The reason is that with the community edition you share the "quantum simulator" with the other users and it cannot run more than a certain number of circuits.

#### Microsoft Liquid

Microsoft is in the running of "quantum simulators", in fact, they developed their own language for quantum simulation, they created Liquid.  It is "cross-platform", i.e., you can run in Windows, Mac OS or Linux. However, if you decide to use in your Linux distro, you will need to download Mono and then run an executable compiled for Windows.

The good side of "The Language Integrated Quantum Operations Simulator" is that it is based in "F#" and it came with a lot of examples such as "Quantum Teleportation", "Shor's Algorithm" and others algorithms.

In fact, the paper Quantum Resource Estimates for Computing Elliptic Curve Discrete Logarithms uses liquid to implement an attack on ECC even with the following limitations.

#### Limitations

Liquid allow you to use just 23 qubits. However, the paper before used more than this amount, the reason (when I asked on github) was that the authors are from Microsoft and they can use as much as they need.

### LibQuantum

The libquantum is a library written in C that you can extend and use to create your circuit. It is considerably fast and easy to use. The good side of this library is that you can use as many as qubits you want, the only problem is that if you use a big amount of qubits your "program" will be very slow and it will use a big quantity of memory.

When I am running tests, I am using this library. I am going to create a small tutorial to use this library. I am assuming that the reader/user is on linux distribution and has a little bit of knowledge of C. First step is to download the library: www.libquantum.de

First, let's create a register and do some operations:

In the code, we first create a quatum register with initial value in 0 and with 2 qubits. After, we print the content of the register, in this example it will be "0" and finally we do some operations with the register. We perform a simple "CNOT" gate but because the value of the register is "|00>" nothing will change. If we change the value in the 4th line from 0 to 1, in the end we will have a register with "|11>".

The library provides other operations such as: Toffoli, Sigma X, Sigma Y, Sigma Z, phase scale, phase kick and hadamard.

In the code below we are going to have a more "complex" circuit:

Now, we initialize the quantum register with the value "|10>" and perform first a Hadamard transform to put the qubit 0 in superposition and after we aply a cnot gate with the value from qubit at position 1 to the target qubit at position 0. In the end, we are going to have a composed state as:
0.707107 +0.000000i|3> (5.000000e-01) (|11>)

0.707107 +0.000000i|0> (5.000000e-01) (|00>)

The library came with more examples that you can compile and run by yourself. I hope that you like this quick tutorial how to simulate quantum computers in classical computers.

## Wednesday, July 12, 2017

### Looking for fast OpenSource algorithms on lattices? Try fpLLL!

I have some news from CWI @ Science Park in Amsterdam where fplll-days-3, organized by Leo Ducas and Marc Stevens, are currently taking place!

Previously held at ENS Lyon, this is the third time already for such a combined effort to enhance the fplll OpenSource project. fplll has become a lively project with many suggestions that help to debug and feature requests for continuously improving the code-base in various directions.

As a brief history of fplll it can be noted that the first code was written by Damien Stehlé. It is now written by many active contributors (according to GitHub, the most active developers are: Martin Albrecht, Shi Bai, Damien Stehlé, Guillaume Bonnoron, Marc Stevens and Koen de Boer) and maintained by Martin Albrecht and Shi Bai.

### What does fplll do?

What fplll does, depending on additionally specified parameters, is performing its implementation of the LLL algorithm using fast floating-point arithmetic under the hood. Other available lattice reduction algorithms are HKZ, BKZ reduction and variants. These algorithms can be applied on an input lattice represented by a matrix, given i.e. in a file as a set of row vectors, to obtain a reduced representation --- a versatile starting point of numerous applications. Furthermore, fplll allows to solve (arbitrarily adjustable approximate-) SVP and CVP instances, when used to find a shortest lattice vector relative to a user-chosen center.

To get started, one can not only use and compile the fplll C++ sources to run experiments, but the often dubbed 'user-friendlier variant' fpylll which provides Python access to the underlying, fast C++ functions. Finally, every mathematician's dear, Sage, (at least for anyone who isn't fully satisfied by pure Python) benefits from an improved fpylll as well, because importing the fpylll module seamlessly allows direct usage within Sage. Soon a new Sage version, SageMath 8.0, will be released, which ships the current fpylll module that accesses said, fast C++ routines.

The significance of lattice-based cryptography is repeatedly mentioned in paper's abstracts and has been explored in past ECRYPT-NET blog posts i.e. on 'What are lattices?' and 'Learning problems and cryptography: LPN, LWE and LWR'.

### Significance for cryptanalysis

From a cryptanalysts point of view, the significance lies in the fact that most security models of lattice-based cryptography typically assume lattice reduction to be the most promising attack-vector against the underlying lattice-based primitive. Some security models are able to immediately (and provably) rule out certain classes of attacks and, for instance, a few others can be argued to be less promising than known formulations as lattice problems. Such arguing hence leads to fplll basically representing the SVP/CVP-oracle and it's performance is deemed as a lower bound for the practical performance of an attack. Typically attacks require many calls to such an oracle function, thus such an approach of taking the time of a single run as a lower bound is used to set parameters in experimental cryptosystems, when commonly a more conservative
lower bound including a security margin is chosen. Specifically, many proposed lattice-based crypto-schemes have been parametrized such that these best known-attacks are taken into account.

I suppose, I do not need to point out the numerous advantages of OpenSource software (over closed-source projects) but and its value to the research community the significance of having freely-available, fast lattice reduction routines is manifold.

To begin with, there is a discrepancy between what theory predicts and algorithmic performance in practice. Techniques described in the literature, summarized as BKZ2.0, leave a broad range of implementation choices. Different groups using different software and metrics where their approach is supreme, naturally lead to results that are hard to compare. If there was software that comes with meaningful defaults for many standard lattice tasks, is customizable, and extensible to individual lattice solutions, then there is hope that the community can agree on problem instances. Ideally, problem instances should cover deployed or model experimental cryptosystems such that they embody a meaningful benchmark for new designs.

Originally, fplll was trying to provide such algorithms with reasonable speed. Recently, developers broadened theirs goals and try to fill gaps of cryptanalytic research. Concretely, now fplll strives for speed from low level optimizations, and by implementing diverse techniques from the literature hence catching up with the state of the art. Additionally, it can be easily tweaked on a high algorithmic level with the Python layer fpylll, yet easily exploiting all the available optimized routines boosting the performance. One can argue that together with diverse lattice challanges this project helps to benchmark and compare various efforts to cryptanalyze cryptographic primitives used in cryptosystem's constructions.

A couple of Lattice Challenges have been proposed (SVP-, Ideal-, LWE- and Ring-Challenges) and it seems that researchers also test their code on these instances, which aids a comparison of approaches.

Having them conveniently accessible and high-level, fast lattice operations allows to quickly try out a new idea, or slightly different approach which saves time and hopefully makes researchers willing to share their tweaks and algorithmic tricks more often in the future.

### The workshop

To come back to the start, the fplll-days are meant to be a hands-on, work-oriented workshop that enables direct discussions with core developers and with the goal to improve existing functions and the many algorithms involved. The general idea behind this meeting is to optimize often used routines, make it user-friendlier and accessible to cryptanalysts, for example.

By using code profiling tools, performance and memory usage bottlenecks can be spotted in a first overview, which allows to direct efforts where they might lead to significant speed-ups. After discussing know issues and useful features, this workshop tries to provide an implementation of numerically stable algorithmic variants to push the dimension LLL can handle (like Givens rotations while resorting only to machine floating point type), sophisticated pruning strategies to speed-up enumeration, and implementing sieving algorithms --- all as a promising new direction in finding short vectors faster.

It is exciting to join and shape such a project so let's hope for many interesting projects that got started and delegated here to be completed during this week and further interested researchers turning into active users joining the party and coming up with meaningful, reproducible research results. Remember that Newton "has seen further, by standing on the shoulders of giants" thus achieving progress, and so you too are encouraged to become active, using an already established framework!

## Thursday, July 6, 2017

### A Brief Survey of Physical Attacks on Cryptographic Hardware

Previously, the topic of side-channel attacks (SCA) was covered on this blog. These attacks are very popular, for they can be mounted using very cheap equipment and do not necessarily require high level of expertise. Hence, SCA are widely accessible and present a common danger. As a result, they are well researched, and various countermeasures have been developed. Still, they are just a small part of the stack of physical attacks. Figure 1. crudely depicts the this colorful “stack”. The one thing all physical attacks have in common is that it is assumed that the attacker must gain physical access to the target device, and attain it for a certain amount of time. In the remainder of this post, a brief survey of these attacks will be given. More detailed descriptions will be provided in a series of posts that will follow.

Figure 1: Stack of Physical Attacks

Invasiveness
The first segregation is based on the “invasiveness”. Invasive attacks entail breach of target’s packaging, or its surrounding enclosure. This is often a very delicate process which often requires expensive equipment and a high level of expertise. Since the breach is destructive by nature, it can be easily detected by subsequent users — if the chip itself was not destroyed in the process that is. The goal of this breach is to gain access to internal state of a chip. Commonly attackers target on-chip busses or storage elements, which may contain sensitive intermediaries of cryptographic computations or keys themselves. Aforementioned enclosures are a privilege of expensive devices, often called Hardware Security Modules (HSMs). HSMs may cost tens of thousands of Euros, and are envisioned to provide secure computational environments at high speeds. Apart from restricting access to the chip using sturdy build and “tamper-proof” latches and locks, enclosures are frequently equipped with seals and coatings that are supposed to witness any foul play that may have taken place. Additionally, tamper detection measures may be built in, envisioned to void all sensitive information at the first glimpse of attacker’s activities. Hence, invading these juggernauts is commonly more expensive and time consuming. Unfortunately, market-share of HSMs compared to bare smart-cards and RFIDs is neighboring negligible, especially with the rise of the IoT.

On the contrary, non-invasive adversaries do not cause any structural damage to packaging nor enclosures. They interact with the target device using its existing interfaces, and mediums that require no mechanical interaction with the device. They are virtually free, but may require significant expertise of attackers.

Activeness
The second segregation is based on the “activeness” of the attacker. Active attacks entail induction of computational (logical) or structural changes in the target chip. When we talk about computational changes, a very common example are Fault Injection (FI) attacks. There are two phases to FI attacks: fault injection during the execution of the targeted algorithm, and the analysis based on the observations of faulty outputs. A common method for altering device’s execution is called clock glitching. Namely, by introducing a premature edge on the clock signal, attacker violates devices’s critical path. As a result, incorrect values are captured in device’s registers. Alternatively, faults can be induced by shooting a laser beam with enough power to change the state of the device, while allowing it to remain operational. Here, any data or control register fall under “state of the device”. For example, round counter, commonly used in implementations of block ciphers, is a very favored target for such faults. Active attacks may require higher level of technical skill, and a more sophisticated setup.

On the contrary, passive adversaries may only observe device’s execution, while interacting through its predefined interfaces. Well-known SCA fall under this category. These attacks are well researched, and can be mounted using very cheap equipment. Developed techniques (e.g., Mutual Information Analysis) are extremely powerful, and once incorporated in the attackers setup can be reproduced quite trivially. Consequently, although they entail only limited exposure of the device, they pose a serious threat for they are very accessible even to attackers with modest capabilities.

The Reality
Activeness and invasiveness are two orthogonal properties, resulting in a total of four possibilities (although I find that the existence of “invasive and passive attacks” calls for a philosophical debate). Unfortunately, situation is much more complex than that in practice. Firstly, attackers are likely to use combined attacks. For example, FI + SCA may be a very powerful combination. Additionally, the distinction mentioned above is not as binary. Rather, along each of the two orthogonal axes there are many shades. For example, faults can be injected in some chips by applying laser beams to their packaging (non-invasive), while others may be shielded from such beams (hence they have to be attacked invasively).

Consequently, there exists a myriad of possible attack variations. Moreover, even if we lock on a certain extreme — let us say passive, non-invasive, CPA — quality of the measurement setup plays a very significant role. A 500 Euro oscilloscope can hardly match its 30000 Euro counterpart. In hindsight, there are no upper bounds to the power of a skilled invasive attacker performing a battery of active and passive attacks, apart from the temporal and financial constraints.

Taking all above into account, choosing a set of countermeasures is a difficult task (let alone implementing them properly). Bare in mind that these countermeasures are not for free. They may significantly increase the price of devices, reducing the profit margins severely. Therefore, there are no silver bullets in protection against physical attacks. In other words, in practice security engineers work to demotivate attackers with high probability. They try to stop “the attacker of interest”, rather then stopping all attacks. To achieve this, first step is identifying potential attackers. This process is often called profiling, and in a nutshell I would describe it as follows. Please note that this is a gross simplification of the problem, meant to depict the general idea. No distinction is made between fixed (price of the setup) and recurring (every time the attack is mounted) costs, nor between temporal and financial costs. Lastly, please note that the value of assets is heavily simplified as well, for the sake of avoiding a philosophical discussion yet again.

Manufacturer’s Dilemma
Assume that a device D, which costs d to manufacture, protects assets worth x , and features a
countermeasure C that costs c to deploy. We may consider D to be secure against an attacker A, who can mount a successful attack at a cost a (which includes A’s investment in
development of expertise), as long as
x a+μA,

μA being the attackers profit margin. In other words, if the cost a is high enough attacker can not obtain desired amount of profit for given assets. On the other hand a manufacturer M that produces D wants to sell for a price m such that
m d + c + μM,
μ M being M’s profit margin. In other words, price of deploying countermeasures c directly cuts into manufacturer’s profits. Looking at these inequalities, it seems that there is no dilemma at all. Nevertheless, cost of attack depends on the selection of a countermeasure, i.e.,
a =f(c).
Assuming that an increase in c leads to the increase in a , by applying some high school math (readers are welcome to play with it), we see that the selection of C must be performed based on the value of assets it protects. A more detailed discussion on this topic will be given in one of the following posts.
In conclusion, physical attacks are a great threat. As IoT progresses, and the amount of ubiquitous devices increases their potential impact may only grow. Deploying devices that protect assets against physical attacks is a complex problem, which demands bespoke solutions, tailored to individual use cases.

## Friday, June 23, 2017

### Algorithm and Key Size Document

The ECRYPT Algorithm and Key Size document is probably the most high impact output from our ECRYPT projects. It is referenced and used throughout the world, to guide the uses of cryptography in practice. The current version of the document can be found here

We are requesting input for the next edition of this document. To do this we have created a Slack channel where people can debate inputs.

We encourage everyone to get involved by sending us your email so we can add you. Once added you can add other people to the channel as you see fit. Please email Nigel Smart or Saartje Verheyen to be added if you do not know someone who is already involved.

At the end of September we will freeze the discussion and start the process of incorporating all the suggestions into the final document.

If you have contributed to the Slack discussion in a positive manner we will include you as an author on the final document as a contributor. That way you get to claim you have contributed to a high impact document (carrot); if you do not contribute however then you cannot complain if we say something you disagree with (stick).

Of course in the end it is a community effort, and in case of disagreement the editors will need to take one side or another.

## Friday, June 16, 2017

### Boomerang Attacks

In cryptography, a boomerang attack is a method of cryptanalysis that is based on differential cryptanalysis.

Boomerang attacks were first introduced by Wagner and allow an adversary to concatenate two high probability differential trails to attack a cipher. This is especially useful if there is a lack of long differentials with sufficient probabilities. The adversary can therefore decompose the encryption function $F$ in two subciphers $f$ and $g$ such that $F = f \circ g$. Then the adversary can search for high probability trails $\Delta \rightarrow \Delta^*$ with probability $p$ for  $f$ and  $\nabla \rightarrow \nabla^*$ with probability $q$ for $g$. The differential trails can then be combined in a chosen plaintext/adaptive chosen ciphertext attack to mount a boomerang distinguisher and then a key recovery attack based on this distinguisher to recover the secret key.

 A boomerang distinguisher

The basic attack works as follows:

1. The adversary chooses a random plaintext $X_1$ and calculates $X_2 = X_1 \oplus \Delta$.
2. The adversary requests the ciphertexts for $X_1$ and $X_2$ from an encryption oracle which are $Y_1 = F(X_1)$ and $Y_2 = F(X_2)$
3. Calculate ciphertexts $Y_3 = Y_1 \oplus \nabla$ and $Y_4 = Y_2 \oplus \nabla$.
4. Request the decryptions of $Y_3$ and $Y_4$ to obtain $X_3 = F^{-1}(Y_3)$ and $X_4 = F^{-1}(Y_4)$.
5. If the difference between $X_3$ and $X_4$ is the same as between $X_1$ and $X_2$, namely $\Delta$ we obtain a correct quartett $(X_1, X_2, X_3, X_4)$.

Calculating a correct quartet requires an attacker to consider both plaintext pairs $(X_1, X_2)$ and $(X_3, X_4)$ and results in a total probability of (pq)^2.
For an attack to succeed, for the probability of the boomerang distinguisher it must hold that $(pq) > 2^{n/2}$. For N plaintext pairs, an adversary expects about $N\cdot(pq)^2$ correct quartets in an attack, while there are only $N\cdot2^{-n}$ (where n is the blocksize) correct quartets for an ideal primitive.