ECRYPT-EU: 2016

Thursday, December 29, 2016

[33C3 TLDR] PUFs, protection, privacy, PRNGs

Pol van Aubel gave an excellent introduction to Physical(ly) Unclonable Functions from both a modern and historical perspective. If you want to know how to exploit intrinsic random properties of all kinds of things, give this talk a shot and don't be afraid of all the footnotes.

Takeaway: PUFs are kinda neat and if you're interested check out this talk.

[33c3 TLDR] Decoding the LoRa PHY

Matt reverse engineered the PHY layer of the proprietary and patented LoRa techology and even wrote a GNUradio plug-in for it.

He also gave an excellent introduction to basics of the LoRa technology.
Actually, it was somehow an introduction to basics of radio technology in general and I highly recommend watching his talk.

Video

[33c3 TLDR] Making Technology Inclusive Through Papercraft and Sound

A talk by bunnie introducing their Love to Code program, a program for teaching children (adults too) to code using cheap papercraft-inspired devices and tools. The argument is that the diversity problem that exists in coding can maybe be fixed by starting at the earliest stages, i.e., starting with children.

From a technical perspective they had to solve some interesting problems to make the technology fast, cheap and usable. In particular, they have a pretty awesome technique using sound to upload code to the microcontrollers they use.

Takeaway: Pretty cool technology for a societally relevant goal. Check it out.

[33c3 TLDR] Tapping into the core

An interesting talk about how to abuse the USB to JTAG bridge available in newer Intel processors to attack computers through their USB port.

It is not entirely clear to me if this is "just" another case of somebody forgetting to turn of a debug facility and leaving it as a security hole or if there are actually cases when it cannot be disabled.

Video

[33c3 TLDR] Wheel of Fortune

Another talk about bad RNGs in embedded devices. This time discussing challenges in embedded systems a bit more in general.

The takeaway might be that these devices really need a hardware RNG even if it costs a little bit extra.

(Similar to Mathy's talk, maybe only watch one of them.)

Video

[33c3 TLDR] Do as I Say not as I Do: Stealth Modification of Programmable Logic Controllers I/O by Pin Control Attack

This talk was about how to abuse a design peculiarity in PLCs. It is used to change the behavior of the controllers by simply changing the configuration of the I/O pins.

For example, switching a pin from output to input will suppress all writing to that pin without any feedback.

Video

[33c3 TLDR] A Tale of Two Skype Calls

Earlier today we had two great talks focusing on the aftermath of the Snowden revelations:
- 3 Years After Snowden: Is Germany fighting State Surveillance?
- The Untold Story of Edward Snowden’s Escape from Hong Kong

First of all a chilling report on the state of the German surveillance machine detailing how the German intelligence agencies have gotten more money, more capabilities (under the law) and are processing more data. Often without good supervision. It is only because of Germany's attempt at an inquiry, which the two speaking journalists recorded, that we know some of the details of the mass surveillance apparatus.

At the end, instead of the regular Q&A, key witness #1 came and joined us over Skype: Edward Snowden.

Takeaway: things are getting worse, not better and we need to do more to combat this.

--

Second, a talk detailing the story of the people that kept the very same Edward Snowden safe during his stay as a refugee in Hong Kong, detailing their particular brand of self-sacrifice. Being refugees in Hong Kong, they are forced to live in squalor without rights. Government support was stripped away once their identities were known. These people, these guardian angels, now subsist on the donation efforts of third parties. Further attempts are being made to get these 7 people out of the country and to a safer place.

For the Q&A, Vanessa, one of the people that helped Snowden, skyped in and answered some of the questions pertaining to her recollection of the events as well as her current status.

Takeaway: the seven refugees that helped Edward Snowden stay hidden during his stay in Hong Kong need assistance. If you want to you can donate at the following places:
- https://fundrazr.com/snowdensguardians
- https://www.gofundme.com/snowdenguardians

Tuesday, December 27, 2016

[33c3 TLDR] How Do I Crack Satellite and Cable Pay TV?

Chris had too much time and over a span of three years reverse engineered set top boxes (Digicipher 2 system) with relatively low cost equipment. In the end he was able to extract the long term keys and understand the used crypto (DES with XOR-preprocessing of data/key inputs) such that he could watch satellite and cable pay TV for free now (nothing interesting to see there though according to him).

Take away: Reverse engineering was made simpler by relatively old design with not too many countermeasures. Never underestimate the effort people are willing to invest to break your system.

Watch his talk on YouTube.

[33c3 TLDR] Everything you always wanted to know about Certificate Transparency -- Martin Schmiedecker

Martin gave a short review of problems with CAs in recent history and then presented the basics of Certificate Transparency, Google's proposed solution to the problem.

The takeaway message is that Certificate Transparency is great but to make it really effective clients need to check the public logs before accepting a certificate.

Martin tried to introduce the technical concepts but could not go into more detail. For a nice start, watch the Video.

[33c3 TLDR] Predicting and Abusing WPA2/802.11 Group Keys -- Mathy Vanhoef

Mathy gave an excellent talk showing ways to attack wireless security (WPA2) via a group key used for broadcast transmissions. This leads to decryption of all traffic. It included introductions to all the relevant parts of the system and a demo.

The take-away message is actually rather short:
People standardize and implement bad RNGs and if those are used to generate cryptographic secrets it leads to vulnerabilites.

It included a few more tricks and gimmicks and I recommend you watch the Video.

Sunday, December 18, 2016

Lightweight Cryptography

The need for lightweight cryptography emerged from the lack of primitives that are capable to run in constrained environments (e.g. embedded systems). In recent years, there has been an increased focus on development of small computing devices that have limited resources. Current state-of-art cryptographic primitives provide the necessary security conditions when put into constraint environments, but their performance may not be acceptable.
Over the last decade, numerous lightweight cryptographic primitives have been proposed providing performance advantage over conventional cryptographic primitives. These primitives include block ciphers, hash functions, stream ciphers and authenticated ciphers.

Lightweight Block Ciphers

A block cipher provides a keyed pseudo-random permutation that can be used in a more complex protocol. It should be impossible for an adversary with realistic computing power to retrieve the key even if the adversary has access to a black-box model of the cipher where she is able to encrypt/decrypt plaintext of her choice. Block ciphers are normally based on either Substitution-Permutation Networks or Feistel-Networks.

The components and operations in a lightweight block cipher are typically simpler than in normal block ciphers like AES. In contrast to simplifying the round functions, the number of rounds simply increases to achieves the same security. As memory is very expensive the implementation of a S-Box as look-up table can lead to a large hardware footprint. Therefore, lightweight block ciphers have usually small (e.g. 4-bit) S-Boxes. To save further memory, lightweight block ciphers are using small block sizes (e.g. 64 or 80 bits, rather then 128). Another option is to reduce the key sizes used to sizes less than 96 bits for efficiency. Simpler key schedules improve the memory, latency and power consumption of lightweight block ciphers.

Recently, Banik et. al. showed an implementation of AES requiring just 2227 GE and latency of 246/326 cycles per byte for encryption and decryption respectively. In 2007 Bogdanov et. al. proposed PRESENT, an ultra-lightweight block cipher based on a Substitution-Permutation Network that is optimized for hardware and can be implemented with just 1075 GE. PRESENT is bit-oriented and has a hardwired diffusion layer. In 2011, Guo et. al. designed LED, an SPN cipher that is heavily based on AES. Interesting in that design is the lack of the key schedule, as it applies the same 64-bit key every four rounds to the state for LED-64. The 128-bit version simply divides the key in two 64-bit sub-keys and then alternately adds them to the state. Reducing the latency is the main goal of the block cipher PRINCE. There is no real key schedule in PRINCE, as it derives three 64-bit keys from a 128-bit master key. PRINCE is a reflection cipher, meaning that the first rounds are the inverse of the last rounds, so that the decryption of a key $k$ is identical to an encryption with key $k\oplus\alpha$ where $\alpha$ is a constant based on $\pi$. The block cipher Midori was designed for reducing the energy consumption when implemented in hardware. It has an AES-like structure and a very lightweight almost-MDS involution matrix M in the MixColumn step. In 2013, Simon and Speck have been designed by NSA. Both ciphers perform exceptionally well in both hardware and software and were recently considered for standardization. Compared to all other ciphers, no security analysis or design rational was given by the designers. Simon is hardware-oriented and based on a Feistel-Network with only the following operations: and, rotation, xor. Speck is software oriented and based on an ARX construction with the typical operations: addition, rotation, xor. Very recently, SKINNY has been published to compete with Simon. The main idead behind the design is to be efficient as possible but without sacrificing security. SKINNY is a tweakable block cipher based on the Tweakey framework with the components chosen because of the good compromise the provide between cryptographic properties and hardware costs.

Lightweight Hash Functions

The desired properties of a hash function are:

Collision resistance: it should be not feasible to find $x$ and $y$, such that $H(x) = H(y)$

Preimage resistance: given a hash $h$, it should be infeasible to find a message $x$ such that $H(x) = h$

Second preimage resistance: given a message $y$, it should be infeasible to find $x\neq y$ such that $H(x) = H(y)$

Conventional hash functions such as SHA1, SHA2 (i.e. SHA-224, SHA-256, SHA-384, SHA-512, SHA-512-224 and SHA-512-256) and SHA3 (i.e. Keccak) may not be suitable for constraint environments due to their large internal state sizes and high power consumption. Lightweight hash functions differ in various aspects as they are optimized for smaller message sizes and/or have smaller internal states and output sizes.

PHOTON is a P-Sponge based AES-like hash function, with an internal state size of 100 to 288 bits and an output of 80 to 256 bits. The state update function is close to the LED cipher. In 2011, Bogdanov et. al. designed SPONGENT, a P-Sponge where the permutation is a modified version of the block cipher PRESENT. SipHash has a ARX structure and is inspired by BLAKE and Skein and has a digest size of 64-bits.

Lightweight Stream Ciphers

A stream cipher generates a key stream from a given key $k$ and an initialization vector $IV$, which is then simply xored with the plaintext to generate a ciphertext. It must be infeasible for an attacker to retrive the key, even if a large part of the keystream is available to the attacker. In 2008, the eSTREAM competition aimed to identify a portfolio of stream ciphers that should be suitable for widespread adoption. Three of the finalists are suitable for hardware applications in a restricted environment.

Grain was designed by Hell et. al. and is based on two finite state registers whose clocking influence each others update function to make it non linear. Grain requires 3239 GE in hardware. MICKEY v2 is based on two LFSR (linear feedback shift registers) that are irregularly clocked. MICKEY v2 requires 3600 GE in hardware. Trivium is also a finalist from the eSTREAM competition that has three LFSR's with different length. Trivium requires 3488 GE in hardware.

Lightweight Authenticated Ciphers

The aim of authenticated encryption is to provide confidentiality and integrity (i.e. data authenticity) simultaneously. In 2014, the CAESAR (Competition for Authenticated Encryption: Security, Applicability and Robustness) competition started with the aim to identify a portfolio of authenticated ciphers that offer advantages over AES-GCM and are suitable for widespread adoption.

ACORN is based on 6 LFSR's and has a state size of 293 bits. ACORN provides full security, for both, encryption and authentication. The hardware costs should be close to that of Trivium according to the designers. SCREAM is a tweakable block ciphers in the TAE (Tweakable Authenticated Encryption) mode. SCREAM is based on LS designs Robin and Fantomas. Bertoni et. al. designed Ketje that is a lightweight variant of SHA3 (i.e. Keccak). Ketje relies on the sponge construction in the MonkeyWrap mode. The internal state size is only 200 bits for Ketje-Jr and 400 bits for Ketje-Sr. Ascon is an easy to implement, sponge-based authenticated cipher with a custom tailored SPN network. It is fast in both, hardware and software even with added countermeasures against side-channel attacks. Another CAESAR candidate is the 64-bit tweakable block cipher Joltik, that is based on the Tweakey framework. Joltik is AES-like and uses the S-Box of Piccolo and the round constants of LED. The MDS matrix is involutory and non-circulant.

Tuesday, December 13, 2016

Living in a Data Obsessed Society (Part II)

This post is about the event “Living in a Data Obsessed Society” that took part in Bristol on Friday the 2^nd of December. If you missed the first part, you can find it here.

The view that decisions based on data are neutral, efficient and always desirable was probably the most challenged during the evening. A very disturbing example was the result of a recent investigation by ProPublica, which found that machine learning software used in the United States to assess the risk of past criminals to recidivate was twice as likely to mistakenly flag black defendants as being more prone to repeat an offence. The same AI-led software was also twice as likely to incorrectly flag white defendants at low risk.

The reasons for these biases remain mostly unknown, as the company responsible for these algorithms keeps them as a trade secret. Even assuming racism was not explicitly hardcoded in the algorithms, Charlesworth and Ladyman reminded that not only humans, but also algorithms, make decisions in conditions that are far from ideal. Machines learn on datasets that are chosen by engineers, and the choice of which data is used on this step is also going to teach the underlying biases to the algorithms. Predictive programs are at most as good as the data on which they are trained on, and that data has a convoluted history that does include discrimination.

There is, then, a big risk of perpetuating a vicious cycle in which people keep being discriminated because they were in the past and are in the present. Moreover, the entrenching of these biases could be seen as something correct, just because of the ideological unquestioning of the authority that we give to the ruling of data and algorithms in these processes. Because, as the speakers pointed at several moments, algorithms in general or machine learning in particular do not eliminate the human component: Humans and human-related data take part in the designing steps, but the centralized power related to these choices remains mostly hidden and unchallenged.

Whereas in existing social institutions there is usually a better or worse designed procedure to object when errors happen, it is hard to dispute the outputs of Artificial Intelligence. In words of Ladyman, the usual answer would be something like “we used our proprietary algorithms, trained with our private dataset X of this many terabytes, and on your input, the answer was ‘Bla’”. He also pointed the huge swift of power this meant from the classical institutions to computer scientists working on big companies, and their probable lack of exposure to social sciences during their studies and career.

Assuming we were to set aside the important aspect of dispute resolution, then the question should be framed as whether these algorithms make “better” decisions than humans, as biases are inherent to both. The meaning of being “better” is, though, another choice that has to be made, and it is a moral one. As such, society should be well informed about the principles embedded on this new forms of authority, in order to establish what aligns with its goals. A very simple example was given: Is more desirable an algorithm that declares guilty people guilty with a 99% probability, while putting a lot of innocent people in jail; or one that almost never condemns the innocents but leaves numerous criminals go away? The same kind of reasoning could be iterated for the discrimination between defendants commented above.

Someone could maybe derive from this text that the speakers were some kind of dataphobes who would not like to see any automation or data ever involved in decision making, but it was not the case. At several points, all of them praised in one way or another the benefits of data and its use in our societies, from healthcare to science to the welfare state. The questioning was about the idealization of data gathering and its algorithmic processing as an ubiquitous, ultimate goal for all aspects of our lives. A critique of its authority based on objectivity claims and, ultimately, a review of the false dichotomy between moral and empiricist traditions. Charlesworth celebrated the dialogue that has taken place between lawyers and computer scientists during the last years. Nevertheless, he urged to expand the horizons of these discussions to include philosophers and the rest of social sciences as well. Cristianini, being a professor on Artificial Intelligence himself, concluded his intervention with a concise formulation: Whatever we use these technologies for, humans should be at the centre of them.

The moral dimension of Cryptography, and its role on rearranging what can be done, by whom and from which data, might come to mind for several readers as well. Since the Snowden revelations, it has become a more and more central debate within the community, which I find best crystallized on the IACR distinguished lecture given by Rogaway on Asiacrypt 2015 and its accompanying paper. Technologies such as Fully Homomorphic Encryption, Secure Multiparty Computation or Differential Privacy can help mitigate some of the problems related with data gathering while retaining its processing utility. All in all, one of Ladyman's conclusions applies here: we should still question who benefits from this processing, and how. A privacy-preserving algorithm that incurs (unintended) discrimination is not a desirable one. As members of society, and as researchers, it is important that we understand the future roles and capabilities of data gathering, storing and processing. From mass surveillance, to biased news, to other forms of decision making.

Tuesday, December 6, 2016

Living in a Data Obsessed Society (Part I)

It’s Friday evening and inside Will’s Memorial Building, Bristol almost two hundred people have gathered to hear and discuss about our lives in a data obsessed society. The opportunities, as well as some of the risks on which we are sleepwalking into by living in the current data-driven world are discussed by three different people with three different perspectives: Nello Cristianini, professor of Artificial Intelligence; Andrew Charlesworth, reader in IT and Law, and James Ladyman, professor of the Philosophy of Science.

Dr. Abigail Fraser, researcher in Epidemiology, introduces the event: It is a common agreement that different data infrastructures are mediating, or even leading, a broad range of aspects in our daily lives, from communications to transactions to decision-making. Fraser herself recognizes the crucial role of big volumes of data for her work, without which she would not be able to conduct much of her research, if any. But there is also another side of data gathering and the decisions being made based on it, whether directly by humans or with the mediation of human-designed algorithms. There seems to be an unquestioned acceptance of data ideology, according to which all the decisions made based on the data are inherently neutral, objective and effective. Nevertheless, the growing evidence provided by the speakers disproved many of these assumptions.

The evolution of the data infrastructure

Cristianini is the first to mention during the evening the enormous amount of research that has been put into hardware and software since his teen years in the early 80s. At that time, he would have to instruct his computer beforehand in order for it being able to answer who, for example, Alexander V was. Today, he just has to take out his smartphone and speak to it. Various internals based on Artificial Intelligence (AI) would then recognize his speech, look for the answer amongst huge volumes of data and, in a matter of seconds, produce an answer for anyone listening.

Computers can now learn and what Cristianini demonstrated live was possible not only because of our technological evolution, but also the status of our social institutions. We have overcome great challenges thanks to AI and the availability of large volumes of data, but we have not developed our laws, learning and morals accordingly in pace with these changes. A series of mostly unquestioned social consequences of the gathering and use of data became then the focus of the event, especially in the cases for which there is not an individual consent or an informed social consensus. Concrete cases included people being progressively fed only with the kind of news they want to hear about, the exercise of mass surveillance by several governments, staff selection processes and banks and insurances companies making decisions using data from even social network profiles.

What can go wrong with massive data gathering?

Charlesworth elaborated the most on the matter of massive data gathering and storage. One of his main concerns was the evolution of Law within the United Kingdom, particularly now the Investigatory Powers Act has been passed by the House of Lords. One of the most infamous parts of this law mandates internet providers to keep a full record of every site its customers have visited for the government’s use.

Different aspects discussed and related to this issue were the lack of evidence for the effectiveness of mass surveillance, its potential to cause alienation on communities, and the lack of adequate safeguards to protect against misuse or exfiltration of the surveillance data. Evidence supporting these views, amongst others, was addressed to the House of Commons by one of the PublicBill Committees earlier this year, but without much success.

On other grounds, Charlesworth was also concerned by the oft-repeated line that there is nothing to worry about as these kind of practices were already taking place outside the rule of law and public scrutiny. The legitimization of mass state surveillance and the extension of powers recently approved are, he explained, something to worry about. After enumerating some of the forty-seven agencies that can access this data without any warrant, he pointed out that the big number of data breaches suffered by UK institutions in the pasts does not help with trusting this data collection and storage.

Ladyman expanded on the chilling effect of people of being or feeling watched. Society is damaged when individuals have to think constantly about the consequences of their communications or web browsing, as it has been shown that people tend to avoid researching or debating about matters that challenge established social norms and factual powers. A number of sociologist, philosophers and lawyers have addressed this question, but perhaps one of the most famous experiments would be that of “The Chilling Effects of Surveillance” conducted by Gregory White and Philip Zimbardo.

During White and Zimbardo’s study, two group of participants were made: One being told that their statements would be shared with the police for training purposes, and one that was not. Fear of authorities equating dissent with wrongdoing, a threat or any kind of punishable behaviour made the former group more likely to condemn marijuana usage and to use second and third person pronouns in their language. Whereas 77% of the group not given a surveillance warning advocated for legalization, only 44% of those who were told that their data would be shared with the police did so. Moreover, 31% of the monitored participants looked for approval from the people conducting the experiment when talking, whereas only 7% of the other group did so.

Last but not least, Ladyman remarked that the way on which we gather, store and represent data, as well as how we decide which of this information is to be retained or not, is not aseptic. One of his side comments, of which the cryptological community is very aware of, was that the fact of eliminating some potentially sensitive or discriminatory entry from a database does not mean that it cannot be inferred by the use of correlations within the same dataset or with the help of some external information.

Next post will continue with the other main concern of the discussion: Are exclusively data-based decisions always objective and desirable?

Monday, November 28, 2016

Verifiable Random Functions

Pseudorandom functions (PRFs) are a central concept in modern cryptography. A PRF is a deterministic keyed primitive guaranteeing that a computationally bounded adversary having access to PRF's outputs at chosen points, cannot distinguish between the PRF and a truly random function mapping between the same domain and range as the PRF. The pseudorandomness property in the well-known candidates follows from various computational hardness assumptions. The first number-theoretical pseudorandom functions (PRF), has been proposed in the seminal work of Goldreich, Goldwasser and Micali¹. Since then, PRFs found applications in the construction of both symmetric and public-key primitives. Following the beginning of their investigation, various number-theoretical constructions targeted efficiency or enhancing the security guarantees. Recent developments of PRF s include works on key-homomorphic PRFs or functional PRFs and their variants.

A related, and more powerful concept, is the notion of verifiable random functions (VRFs). They were proposed in 1999 by Micali, Rabin and Vadhan². VRFs are in some sense comparable to their simpler counterparts (PRFs), but in addition to the output values, a VRF also produces a publicly verifiable proof $\pi$ (therefore, there is also need for a public verification key). The purpose of the proofs $\pi$ is to efficiently validate the correctness of the computed outputs. The pseudorandomness property must hold, exactly as in the case of a PRF, with the noticeable difference that no proof will be released for the challenge input during the security experiment. Since the introduction of VRFs, constructions achieving adaptive security, exponentially large input spaces or security under standard assumptions were introduced. However, the construction of VRFs meeting all aforementioned constraints at the same time has been proven a challenging academic exercise. Finally, progress in this direction has been made due to the work of Hofheinz and Jager³, who solved the open problem via a construction meeting all the requirements. A major constraint in achieving adaptive security under a static assumption resided in the lack of techniques for removing the "q-type assumptions" (the "size" of the assumptions is parameterized by "q" rather then being fixed) from the security proofs of the previous constructions.

An adaptive-secure VRF from standard assumptions

The scheme by Hofheinz and Jager has its roots in the VRF⁴ proposed by Lysyanskaya. In Lysyanskaya's construction, for an input point $x$ in the domain of the VRF, represented in binary as $x = (x_1,\dots, x_n)$, the corresponding output is set to the following encoding: $y = g^{\prod_{i=1}^{n} a_{i, x_i}}$, which for brevity we will denote $[\prod_{i=1}^{n} a_{i, x_i}]$. The pseudorandomness proof requires a q-type assumption. To remove it, the technique proposed in the Hofheinz and Jager paper replaces the set of scalar exponents $\{a_{1,0}, \dots, a_{n,1}\}$ with corresponding matrix exponents. A pairing is also needed for verifiability. Therefore, a point $x = (x_1,\dots, x_n)$ in the domain of the VRF will be mapped to a vector of points. Informally, the construction samples $\vec{u} \leftarrow \mathbb{Z}_p^{k}$ (p prime) and a set of $2n$ square matrices over $\mathbb{Z}_p^{k \times k}$: $ \begin{equation} \begin{aligned} \left \{ \begin{array}{cccc} {M_{1,0}} & M_{2,0} & \dots & M_{n,0}\\ {M_{1,1}} & M_{2,1} & \dots & M_{n,1}\\ \end{array} \right \} \end{aligned} \end{equation} $. The secret key is set to the plain values of the $\{ \vec{u}, M_{1,0}, \dots, M_{n,1} \}$ while the verification key will consists of the encodings (element-wise) of the entries forming the secret key. To evaluate at point $x$, one computes: $VRF(sk, x = (x_1,\dots, x_n)) = \Bigg[ \vec{u}^t \cdot \Big(\prod_{i=1}^{n}M_{i,x_i} \Big) \Bigg]$. The complete construction requires an extra step, that post-processes the output generated via the chained matrix multiplications with a randomness extractor. We omit this detail. A vital observation is that the multi-dimensional form of the secret key allows to discard the q-type assumptions, and replace it with a static one.

Proof intuition

The intuition for the proof can be summarized as follows:

during the adaptive pseudorandomness game, a property called "well-distributed outputs" ensures that all the evaluation queries except the one for the challenge will output encoded vectors $[\vec{v} = \vec{u}^t \cdot (\prod_{i=1}^{n}M_{i,x_i})]$, such that each vector but the one corresponding to the challenge belongs to a special designated rowspace. This is depicted in the figure, where the right side presents the evaluation of the challenge input $x^*$, while the left side presents the evaluation at $x\ne x^*$.
to enforce well distributed outputs, the matrices $M_{i,x_i}$ must have special forms; for simplicity, consider $x^* = (0, 1, \dots, 0)$ of Hamming weight 1 and the corresponding secret key: \begin{equation} \begin{aligned} \vec{u}^t , \left \{ \begin{array}{cccc} U_{1,0} & L_{2,0} & \dots & U_{n,0} \\ L_{1,1} & U_{2,1} & \dots & L_{n,1} \\ \end{array} \right \} \end{aligned} \end{equation} where $L_i$ stands for an $n$-$1$ rank matrix (lower rank), while the $U_i$ denotes a full rank matrix that map between RowSpace($L_{i-1}$) and RowSpace($L_{i}$). Rowspace($L_0$) will be a randomly chosen subspace of dimension $n-1$, and $\vec{u} \not \in $RowSpace($L_0$) with overwhelming probability. Also, notice the full rank matrices occur in the positions corresponding to $x^*$, in order to ensure well-distributed outputs.
finally, and maybe most importantly, one must take into account that the distribution of matrices used to ensure well-distributed outputs must be indistinguishable from the distribution of uniformly sampled square matrices. A hybrid argument is required for this proof with the transition between the games being based on the $n$-Rank assumption (from the Matrix-DDH family of assumptions).

References

^{1.
Goldreich, O., Goldwasser, S., & Micali, S. (1986). How to construct random functions. Journal of the ACM (JACM), 33(4), 792-807.
↩}

^{2. Micali, S., Rabin, M., & Vadhan, S. (1999). Verifiable random functions. In Foundations of Computer Science, 1999. 40th Annual Symposium on (pp. 120-130). IEEE.↩}

^{3. Hofheinz, D., & Jager, T. (2016, January). Verifiable random functions from standard assumptions. In Theory of Cryptography Conference (pp. 336-362). Springer Berlin Heidelberg.↩}

^{4. Lysyanskaya, A. (2002, August). Unique signatures and verifiable random functions from the DH-DDH separation. In Annual International Cryptology Conference (pp. 597-612). Springer Berlin Heidelberg.↩}

Thursday, November 24, 2016

Recent research on attacks that "use a little leakage"

In this post, I'll summarize three fantastic talks from what was one of my favourite sessions of the ACM CCS Conference last month (session 11B: "Attacks using a little leakage"). The setting common to the three papers is a client-server set-up, where the client outsources the storage of its documents or data to a server that's not entirely trusted. Instead of using the server just for storage, the client wants to outsource some computations on this data too—keyword searches or database queries, for example. The problem is to find the right cryptographic algorithms that allow efficiently making these searches and queries while minimizing the information leaked from communication between the client and server, or from the server's computations.

Generic Attacks on Secure Outsourced Databases (paper, talk)
Georgios Kellaris (Harvard University), George Kollios (Boston University), Kobbi Nissim (Ben-Gurion University) and Adam O'Neill (Georgetown University)
The Shadow Nemesis: Inference Attacks on Efficiently Deployable, Efficiently Searchable Encryption (paper, talk)
David Pouliot and Charles V. Wright (Portland State University)
Breaking Web Applications Built On Top of Encrypted Data (paper, talk)
Paul Grubbs (Cornell University), Richard McPherson (University of Texas, Austin), Muhammed Naveed (University of Southern California), Thomas Ristenpart and Vitaly Shmatikov (Cornell Tech)

1. Generic Attacks on Secure Outsourced Databases

This paper presents two attacks, one exploiting communication volume, and one exploiting the access pattern. "Generic" means that they apply to any kind of encryption, not necessarily deterministic, or order-preserving, or even property-preserving.

Setting	outsourced relational database (collection of records, where each record has some number of attributes)
Adversary's goal	reconstruction of the attribute values for all records
Model	database system is static (read-only; queries don't modify records), atomic (each record is encrypted separately), non-storage-inflating (no dummy records), and has records of a fixed length
Assumptions about data	attribute values are ordered (say, numerically or alphabetically)
Assumptions about queries	uniform range/interval queries (for an attribute with $N$ possible values, there are $\binom{N}{2}+N$ possible queries)
Adversary's knowledge	set of possible attribute values
Adversary's capabilities	passive, observe queries and either access pattern (which encrypted records are returned) or communication volume (how many encrypted records are returned)

The big (possibly unreasonable) assumption is that the range queries must be uniform. However, as the authors point out, the attack model is otherwise weak and the security of an outsourced database shouldn't depend on the query distribution.

Attack using access pattern leakage

The adversary observes at least $N^2\cdot \log(N)$ queries, so with high probability, all of the $\binom{N}{2} + N$ queries have occurred (proof: ask a coupon collector). For each query, it sees which encrypted records were returned. Suppose the database has $n$ records, and assign a binary indicator vector $\vec{v} \in \{0,1\}^n$ to each query. A 1 in position $i$ means that the $i$th encrypted record was returned as part of the query results. The Hamming weight of this vector is the number of matching records.

The attack works as follows.

Find one of the endpoints. Pick a query that returned all but one of the records (i.e., a query whose associated vector has Hamming weight $n-1$). Let $i_1$ be the index of the 0 in its characteristic vector.
For $j=2$ to $n$, find a query whose characteristic vector has Hamming weight $j$, with $j-1$ of the 1's in positions $i_1,\ldots,i_{j-1}$. Let $i_j$ be the index of the other 1 in the vector.

This algorithm puts the encrypted records in order, up to reflection, and is enough for the adversary to reconstruct the plaintext! The paper also describes a reconstruction attack for the case that not all values of the domain occur in the database. It requires seeing more queries, about $N^4\cdot \log(N)$.

Attack using only communication volume

The main idea of this attack is to determine the distance between "adjacent" values. The adversary observes $q \geq N^4\cdot \log(N)$ queries. For each query, it sees how many encrypted records were returned. In the case that not all of the $N$ possible values occur in the database, the attack works as follows. (It is much simpler when they do.) Let $r_i$ be the hidden value of record $i$ (i.e., its position in the range 1 to $N$).

Determine the approximate number of distinct queries that returned a certain number of records. Let $c_j$ be a count of the number of queries that returned $j$ records, for $0 \leq j \leq n$, so $\sum_{j=0}^n c_j = q$. Scale all of the $c_j$s by $\frac{N(N+1)}{2}\cdot \frac{1}{q}$.
Let $d_i = r_{i+1} - r_i$ be the difference in position of the $(i+1)$st record and the $i$th record, for $i=0$ to $n$, when the $n$ records are sorted. To keep the notation simple, define $d_0 = r_1$ and $d_n = N + 1 - r_n$. Note that $c_j = \sum_{i=1}^{n+1-j} d_{i-1}\cdot d_{j + i-1}$ for $j=1$ to $n$, and $c_0 = \frac{1}{2} ( \sum_{i=0}^n {d_i}^2 - (N+1) )$.
Factor a cleverly-constructed polynomial to recover the $d_i$s. Replace $c_0$ by $2\cdot c_0 + N + 1$. Let $F(x) = \sum_{i=0}^{n} c_{n-i}\cdot x^{i} + \sum_{i=0}^{n} c_{i}\cdot x^{n+i}$. Then $F(x)$ factors as $d(x) \cdot d^R(x)$, where $d(x) = \sum_{i=0}^n d_{i}\cdot x^i$ and $d^R(x) = \sum_{i=0}^n d_{n-i}\cdot x^i$.
Compute the attribute values from the $d_i$s: $r_1 = d_0$ and $r_i = r_{i-1} + d_{i-1}$ for $i=2$ to $n$.

The success of this algorithm depends on $F(x)$ having only 1 factorization into two irreducible polynomials. Also, since factorization can be slow when there are many records in the database, the authors also tested a simple, brute-force algorithm for checking the $d_i$s and it performed better than factorizing in their experiments.

2. The Shadow Nemesis: Inference Attacks on Efficiently Deployable, Efficiently Searchable Encryption

This paper presents attacks on two efficiently deployable, efficiently searchable encryption (EDESE) schemes that support full-text search. The first scheme they attack is ShadowCrypt, a browser extension that transparently encrypts text fields in web applications without relying on something like client-side JavaScript code. The second is Mimesis Aegis, a privacy-preserving system for mobile platforms that fits between the application layer and the user layer.

These two EDESE schemes work by appending a list of tags (also called "opaque identifiers") to each message or document, corresponding to the keywords it contains. You can think of each tag as a PRF with key $k$ applied to the keyword $w$, so $t=PRF_k(w)$.

Setting	(web) applications that store sets of documents/messages in the cloud and allow keyword search on these documents
Adversary's goal	determine which keywords are in a document
Model	each document/message has a set of tags corresponding to the keywords it contains
Assumptions about tags	the same keyword occurring in multiple documents yields the same tag
Adversary's knowledge	auxiliary dataset providing frequency of keywords and keyword co-occurrence statistics
Adversary's capabilities	passive, sees encrypted documents/messages and lists of tags

What I found most interesting about this work was that the problem of determining which keywords are associated with which documents was reduced to problems on graphs!

The weighted graph matching problem is the following. Given two graphs $G=(V_G,E_G)$ and $H=(V_H,E_H)$ on $n$ vertices, each with a set of edge weights $w(E): E \rightarrow \mathbb{R}^{\geq 0}$, determine the mapping $\sigma: V_G \rightarrow V_H$ that makes the graphs most closely resemble each other. (This type of "matching" is about matching a vertex in one graph to a vertex in another graph; it has nothing to do with maximal independent sets of edges.) There are a few different possibilities for what it means for the graphs to "most closely resemble each other"—the one used in the paper is minimizing the Euclidean distance of the adjacency matrix of $G$ and the permuted adjacency matrix of $H$.

The labelled graph matching problem is just an extension of the weighted graph matching problem where each vertex also has a weight.

The two graphs that will be matched to learn which keywords are in which documents are $G$, whose vertices correspond to the $n$ most frequent keywords in the auxiliary data, and $H$, whose vertices correspond to the $n$ most frequent tags in the target data. The weight of an edge between 2 vertices is the probability that those two tags (or keywords) occur in the same encrypted document (or document in the auxiliary data set). To form an instance of the labelled graph matching problem, the vertices are assigned weights that correspond to their frequencies in the target data set or their frequencies in the auxiliary data set.

The authors implemented their weighted graph matching and labelled graph matching attacks on two data sets, based on the 2000-2002 Enron email corpus and the Ubuntu IRC chat logs from 2004-2012. Their attacks accurately recovered hundreds of the most frequent keywords—see the paper for more details about the results. And while you're checking it out, read the authors' observation about how critical it is to properly choose Bloom filter parameters when using them to replace the usual inverted index structure in a searchable encryption scheme.

3. Breaking Web Applications Built On Top of Encrypted Data

This paper is cheekily titled to reflect the particular system that it attacks—it's from the paper "Building Web Applications on Top of Encrypted Data Using Mylar". Mylar is an extension to Meteor, a JavaScript web application platform. The result is a complete client-server system that claims to protect users' data on the way to and at the server. Mylar's back-end is an instance of MongoDB, a non-relational database where the data is a collection of documents, and each document has a number of key-value pairs.

The main ingredient in Mylar is multi-key searchable encryption (MKSE), which allows users to share data. The MKSE scheme used in Mylar was built to satisfy two properties: data hiding and token hiding. One of the results of this paper is proving by counterexample that a scheme that's both data-hiding and token-hiding does not necessarily provide indistinguishable keyword encryptions and keyword tokens.

One of the things I like about this paper is the taxonomy it introduces for real-world adversarial models. A snapshot passive attack is a one-time, well, snapshot of the data stored on a server. A persistent passive attack involves observing all data stored on a server and all operations the server performs during a certain time period. An active attack is one where anything goes—the server can misbehave or even collude with users.

The main part of the paper evaluates the security of a few Mylar apps—one that was already available (kChat, a chat app), and three open-source Meteor apps that were ported to Mylar. The three apps are MDaisy, a medical appointment app, OpenDNA, an app that analyzes genomic data to identify risk groups, and MeteorShop, an e-commerce app. Before summarizing some of the paper's results, it's important to understand principals, which in Mylar are units of access control and have a name and a key pair. Every document and every user has a principal, and a principal can also apply to multiple documents.

The paper's main findings are grouped into three categories: exploiting metadata, exploiting access patterns, and active attacks. First, here are some examples of exploiting metadata in Mylar:

The names of principals, which are unencrypted to facilitate verifying keys, can leak sensitive information. For example, in kChat, the names of user principals and chat room principals are simply usernames or email addresses and the chat room name.
Mylar's access graph, which records the relationships between users, access control levels, and encrypted items, can leak sensitive information. For example, in MDaisy, this access graph could reveal that a particular user (a doctor or other health care professional) regularly creates appointments and shares them with the same other user (a patient). A passive snapshot attacker could combine this leakage with knowledge of the doctor's speciality to infer that a patient is being treated for a certain condition.
The size of a MongoDB document associated to a user principal can leak sensitive information. In MDaisy, each user, whether staff or patient, has its own document. However, staff have only their names stored, while patients have additional information stored, such as date of birth.

Exploiting access patterns of searchable encryption is not new, and Mylar didn't claim to hide them, so I won't say anything further about this. The active attacks, however, are interesting, because Mylar claimed to protect data against an actively malicious server as long as none of the users who can access it use a compromised machine. This claim is false, and the paper describes attacks that arise from properties such as the server being able to forcibly add users to a "tainted" principal. After a user is added to a principal, it automatically computes and sends to the server a "delta" value that adjusts search tokens so documents encrypted with different keys can be searched. Once the malicious server receives a user's delta value for a tainted principal (whose keys it knows), it can then search for any keyword in any of the user's documents!

These three talks are evidence that we still have a long way to go to get secure, usable encryption that still preserves some functionality, whether in web applications, or outsourced databases or documents. It is hard to get things right. Even then, as the authors of the Shadow Nemesis paper point out, careful tuning of the Bloom Filter parameters thwarts their attacks on Mimesis Aegis, but there's no reason to believe that it's enough to defend against any other attack. I hope that these recent papers will inspire more secure constructions, not only more attacks.

P.S. There was some good discussion about attack vs. defence papers in the CCS panel discussion on the impact of academic security research (video).

Sunday, November 6, 2016

Hardware Canaries: Arbiters of Proper Randomness

Random numbers are widely used in cryptographic systems. Often they are essential in the very first steps of protocols (e.g., generation of session keys and challenges), or repeatedly used throughout the entire communication (e.g., random nonces). Therefore, failure of random number generators (RNGs) may obliterate security measures. Scale and severity of these failures can be depicted by a study conducted by Bernstein et al. The study shows how improper operation of RNGs, that have passed FIPS 140-2 Level 2 certification, lead to factorization of 184 distinct 1024-bit RSA keys used in Taiwan's national "Citizen Digital Certificate".

Citizen Digital Certificate is your internet ID for bilateral identification while you are exchanging information on the internet.

In other words, a team of several researchers could easily track, surveil, or impersonate 184 Taiwanese citizens, simply because random numbers were not generated properly.

This is only one of many examples in the public-key setting. As alarming as it is, it may not seem to be of great importance to ones who believe their governments will hire capable security experts, or simply ones who trust their browsers while on the internet. Moreover, generation of these random numbers is often out hands of users, in well guarded server rooms, with considerable computational power to check "the quality" (statistical properties) of obtained numbers. And should all fail, there is an organization behind these numbers to be held liable.

On the other hand emerging Internet of Things (IOT) technologies are bringing numerous devices (e.g., wireless sensors, drones) in our environment. While hand-held "smart" devices have became the most ordinary, maybe even mundane, the "smart" hype expands to cars and houses. In this shroud of "smart" that surrounds us, to start to perceive secure communication more personally. Once you can use your smartphone to control the heating furnace in your home, consequences of unauthorized access may cause more fire than some leaked photos.

Additionally, unlike in the traditional IT setting where ample computational power is available, IOT devices are often constrained in terms of resources, have to be very cheap and operate reliably. Lastly, attackers can easily obtain physical access to deployed IOT devices. Hence, attackers may perform numerous side-channel attacks, and may manipulate devices' environment (e.g., temperature) or tamper with the devices' operation (e.g., shoot lasers).

Various mitigation techniques are studied to protect against these attacks. For example, sturdy casings, protective meshes, and seals for tamper detection, provide mechanical layer of security. They are expensive, and easily avoided by skilled adversaries. On the other end of the spectrum various circuit techniques, often called "secure logic styles", such as Sense-Amplifier Based Logic (SABL), are used. They are very difficult to implement in practice, due to variations in silicon manufacturing processes. Based on secret sharing, Threshold Implementations (TI) provide provable security against widely spread Differential Power Analysis (DPA). The only downfall of TI, as well as all masking schemes, is that masks must be random and uniformly distributed.

Therefore, security of a device depends on cryptographic primitives (algorithms and protocols), which rely on random numbers, as well as the secure implementation. Since many countermeasures rely on use of random masks, secure implementation also often depends on the RNG. These dependencies are depicted in Figure 1.

Figure 1: Security dependencies.

Practical problems stem from the circular dependency between RNG and secure implementations. Namely, should the attacker focus RNG as a target of attacks, causing it to malfunction or being able to predict its output, all physical security would be circumvent.

Consequently, outputs of RNGs must be reliably unpredictable. To ensure this, RNG ouputs must be evaluated using prescribed statistical tests. Furthermore, in order to fit the IOT framework, these tests have to be performed in a lightweight, low-latency, and highly reliable manner. Therefore, storing giga bytes of data and computing over them is not a valid strategy.

Figure 2 depicts two solutions proposed by Rožić et al based on NIST SP 800-90B model of the entropy source. Both are variations of the same idea to use canary numbers. This concept is already used in software security, and much longer in the mining industry for early warning threat detection. Namely, caged canaries were used to warn miners of poison gasses in coal mines, since they were much more susceptible to it. Hence small doses of gas, fatal to the canaries, could be detected, and miners would have been warned. Similarly, Rožić et al. propose an additional output of the RNG that has significantly worse characteristics. Whether canary numbers stem from a different noise source, or are obtained by weaker processing of the same noise as the random numbers, quality of canary numbers will decline abruptly once the device is tampered with. In the experiments authors have tampered with the chip temperature, cooling it down quickly to change the physical processes that produce entropy.

Figure 2: RNG architectures with canary numbers.

Figure 3 depicts experimental results obtained on an elementary ring-oscillator based RNG. It clearly shows that the quality of canary numbers decays much more significantly, hence an attack can be detected while a RNG is still producing reliably unpredictable random numbers.

Figure 3: Test results of the elementary ring-oscillator based RNG (from the original paper).

Friday, October 28, 2016

soft skills, hard facts

Within the ECRYPT-NET program we have the chance to learn effective communication of scientific results. Although often regarded as "soft skills", thus indicating their lesser value compared to the classical "hard skills", these methods are versatile. Obtaining them means being able to apply the techniques to either written or oral presentations.

In particular at the School on Design for a secure Internet of Things our oral presentation training started with practical sessions. Classical topics like how to deal with stage fright and non-verbal communication such as body language were demonstrated and different ways to know and connect to the audience were addressed.

Among others, we spoke detailed about posture, use of voice, articulation on the one hand and style, amount of information per presentation slide as well as how to present in a lively way yet giving a memorable take-home message on the other hand.

Later this year, during the Cloud Summer School recently in Leuven we had the chance to actively participate the practical training session on academic writing to concisely yet accurately convey future research findings. Our trainer taught us how to chop-up long, unwieldy sentences into more easly digestible ones by reorganizing competing foci and effectively conveying the essence. Showing worked examples from our area and even some excerpts of our own texts definitely helped in understanding the concepts.

There, all of us ECRYPT-NET early stage researchers gave their (possibly first, big) presentation and we received feedback from the board and our individual supervisors. The combined, but especially the individual, feedback of good choices and less fortunate formulations were brought up.

As we'll have more and more tasks to work on as a group coming up in our career, developing an efficient work-flow for joint-documents is not to be considered wasted effort and time. Thus, soon we'll have the chance to obtain more soft-skills that can turn out to be useful in a dynamic setting.

I want to take the opportunity to organize a workshop as a follow-up event of the upcoming Passwords2016-conference for us ESRs. The plan is to have a day of sessions and meetings in Bochum on Thursday, 8.12.:

It'll be co-located with the 11th International Conference on Passwords, where you can listen to the Long Story of Password (Re-)use, learn Password Hardening techniques, debate whether new Password Management strategies based on Episodic Memories or Through Use of Mnemonics is revolutionary and find out about the Hopes and Fears of Mobile Device Security.

About 100 international participants from academia, companies, ... will come and follow the conference program, which is a mix

of classical talks and more practical "hackers sessions", showing attacks in the real-world setting.

Unfortunately there is no detailed program available yet, but soon for sure.

However, the registration is open now until 2016-11-30.

For those interested to join this event, try to save some € by obtaining the early bird offer (only until 2016-10-31).

Ruhr-University seen from Beckmanns Hof

All other fellows are warmly invited to come on Wednesday, 7.12 after the Passwords2016-talks for a first ECRYPT-NET get-together in Bochum and our program tailored to the upcoming writing project.

Thursday, 8.12 is the main day which we will start off by a workshop followed by project-oriented brainstorming and group discussions.

We'll have an expert training on collaborative creative writing and group

communication (in English), working in nice atmosphere close to Bochum's botanical gardens.

In short, in this workshop the focus lies on how to get things done, collaboratively. The technical aspect will be covered by a git repository with the projects' skeleton, that I set up here on a RUB-Server, where one can get a git account even as a RUB-external.

If you need a refresher, reread Ralph's blog post about git basics.

I am looking forward welcoming all of you here.

Monday, October 17, 2016

Supersingular isogeny Diffie-Hellman

Supersingular isogeny Diffie-Hellman (SIDH) is a relatively recent proposal for a post-quantum secure key-exchange method which I will briefly outline here.

Most post-quantum cryptography is based on lattices, codes, multivariate quadratics or hashes. See Gustavo's post for more.

A fifth category seems to slowly establish itself: Isogeny-based crypto.

These schemes are based on the difficulty of finding an isogeny between two supersingular elliptic curves. Isogenies are specific rational maps between elliptic curves which must also be a group homomorphism for the group of points on the curve.

The original proposal [Stolbunov et al., 2006] was to use the problem of finding isogenies between ordinary elliptic curves but this system was shown to be breakable with a quantum computer [Childs et al., 2010]. After that it was proposed to use supersingular elliptic curves instead [De Feo et al., 2011].

SIDH currently has significantly worse performance than lattice based key-exchange schemes but offers much smaller key-sizes. Compared to Frodo it is 15 times slower but the key size is only one twentieth. Compared to NewHope it is over 100 times slower at less than one third of the keysize. This can be relevant in scenarios like IOT where cryptographic computations require orders of magnitude less energy then actually transmitting data via the radio.

Although finding isogenies between curves is difficult, Vélu's formulas allow calculating an isogeny with a given finite subgroup as its kernel. All such isogenies are identical up to isomorphism.

Now, starting with a public curve $E$ that is a system parameter we have both parties, Alice and Bob generate an isogeny for kernels $\langle r_a \rangle, \langle r_b\rangle$ respectively. Let $r_a, r_b$ be any generators of a subgroup for now. This gives us two isogenies
$$ \phi_a: E \rightarrow E_a\\
\phi_b: E \rightarrow E_b.$$
Now we would like to exchange $E_a, E_b$ between the partners and somehow derive a common $E_{ab}$ using the kernels we have used. Unfortunately, $r_a$ does not even lie on $E_b$ so we have a problem.

The solution that was proposed by De Feo et al. is to use 4 more points $P_a, P_b, Q_a, Q_b$ on $E$ as public parameters, two for each party. This allows constructing
$$r_a = m_aP_a + n_aQ_a\\
r_b = m_bP_b + n_bQ_b$$ using random integers $m_a, n_a, m_b, n_b$ appropriate for the order.
Now, after calculating the isogenies $\phi_a, \phi_b$ the parties not only exchange the curves $E_a, E_b$ but also $\phi_a(P_b), \phi_a(Q_b)$ and $\phi_b(P_a), \phi_b(Q_a)$.
Looking at the example of Alice she can now calculate
$$m_a\phi_b(P_a)+n_a\phi_b(Q_a) = \phi_b(m_aP_a + n_aQ_a) = \phi_b(r_a)$$ and Bob can perform the analogous computation. Constructing another isogeny using this $\langle \phi_b(r_a) \rangle$ and $\langle \phi_a(r_b) \rangle$ respectively gives Alice and Bob two curves $E_{ba}, E_{ab}$ which are isomorphic and their j-invariant can be used as a common secret.

I will leave you with this wonderfully formula laden image from the De Feo et al. paper showing the protocol.

Monday, October 10, 2016

Quantum computation, algorithms and some walks.. pt.1

During the last meeting of the ECRYPT-NET project in Leuven, I gave an introduction to Quantum Algorithms. In the first part, I explained quantum computation, it is natural that we need to have a base. In this post, I will give more detailed explanation of it. First, we need to understand the concept of superposition in quantum physics. To achieve this, we are going to go back to the idea from Schrödinger, i.e., a cat that may be simultaneously both alive and dead. This phenomenon is called superposition in quantum physics and quantum computers take advantage of this. If you are interested about how physically implement a quantum processor you can see at [2].

Well, computers aren't made with cats inside. Computers use bits and registers to work. So, how is it possible to compute with a quantum computer? How is it possible to represent bits? How is it possible to take advantage of the superposition?

Quantum Notation

First, let's learn about the Dirac notation, that is commonly used in Quantum physics and in most of the literature about quantum computers. Since Blogger doesn't allow me to create mathematical equations, I will get some help from physciense blog and pick some images from them:

The Dirac notation could be a little bit different. If you want to check about it, I selected some nice lecture notes in the following links: Lecture 1 and Lecture 2.

So, the bra-ket notation is just vectors and we are going to use this to represent our qubit, yes we call the bit of a quantum computer by this name. In classical computers we represent a bit with 0 and 1 but in quantum computers it is a little bit different. We can represent the states as follows:

As the image show to us, the qubit is 0,1 or a superposition of 0 and 1. However, if we measure, i.e., if we see the qubit we lose the superposition. In other words, our state collapses and we cannot take advantage of the superposition anymore. In the same way that in classical computers we have gates, the quantum computer also has gates. One very famous gate is the Hadamard gate. This gate has the property to put a qubit in the superposition state. We can see the action of this gate in the following image:

Quantum Algorithms

Now, we know what is a qubit and how we can operate with it. We can move for the next step and create some algorithms to solve problems. The most common and very well-known example came from Deutsch and Jozsa. It is known by Deutsch-Jozsa problem and consist of:

Input: f: {0,1}^n to {0,1} either constant or balanced
Output: 0 iff the function f is constant
Constraints: f is a black-box

In the classical computers we can represent this problem such as:

If we solve this problem with a quantum computer, we are going to make exactly 1 query. The function f will be implemented as a black-box in the quantum computer and it will be:

However, if we just use this implementation, it is not enough to solve the problem. In order to solve this, we are going to create a quantum circuit and use Hadamard gates. In fact, the Hadamard gates are going to help us to take advantage of the superposition. We are going to have this circuit:

Now, we have the circuit to determine if a function is constant or balanced. Nevertheless, how do we compute this circuit? How does it work? In order to answer these questions, we work out the qubit after each gate. In the end we have the final measure. Suppose that we just have one qubit and we start it in state 0 and pass through it the first gate. We are going to have this:

After this, we can see that we put our qubit in a superposition state. Now, we go to our function and call our black box. The result of it can be seen as:

The third step, we are going to use the interferences to "reconstruct" the qubit. In the last step, we are going to compute now the last gate and see what is "expected" when we measure:

In the end, we have that final state, i.e., if the function is constant "f(0)=1" then we are going to have the result as qubit in 0. However, if the function is balanced "f(1)=1" then we are going to have the qubit in 1. The algorithm can be generalized and it works for a system with "n"-qubits. It is possible to find more information in [1]. Also, you can check this class.

In my next blog entry, I will talk about Shor's, Grover's algorithms and Random quantum walks. However, we just saw the begging of quantum algorithms. If you want more information about it, you can check the slides from my presentation in Leuven on my website.

[1] David Deutsch and Richard Jozsa (1992). "Rapid solutions of problems by quantum computation". Proceedings of the Royal Society of London A.

[2] Gambetta, Jay M., Jerry M. Chow, and Matthias Steffen. "Building logical qubits in a superconducting quantum computing system." arXiv preprint arXiv:1510.04375 (2015).