Sunday, February 28, 2016

Paris Crypto Day (Part I)

Following the tradition of organizing Crypto Days in New York, Boston, DC, Tel Aviv and Bay areas, the crypto community of Paris decided to start organizing similar events. The first one1 took place on 16th of February 2016, was hosted by ENS Paris and consisted of six talks given by senior researchers and PhD students.


The morning session was opened by Antoine Joux, his presentation surveying the key historical and recent approaches to solving the Discrete Logarithm Problem (DLP), but also emphasizing the progress that has been made in considering this problem over small characteristic finite fields. The motivation for such a study resides in the large number of schemes based on the DLP (e.g. the Diffie-Hellman key exchange). In short, the DLP problem is to find $x$ such that $g^x=h$, where $g, h$ and a description of a group $G$ are given as input. Here, the underlying group $G$ is a multiplicative group (e.g. the group of invertible elements in a finite field or the group of points on an elliptic curve).

The audience was guided through the following general and index-calculus methods of solving the DLP:

  1. Pohlig–Hellman2
  2. Baby Step/Giant Step2
  3. Pollard-Rho
  4. Hellman-Reyneri
  5. Coppersmith's algorithm
before the quasi-polynomial heuristic for finite fields of small characteristics $F_{q^k}$ was described. For the case when $q$ is of size at most comparable with $k$ the time complexity of the heuristic is $q^{O(log(q))}$. More details about the techniques used can be accessed here. The result has important implications for cryptographic schemes that are based on pairings where the fields have small characteristics. It will be interesting to see if this major breakthrough in DLP will have implications in the problem of integer factorization.


Pierrick Meaux - PhD student at ENS Paris - brought a different topic to our attention during his morning session presentation. The subject of the talk was the EuroCrypt 2016 paper Pierrick worked on:

Towards Stream Ciphers for Efficient FHE with Low-Noise Ciphertexts (a joint work with A. Journault, F.-X. Standaert, C. Carlet)
One of the most interesting aspects of this work is that it combines the (mostly) theoretical paradigm of fully homomorphic encryption (FHE) with concrete symmetric constructions, therefore aiming to bring FHE in the practical realm. The main result was a new symmetric construction designed to allow homomorphic operations on ciphertexts. The FHE paradigm is motivated by the ability to outsource storage and processing of encrypted data to a more computationally powerful third party. The first generation FHE systems lack efficiency. An inherent goal is to make such constructions time and memory efficient. A comparison between the previous symmetric FHE constructs and the current work was given (summarized in the following table):

Algorithm Reference Multiplicative depth
AES-128 [GHS12] 40
SIMON-64/128 [LN14] 44
Prince [DSE+14] 24
Kreyvium-12 [CCF+15] 12
LowMC-12 [ARS+15] 12
FLIP presented work 4

The cipher is based on a new framework, called FLIP - a Filter Permutator. Three main parts are used: a PRNG, a Permutation Generator and a Filtering function. The proposed construction instantiates the PRNG from AES-128; it uses the Knuth Shuffle as a Permutation Generator and introduces a new filtering function. The team built the filtering function having in mind minimality and analysability of such a filtering function, and is built from linear, quadratic and triangular functions.

Another blog entry will follow soon and will cover the afternoon talks!


Notes

1. Paris Crypto Day Website

2. A link to a talk by Antoine Joux, CryptoAction School on Cryptographic Attacks, Porto, 2014

Sunday, February 21, 2016

Git Basics for Cryptographers

This is the first blog post in a series of posts for using privacy enhancing tools and tools for collaboration with others.

This post is about the version control system git and is split in two parts: Firstly, we introduce version control and motivate its use. Secondly, we introduce the basics for using git.

1. Getting Started

In this section, we motivate the use of version control. Furthermore, we show where to download and how to install git. At the end, we explain how to first set up your git environment.

About Version Control: 

A Version Control System (VCS) is a tool that saves changes to a set of files over time, so that you can reassemble files from a previous state later. Moreover, it gives you the opportunity to compare changes over time, and if used in collaboration with others, it shows who changed what, when and why. 

Git in a nutshell:

The major difference between git and any other Version Control System (e.g. SVN, CVS, ...) is how git handles the data. Most of these systems store data as a list of file-based changes over time. Git is different. Git handles the data in "snapshots", and every time you commit something it takes a snapshot of what all your files look like and stores a reference to that snapshot.
An important thing to remember, is how git works internally. Git works in three states: committed, modified and staged. Committed means that your data is stored safely in your local repository. Modified means that you have changed a file, but have not committed it yet to your local repository. Staged means that you have marked a file to go into your next commit snapshot. 

http://git-scm.com/book/en/v2/Getting-Started-Git-Basics


A typical work flow in git is as follows:
  1. You modify some data/file in your working directory.
  2. You stage the files, adding a snapshot to your staging area.
  3. You do a commit, which adds your snapshot to your local repository.
Furthermore, nearly every command in git is local. This means, that git has a copy of the whole repository in a local repository where you work in. To synchronize this repositories you have to use git pull and push. 

Download & Install:

For Linux: 
$ sudo apt-get install git-all
For Mac OS: install Xcode Command Line tools or go to http://git-scm.com/download/mac and follow the instructions.
For Windows: got to http://git-scm.com/download/win and follow the instructions. 
  

First time setup:

After git is installed, you can configure some settings in your git config file. It is normally located in your home directory as ~/.gitconfig . The first thing you should do is to set up your identity:
$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com
You can also set up your preferred editor and many other things:
$ git config --global core.editor emacs
You should now have a basic understanding of how git works. Now let's get started with the fun part.

2. Git Basics

In this section, we show the basic commands in git. 

git init

If you want to set up a new repository, you first have to run: 
$ git init
which will set up a .git folder with all the necessary git files.

git clone

If you want to get a copy of an existing repository, you have to run: 
$ git clone username@host:/path/to/project.git exampleProject
which will use SSH to check out a copy of a remote repository. 

git status

To check the status of your files, to see which ones are modified, staged or committed, you can use: 
$ git status
On branch master
nothing to commit, working directory clean
This means that none of your files are untraced or modified. If you now for example add a new file README the output of git status will look like:
$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    README

nothing added to commit but untracked files present (use "git add" to track)
which means you have a new untraced file called README. 

git add

If you now want to add the previously created README file to your repository you have to use:
$ git add README
which will add the file README to staging area. If you run git status now you will see:
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   README

git ignore

You will often also have files which you do not want to be added to the git repository. This could for example be a pdf file, if you collaboratively work on latex documents, or some output and executable files, when you collaboratively program something. Therefore, git offers the .gitignore file. 
# no .a files
*.a

# ignore all files in the build/ directory
build/

# ignore all .pdf files in the doc/ directory
doc/**/*.pdf
Just add a file .gitignore with the previous content to your repository and git will ignore the defined files.

git commit

Now your staging area is set up. If you now want to commit your files to your local repository, you can use: 
$ git commit -m "Add feature X"
[master 666abcd] Add feature X
 2 files changed, 42 insertions(+)
 create mode 100645 README
which will add a snapshot of your current files to your repository. Be aware that files that you have not added will not be part of your snapshot. The option -m will add the message "Add feature X" to your commit, so that you can later know more about your commit. Always use short and meaningful commit messages!
If you discover that you have done something wrong, or forgot something, you can also undo your commit by running:
$ git commit --amend
which will open a commit log, where you can change your things and commit again.

git rm

If you want to remove a file that you already added to your staging area with git add you have to run:
$ git rm README2
rm 'README2'
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    deleted:    README2
which will delete the file README2. 

git log

To view your commit history you can use: 
$ git log
commit ca82a6dff817ec66f44342007202690a93763949
Author: Jane Doe <jane.doe@gmail.com>
Date:   Mon Mar 17 21:52:11 2008 -0700

    add the fancy feature Y

commit 085bb3bcb608e1e8451d4b2432f8ecbe6306e7e7
Author: John Doe <john.doe@gmail.com>
Date:   Fri Feb 30 13:37:00 2000 -0100

    initial commit
which will list your commit history in reverse chronical order. Be aware that files that are committed are not necessarily in the remote repository. If you commit a snapshot, you just add files to your local repository. 

git push

If you want to collaborate with others, or store your files in a remote repository, you have to connect your local repository with a remote repository. All of this is normally done by git itself, but you can also do it manually. To view your remotes run: 
$ git remote -v
origin https://github.com/johndoe/exampleProject (fetch)
origin https://github.com/johndoe/exampleProject (push)
which will show your remote branches. 

If you want now to push the changes in your local repository to the remote repository you have to run:
$ git push master origin
which will synchronize the changes in your local respository (master branch) to your remote repository. As a shortcut, you can also just type git push, when you are already in the correct branch.

git pull

To synchronize your local repository with your remote repository and to fetch data, you have to run:
$ git pull origin master
which will fetch the data from the remote repository, and merge it with the data in your local repository.

git tags

Git also offers you the abiltity to tag certain commits, which can be used to label for example a release point. To create a new tag just type:
$ git tag -a release1 -m "this are some files"
which will create a tag called release1 with a short description. To list tags you can run:
$ git tag
release1
release2
If you want to have more details about a tag, you can also run:
$ git show release1
tag release1
Tagger: Edward Snowden <citizenfour@gmail.com>                             Date:   Sat May 23 20:19:12 2013 +0800

this are some files

commit ca82a6dff817ec66f44342007202690a93763949
Author: Edward Snowden <citizenfour@gmail.com>
Date:   Mon Jan 01 21:52:11 2013 -1000

    copy all the data
You can also share tags:
$ git push origin release1
and check out the remote repository at exactly this tag:
$ git checkout -b release2 release2
Switched to a new branch 'release2'

gitk

There are also some tools that visualize your git repository. One of the most useful is gitk: https://git-scm.com/docs/gitk 

http://wiki.siduction.de/images/8/81/Gitk-main1.png

For further information, this post is mainly based on the git book: http://git-scm.com/book/en/v2
There are also some video lectures avaliable online at: http://git-scm.com/videos

These are the basics of git. Stay tuned for further posts about the advanced use of git and about other privacy enhancing and collaboration tools.







Sunday, February 14, 2016

Cryptocurrencies: addressing the problem of solvency


A cryptocurrency (or crypto currency) is a medium of exchange using cryptography to secure the transactions and to control the creation of new units.
(taken from Wikipedia)


The first decentralized cryptocurrency, bitcoin, was created in 2009 by Satoshi Nakamoto, a bogus name behind which the developer (or the group of developers) hides his identity. Despite many efforts to discover the real identity of Mr. Nakamoto (e.g. by analysing his perfect British English, his coding style, and even tracing the timestamps for his on line activities to recover the time zone he lives in), nothing is known for sure and the particulars of the creator(s) of bitcoin remain a mystery.

Three of the main concepts related to Bitcoin are addresses, keys and wallets:

 - an address is an identifier of 26-35 alphanumeric characters, beginning with the number 1 or 3, that represents a possible destination for a bitcoin payment. Unlike a common postal or e-mail address, a bitcoin address is designed to be used exactly once only. While it is technically possible to use it more than once, this can harm one's privacy and even that of other users

- bitcoin is based on public-key encryption: the secret key is used to sign transactions (and thus spend bitcoins), while the public one is used to verify these transactions. A bitcoin address is simply a 160-bit hash of the public key

- a wallet is a file that contains a collection of private keys


Among the many interesting features of this cryptocurrency, the bitcoin system is completely peer-to-peer, meaning that users can perform transactions directly, without any intermediary (like a bank). Transactions are verified by network nodes and recorded in a public distributed ledger, called block chain. Besides being exchanged thanks to transactions,  new bitcoins can be generated through mining:

Mining is the process of adding transaction records to bitcoin's public ledger of past transactions. This ledger [...] serves to confirm transactions to the rest of the network as having taken place. Bitcoin nodes use the block chain to distinguish legitimate bitcoin transactions from attempts to re-spend coins that have already been spent elsewhere.
Mining is intentionally designed to be resource-intensive and difficult so that the number of blocks found each day by miners remains steady. [...]
 The primary purpose of mining is to allow bitcoin nodes to reach a secure, tamper-resistant consensus. Mining is also the mechanism used to introduce bitcoins into the system: miners are paid any transaction fees as well as a "subsidy" of newly created coins. This both serves the purpose of disseminating new coins in a decentralized manner as well as motivating people to provide security for the system.
(taken from bitcoin wiki)


The use of bitcoin as a mean to pay for goods and services keeps on growing but, because of its untraceability, it is also used by criminals (e.g. for payments in black markets on the darknet): for this reason, several governments are starting to take actions in order to regulate virtual currencies.


Managing cryptographic keys can be difficult for many users, so they usually prefer to keep their money with online exchanges, which can then provide a simple user experience similar to online banking. Anyway, things can still go terribly bad! The story of Mt. Gox may be quite enlightening: this bitcoin exchange was created in 2010 and within three years it was already handling 70% of all bitcoin transactions. In 2014 it was announced that the equivalent of US$450M was "missing", likely stolen, and that the theft had gone undetected for years.
This event should suggest a very simple thing: it is important for the users to be able to verify that the exchange still holds their money. For this purpose it is possible to use the proofs of solvency.
Improving upon previous works (e.g. [1]), Bonneau et al. [2] proposed Provisions, a proof of solvency which is able to (1) reveal nothing about customer holdings, (2) keep the value of the exchange's total holdings secret, (3) maintain unlinkability between an exchange and its Bitcoin addresses and (4) avoid collusions between different exchanges.

Citing from [2], "a proof of solvency consists of two components. In the first, the proof of liabilities, the exchange commits to the total quantity of bitcoin it owes to all of its users. In the second, the proof of assets, the exchange commits to the total value of bitcoin it has signing authority over. If the latter committed value is greater than or equal to the former, the exchange is considered solvent".

Provisions consists of three main protocols:

1. proof of assets --- the exchange commits to its total assets and proves in zero-knowledge that the sum of balances held by the public keys it owns (i.e. those for which it knows the secret key) is equal to the committed value

2. proof of liabilities --- the exchange publishes a commitment to each user's account balance, then homomorphically sums the committed values and produces a commitment to the total liabilities. The exchange enables each user to privately verify that the commitment to his balance is correct

3. proof of solvency --- the exchange homomorphically computes a commitment to the difference of the assets and liabilities (thanks to the result of the first two protocols) and then proves in zero-knowledge that it is a commitment to zero


A final remark on the benefits introduced by proofs of solvency: it is certainly useful for the users to be able to periodically verify that an exchange still holds their money, but this does not solve all the problems, as the risk of a catastrophic loss always remains. Moreover, the simple fact that the exchange holds the bitcoins (i.e. the money) does not guarantee that it will return it when asked to do so! For instance, one could imagine a situation in which a mysterious party accepts to give the exchange its secret key(s), so that it can demonstrate it owns a certain amount of money and "pass" the proof of solvency. Nevertheless, this does not guarantee that the money will be really available for withdrawing or spending. Anyway, this has probably to be considered an intrinsic danger of such systems, as it is impossible to prove one's intention. In [2] they sketch a possible way to mitigate this kind of issue: an exchange could send a customer's money to an address redeemable by either the exchange or the user. This way, the user would have a "window" to redeem the money, if he wants. Of course, such a strategy would be rather impractical for obvious reasons.

In conclusion, while many people might still be scared of using cryptocurrencies (because of the complex mechanism they are based on and, possibly, because of their intrinsically intangible nature) steps are being taken to make the system more secure and "tempting" for the masses.


---
References:
[1] Z. Wilcox. Proving your bitcoin reserves, Feb. 2014
[2] G. G. Dagher, B. Bünz, J. Bonneau, J. Clark, D. Boneh. Provisions: Privacy-preserving proofs of solvency for Bitcoin exchanges, Oct. 2015

Monday, February 8, 2016

CAESAR Candidates and RFID for IOT

Authenticated Encryption (AE) schemes require data authenticity in addition to the confidentiality of the message content. Achieving these two security goals is crucial for enabling communication in the presence of an active adversary – one that that may tamper with message content in addition to observing communication. This problem is common to many of real-life scenarios, starting with the traditional ones (e.g., securing protocols for financial transactions), up to the rising number of Internet of Things (IOT) applications. For example, data acquired in a wireless sensor network should be available only to the owners of the network. Additionally, adversaries that can modify valid sensor data, or spoof false sensor data into the system, would lead to complete loss of credibility of the sensor readings – hence rendering the system useless.


In spite of many relevant use-cases, AE schemes remained not formalized for many years. Instead, cryptographic community – as well as “crypto-enthusiasts” – have relied on using generic compositions of existing cryptographic constructs (e.g., AES-CCM).  Unfortunately, this approach often lead to different types of exploitable weaknesses. In the year 2002 Rogaway made one of the first steps towards more reliable design of AE schemes. Today, a Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR) is running since 2014 with a goal of selecting a portfolio of AE ciphers. Now, CAESAR competition is in the second round, with 29 candidates remaining after the selection committee eliminated nearly half of them.  Therefore, at this point one may consider that remaining candidates provide a satisfactory level of security and robustness. Additionally, many of the authors discuss use-cases that involve both constrained and high-end devices, supporting them with hardware and software implementation results.

Considering IOT application of CAESAR candidates, boundaries of high-end implementations lie in what fits a server rack. On the other hand, constrained domain imposes many limitations. Namely, a swarm of low-end devices that is the frontier of IOT must provide sufficient data throughput, including both computational and communicational aspects; at a very low price. This price is most generally presented by the maximum allowed chip area, equal to the size of 2000 two-input NAND gates (2000 GE) in the technology of choice. Furthermore, the majority of these devices is either passively-, or battery-powered; and therefore must conform either low power, or low energy criteria, respectively. Contradicting nature of all these requirements makes it very difficult to find a single optimal approach for all use-cases. Hence, we will focus on devices that are most constrained in terms of resources at hand – passively powered Radio Frequency IDentification (RFID) circuits.

RFID systems infiltrated the world of IOT in many applications starting from electronic identification documents, access management, and tracking. More importantly, they can handle medical and payment applications. Whether they are in passports, credit cards, or they in tags that mark shipping crates, RFID are often international travelers. Consequently, interoperability in different parts of the world is crucial for their use; therefore, their use is commonly standardized. For example, ISO/IEC 18000 family of standards specifies RFID devices for a number of different applications. Compliant to this standard is the EPCGlobal Gen2 air interface protocol, published in 2004 and widely deployed since. The cryptographic community often overlooks these standards, and ones akin to them. Nevertheless, these standards include various overheads and constraints (e.g., maximum latency of the response), compliance to which makes the difference between usability and actual real-life applicability. For example, Table 1 gives overview of timing constraints for the EPCGlobal Gen2 protocol. A paper from 2012 gives an example of a RFID security study within the boundaries of the aforementioned protocol. Here Tari determines the length of the logic 0 symbol that initiates the communication. Additionally, T1 denotes the amount of time an RFID tag has to begin its response, upon successfully receiving the last symbol of the command, depending on the Tari value.

Tari (µs)
T1 Time (µs)
#Cycles @ 1.5 MHz
#Cycles @ 2 MHz
#Cycles @ 2.5 MHz
6.25
39.06
58
78
97
12.5
78.125
117
156
195
25
187.5
281
375
468

Seeing how the most straightforward way of minimizing power consumption is by reducing the area of the circuit, implementers strive towards both low area and power by serializing the computation. This often leads to an unfittingly large number of cycles. For example, a 1000 GE serial implementation of lightweight block cipher PRESENT fits the general constraint of 2000 GE rather elegantly. Nevertheless, its computational latency is 563 clock cycles, which makes it rather unfitting for the purpose at hand. Since this implementation takes only 50% of the area boundary, it allows a large maneuver space that can be threaded in order to meet the needs of EPCGlobal Gen2 protocol; which is not the case for a number of CAESAR candidates (see below). Lastly, note that this standard is only one real-life example chosen to depict, not dictate, requirements. Nevertheless, we due to the very scarce amount of resource, and the interoperability imperative, we believe it would be most beneficial to consider concrete use-cases while evaluating hardware implementations of the remaining candidates.

Continuing to overview the remaining candidates, we notice there are 15 candidates that either mandate or recommend use of AES or AES round function. While these are sound design decisions from the perspective of the knowledge base on AES and its presence in the modern day world, relying on AES may bring hardship in lightweight applications. Namely, the smallest implementation of AES by, Moradi et al., requires 2400 GE of UMCL18G212T3 standard-cell library, (and 226 clock cycles), and consumes 3.7 µW of power at the operating frequency of 100 kHz. AES computation is the dominant part of the computation/area it determines required resources in a great deal; therefore, we will not discuss these candidates separately.

Listed in the alphabetical order, we overview candidates that make claims of being lightweight in hardware, and present relevant implementation results. Therefore, designs intended for high-end applications are not considered. Furthermore, since there are no results available for all candidates, we allow certain estimates. In addition, some authors have abstained from providing power figures, and results are provided for different technology libraries, hence these implementations should not be compared among themselves, since it might yield an unfair comparison. 


  •         ACORN. Based on a novel construct, uses 128-bit key.  It is a stream-based cipher, and authors estimate its hardware implementation to be slightly larger than that of TRIVIUM cipher, which requires at least 2599 GE using 130 nm CMOS technology and 1333 cycles for initialization, followed by a cycle for each bit of encryption. We found no power estimations for this implementation.
  •           Ascon. Sponge-based mode of operation based on a novel SPN permutation, uses 128-bit key. A variety of hardware implementations are available, smallest one requiring 2570 GE (3750 GE with interface) in UMC 90nm technology, and 512 cycles per round; utilizing 1.5 µW at 100 kHz operating frequency. Note that 6 or 12 rounds are required for a complete permutation.
  •          Ketje. Based on a round-reduced version of SHA-3 competition winner Keccak-f[400] (Ketje-Sr) and Keccak-f[200] (Ketje-Jr), uses 128-, and 96-bit keys, respectively. Authors did not include any hardware implementation results in the submission document. Nevertheless, this candidate comes with an advantage of being able to share hardware with SHA-3 circuitry, and Keccak-f[200] permutation can be implemented by as little as 1270 GE.
  •     Minalpher. Based on a novel SPN permutation, employed in a tweakable Even-Manseur construct, uses 128-bit key. Smallest hardware implementation results presented are only for the underlying Minalpher-P permutation, require 2700 GE in NanGate 45nm standard-cell library, and require 288 clock cycles to finish computation. Authors did not present any power figures.
  •          PRIMATEs. Another sponge-based design, with a proprietary SPN permutation; use 80-, or 120-bit keys. PRIMATEs offer 3 modes of operation, allowing tradeoffs between security and performance, for each key size. Authors did not include any hardware implementation results in the submission document. Still unpublished work of ours shows that the core permutation for an 80-bit key can be implemented with as little as 1200 GE using UMC 90 nm standard cell library, which requires 16 cycles per round, consuming 0.68 µW of power at the operating frequency of 100 kHz Note that 6 or 12 rounds are required for a complete permutation.
  •           TriviA-Ck. Based on Trivia-SC stream cipher is a variant of TRIVIUM; uses 128-bit key. 

Tuesday, February 2, 2016

Parallelization of Pollard's rho Method

Pollard's rho method is probably the most common and well-known algorithm for computing discrete logarithms. However, the standard version is not very useful for large problems. We thankfully understand rather well how to adapt the algorithm in such a way that we can use it in a parallel fashion. We will focus on discrete logarithms in the elliptic curve setting.

To give credit where credit is due, the explanations were primarily adapted from those available in [1]. There are earlier explanations of the concept, but I feel this is the most readily understandable.

The Standard Version


The discrete logarithm for elliptic curves (ECDLP) is defined as finding a scalar \(n\) such that \(nP = Q\) for points \(P\) and \(Q \in G\) where \(\langle P \rangle = G\). Standard Pollard rho now works by finding a collision between two distinct linear combinations of \(P\) and \(Q\). That is, finding a collision in the map
\[ (a,b) \mapsto aP + bQ. \]
Given this collision \((a, a', b, b')\) we can, with high probability, compute the scalar \(n\). Finding this collision is done by iterating a function \(f\) that, from one linear combination of \(P, Q\), outputs another valid linear combination of \(P, Q\). The standard version simply computes one long chain of iterations until it detects a cycle in this chain, yielding the necessary collision.

If the output of \(f\) produces random linear combinations of \(P\) and \(Q\) and \(l\) is the order of \(P\), we expect to need \(\sqrt{\pi l / 2}\) iterations to succeed, as per the standard birthday problem. This is known as a random walk.


Figure 1: A collision in the standard version of Pollard's rho method.

The Parallel Version


You could theoretically build a parallel version of Pollard rho by simply defining \(m\) distinct starting points for \(m\) chains. You then would run the algorithm in parallel on all the chains and wait for one to finish. However, we can do better.

While often Pollard's rho method is framed in terms of finding a cycle, we are actually finding a collision. This means we can apply Van Oorschot and Wiener's [2] techniques for parallel collision search. Intuitively, we still take \(m\) different starting points \(W_0^i\) and have them iterate a function \(f\). However, every walk now sends its point to the same central location. Here we simply keep track of all incoming points and when a collision is found we are done.

Figure 2: A collision in the parallel version of Pollard's rho method.
There is a key secondary consideration though. For non-trivial problem sizes we cannot simply store every point generated in each of these walks. Consider the very realistic number of 100 million points generated per second on one single machine. Then consider the fact that for a 100-bit problem we need approximately \[ \frac{\sqrt{\pi 2^{100}/4}}{10^8 \cdot 3600 \cdot 24} \approx 115 \] days. This costs way too much storage. Supposing that we can store a point for such a problem in only 100 bits we would still require well over 10000 TiB of storage. This estimate even assumes the use of the negation map, hence the division by 4 in the earlier calculation. The negation map is an optimization technique that halves the search space, but we will not discuss it in detail.

Thankfully, we know how to solve this. We simply output only those points which have a particular property that occurs with a certain probability. We call these distinguished points and the related property the distinguished point property. The computation of this property should be efficient and it is very convenient if the probability can be tweaked.
For example, given a point \(P = (x,y)\), only output \(P\) if the binary representation of \(x\) has at least \(n\) leading zeroes. This is very fast to compute and it can be trivially adjusted by changing \(n\).

Note that we can do this without loss of functionality. Once a collision occurs between two walks, every subsequent point also collides. This means we find a collision at the next distinguished point the converged walks reach, assuming we do not enter a cycle before then.

References


[1] Daniel J. Bernstein, Tanja Lange, and Peter Schwabe. On the correct use of the negation map in the Pollard rho method. In Dario Catalano, Nelly Fazio, Rosario Gennaro, and Antonio Nicolosi, editors, Public Key Cryptography -- PKC 2011, volume 6571 of Lecture Notes in Computer Science, pages 128--146. Springer-Verlag Berlin Heidelberg, 2011. http://cr.yp.to/papers.html#negation.
[2] Paul C. Van Oorschot and Michael J. Wiener. Parallel collision search with cryptanalytic applications. Journal of cryptology, 12(1):1--28, 1999.