Thursday, November 2, 2017

Compressing ledgers of financial transactions

The history of modern banking begins almost 550 years ago, with the establishment of Bank Monte dei Paschi di Siena, in nowadays Italy. However, it wasn't until 1980s (early introduction of home banking), when pioneering financial institutions started to make use of computing machines to automatically process part of their financial transactions, and thus replace the manual, more error-prone process. As the availability of the Internet increased, starting with early 2000s, many major banks began to offer Internet banking to their customers. This means that one can access his/her account's balance or history through a web-browser or smartphone and initiate or receive transactions.


Nowadays, a number of transactions in the order of millions are processed on a daily basis by a large bank. As the time passes and more and more transactions are processed, the size of this set only increases. Thus, one should think at potential solutions for removing the old and useless ones, such that they can be safely archived. The story repeats for the case of crypto-currencies (with the corresponding modifications - multiple senders or receivers allowed, accounts replaced with addresses). If one considers for instance the size of transactions processed by a well-known crypto-currency, in the first six years of its lifespan, the size of the ledger reached aound 95 GB, while only in the first 9 months of 2017, more than 40 GB were processed. Such a growth implies increased computational costs while verifying the actual balance of a specific address, as a traversal through the set of addresses needs to be employed.


A first easy step one can do is to try to switch from the representation of transactions as lists, to a more visual one, which represents accounts as nodes in a graph and transactions as edges. Since multiple edges are allowed, the graph becomes in fact a multigraph. In their work, recently presented at SecureComm17, Rémi Geraud, David Naccache and Răzvan Roșie put forward the problem of finding "nilcatenations" in sets of transactions. Loosely speaking, a nilcatenation is a subgraph in a given multigraph with a special property: the balances of the nodes are zero, for each existing account. Stated differently, every single user part of a nilcatenation receives the same amount of money that it gets. Since the balances in such components do not affect the global balance of the original multigraph, nilcatenations can be decoupled and archived.


Some interesting observations can be made about nilcatenations. Any occurrence of such a component can be only as a part of a strongly connected components (SCC): a maximal subgraph of a directed graph with paths between any two nodes. Another observation can be made regarding simple, obstructing nodes: if we identify nodes with the in and out degrees set to 1, but with different weights for incoming and outgoing edges, then such nodes cannot be included in a nilcatenation (we dub them "first order obstructions"). After clearly formalizing it, the problem turns out to be be NP-Complete, via a reduction to the 0-target SSP problem (which is equivalent with SSP).


After pruning the original graph into smaller components (by employing SCC-split and first-order obstruction removal steps until convergence), one can benefit from known techniques in attacking the (multi-dimensional) subset-sum problem for each component, independently. Particularly, we can see the problem of finding nilcatenations as a multi-dimensional version of the SSP, and tackle each component independently. The known techniques employing the usage of an SVP-oracle the density work on low density instances. An overview of our heuristic is given in the second picture. More details can be found in the original work.