Tuesday, December 13, 2016

Living in a Data Obsessed Society (Part II)

This post is about the event “Living in a Data Obsessed Society” that took part in Bristol on Friday the 2nd of December. If you missed the first part, you can find it here.
The view that decisions based on data are neutral, efficient and always desirable was probably the most challenged during the evening. A very disturbing example was the result of a recent investigation by ProPublica, which found that machine learning software used in the United States to assess the risk of past criminals to recidivate was twice as likely to mistakenly flag black defendants as being more prone to repeat an offence. The same AI-led software was also twice as likely to incorrectly flag white defendants at low risk.
The reasons for these biases remain mostly unknown, as the company responsible for these algorithms keeps them as a trade secret. Even assuming racism was not explicitly hardcoded in the algorithms, Charlesworth and Ladyman reminded that not only humans, but also algorithms, make decisions in conditions that are far from ideal. Machines learn on datasets that are chosen by engineers, and the choice of which data is used on this step is also going to teach the underlying biases to the algorithms. Predictive programs are at most as good as the data on which they are trained on, and that data has a convoluted history that does include discrimination.
There is, then, a big risk of perpetuating a vicious cycle in which people keep being discriminated because they were in the past and are in the present. Moreover, the entrenching of these biases could be seen as something correct, just because of the ideological unquestioning of the authority that we give to the ruling of data and algorithms in these processes. Because, as the speakers pointed at several moments, algorithms in general or machine learning in particular do not eliminate the human component: Humans and human-related data take part in the designing steps, but the centralized power related to these choices remains mostly hidden and unchallenged.
Whereas in existing social institutions there is usually a better or worse designed procedure to object when errors happen, it is hard to dispute the outputs of Artificial Intelligence. In words of Ladyman, the usual answer would be something like “we used our proprietary algorithms, trained with our private dataset X of this many terabytes, and on your input, the answer was Bla”. He also pointed the huge swift of power this meant from the classical institutions to computer scientists working on big companies, and their probable lack of exposure to social sciences during their studies and career.
Assuming we were to set aside the important aspect of dispute resolution, then the question should be framed as whether these algorithms make “better” decisions than humans, as biases are inherent to both. The meaning of being “better” is, though, another choice that has to be made, and it is a moral one. As such, society should be well informed about the principles embedded on this new forms of authority, in order to establish what aligns with its goals. A very simple example was given: Is more desirable an algorithm that declares guilty people guilty with a 99% probability, while putting a lot of innocent people in jail; or one that almost never condemns the innocents but leaves numerous criminals go away? The same kind of reasoning could be iterated for the discrimination between defendants commented above.
Someone could maybe derive from this text that the speakers were some kind of dataphobes who would not like to see any automation or data ever involved in decision making, but it was not the case. At several points, all of them praised in one way or another the benefits of data and its use in our societies, from healthcare to science to the welfare state. The questioning was about the idealization of data gathering and its algorithmic processing as an ubiquitous, ultimate goal for all aspects of our lives. A critique of its authority based on objectivity claims and, ultimately, a review of the false dichotomy between moral and empiricist traditions. Charlesworth celebrated the dialogue that has taken place between lawyers and computer scientists during the last years. Nevertheless, he urged to expand the horizons of these discussions to include philosophers and the rest of social sciences as well. Cristianini, being a professor on Artificial Intelligence himself, concluded his intervention with a concise formulation: Whatever we use these technologies for, humans should be at the centre of them.

The moral dimension of Cryptography, and its role on rearranging what can be done, by whom and from which data, might come to mind for several readers as well. Since the Snowden revelations, it has become a more and more central debate within the community, which I find best crystallized on the IACR distinguished lecture given by Rogaway on Asiacrypt 2015 and its accompanying paper. Technologies such as Fully Homomorphic Encryption, Secure Multiparty Computation or Differential Privacy can help mitigate some of the problems related with data gathering while retaining its processing utility. All in all, one of Ladyman's conclusions applies here: we should still question who benefits from this processing, and how. A privacy-preserving algorithm that incurs (unintended) discrimination is not a desirable one. As members of society, and as researchers, it is important that we understand the future roles and capabilities of data gathering, storing and processing. From mass surveillance, to biased news, to other forms of decision making.

No comments:

Post a Comment