This post is about the event “Living in a
Data Obsessed Society” that took part in Bristol on Friday the 2nd
of December. If you missed the first part, you can find it here.
The view that decisions based on data are neutral,
efficient and always desirable was probably the most challenged during the evening. A very disturbing example was the result of a
recent investigation
by ProPublica, which found
that machine learning software used in the United States to assess
the risk of past criminals to recidivate was twice as likely to
mistakenly flag black defendants as being more prone to repeat an
offence. The same AI-led software was also twice as likely to
incorrectly flag white defendants at low risk.
The reasons for these biases remain mostly unknown,
as the company responsible for these algorithms keeps them as a trade
secret. Even assuming racism was not explicitly hardcoded in the
algorithms, Charlesworth and Ladyman reminded that not only humans,
but also algorithms, make decisions in conditions that are far from
ideal. Machines learn on datasets that are chosen by engineers, and
the choice of which data is used on this step is also going to teach
the underlying biases to the algorithms. Predictive programs are at
most as good as the data on which they are trained on, and that data
has a convoluted history that does include discrimination.
There is, then, a big risk of perpetuating a
vicious cycle in which people keep being discriminated because they
were in the past and are in the present. Moreover, the entrenching of
these biases could be seen as something correct, just because of the
ideological unquestioning of the authority that we give to the ruling
of data and algorithms in these processes. Because, as the speakers
pointed at several moments, algorithms in general or machine learning
in particular do not eliminate the human component: Humans and
human-related data take part in the designing steps, but the
centralized power related to these choices remains mostly hidden and
unchallenged.
Whereas in existing social institutions there is
usually a better or worse
designed procedure to object when errors happen, it is hard to
dispute the outputs of Artificial Intelligence. In words of Ladyman,
the usual answer would be something like “we
used our proprietary algorithms, trained with our private dataset X
of this many terabytes, and on your input, the answer was ‘Bla’”.
He also pointed the huge swift of power this meant from the classical
institutions to computer scientists working on big companies, and
their probable lack of exposure to social sciences during their
studies and career.
Assuming we were to set aside the important aspect of dispute
resolution, then the question should be framed as whether these
algorithms make “better” decisions than humans, as biases are
inherent to both. The meaning of being “better” is, though,
another choice that has to be made, and it is a moral one. As such,
society should be well informed about the principles embedded on this
new forms of authority, in order to establish what aligns with its
goals. A very simple example was given: Is more desirable an
algorithm that declares guilty people guilty with a 99% probability,
while putting a lot of innocent people in jail; or one that almost
never condemns the innocents but leaves numerous criminals go away?
The same kind of reasoning could be iterated for the discrimination
between defendants commented above.
Someone could maybe derive from this text that the
speakers were some kind of dataphobes who would not like to see any
automation or data ever involved in decision making, but it was not
the case. At several points, all of them praised in one way or
another the benefits of data and its use in our societies, from
healthcare to science to the welfare state. The questioning was about
the idealization of data gathering and its algorithmic processing as
an ubiquitous, ultimate goal for all
aspects of our lives. A
critique of its authority based on objectivity claims and, ultimately, a review of the false dichotomy between moral and
empiricist traditions. Charlesworth celebrated the dialogue that has taken place between lawyers and computer scientists during the last years. Nevertheless, he urged to expand the horizons of these discussions to include philosophers and the rest of social sciences as well. Cristianini, being a professor on Artificial Intelligence himself, concluded his intervention with a concise formulation: Whatever we use these technologies for, humans should be at the centre of them.
The moral dimension of Cryptography, and its role on rearranging what can be done, by whom and from which data, might come to mind for several readers as well. Since the Snowden revelations, it has become a more and more central debate within the community, which I find best crystallized on the IACR distinguished lecture given by Rogaway on Asiacrypt 2015 and its accompanying paper. Technologies such as Fully Homomorphic Encryption, Secure Multiparty Computation or Differential Privacy can help mitigate some of the problems related with data gathering while retaining its processing utility. All in all, one of Ladyman's conclusions applies here: we should still question who benefits from this processing, and how. A privacy-preserving algorithm that incurs (unintended) discrimination is not a desirable one. As members of society, and as researchers, it is important that we understand the future roles and capabilities of data gathering, storing and processing. From mass surveillance, to biased news, to other forms of decision making.
The moral dimension of Cryptography, and its role on rearranging what can be done, by whom and from which data, might come to mind for several readers as well. Since the Snowden revelations, it has become a more and more central debate within the community, which I find best crystallized on the IACR distinguished lecture given by Rogaway on Asiacrypt 2015 and its accompanying paper. Technologies such as Fully Homomorphic Encryption, Secure Multiparty Computation or Differential Privacy can help mitigate some of the problems related with data gathering while retaining its processing utility. All in all, one of Ladyman's conclusions applies here: we should still question who benefits from this processing, and how. A privacy-preserving algorithm that incurs (unintended) discrimination is not a desirable one. As members of society, and as researchers, it is important that we understand the future roles and capabilities of data gathering, storing and processing. From mass surveillance, to biased news, to other forms of decision making.
No comments:
Post a Comment