dagstuhl-ceIs there an intermediate ethical domain, suitable for machines?

The thought is prompted by this summary of an interesting seminar on Engineering Moral Agents, one of the ongoing series hosted at Schloss Dagstuhl. It seems to have been an exceptionally good session which got into some of the issues in a really useful way – practically oriented but not philosophical naive. It noted the growing need to make autonomous robots – self-driving cars, drones, and so on – able to deal with ethical issues. On the one hand it looked at how ethical theories could be formalised in a way that would lend itself to machine implementation, and on the other how such a formalisation could in fact be implemented. It identified two broad approaches: top-down, where in essence you hard-wire suitable rules into the machine, and bottom-up, where the machine learns for itself from suitable examples. The approaches are not necessarily exclusive, of course.

The seminar thought that utilitarian or Kantian theories of morality were both prima facie candidates for formalisation. Utilitarian or more broadly, consequentialist theories look particularly promising because calculating the optimal value (such as the greatest happiness of the greatest number) achievable from the range of alternatives on offer looks like something that can be reduced to arithmetic fairly straightforwardly. There are problems in that consequentialist theories usually yield at least some results that look questionable in common sense terms (finding the initial values to slot into your sums is also a non-trivial challenge – how do you put a clear numerical value on people’s probable future happiness?)

A learning system eases several of these problems. You don’t need a fully formalised system (so long as you can agree on a database of examples). But you face the same problems that arise for learning systems in other contexts; you can’t have the assurance of knowing why the machine behaves as it does, and if your database had unnoticed gaps or bias you may suffer from sudden catastrophic mistakes.  The seminar summary rightly notes that a machine that learned its ethics will not be able to explain its behaviour; but I don’t know that that means it lacks agency; many humans would struggle to explain their moral decisions in a way that would pass muster philosophically. Most of us could do no more than point to harms avoided or social rules observed at best.

The seminar looked at some interesting approaches, mentioned here with tantalising brevity: Horty’s default logic, Sergot’s STIT (See To It That) logic; and the possibility of drawing on the decision theory already developed in the context of micro-economics. This is consequentialist in character and there was an examination of whether in fact all ethical theories can be restated in consequentialist terms (yes, apparently, but only if you’re prepared to stretch the idea of a consequence to a point where the idea becomes vacuous). ‘Reason-based’ formalisations presented by List and Dietrich interestingly get away from narrow consequentialisms and their problems using a rightness function which can accommodate various factors.

The seminar noted that society will demand high, perhaps precautionary standards of safety from machines, and floated the idea of an ethical ‘black box’ recorder. It noted the problem of cultural neutrality and the risk of malicious hacking. It made the important point that human beings do not enjoy complete ethical agreement anyway, but argue vigorously about real issues.

The thing that struck me was how far it was possible to go in discussing morality when it is pretty clear that the self-driving cars and so on under discussion actually have no moral agency whatever. Some words of caution are in order here. Some people think moral agency is a delusion anyway; some maintain that on the contrary, relatively simple machines can have it. But I think for the sake of argument we can assume that humans are moral beings, and that none of the machines we’re currently discussing is even a candidate for moral agency – though future machines with human-style general understanding may be.

The thing is that successful robots currently deal with limited domains. A self-driving car can cope with an array of entities like road, speed, obstacle, and so on; it does not and could not have the unfettered real-world understanding of all the concepts it would need to make general ethical decisions about, for example, what risks and sacrifices might be right when it comes to actual human lives. Even Asimov’s apparently simple Laws of Robotics required robots to understand and recognise correctly and appropriately the difficult concept of ‘harm’ to a human being.

One way of squaring this circle might be to say that, yes, actually, any robot which is expected to operate with any degree of autonomy must be given a human-level understanding of the world. As I’ve noted before, this might actually be one of the stronger arguments for developing human-style artificial general intelligence in the first place.

But it seems wasteful to bestow consciousness on a roomba, both in terms of pure expense and in terms of the chronic boredom the poor thing would endure (is it theoretically possible to have consciousness without the capacity for boredom?). So really the problem that faces us is one of making simple robots, that operate on restricted domains, able to deal adequately with occasional issues from the unrestricted domain of reality. Now clearly ‘adequate’ is an important word there. I believe that in order to make robots that operate acceptably in domains they cannot understand, we’re going to need systems that are conservative and tend towards inaction. We would not, I think, accept a long trail of offensive and dangerous behaviour in exchange for a rare life-saving intervention. This suggests rules rather than learning; a set of rules that allow a moron to behave acceptably without understanding what is going on.

Do these rules constitute a separate ethical realm, a ‘sub-ethics’ that substitute for morality when dealing with entities that have autonomy but no agency? I rather think they might.

3 Comments

  1. 1. Paul Torek says:

    If anyone thinks Utilitarianism is an obvious candidate for formalization, they must have got stuck on “greatest” and not noticed the profound difficulties of “happiness”.

    On the bottom-up learning approach: there is a result from image classification machine learning that is very worrying. The authors write that “Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye.” To put this in human terms, imagine a pair of cellophane glasses with a light pattern printed in them that made you perceive *everything* around you as a different thing (hat tip: JPNunez).

    We want our ethics-learners to be less vulnerable than that.

  2. 2. SelfAwarePatterns says:

    “is it theoretically possible to have consciousness without the capacity for boredom?”

    That depends on exactly how we define consciousness. If a robot is able to predictively model its environment and its own systems as a guide to achieving its goals, but those goals include no impulse to do something when its not in use, is it conscious? In other words, does consciousness require that the entity have its own independent agenda, one that isn’t a subset of our agenda?

    I can’t see that there’s a fact of the matter answer here. It’s a matter of philosophical definition, although I suspect most people’s intuition of consciousness won’t be triggered without the system having that independent self concerned agenda.

  3. 3. David Duffy says:

    Paul Torek mentions one paper on vulnerabilities in a particular style of classifier to a particular attack (which presumes no ongoing learning). But in the human domain, “observers often do not notice when two people in a photograph exchange heads, provided that the change occurs during an eye movement..observers who were giving directions to an experimenter often did not notice that the experimenter was replaced by a different person during an interruption caused by a door being carried between them” [Simons and Chabris 1999, introduction to their gorilla experiment]. Hopefully, self-driving cars will be slightly more likely to “see” cyclists than they are here. It seems to me that once rules have been set up by us agential characters, much of this reduces to engineering – how often is a rule broken by this particular robot across a range of real life situations.

Leave a Reply