Rules for Robots

axebot Robot behaviour is no longer a purely theoretical problem. Since Asimov came up with the famous Three Laws which provide the framework for his robot stories, a good deal of serious thought has been given to extreme cases where robots might cause massive disasters and to such matters as the ethics of military robots. Now, though, things have moved on to a more mundane level and we need to give thought to more everyday issues. OK, a robot should not harm a human being or through inaction allow a human being to come to harm, but can we also just ask that you stop knocking the coffee over and throwing my drafts away? Dario Amodei, Chris Olah, John Schulman, Jacob Steinhardt, Paul Christiano, and Dan Mane have considered how to devise appropriate rules in this interesting paper.

They reckon things can go wrong in three basic ways. It could be that the robot’s objective was not properly defined in the first place. It could be that the testing of success is not frequent enough, especially if the tests we have devised are complex or expensive. Third, there could be problems due to “insufficient or poorly curated training data or an insufficiently expressive model”. I take it these are meant to be the greatest dangers – the set doesn’t seem to be exhaustive.

The authors illustrate the kind of thing that can go wrong with the example of an office cleaning robot, mentioning five types of case.

  • Avoiding Negative Side Effects: we don’t want the robot to clean quicker by knocking over the vases.
  • Avoiding Reward Hacking: we tell the robot to clean until it can’t see any mess; it closes its eyes.
  • Scalable Oversight: if the robot finds an unrecognised object on the floor it may need to check with a human; we don’t want a robot that comes back every three minutes to ask what it can throw away, but we don’t want one that incinerates our new phone either.
  • Safe Exploration: we’re talking here about robots that learn, but as the authors put it, the robot should experiment with mopping strategies, but not put a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: we want a robot that learned its trade in a factory to be able to move safely and effectively to an office job.How do we ensure that the cleaning robot recognizes, and behaves robustly, when in an environment different from its training environment? For example, heuristics it learned for cleaning factory workfloors may be outright dangerous in an office.

The authors consider a large number of different strategies for mitigating or avoiding each of these types of problem. One particularly interesting one is the idea of an impact regulariser, either pre-defined or learned by the robot. The idea here is that the robot adopts the broad principle of leaving things the way people would wish to find them. In the case of the office this means identifying an ideal state – rubbish and dirt removed, chairs pushed back under desks, desk surfaces clear (vases still upright), and so on. If the robot aims to return things to the ideal state this helps avoid negative side effects of an over-simplified objective or other issues.

There are further problems, though, because if the robot invariably tries to put things back to an ideal starting point it will try to put back changes we actually wanted, clear away papers we wanted left out, and so on. Now in practice and in the case of an office cleaning robot I think we could get round those problems without too much difficulty; we would essentially lower our expectations of the robot and redesign the job in a much more limited and stereotyped way. In particular we would give up the very ambitious goal of making a robot which could switch from one job to another without adjustment and without faltering.

Still it is interesting to see the consequences of the more ambitious approach. The final problem, cutting to the chase, is that in order to tell how humans want their office arranged in every possible set of circumstances, you really cannot do without a human level of understanding. There is an old argument that robots need not resemble humans physically; instead you make your robot to fit the job; a squat circle on wheels if you’re cleaning the floor, a single fixed arm if you want it to build cars. The counter-argument has often been that our world has been shaped to fit human beings, and if we want a general purpose robot it will pay to have it more or less human size and weight, bipedal, with hands, and so on. Perhaps there is a parallel argument to explain why general-purpose robots need human-level cognition; otherwise they won’t function effectively in a world shaped by human activity. The search for artificial general intelligence is not an idle project after all?