axebot Robot behaviour is no longer a purely theoretical problem. Since Asimov came up with the famous Three Laws which provide the framework for his robot stories, a good deal of serious thought has been given to extreme cases where robots might cause massive disasters and to such matters as the ethics of military robots. Now, though, things have moved on to a more mundane level and we need to give thought to more everyday issues. OK, a robot should not harm a human being or through inaction allow a human being to come to harm, but can we also just ask that you stop knocking the coffee over and throwing my drafts away? Dario Amodei, Chris Olah, John Schulman, Jacob Steinhardt, Paul Christiano, and Dan Mane have considered how to devise appropriate rules in this interesting paper.

They reckon things can go wrong in three basic ways. It could be that the robot’s objective was not properly defined in the first place. It could be that the testing of success is not frequent enough, especially if the tests we have devised are complex or expensive. Third, there could be problems due to “insufficient or poorly curated training data or an insufficiently expressive model”. I take it these are meant to be the greatest dangers – the set doesn’t seem to be exhaustive.

The authors illustrate the kind of thing that can go wrong with the example of an office cleaning robot, mentioning five types of case.

  • Avoiding Negative Side Effects: we don’t want the robot to clean quicker by knocking over the vases.
  • Avoiding Reward Hacking: we tell the robot to clean until it can’t see any mess; it closes its eyes.
  • Scalable Oversight: if the robot finds an unrecognised object on the floor it may need to check with a human; we don’t want a robot that comes back every three minutes to ask what it can throw away, but we don’t want one that incinerates our new phone either.
  • Safe Exploration: we’re talking here about robots that learn, but as the authors put it, the robot should experiment with mopping strategies, but not put a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: we want a robot that learned its trade in a factory to be able to move safely and effectively to an office job.How do we ensure that the cleaning robot recognizes, and behaves robustly, when in an environment different from its training environment? For example, heuristics it learned for cleaning factory workfloors may be outright dangerous in an office.

The authors consider a large number of different strategies for mitigating or avoiding each of these types of problem. One particularly interesting one is the idea of an impact regulariser, either pre-defined or learned by the robot. The idea here is that the robot adopts the broad principle of leaving things the way people would wish to find them. In the case of the office this means identifying an ideal state – rubbish and dirt removed, chairs pushed back under desks, desk surfaces clear (vases still upright), and so on. If the robot aims to return things to the ideal state this helps avoid negative side effects of an over-simplified objective or other issues.

There are further problems, though, because if the robot invariably tries to put things back to an ideal starting point it will try to put back changes we actually wanted, clear away papers we wanted left out, and so on. Now in practice and in the case of an office cleaning robot I think we could get round those problems without too much difficulty; we would essentially lower our expectations of the robot and redesign the job in a much more limited and stereotyped way. In particular we would give up the very ambitious goal of making a robot which could switch from one job to another without adjustment and without faltering.

Still it is interesting to see the consequences of the more ambitious approach. The final problem, cutting to the chase, is that in order to tell how humans want their office arranged in every possible set of circumstances, you really cannot do without a human level of understanding. There is an old argument that robots need not resemble humans physically; instead you make your robot to fit the job; a squat circle on wheels if you’re cleaning the floor, a single fixed arm if you want it to build cars. The counter-argument has often been that our world has been shaped to fit human beings, and if we want a general purpose robot it will pay to have it more or less human size and weight, bipedal, with hands, and so on. Perhaps there is a parallel argument to explain why general-purpose robots need human-level cognition; otherwise they won’t function effectively in a world shaped by human activity. The search for artificial general intelligence is not an idle project after all?


  1. 1. Jochen says:

    Sometimes I wonder if we’re not a bit overreaching in our expectations of our putative future mechanical servants. We want to be able to communicate perfectly with them, have them cater to our every wish and whim, solve the problems we set them to our full satisfaction, and so on; but of course, that’s a state of affairs we’re very far from reaching with our human brethren. I need not invoke the example of the cleaning lady mopping up a Beuys installation to make the point here—everybody basically constantly is faced with the fact that either we understand problems posed to us poorly, or those we want to fulfill some task for us fail due to miscommunication. In fact, that there is perfect communication between two humans is probably exceedingly rare; why would we expect, or even aim at, things being differently when it comes to robots?

  2. 2. Jochen says:

    Also, I think there is at least one additional source of failure that’s not considered in the above—we need to make sure that the instructions the robot receives don’t trap it in an infinite loop, as with the computer scientist who never finishes his shower, because the bottle’s instructions read ‘lather, rinse, repeat’. This could perhaps be addressed by introducing a maximum time by which the task should be completed, or else, it needs to be terminated.

    However, how does one decide what the maximum reasonable time is in which a given task should be completed? This runs into questions of open-endedness, and I think probably these problems alone necessitate some human or near-human level reasoning capacities.

  3. 3. Hunt says:

    ‘lather, rinse, repeat’

    Not to make this devolve into another debate about computationalism (oh we know it’s going to happen anyway…)

    But your example underscores the fact that human task-following always happens at the meta- or meta-meta-… level. We don’t ‘execute’ algorithms, we ‘follow’ algorithms in a kind of fuzzy, high-level way, thus we don’t get caught in endless loops (normally, perhaps OCD could be approximately explained by this). I’m not sure there are any examples of this in computer science, and perhaps it would be a promising area to research.

  4. 4. Callan S. says:

    Why do people want robot ‘servants’ anyway?

    You’d think the solution to giving people what they want would lie in…why they want it to begin with.

  5. 5. Hunt says:

    I don’t know, I think I could get into having a robot like Robbie from Forbidden Planet around–a super strong maker bot who could fabricate anything I wanted, like fifty gallons of booze.

  6. 6. Jochen says:


    I agree—I’ve heard this described as the distinction between ‘following’ an algorithm, and ‘being governed’ by an algorithm. The distinction, roughly, being that if you follow an algorithm, you can choose not to, or violate its rules, or even make errors in the execution, while if you are governed by said algorithm, you’re constrained to follow it no matter what, and without any other option. In a sense, one might want to argue that in such a case, one doesn’t even know one is following that algorithm—there is no level on which one could reflect on doing so.

    This of course invites the question: is the decision whether to follow an algorithm again governed by an algorithm? That is, is there an algorithm governing our behavior that determines whether we follow some other algorithm? And perhaps several stacks of meta-levels beyond?

    One might suppose that, with algorithm on level 0 being the object of algorithm on level 1 which decides whether we follow the level-0 algorithm, this induces some capacity of self-reflection—being, in some weak sense, ‘aware’ of what we’re doing. But of course, we could just as well collapse all of the levels into a single algorithm governing our behavior, which we could accordingly not reflect.

    Dennett (I think) has discussed the behavior of the sphex wasp as an emblematic case of rule-governedness: it moves a cricket closer to its burrow, then checks the burrow; when one in the intermediate time moves away the cricket, it repeats the same behavior—draws it closer, checks, and so on. A human being would eventually (well, rather quickly for most humans) tire of the act, ‘getting wise’ to the changing circumstances, and adapting its behavior accordingly; but for the wasp, the repetitiveness (and hence, pointlessness) of its behavior is invisible, in some sense.

  7. 7. Callan S. says:

    So maybe there are degrees of reflection.

  8. 8. Hunt says:

    The be honest, the more I learn about the limitations of human cognition, the more I think no more than a few degrees of self-reflection might be necessary to match anything humans can do. I’m more and more convinced that AIs will eventually be made to meet or exceed our level of awareness.

  9. 9. Callan S. says:

    Though such self reflection will be engineered by/a product of human self reflection. And even self reflection built by that self reflection will be a derivitive of human self reflection. They can’t choose their parents…

  10. 10. Hunt says:

    I would say that won’t matter much. For instance, pick a human, any human, he or she will exhibit ‘change blindness’. But change blindness is one of those things that could easily be engineered out of an AI. That alone would give it perceptual abilities beyond any human.

  11. 11. Callan S. says:

    But change blindness is one of those things that could easily be engineered out of an AI.

    By whom? They’d only engineer out the change blindness they are aware of, of course.

    Whatever our blindness of ours we are blind to, we’d pass on.

  12. 12. Hunt says:

    Donald Rumsfeld won’t be leading the effort, so we can ignore the unknown unknowns. 🙂

  13. 13. Callan S. says:

    We always do!

    Also, in terms of robots being smart enough to do labor :

  14. 14. Hunt says:

    Defn. robot: a machine that can do the work of a person and that works automatically or is controlled by a computer

    We’ve already had washing machine robots for almost a century.

  15. 15. Callan S. says:

    I’m not sure what you mean, Hunt? In terms of self driving cars, we’ve never had them before. Not the things that will come after self driving cars, if nothing changes.

  16. 16. Callan S. says:

    * ‘Nor the things that will come after…’

  17. 17. Hunt says:

    My point was a little incoherent because I thought the cartoon referenced “robots”, but the point still stands, machines have been doing useful, autonomous work for us for centuries. There is almost no definition of “robot” that a washing machine doesn’t meet, unless you require anthropomorphism. I’m not sure you’d want to call a washing machine “intelligent” though. A car that drives itself is in a different class than just an advanced car. Why? Because it acts autonomously. I would call a car that drives itself a robot, but a car that you have to drive is just a tool used to transport yourself.

  18. 18. moppingrobot says:

    I am mentioning about the mopping robot that able to clean the home floor effectively. One can use it regularly with setting pre-schedule. This item comes with all high-quality parts and materials. Most of the mopping robots have the automatic workability. You can try it once.

Leave a Reply