Rules for Robots

axebot Robot behaviour is no longer a purely theoretical problem. Since Asimov came up with the famous Three Laws which provide the framework for his robot stories, a good deal of serious thought has been given to extreme cases where robots might cause massive disasters and to such matters as the ethics of military robots. Now, though, things have moved on to a more mundane level and we need to give thought to more everyday issues. OK, a robot should not harm a human being or through inaction allow a human being to come to harm, but can we also just ask that you stop knocking the coffee over and throwing my drafts away? Dario Amodei, Chris Olah, John Schulman, Jacob Steinhardt, Paul Christiano, and Dan Mane have considered how to devise appropriate rules in this interesting paper.

They reckon things can go wrong in three basic ways. It could be that the robot’s objective was not properly defined in the first place. It could be that the testing of success is not frequent enough, especially if the tests we have devised are complex or expensive. Third, there could be problems due to “insufficient or poorly curated training data or an insufficiently expressive model”. I take it these are meant to be the greatest dangers – the set doesn’t seem to be exhaustive.

The authors illustrate the kind of thing that can go wrong with the example of an office cleaning robot, mentioning five types of case.

  • Avoiding Negative Side Effects: we don’t want the robot to clean quicker by knocking over the vases.
  • Avoiding Reward Hacking: we tell the robot to clean until it can’t see any mess; it closes its eyes.
  • Scalable Oversight: if the robot finds an unrecognised object on the floor it may need to check with a human; we don’t want a robot that comes back every three minutes to ask what it can throw away, but we don’t want one that incinerates our new phone either.
  • Safe Exploration: we’re talking here about robots that learn, but as the authors put it, the robot should experiment with mopping strategies, but not put a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: we want a robot that learned its trade in a factory to be able to move safely and effectively to an office job.How do we ensure that the cleaning robot recognizes, and behaves robustly, when in an environment different from its training environment? For example, heuristics it learned for cleaning factory workfloors may be outright dangerous in an office.

The authors consider a large number of different strategies for mitigating or avoiding each of these types of problem. One particularly interesting one is the idea of an impact regulariser, either pre-defined or learned by the robot. The idea here is that the robot adopts the broad principle of leaving things the way people would wish to find them. In the case of the office this means identifying an ideal state – rubbish and dirt removed, chairs pushed back under desks, desk surfaces clear (vases still upright), and so on. If the robot aims to return things to the ideal state this helps avoid negative side effects of an over-simplified objective or other issues.

There are further problems, though, because if the robot invariably tries to put things back to an ideal starting point it will try to put back changes we actually wanted, clear away papers we wanted left out, and so on. Now in practice and in the case of an office cleaning robot I think we could get round those problems without too much difficulty; we would essentially lower our expectations of the robot and redesign the job in a much more limited and stereotyped way. In particular we would give up the very ambitious goal of making a robot which could switch from one job to another without adjustment and without faltering.

Still it is interesting to see the consequences of the more ambitious approach. The final problem, cutting to the chase, is that in order to tell how humans want their office arranged in every possible set of circumstances, you really cannot do without a human level of understanding. There is an old argument that robots need not resemble humans physically; instead you make your robot to fit the job; a squat circle on wheels if you’re cleaning the floor, a single fixed arm if you want it to build cars. The counter-argument has often been that our world has been shaped to fit human beings, and if we want a general purpose robot it will pay to have it more or less human size and weight, bipedal, with hands, and so on. Perhaps there is a parallel argument to explain why general-purpose robots need human-level cognition; otherwise they won’t function effectively in a world shaped by human activity. The search for artificial general intelligence is not an idle project after all?

Here’s a thought…

Where do thoughts come from? Alva Noë provides a nice commentary here on an interesting paper by Melissa Ellamil et al. The paper reports on research into the origin of spontaneous thoughts.

The research used subjects trained in Mahasi Vipassana mindfulness techniques. They were asked to report the occurrence of thoughts during sessions when they were either left alone or provided with verbal stimuli. As well as reporting the occurrence of a thought, they were asked to categorise it as image, narrative, emotion or bodily sensation (seems a little restrictive to me – I can imagine having two at once or a thought that doesn’t fit any of the categories). At the same time brain activity was measured by fMRI scan.

Overall the study found many regions implicated in the generation of spontaneous thought; the researchers point to the hippocampus as a region of particular interest, but there were plenty of other areas involved. A common view is that when our attention is not actively engaged with tasks or challenges in the external world the brain operates the Default Mode Network (DMN); a set of neuronal areas which appear to produce detached thought (we touched on this a while ago); the new research complicates this picture somewhat or at least suggests that the DMN is not the unique source of spontaneous thoughts. Even when we’re disengaged from real events we may be engaged with the outside world via memory or in other ways.

Noë’s short commentary rightly points to the problem involved in using specially trained subjects. Normal subjects find it difficult to report their thoughts accurately; the Vipassana techniques provide practice in being aware of what’s going on in the mind, and this is meant to enhance the accuracy of the results. However, as Noë says, there’s no objective way to be sure that these reports are really more accurate. The trained subjects feel more confidence in their reports, but there’s no way to confirm that the confidence is justified. In fact we could go further and suggest that the special training they have undertaken may even make their experience particularly unrepresentative of most minds; it might be systematically changing their experience. These problems echo the methodological ones faced by early psychologists such as Wundt and Titchener with trained subjects. I suppose Ellamil et al might retort that mindfulness is unlikely to have changed the fundamental neural architecture of the brain and that their choice of subject most likely just provided greater consistency.

Where do ‘spontaneous’ thoughts come from? First we should be clear what we mean by a spontaneous thought. There are several kinds of thought we would probably want to exclude. Sometimes our thoughts are consciously directed; if for example we have set ourselves to solve a problem we may choose to follow a particular strategy or procedure. There are lots of different ways to do this, which I won’t attempt to explore in detail: we might hold different aspects of the problem in mind in sequence; if we’re making a plan we might work through imagined events; or we might even follow a formal procedure of some kind. We could argue that even in these cases what we usually control is the focus of attention, rather than the actual generation of thoughts, but it seems clear enough that this kind of thinking is not ‘spontaneous’ in the expected sense. It is interesting to note in passing that this ability to control our own thoughts implies an ability to divide our minds into controller and executor, or at least to quickly alternate those roles.

Also to be excluded are thoughts provoked directly by outside events. A match is struck in a dark theatre; everyone’s eyes saccade involuntarily to the point of light. Less automatically a whole variety of events can take hold of our attention and send our thoughts in a new direction. As well as purely external events, the sources in such cases might include interventions from non-mental parts of our own bodies; a pain in the foot, an empty stomach.

Third, we should exclude thoughts that are part of a coherent ongoing chain of conscious cogitation. These ‘normal’ thoughts are not being directed like our problem-solving efforts, but they follow a thread of relevance; by some connection one follows on from the next.

What we’re after then is thoughts that appear unbidden, unprompted, and with no perceivable connection with the thoughts that recently preceded them. Where do they come from? It could be that mere random neuronal noise sometimes generates new thoughts, but it seems unlikely to be a major contributor to me: such thoughts would be likely to resemble random nonsense and most of our spontaneous thought seem to make a little more sense than that.

We noticed above that when directing our thoughts we seem to be able to split ourselves into controller and controlled. As well as passing control up to a super-controller we sometimes pass it down, for example to the part of our mind that gets on with the details of driving along a route while the surface of our mind us engaged with other things. Clearly some part of our mind goes on thinking about which turnings to take; is it possible that one or more parts of our mind similarly goes on thinking about other topics but then at some trigger moment inserts a significant thought back into the main conscious stream? A ‘silent’ thinking part of us like this might be a permanent feature, a regular sub- or unconscious mind; or it might be that we occasionally drop threads of thought that descend out of the light of attention for a while but continue unheard before popping back up and terminating. We might perhaps have several such threads ruminating away in the background; ordinary conscious thought often seems rather multi-threaded. Perhaps we keep dreaming while awake and just don’t know it?

There’s a basic problem here in that our knowledge of these processes, and hence all our reports, rely on memory. We cannot report instantaneously; if we think a thought was spontaneous it’s because we don’t remember any relevant antecedents; but how can we exclude the possibility that we merely forgot them? I think this problem radically undermines our certainty about spontaneous thoughts. Things get worse when we remember the possibility that instead of two separate thought processes, we have one that alternates roles. Maybe when driving we do give conscious attention to all our decisions; but our mind switches back and forth between that and other matters that are more memorable; after the journey we find we have instantly forgotten all the boring stuff about navigating the route and are surprised that we seem to have done it thoughtlessly. Why should it not be the same with other thoughts? Perhaps we have a nagging worry about X which we keep spending a few moments’ thought on between episodes of more structured and memorable thought about something else; then everything but our final alarming conclusion about X gets forgotten and the conclusion seems to have popped out of nowhere.

We can’t, in short, be sure that we ever have any spontaneous thoughts: moreover, we can’t be sure that there are any subconscious thoughts. We can never tell the difference, from the inside, between a thought presented by our subconscious, and one we worked up entirely in intermittent and instantly-forgotten conscious mode. Perhaps whole areas of our thought never get connected to memory at all.

That does suggest that using fMRI was a good idea; if the problem is insoluble in first-person terms maybe we have to address it on a third-person basis. It’s likely that we might pick up some neuronal indications of switching if thought really alternated the way I’ve suggested. Likely but not guaranteed; after all a novel manages to switch back and forth between topics and points of view without moving to different pages. One thing is definitely clear; when Noë pointed out that this is more difficult than it may appear he was absolutely right.

Flatlanders

Wrong again: just last week I was saying that Roger Penrose’s arguments seemed to have drifted off the radar a bit. Immediately, along comes this terrific post from Scott Aaronson about a discussion with Penrose.

In fact it’s not entirely about Penrose; Aaronson’s main aim was to present an interesting theory of his own as to why a computer can’t be conscious, which relies on non-copyability. He begins by suggesting that the onus is on those who think a computer can’t be conscious to show exactly why. He congratulates Penrose on doing this properly, in contrast to say, John Searle who merely offers hand-wavy stuff about unknown biological properties. I’m not really sure that Searle’s honest confession of ignorance isn’t better than Penrose’s implausible speculations about unknown quantum mechanics, but we’ll let that pass.

Aaronson rests his own case not on subjectivity and qualia but on identity. He mentions several examples where the limitless copyability of a program seems at odds with the strong sense of a unique identity we have of ourselves – including Star Trek style teleportation and the fact that a program exists in some Platonic sense forever, whereas we only have one particular existence. He notes that at the moment one of the main differences between brain and computer is our ability to download, amend and/or re-run programs exactly; we can’t do that at all with the brain. He therefore looks for reasons why brain states might be uncopyable. The question is, how much detail do we need before making a ‘good enough’ copy? If it turns out that we have to go down to the quantum level we run into the ‘no-cloning’ theorem; the price of transferring the quantum state of your brain is the destruction of the original. Aaronson makes a good case for the resulting view of our probably uniqueness being an intuitively comfortable one, in tune with our intuitions about our own nature. It also offers incidentally a sort of reconciliation between the Everett many-worlds view and the Copenhagen interpretation of quantum physics: from a God’s eye point of view we can see the world as branching, while from the point of view of any conscious entity (did I just accidentally call God unconscious?) the relevant measurements are irreversible and unrealised branches can be ‘lopped off’. Aaronson, incidentally, reports amusingly that Penrose absolutely accepts that the Everett view follows from our current understanding of quantum physics; he just regards that as a reductio ad absurdum – ie, the Everett view is so absurd the link proves there must be something wrong with our current understanding of quantum physics.

What about Penrose? According to Aaronson he now prefers to rest his case on evolutionary factors and downplay his logical argument based on Godel. That’s a shame in my view. The argument goes something like this (if I garble it someone will perhaps offer a better version).

First we set up a formal system for ourselves. We can just use the letters of the alphabet, normal numbers, and normal symbols of formal logic, with all the usual rules about how they can be put together. Then we make a list consisting of all the valid statements that can be made in this system. By ‘valid’, we don’t mean they’re true, just that they comply with the rules about how we put characters together (eg, if we use an opening bracket, there must be a closing one in an appropriate place). The list of valid statements will go on forever, of course, but we can put them in alphabetical order and number them. The list obviously includes everything that can be said in the system.

Some of the statements, by pure chance, will be proofs of other statements in the list. Equally, somewhere in our list will be statements that tell us that the list includes no proof of statement x. Somewhere else will be another statement – let’s call this the ‘key statement’ – that says this about itself. Instead of x, the number of that very statement itself appears. So this one says, there is no proof in this system of this statement.

Is the key statement correct – is there no proof of the key statement in the system? Well, we could look through the list, but as we know it goes on indefinitely; so if there really is no proof there we’d simply be looking forever. So we need to take a different tack. Could the key statement be false? Well, if it is false, then what it says is wrong, and there is a proof somewhere in the list. But that can’t be, because if there’s a proof of the key statement anywhere,the key statement must be true! Assuming the key statement is false leads us unavoidably to the conclusion that it is true, in the light of what it actually says. We cannot have a contradiction, so the key statement must be true.

So by looking at what the key statement says, we can establish that it is true; but we also establish that there is no proof of it in the list. If there is no proof in the list, there is no possible proof in our system, because we know that the list contains everything that can be said within our system; there is therefore a true statement in our system that is not provable within it. We have something that cannot be proved in an arbitrary formal system, but which human reasoning can show to be true; ergo, human reasoning is not operating within any such formal system. All computers work in a formal system, so it follows that human reasoning is not computational.

As Aaronson says, this argument was discussed to the point of exhaustion when it first came out, which is probably why Penrose prefers other arguments now. Aaronson rejects it, pointing out that he himself has no magic ability to see “from the outside” whether a given formal system is consistent; why should an AI do any better – he suggests Turing made a similar argument. Penrose apparently responded that this misses the point, which is not about a mystical ability to perceive consistency but the human ability to transcend any given formal system and move up to an expanded one.

I’ll leave that for readers to resolve to their own satisfaction. Let’s go back to Aaronson’s suggestion that the burden of proof lies on those who argue for the non-computability of consciousness. What an odd idea that is!  How would that play  at the Patent Office?

“So this is your consciousness machine, Mr A? It looks like a computer. How does it work?”

“All I’ll tell you is that it is a computer. Then it’s up to you to prove to me that it doesn’t work – otherwise you have to give me rights over consciousness! Bwah ha ha!”

Still, I’ll go along with it. What have I got? To begin with I would timidly offer my own argument that consciousness is really a massive development of recognition, and that recognition itself cannot be algorithmic.

Intuitively it seems clear to me that the recognition of linkages and underlying entities is what powers most of our thought processes. More formally, both of the main methods of reasoning rely on recognition; induction because it relies on recognising a real link (eg a causal link) between thing a and thing b; deduction because it reduces to the recognition of consistent truth values across certain formal transformations. But recognition itself cannot operate according to rules. In a program you just hand the computer the entities to be processed; in real world situations they have to be recognised. But if recognition used rules and rules relied on recognising the entities to which the rules applied, we’d be caught in a vicious circularity. It follows that this kind of recognition cannot be delivered by algorithms.

The more general case rests on, as it were, the non-universality of computation. It’s argued that computation can run any algorithm and deliver, to any required degree of accuracy, any set of physical states of affairs. The problem is that many significant kinds of states of affairs are not describable in purely physical or algorithmic terms. You cannot list the physical states of affairs that correspond to a project, a game, or a misunderstanding. You can fake it by generating only sets of states of affairs that are already known to correspond with examples of these things, but that approach misses the point. Consciousness absolutely depends on intentional states that can’t be properly specified except in intentional terms. That doesn’t contradict physics or even add to it the way new quantum mechanics might; it’s just that the important aspects of reality are not exhausted by physics or by computation.

The thing is, I think long exposure to programmable environments and interesting physical explanations for complex phenomena has turned us all increasingly into flatlanders who miss a dimension; who naturally suppose that one level of explanation is enough, or rather who naturally never even notice the possibility of other levels; but there are more things in heaven and earth than are dreamt of in that philosophy.

No digital consciousness

no botsI liked this account by Bobby Azarian of why digital computation can’t do consciousness. It has several virtues; it’s clear, identifies the right issues and is honest about what we don’t know (rather than passing off the author’s own speculations as the obvious truth or the emerging orthodoxy). Also, remarkably, I almost completely agree with it.

Azarian starts off well by suggesting that lack of intentionality is a key issue. Computers don’t have intentions and don’t deal in meanings, though some put up a good pretence in special conditions.  Azarian takes a Searlian line by relating the lack of intentionality to the maxim that you can’t get meaning-related semantics from mere rule-bound syntax. Shuffling digital data is all computers do, and that can never lead to semantics (or any other form of meaning or intentionality). He cites Searle’s celebrated Chinese Room argument (actually a thought experiment) in which a man given a set of rules that allow him to provide answers to questions in Chinese does not thereby come to understand Chinese. But, the argument goes, if the man, by following rules, cannot gain understanding, then a computer can’t either. Azarian mentions one of the objections Searle himself first named, the ‘systems response’: this says that the man doesn’t understand, but a system composed of him and his apparatus, does. Searle really only offered rhetoric against this objection, and in my view it is essentially correct. The answers the Chinese Room gives are not answers from the man, so why should his lack of understanding show anything?

Still, although I think the Chinese Room fails, I think the conclusion it was meant to establish – no semantics from syntax – turns out to be correct, so I’m still with Azarian. He moves on to make another  Searlian point; simulation is not duplication. Searle pointed out that nobody gets wet from digitally simulated rain, and hence simulating a brain on a computer should not be expected to produce consciousness. Azarian gives some good examples.

The underlying point here, I would say, is that a simulation always seeks to reproduce some properties of the thing simulated, and drops others which are not relevant for the purposes of the simulation. Simulations are selective and ontologically smaller than the thing simulated – which, by the way, is why Nick Bostrom’s idea of indefinitely nested world simulations doesn’t work. The same thing can however be simulated in different ways depending on what the simulation is for. If I get a computer to simulate me doing arithmetic by calculating, then I get the correct result. If it simulates me doing arithmetic by operating a humanoid writing random characters on a board with chalk, it doesn’t – although the latter kind of simulation might be best if I were putting on a play. It follows that Searle isn’t necessarily exactly right, even about the rain. If my rain simulation program turns on sprinklers at the right stage of a dramatic performance, then that kind of simulation will certainly make people wet.

Searle’s real point, of course, is really that the properties a computer has in itself, of running sets of rules, are not the relevant ones for consciousness, and Searle hypothesises that the required properties are biological ones we have yet to identify. This general view, endorsed by Azarian, is roughly correct, I think. But it’s still plausibly deniable. What kind of properties does a conscious mind need? Alright we don’t know, but might not information processing be relevant? It looks to a lot of people as if it might be, in which case that’s what we should need for consciousness in an effective brain simulator. And what properties does a digital computer, in itself have – the property of doing information processing? Booyah! So maybe we even need to look again at whether we can get semantics from syntax. Maybe in some sense semantic operations can underpin processes which transcend mere semantics?

Unless you accept Roger Penrose’s proof that human thinking is not algorithmic (it seems to have drifted off the radar in recent years) this means we’re still really left with a contest of intuitions, at least until we find out for sure what the magic missing ingredient for consciousness is. My intuitions are with Azarian, partly because the history of failure with strong AI looks to me very like a history of running up against the inadequacy of algorithms. But I reckon I can go further and say what the missing element is. The point is that consciousness is not computation, it’s recognition. Humans have taken recognition to a new level where we recognise not just items of food or danger, but general entities, concepts, processes, future contingencies, logical connections, and even philosophical ontologies. The process of moving from recognised entity to recognised entity by recognising the links between them is exactly the process of thought. But recognition, in us, does not work by comparing items with an existing list, as an algorithm might do; it works by throwing a mass of potential patterns at reality and seeing what sticks. Until something works, we can’t tell what are patterns at all; the locks create their own keys.

It follows that consciousness is not essentially computational (I still wonder whether computation might not subserve the process at some level). But now I’m doing what I praised Azarian for avoiding, and presenting my own speculations…