The Turing Test – is it all over?

Turing surprisedSo, was the brouhaha over the Turing Test  justified? It was widely reported last week that the test had been passed for the first time by a chatbot named ‘Eugene Goostman’.  I think the name itself is a little joke: it sort of means ‘well-made ghost man’.

This particular version of the test was not the regular Loebner which we have discussed before (Hugh Loebner must be grinding his teeth in frustration at the apprent ease with which Warwick garnered international media attention), but a session at the Royal Society apparently organised by Kevin Warwick.  The bar was set unusually low in this case: to succeed the chatbot only had to convince 30% of the judges that it was human. This was based on the key sentence in the paper by Turing which started the whole thing:

I believe that in about fifty years’ time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.

Fair enough, perhaps, but less impressive than the 50% demanded by other versions; and if 30% is the benchmark, this wasn’t actually the first ever pass, because other chatbots like Cleverbot have scored higher in the past. Goostman doesn’t seem to be all that special: in an “interview” on BBC radio, with only the softest of questioning, it used one response twice over, word for word.

The softness of the questioning does seem to be a crucial variable in Turing tests. If the judges stick to standard small talk and allow the chatbot to take the initiative, quite a reasonable dialogue may result: if the judges are tough it is easy to force the success rate down to zero by various tricks and traps.  Iph u spel orl ur werds rong, fr egsampl, uh bott kanot kope, but a human generally manages fine.

Perhaps that wouldn’t have worked for Goostman, though because he was presented as relatively ignorant young boy whose first language was not English, giving him some excuse for not understanding things. This stratagem attracted some criticism, but really it is of a piece with chatbot strategy in general; faking and gaming is what it’s all about. N0-one remotely supposes that Goostman, or Cleverbot, or any of the others, has actually attained consciousness, or is doing anything that could properly be called thinking. Many years ago I believe there were serious efforts to write programs that to some degree imitated the probable mental processes of a human being: they identified a topic, accessed a database of information about it, retained a set of ‘attitudes’ towards things and tried to construct utterances that made sense in relation to them.  It is a weakness of the Turing test that it does not reward that kind of effort; a robot with poor grammar and general knowledge might be readily detectable even though it gave signs of some nascent understanding, while a bot which generates smooth responses without any attempt at understanding has a much better chance of passing.

So perhaps the curtain should be drawn down on the test; not because it has been passed, but because it’s no use.

17 thoughts on “The Turing Test – is it all over?

  1. I’ve suspected that it’s of no use for quite some time. A few years ago I was listening to a weather radio that was quite obviously a synthesized human voice “reading” a weather feed. I was talking to my brother at the time, and it became clear that he was convinced it was actually a real voice, while to me it was obviously fake. Not exactly a Turing technically or in spirit, but it got me thinking how little humans really need to go on to be satisfied they are interacting with another person. A month or two ago I was listening to a humorous recording of a phone solicitation (won’t link it because it was quite vulgar) on youtube. I’m about 80 percent sure that the phone solicitor was actually computer generated, with voice recognition and response, but that wasn’t what the recording was about at all. I think we are quickly approaching the point where we will routinely interact with agents of unknown conscious status, and gradually we will grow to accept the ambiguity and not even care. A hundred years ago it would probably have seemed odd to have a person’s image talking to you on a screen in your home. Today we realize the newsman is not “talking TO” us at all. A similar kind of shift will happen with interactive agents.

  2. a robot with poor grammar and general knowledge might be readily detectable even though it gave signs of some nascent understanding, while a bot which generates smooth responses without any attempt at understanding has a much better chance of passing.

    Sounds like a job interview!

    One of the things that strikes me is how the bot cannot say no. I mean you could be ‘tough’ on it by swearing at it and calling it all sorts of names and it would not disconnect from the chat. Where is understanding if it cannot reject?

    Sometimes I think it’s a lack of practical context that’s the problem – say we had a bot who has a virtual maze (ala pac man) and perhaps a synaptic set up that reinforces behaviour towards getting some dot at the end. Now say the initial chat is somewhat like a trainer with an animal – the chatter can give text and also supply a positive feedback). Eventually the trainer could talk the pac man through the maze to the target.

    Okay, that’s just giving directions. But it might be the link to actually having a conversation about where the dot is, or if we complicate the environment and needed dots, where the bread is, where the water is, how do we get a job to get money to buy the bread, etc.

    Iph u spel orl ur werds rong, fr egsampl, uh bott kanot kope,

    The interesting thing is how much effort it takes to spell that badly.

    But also it’s not really fair – we translate the symbols into an audio simulation. ‘orl’, as a series of pixels, is not very similar to ‘all’, but in terms of sound waves it might have a very similar structure. Just because we don’t write down wobbly soundwaves for our communication doesn’t mean a bot is somehow missing something for not getting our symbol to audio simulation translation. It’s just one translation. If I wrote out some communication in morse code in the chat and it’s actually another human in the chat but they don’t get it, does that make him a bot? It’s valid communication, after all. You can be ‘tough’ on humans as well.

    I don’t think we should focus too much on translation issues as the main disproval method.

  3. The thing is that all these tests take place in particular lab conditions, very different from ordinary life.

    Regardless the mechanical problems, just set the bot free in society, or a family, and see how long it takes for someone to spot it out. I bet that not more than one day (or one minute).

    We are tuned to detect very subtle changes in standard behaviour. Think how many times you have met people that have made you feel that there was something wrong about them (psychologically), not really easy to tell what, and then you found out they were suffering some psychological condition.

    To me, this is the test: if the bot can go through a whole week in ordinary social conditions, and nobody suspects and reports there is something really weird about it, passed!

    I think that the weak point of the test will not ) be missing bots, but that probably many humans will be labeled as bots.

    After all, the human body has been wired as a bot by evolution (with a mission uploaded onto its firmware), and as such will respond in many cases.

  4. “One of the things that strikes me is how the bot cannot say no.”

    The answer is very simple : ask the programmer why. He’s the one communicating with the user, not the computer. When your mother rings you on the telephone do you assume you’re speaking with the telephonic computer systems that deliver the message or your mother ? The source of meaning for the physical deliverable of sound waves is your mother. Likewise the source of meaning for a ‘chatbot’ is the programmer who produced it. The programmer determines what conversational state to arbitrarily map to whatever digital state the computer is in. There is no necessary link between conversational state and digital state and the only source of information as that link is the programmer, who in in principal and in fact the only person you’re talking to.

    Chatbots exist no more than chatphones – ignore them. A fantasy.

  5. The Turing test (TT) is about the question ‘can machines think?’ One can look at that approach to artificial intelligence (AI) by showing that it addresses the possibility for artificial agents (AAs) to generate meaningful information (meanings) as we humans do. The initial question about thinking machines can then be reformulated into ‘can AAs generate meanings like humans do?’
    The TT is then about generation of human-like meanings. This brings us to use an existing model: the meaning generator system (MGS) where a system submitted to an internal constraint generates a meaning in order to satisfy the constraint. The system approach of the MGS allows comparing meaning generations in animals, humans, and AAs. The comparison shows that in order to have AAs capable of generating human-like meanings, we need the AAs to carry human constraints. And transferring human constraints to AAs raises concerns coming from the unknown natures of life and human mind which are at the root of human constraints.
    Consequently, today AAs cannot think like we humans think. They cannot pass the TT. Strong AI is not possible today. Only weak AI is possible. Imitation performances can be almost perfect and make us believe that AAs generates human-like meanings, but there is no such meaning generation as AAs do not carry human constraints.
    Another consequence is that it is not possible today to design living machines. Artificial agents cannot generate meanings like animals do because we do not know the nature of life and cannot transfer animal constraints to AAs. Strong AL is not possible today. At this level of analysis the blocking points for strong AI and strong AL come more from our lack of understanding about life and human mind than from computers performances. We need progress in these understandings to design AAs capable of behaving better like animals and thinking like humans.
    Such approach to TT via meaning generation can also be used for the Chinese Room Argument and for the Symbol Grounding Problem.
    If you are interested, a more detailed presentation is available at
    http://philpapers.org/rec/MENTTC-2 (‘Turing Test, Chinese Room Argument, Symbol Grounding Problem. Meanings in Artificial Agents’).

  6. John Davey,

    Perhaps as much I could consider that I am not actually speaking with my mother on the phone, but my grandmother? That she is the only person I’m talking to when I speak to my mother? How many umbilical like wires or wire like umbilical cords should we ignore?

    Christophe Menant,

    I suppose it raises a question of whether Turing was a bit humanocentric. As if an environmentally adaptive and profiting system would somehow need to be recognisable as human in order to adapt and profit in an environmental niche. As if the only spectrum of intelligence is the human spectrums, when it’s plausible it’s just one spectrum amongst a wider range. Some of them, like the ultraviolet, perhaps unrecognisable.

  7. Christophe,

    Maybe the first thing is to identify and isolate those factors that makes humans human. Maybe some features can be shared with AA systems, sufficiently developed, but others can’t be transfered.

    To me, meaning (a concept not easy to define, try to find a definition for meaning that is not tautological) is rooted in a conscious experience. No consciousness no meaning. So, before the nature of consciousness is understood, meaning can’t be properly addressed by technological domain.

    Humans can operate on a dual basis, like conscious entities or like AA bots. Consider any process that requires symbolic inputs and can be automated: driving, reading a musical script, video games playing. All these activities, with enough practice can be carried out in both ways, consciously understanding the meaning of the sympols or automatically reacting to them.

    What happens in the first case? that meaning requires conscious attention, the latter is just pattern recognition, like any OCR application can do, and no meaning is involved. But mind you !! and this is where I wanted to come, the observable behaviour can be identical, and for this reason most tests will fail.

  8. Vincente,
    Starting with meaning as rooted in human conscious experience makes indeed the subject difficult.
    I feel it is easier to adopt an evolutionary approach and begin by trying to model meaning generation in animals. All animals have to satisfy a ‘stay alive’ constraint. For a mouse, the sight of a cat related to the ‘stay alive’ constraint generates a meaning ‘danger’. And that meaning triggers an action like hide or run away. The animal case allows the build-up of a simple model of meaning generation by a system submitted to an internal constraint, the Meaning Generator System (see short introduction on that subject at http://philpapers.org/archive/MENITA-2.pdf)
    Regarding factors that make humans human, we can look at using the MGS with specific and generic human constraints like look for happiness, avoid anxiety, valorize ego (See book chapter at http://philpapers.org/archive/MENCOI.pdf This subject deserves more work).
    Regarding observable behaviors, you are right. We have no reliable test to discriminate the ones coming from humans or coming from AAs.

  9. Wooden deliberate concentrated activity, or automated practised activity like driving etc., both still require conscious initiation.Autonomic activity like circulation, breathing, digestion, etc. generally do not. This is the dual interaction which humans have that bots at present do not. When the various brain projects realise the brain exists because of the body they will progress as will AAs and not before.

  10. Christophe,

    I find it difficult to understand that approach. Taking advantage of your example, the sight of a cat induces an emotion in the mouse. Emotions are on the opposite end to meaning. In fact, emotions usually have a deep subconscious component. Reactions to fear take place long time before the stimulus reaches the conscious level, i.e. the mouse jumps before consciously realising that there was a cat, for which no meaning can never stem from these processes. Even more, these situation are real, with no symbolic content. The cat jumping is a real event, not a warning sign of jumping cats danger. If you see a piano falling on you from a third floor, and you jump aside to skip it, you didn’t process any meaningful symbols. These reactions can be learned, causing big trouble to humans, that many times suffer unpleasant emotional reactions as an important source of psychologic discomfort. One of the concerns of psychiatry is how to add some MEANING (rationalise)to these emotions, in order to control them. The evidence that these reactions are supported by very basic neuronal structures, set up by evolution long time before the cognitive areas (required for meaning) appeared, is that they can be inherited, we humans feel an innate fear for spiders and snakes, for example.

    In summary, instincts and meaning don’t mix well.

  11. Richard,

    If they require conscious initiation, why do we say they are automated. When you walk, do you consciously take each step? If you are driving and you see a red light, how many time you start breaking without thinking about it?

    The point is that these activities can be carried out both, consciously or not. Digestion or blood filtering by the kidneys can never be consciously controlled, to begin with because the computational capabilities required to consciously control these physiological processes is out range.

    Now I am wondering why we kept some conscious control on breathing, maybe be able to take a deep breath and smell the breeze to see who and what was around?

  12. Vicente,
    Regarding emotions, I feel they can be considered as meanings. Anger, happiness or sadness can be looked at as generated by connections between internal or external events and constraints.
    The sight of a piano falling from the third floor close to me generates meanings ‘danger’ and ‘fear’ as it gets connected to my ‘stay alive’ constraint.
    The sight of a piano falling from the third floor at distance generates a meaning ‘interesting experience’ as it gets connected to my ‘look for happiness’ and ‘valorize ego’ constraints (happy to be safe and having a story to tell).

  13. Cristophe, from your comments and papers, I see that the understanding, applicability and scope of the concept of meaning that you consider, is much broader than mine. The gap seems to wide to allow a meaningful discussion.

  14. Vicente, Nature/nurture habitual automated somatic activity, like walking or driving does require conscious initiation, although it may not be memerable you never think you are just sitting in a chair, but find yourself somehow driving or walking. If you do, big problems for you I think. Digestion or blood filtering are part of the autonomic nervous system and not the somatic nervous system, but do rely on the somatic consciously initiated nervous system to supply sustenance and remove waste. When you say ‘why we kept some conscious control on breathing’ I think it was an evolved advantage especially for swimming. Whilst we can consciously somaticaly affecct our breathing, it is still fortunately an autonomicly controlled activity allowing isolation of the conscious initiated somatic activity when sleeping.

  15. Richard, yes diving, holding your breath in a smoky or dusty environment… even to relax… many advantages.

    It is necessary to distinguish between the action initiation ‘per se’, e.g. I decide to move over there now, which is usually conscious (or not), and the action implementation, stand up and walk, which can be initiated and carried out unconsciouly.

    Furthermore, what is meant by conscious initiation is a very thorny question. To be fully conscious you need to be aware of all the real motivations that triggered the action, and this, unfortunately, seldom happens. There are many unconscious inputs driving what you call “conscious iniation”.

    Also, CNS and sympathetic/parasympathetic interactions are bidirectional, through a complex hormonal chain modulated by conscious events…

    It’s not that simple, you stand up and walk for some ridiculous subliminal stimulus on tv, and then you go to the toilette, three times in ten minutes, because you have to sit and exam the next day….

  16. Vicente, Conscious automatic somatic is a different part of the nervous spectrum compared to the autonomic which is unconscious. Of course they do affect and effect each other. This is the physical duality interaction which Descartes struggled with.
    I agree there are many variations of activities that occur during consciousness which of course varies as well but to k.i.s (keep it simple), any somatic activity is initiated by a conscious aware decision, no matter how slight, or why it was caused.
    Autonomic is not the same as repetitively learnt nature/nurture automatic conscious somatic activity, which is how the somatic nervous system can become isolated when sleeping, allowing the autonomic nervous system to function undisturbed.
    When talking I may move my hands or body but am consciously aware, no matter how slight, that this is occcuring even to pause whilst talking to eat or drink all of which requires conscious aware initiation again no matter how slight.
    The body is the driver of the driver which is the mind. When conscious somatic initiation occurs, it may not be memorable for various reasons.

  17. Callan

    “John Davey,

    Perhaps as much I could consider that I am not actually speaking with my mother on the phone, but my grandmother? That she is the only person I’m talking to when I speak to my mother? How many umbilical like wires or wire like umbilical cords should we ignore?”

    Unless in your part of the world ventriloquism is the norm, no. Are you seriously suggesting that there is no causal link between the voice of user of a telephone and the sounds it generates on the other end ? If not then what on earth is the point you are making here ?

Leave a Reply

Your email address will not be published. Required fields are marked *