The Turing Test – is it all over?

Turing surprisedSo, was the brouhaha over the Turing Test  justified? It was widely reported last week that the test had been passed for the first time by a chatbot named ‘Eugene Goostman’.  I think the name itself is a little joke: it sort of means ‘well-made ghost man’.

This particular version of the test was not the regular Loebner which we have discussed before (Hugh Loebner must be grinding his teeth in frustration at the apprent ease with which Warwick garnered international media attention), but a session at the Royal Society apparently organised by Kevin Warwick.  The bar was set unusually low in this case: to succeed the chatbot only had to convince 30% of the judges that it was human. This was based on the key sentence in the paper by Turing which started the whole thing:

I believe that in about fifty years’ time it will be possible, to programme computers, with a storage capacity of about 109, to make them play the imitation game so well that an average interrogator will not have more than 70 per cent chance of making the right identification after five minutes of questioning.

Fair enough, perhaps, but less impressive than the 50% demanded by other versions; and if 30% is the benchmark, this wasn’t actually the first ever pass, because other chatbots like Cleverbot have scored higher in the past. Goostman doesn’t seem to be all that special: in an “interview” on BBC radio, with only the softest of questioning, it used one response twice over, word for word.

The softness of the questioning does seem to be a crucial variable in Turing tests. If the judges stick to standard small talk and allow the chatbot to take the initiative, quite a reasonable dialogue may result: if the judges are tough it is easy to force the success rate down to zero by various tricks and traps.  Iph u spel orl ur werds rong, fr egsampl, uh bott kanot kope, but a human generally manages fine.

Perhaps that wouldn’t have worked for Goostman, though because he was presented as relatively ignorant young boy whose first language was not English, giving him some excuse for not understanding things. This stratagem attracted some criticism, but really it is of a piece with chatbot strategy in general; faking and gaming is what it’s all about. N0-one remotely supposes that Goostman, or Cleverbot, or any of the others, has actually attained consciousness, or is doing anything that could properly be called thinking. Many years ago I believe there were serious efforts to write programs that to some degree imitated the probable mental processes of a human being: they identified a topic, accessed a database of information about it, retained a set of ‘attitudes’ towards things and tried to construct utterances that made sense in relation to them.  It is a weakness of the Turing test that it does not reward that kind of effort; a robot with poor grammar and general knowledge might be readily detectable even though it gave signs of some nascent understanding, while a bot which generates smooth responses without any attempt at understanding has a much better chance of passing.

So perhaps the curtain should be drawn down on the test; not because it has been passed, but because it’s no use.

Most Human

Picture: Brian Christian In The Most Human Human: A Defence of Humanity in the Age of the Computer, Brian Christian gives an entertaining and generally sensible account of his campaign to win the ‘most human human’ award at the Loebner prize.

As you probably know, the Loebner prize is an annual staging of the Turing test: judges conduct online conversations with real humans and with chatbots and try to determine which is which.  The main point of the contest is for the chatbot creators to deceive as many judges as they can, but in order to encourage the human subjects there’s also an award for the participant considered to be least like a computer.  Christian set himself to win this prize, and intersperses the story of his effort with reflections on the background and significance of the enterprise.

He tries to ramp up the drama a bit by pointing out that in 2008 a chatbot came close to succeeding, persuading 3 of the judges that it was human: four would have got it the first-ever win. This year, therefore, he suggests, might in some sense prove to be humanity’s last stand.  I think it’s true that Loebner entrants have improved over the years. In the early days none of the chatbots was at all convincing and few of their efforts at conversation rose above the ludicrous – in fact, many early transcripts are worth reading for the surreal humour they inadvertently generated. Nowadays, if they’re not put under particular pressure the leading chatbots can produce a lengthy stream of fairly reasonable responses, mixed with occasional touches of genius and – still – the inevitable periodic lapses into the inapposite or plain incoherent. But I don’t think the Turing Test is seriously about to fall. The variation in success at the Loebner prize has something to do with the quality of the bots, but more with the variability of the judges. I don’t think it’s made very clear to the judges what their task is, and they seem to divide into hawks and doves: some appear to feel they should be sporting and play along with the bots, while others approach the conversations inquisitorially and do their best to catch their  ‘opponents’ out. The former approach sometimes lets the bots look good; the latter, I’m afraid, never really fails to unmask them.  I suspect that in 2008 there just happened to be a lot of judges who were ready to give the bots a fighting chance.

How do you demonstrate your humanity?  In the past some human contenders have tried to signal their authenticity with displays of emotional affect, but I should have thought that if that approach was susceptible to fakery. However, compared to any human being, the bots have little information and no understanding. They can therefore be thrown off either by allusions to matters of fact that a human participant would certainly know (the details of the hotel breakfast; a topical story from the same day’s news) but would not be in the bots’ databases; or they can be stumped by questions that require genuine comprehension (prhps w cld sk qstns wth n vwls nd sk fr rpls n the sm frmt?). In one way or another they rely on scripts, so as Christian deduces, it is basically a matter of breaking the pattern.

I’m not altogether sure how it comes about that humans can break pattern so easily while remaining confident that another human will readily catch their drift. Sometimes it’s a matter of human thought operating on more than one level, so that where two topics intersect we can leap from one to the other (a feature it might be worth trying to build into bots to some extent). In the case of the Loebner, though, hawkish judges are likely to make a point of leaving no thread of relevance whatever between each input, confident that any human being will be able to pick up a new narrative instantly from a standing start. I think it has something to do with the un-named human faculty that allows us to deal with pragmatics in language, evade the frame problem, and effortlessly catch and attribute meanings (at least, I think all those things rely at least partly on a common underlying faculty,or perhaps on an unidentified common property of mental processes).

Christian quotes an example of a bot which appeared to be particularly impoverished, having only one script: if it could persuade the judge to talk about Bill Clinton it looked very good, but s soon as the subject was changed it was dead meat. The best bots, like Rollo Carpenter’s Jabberwacky,  seem to have a very large repertoire of examples of real human responses to the kind of thing real humans say in a conversation with chatbots (helpfully real humans are not generally all that original in these circumstances, so it’s almost possible to treat chatbot conversation as a large but limited domain in itself). They often seem to make sense, but still fall down on consistency, being liable to give random and conflicting answers, for example about their own supposed gender and marital status.

Reflecting on this, Christian notes that a great deal of ordinary everyday human interaction effectively follows scripts, too. In the routine of small talk, shopping, or ordering food, there tend to be ritual formulas to be followed and only a tiny exchange of real information. Where communication is poor, for example where there is no shared language, it’s still often relatively easy to get the tiny nuggets of actual information across and complete the transaction successfully (though not always: Christian tells against himself the story of a small disaster he suffered in Paris through assuming, by analogy with Spanish, that ‘station est’ must mean ‘this station’).

Doesn’t this show, Christian asks, that half the time we’re wasting time? Wouldn’t it be better if we dropped the stereotyped phatic exchanges and cut to the chase? In speed-dating it is apparently necessary to have rules forbidding participants to ask certain standard questions (Where do you live? What do you do for a living?) which eat up scarce time without people getting any real feel for each other’s personality. Wouldn’t it be more rewarding if we applied similar rules to all our conversations?

This,  Christian thinks, might be the gift which artificial intelligence ultimately bestows on us. Unlike some others, he’s not worried that dialogue with computers will make us start to think of ourselves as machines – the difference, he thinks, is too obvious. On the contrary, the experience of dealing with robots will bring home to us for the first time how much of our own behaviour is needlessly stereotyped and robotic and inspire us to become more original – more human – than we ever were before.

In some ways this makes sense. As a similar point it has sometimes occurred to me in the past to wonder whether our time is  best spent by so many of us watching the same films and reading the same books. Too often I have had conversations sharing identical memories of the same television programme and quoting the same passages from reviews in the same newspapers. Mightn’t it be more productive, mightn’t we cover more ground, if we all had different experiences of different things?

Maybe, but it would be hard work. If Christian had his way, we should no longer be saying things like this.

– Hi, how are you?

– Fine, thanks, you too?

– Yeah, not so bad. I see the rain’s stopped.

– Mm, hope it stays fine for the weekend.

– Oh, yeah, the weekend.

– Well, it’s Friday tomorrow – at last!

– That’s right. One more day to go.

Instead I suppose our conversations would be earnest, informative, intense, and personal.

– Tell me something important.

– Sometimes I’m secretly glad my father died young rather than living to see his hopes crushed.

– Mithridates, he died old.

– Ugh: Housman’s is the only poetry which is necessarily improved by parody.

– I’ve never respected your taste or your intellect, but I’ve still always felt protective towards you.

– There’s a useful sociological framework theory of amorance I can summarise if it would help?

– What are you really thinking?

Perhaps the second kind of exchange is more interesting than the first, but all day every day it would be tough to sustain and wearing to endure. It seems to me there’s something peculiarly Western about the idea that even our small talk should be made to yield a profit. I believe historically most civilisations have been inclined to believe that the world was gradually deteriorating from a previous Golden Age, and that keeping things the way they had been in past was the most anyone could generally aspire to. Since the Renaissance, perhaps, we have become more accustomed to the idea of improvement and tend to look restlessly for progress: a culture constantly gearing up and apparently preparing itself for some colossal future undertaking the nature of which remains obscure. This driven quality clearly yields its benefits in prosperity for us, but when it gets down to the personal level it has its dangers, at worst it may promote slave-like levels of work, degrade friendship into networking and reinterpret leisure as mere recuperation. I’m not sure I want to see self-help books about leveraging those moments of idle chat. (In fairness, that’s not what Christian has in mind either.)

Christian may be right, in any case, that human interaction with machines will tend to emphasise the differences more than the similarities. I won’t reveal whether he ultimately succeeded in his quest to be Most Human Human (or perhaps was pipped at the post when a rival and his judge stumbled on a common and all-too-human sporting interest?), but I can tell you that this was not on any view humanity’s last stand:  the bots were routed.