Archive for June, 2011

Picture: Brian Christian In The Most Human Human: A Defence of Humanity in the Age of the Computer, Brian Christian gives an entertaining and generally sensible account of his campaign to win the ‘most human human’ award at the Loebner prize.

As you probably know, the Loebner prize is an annual staging of the Turing test: judges conduct online conversations with real humans and with chatbots and try to determine which is which.  The main point of the contest is for the chatbot creators to deceive as many judges as they can, but in order to encourage the human subjects there’s also an award for the participant considered to be least like a computer.  Christian set himself to win this prize, and intersperses the story of his effort with reflections on the background and significance of the enterprise.

He tries to ramp up the drama a bit by pointing out that in 2008 a chatbot came close to succeeding, persuading 3 of the judges that it was human: four would have got it the first-ever win. This year, therefore, he suggests, might in some sense prove to be humanity’s last stand.  I think it’s true that Loebner entrants have improved over the years. In the early days none of the chatbots was at all convincing and few of their efforts at conversation rose above the ludicrous – in fact, many early transcripts are worth reading for the surreal humour they inadvertently generated. Nowadays, if they’re not put under particular pressure the leading chatbots can produce a lengthy stream of fairly reasonable responses, mixed with occasional touches of genius and – still – the inevitable periodic lapses into the inapposite or plain incoherent. But I don’t think the Turing Test is seriously about to fall. The variation in success at the Loebner prize has something to do with the quality of the bots, but more with the variability of the judges. I don’t think it’s made very clear to the judges what their task is, and they seem to divide into hawks and doves: some appear to feel they should be sporting and play along with the bots, while others approach the conversations inquisitorially and do their best to catch their  ‘opponents’ out. The former approach sometimes lets the bots look good; the latter, I’m afraid, never really fails to unmask them.  I suspect that in 2008 there just happened to be a lot of judges who were ready to give the bots a fighting chance.

How do you demonstrate your humanity?  In the past some human contenders have tried to signal their authenticity with displays of emotional affect, but I should have thought that if that approach was susceptible to fakery. However, compared to any human being, the bots have little information and no understanding. They can therefore be thrown off either by allusions to matters of fact that a human participant would certainly know (the details of the hotel breakfast; a topical story from the same day’s news) but would not be in the bots’ databases; or they can be stumped by questions that require genuine comprehension (prhps w cld sk qstns wth n vwls nd sk fr rpls n the sm frmt?). In one way or another they rely on scripts, so as Christian deduces, it is basically a matter of breaking the pattern.

I’m not altogether sure how it comes about that humans can break pattern so easily while remaining confident that another human will readily catch their drift. Sometimes it’s a matter of human thought operating on more than one level, so that where two topics intersect we can leap from one to the other (a feature it might be worth trying to build into bots to some extent). In the case of the Loebner, though, hawkish judges are likely to make a point of leaving no thread of relevance whatever between each input, confident that any human being will be able to pick up a new narrative instantly from a standing start. I think it has something to do with the un-named human faculty that allows us to deal with pragmatics in language, evade the frame problem, and effortlessly catch and attribute meanings (at least, I think all those things rely at least partly on a common underlying faculty,or perhaps on an unidentified common property of mental processes).

Christian quotes an example of a bot which appeared to be particularly impoverished, having only one script: if it could persuade the judge to talk about Bill Clinton it looked very good, but s soon as the subject was changed it was dead meat. The best bots, like Rollo Carpenter’s Jabberwacky,  seem to have a very large repertoire of examples of real human responses to the kind of thing real humans say in a conversation with chatbots (helpfully real humans are not generally all that original in these circumstances, so it’s almost possible to treat chatbot conversation as a large but limited domain in itself). They often seem to make sense, but still fall down on consistency, being liable to give random and conflicting answers, for example about their own supposed gender and marital status.

Reflecting on this, Christian notes that a great deal of ordinary everyday human interaction effectively follows scripts, too. In the routine of small talk, shopping, or ordering food, there tend to be ritual formulas to be followed and only a tiny exchange of real information. Where communication is poor, for example where there is no shared language, it’s still often relatively easy to get the tiny nuggets of actual information across and complete the transaction successfully (though not always: Christian tells against himself the story of a small disaster he suffered in Paris through assuming, by analogy with Spanish, that ‘station est’ must mean ‘this station’).

Doesn’t this show, Christian asks, that half the time we’re wasting time? Wouldn’t it be better if we dropped the stereotyped phatic exchanges and cut to the chase? In speed-dating it is apparently necessary to have rules forbidding participants to ask certain standard questions (Where do you live? What do you do for a living?) which eat up scarce time without people getting any real feel for each other’s personality. Wouldn’t it be more rewarding if we applied similar rules to all our conversations?

This,  Christian thinks, might be the gift which artificial intelligence ultimately bestows on us. Unlike some others, he’s not worried that dialogue with computers will make us start to think of ourselves as machines – the difference, he thinks, is too obvious. On the contrary, the experience of dealing with robots will bring home to us for the first time how much of our own behaviour is needlessly stereotyped and robotic and inspire us to become more original – more human – than we ever were before.

In some ways this makes sense. As a similar point it has sometimes occurred to me in the past to wonder whether our time is  best spent by so many of us watching the same films and reading the same books. Too often I have had conversations sharing identical memories of the same television programme and quoting the same passages from reviews in the same newspapers. Mightn’t it be more productive, mightn’t we cover more ground, if we all had different experiences of different things?

Maybe, but it would be hard work. If Christian had his way, we should no longer be saying things like this.

- Hi, how are you?

- Fine, thanks, you too?

- Yeah, not so bad. I see the rain’s stopped.

- Mm, hope it stays fine for the weekend.

- Oh, yeah, the weekend.

- Well, it’s Friday tomorrow – at last!

- That’s right. One more day to go.

Instead I suppose our conversations would be earnest, informative, intense, and personal.

- Tell me something important.

- Sometimes I’m secretly glad my father died young rather than living to see his hopes crushed.

- Mithridates, he died old.

- Ugh: Housman’s is the only poetry which is necessarily improved by parody.

- I’ve never respected your taste or your intellect, but I’ve still always felt protective towards you.

- There’s a useful sociological framework theory of amorance I can summarise if it would help?

- What are you really thinking?

Perhaps the second kind of exchange is more interesting than the first, but all day every day it would be tough to sustain and wearing to endure. It seems to me there’s something peculiarly Western about the idea that even our small talk should be made to yield a profit. I believe historically most civilisations have been inclined to believe that the world was gradually deteriorating from a previous Golden Age, and that keeping things the way they had been in past was the most anyone could generally aspire to. Since the Renaissance, perhaps, we have become more accustomed to the idea of improvement and tend to look restlessly for progress: a culture constantly gearing up and apparently preparing itself for some colossal future undertaking the nature of which remains obscure. This driven quality clearly yields its benefits in prosperity for us, but when it gets down to the personal level it has its dangers, at worst it may promote slave-like levels of work, degrade friendship into networking and reinterpret leisure as mere recuperation. I’m not sure I want to see self-help books about leveraging those moments of idle chat. (In fairness, that’s not what Christian has in mind either.)

Christian may be right, in any case, that human interaction with machines will tend to emphasise the differences more than the similarities. I won’t reveal whether he ultimately succeeded in his quest to be Most Human Human (or perhaps was pipped at the post when a rival and his judge stumbled on a common and all-too-human sporting interest?), but I can tell you that this was not on any view humanity’s last stand:  the bots were routed.

Picture: Chomsky There has been quite a furore over some remarks made by Chomsky at the recent MIT symposium Brains, Minds and Machines; Chomsky apparently criticised researchers in machine learning who use purely statistical methods to produce simulations without addressing the meaning of the behaviour being simulated – there’s a brief account of the discussion in Technology Review. The wider debate was launched when Google’s Director of Research, Peter Norvig issued a response to Chomsky. So far as I can tell the point being made by Chomsky was that applications like Google Translate, which uses a huge corpus of parallel texts to produce equivalents in different languages, do not tell us anything about the way human beings use language.

My first reaction was surprise that anyone would disagree with that. The weakness of translation programmes is that they don’t deal with meanings, whereas human language is all about meanings. Google is more successful than earlier attempts mainly because it uses a much larger database. We always knew this was possible in principle, and that there is no theoretical upper limit to how good these translations can get (in theory we can have a database of everything ever written); it’s just that we (or I, anyway) underestimated how much would be practical and how soon.

As an analogy, we could say it’s like collecting exhaustive data about the movement of heavenly bodies. With big enough tables we could make excellent predictions about eclipses and other developments without it ever dawning on us that the whole thing was in fact about gravity. Norvig reads into Chomsky’s remarks a reassertion of views expressed earlier that ‘statistical’ research methods are no more than ‘butterfly collecting’.  It would be a little odd to dismiss the collection of data as not really proper science – but it’s true that we celebrate Newton who set out the law of gravity, and not poor Flamsteed and the other astronomers who supplied the data.

But hang on there. A huge corpus of parallel texts doesn’t tell us anything about the way human beings use language? That can’t be right, can it? Surely it tells us a lot about the way certain words are used, and their interpretation in other languages? Norvig makes the point that all interpretation is probabilistic – we pick out what people most likely meant. So probabilistic models may well enlighten us about how human beings process language.  He goes on to point out that there may be several different ways of capturing essentially the same theory, some more useful for some purposes than others: why rule out statistical descriptions?

Hm, I don’t know.  Certainly there are times when we have to choose between rival interpretations, but does that make interpretation essentially probabilistic? There are times when we have to choose between two alternative  jigsaw pieces, but I wouldn’t say that solving a jigsaw was a statistical matter.  Even if we concede that human interpretation of language is probabilistic that glosses over the substantial point that in human beings the probabilistic judgements are about likely meanings, not about likely equivalent sentences. Human language is animated by intentionality, by meaning things: and unfortunately at the moment we have little idea of how intentionality works.

But then, if we don’t know how meaning works, isn’t it possible that it might turn out to be statistical (or probabilistic, or anyway some sort of numerical)?  I don’t think it is possible to rule this out. If we had some sort of neural network which was able to do translations, or perhaps talk to us intelligently, we might be tempted to conclude that its inner weightings effectively encoded a set of probabilities about sentences and/or a set of meanings – mightn’t we? And who knows, by examining those weightings might we not finally glean some useful insight into how leaden numbers get transmuted into the gold of semantics?

As I say, I don’t think it’s possible to rule that out – but we know how Google Translate works and it isn’t really anything like that. Norvig wouldn’t claim -would he? – that Google Translate or other programs written on a similar basis display any actual understanding of the sentences they process.  So it still seems to me that, whether or not his wider attitudes are well-founded, Chomsky was essentially right.

To me the discussion has several curious echoes of the Chinese Room. I wonder whether Norvig would say that the man in the room understands Chinese?