Turing Test tactics

turing22012 was Alan Turing Year, marking the hundredth centenary of his birth.  The British Government, a little late perhaps, announced recently that it would support a Bill giving Turing a posthumous pardon; Gordon Brown, then the Prime Minister, had already issued an official apology in 2009. As you probably know, Turing, who was gay, was threatened with blackmail by one of his lovers (homosexuality being still illegal at the time) and reported the matter to the authorities; he was then tried and convicted and offered a choice of going to jail or taking hormones, effectively a form of oestrogen. He chose the latter, but subsequently died of cyanide poisoning in what is generally believed to have been suicide, leaving by his bed a partly-eaten apple, thought by many to be a poignant allusion to the story of Snow White. In fact it is not clear that the apple had any significance or that his death was actually suicide

The pardon was widely but not universally welcomed: some thought it an empty  gesture; some asked why Turing alone should be pardoned; and some even saw it as an insult, confirming by implication that Turing’s homosexuality was indeed an offence that needed to be forgiven.

Turing is generally celebrated for wartime work at Bletchley Park, the code-breaking centre, and for his work on the Halting Problem: on the latter he was pipped at the post by Alonzo Church, but his solution included the elegant formalisation of the idea of digital computing embodied in the Turing Machine, recognised as the foundation stone of modern computing. In a famous paper from 1950 he also effectively launched the field of Artificial Intelligence, and it is here that we find what we now call the Turing Test, a much-debated proposal that the ability of machines to think might be tested by having a short conversation with them.

Turing’s optimism about artificial intelligence has not been justified by developments since: he thought the Test would be passed by the end of the twentieth century. For many years the Loebner Prize contest has invited contestants to provide computerised interlocutors to be put through a real Turing Test by a panel of human judges, who attempt to tell which of their conversational partners, communicating remotely by text on a screen, is human and which machine.  None of the ‘chat-bots’  has succeeded in passing itself off as human so far – but then so far as I can tell none of the candidates ever pretended to be a genuinely thinking machine – they’re simply designed to scrape through the test by means of various cunning tricks – so according to Turing, none of them should have succeeded.

One lesson which has emerged from the years of trials – often inadvertently hilarious – is that success depends strongly on the judges. If the judge allows the chat-bot to take the lead and steer the conversation, a good impression is liely to be possible; but judges who try to make things difficult for the computer never fail. So how do you go about tripping up a chat-bot?

Well, we could try testing its general knowledge. Human beings have a vast repository of facts, which even the largest computer finds it difficult to match. One problem with this approach is that human beings cannot be relied on to know anything in particular – not knowing the year of the battle of Hastings, for example, does not prove that you’re not human. The second problem is that computers have been getting much better at this. Some clever chat-bots these days are permanently accessible online; they save the inputs made by casual visitors and later discreetly feed them back to another subject, noting the response for future use. Over time they accumulate a large database of what humans say in these circumstances and what other humans say in response. The really clever part of this strategy is that not only does it provide good responses, it means your database is automatically weighted towards the most likely topics and queries. It turns out that human beings are fairly predictable, and so the chat-bot can come back with responses that are sometimes eerily good, embodying human-style jokes, finishing quotations, apparently picking up web-culture references, and so on.

If we’re subtle we might try to turn this tactic of saving real human input against the chat-bot, looking for responses that seem more appropriate for someone speaking to a chat-bot than someone engaging in normal conversation, or perhaps referring to earlier phases of the conversation that never happened. But this is a tricky strategy to rely on, generally requiring some luck.

Perhaps rather than trying established facts, it might be better to ask the chat-bot questions which have never been asked before in the entire history of the world, but which any human can easily answer. When was the last time a mole fought an octopus? How many emeralds were in the crown worn by Shakespeare during his visit to Tashkent?

It might be possible to make things a little more difficult for the chat-bot by asking questions that require an answer in a specific format; but it’s hard to do that effectively in a Turing Test because normal usage is generally extremely flexible about what it will accept as an answer; and failing to match the prescribed format might be more human rather than less. Moreover, rephrasing is another field where the computers have come on a lot: we only have to think of the Watson system’s performance at the quiz game Jeopardy, which besides rapid retrieval of facts required just this kind of reformulation.

So it might be better to move away from general stuff and ask the chat-bot about specifics that any human would know but which are unlikely to be in a database – the weather outside, which hotel it is supposedly staying at. Perhaps we should ask it about its mother, as they did in similar circumstances in Blade Runner, though probably not for her maiden name.

On a different tack, we might try to exploit the weakness of many chat-bots when it comes to holding a context: instead of falling into the standard rhythm of one input, one response, we can allude to something we mentioned three inputs ago. Although they have got a little better, most chat-bots still seem to have great difficulty maintaining a topic across several inputs or ensuring consistency of response. Being cruel, we might deliberately introduce oddities that the bot needs to remember: we tell it our cat is called Fish  and then a little later ask whether it thinks the Fish we mentioned likes to swim.

Wherever possible we should fall back on Gricean implicature and provide good enough clues without spelling things out. Perhaps we could observe to the chat-bot that poor grammar is very human – which to a human more or less invites an ungrammatical response, although of course we can never rely on a real human’s getting the point. The same thing is true, alas, in the case of some of the simplest and deadliest strategies, which involve changing the rules of discourse. We tell the chat-bot that all our inputs from now on lliw eb delleps tuo sdrawkcab and ask it to reply in the same way, or we js mss t ll th vwls.

Devising these strategies makes us think in potentially useful ways about the special qualities of human thought. If we bring all our insights together, can we devise an Ultra-Turing Test? That would be a single question which no computer ever answers correctly and all reasonably alert and intelligent humans get right. We’d have to make some small allowance for chance, as there is obviously no answer that couldn’t be generated at random in some tiny number of cases. We’d also have to allow for the fact that as soon as any question was known, artful chat-bot programmers would seek to build in an answer; the question would have to be such that they couldn’t do that successfully.

Perhaps the question would allude to some feature of the local environment which would be obvious but not foreseeable (perhaps just the time?) but pick it out in a non-specific allusive way which relied on the ability to generate implications quickly from a vast store of background knowledge. It doesn’t sound impossible…

 

19 thoughts on “Turing Test tactics

  1. I have a vivid memory of watching a news spot on an early handheld calculator when I was a preschooler. Now, a couple times a week my 4 year old daughter will climb on my lap while I’m doing my early morning roundup of tech and science news sites, asking to see robots. So we watch robots riding bicycles, walking tightropes, writing their names, doing broadway routines, and so on, all the while wondering what her child will be seeing 40 years hence. And it occurs to me that Minsky is right: she’ll be lucky if her ‘wonder machines’ keep her and her child as pets.

    The Turing Test is an indicator of many things, I’m sure, the END OF HUMANITY AS WE KNOW IT not the least of them. Science has consistently shown that we’re less special than we are inclined to think, no matter what the domain. It’s difficult to imagine why ‘intelligence’ should prove to be any different.

  2. The Turing Test was just a thought experiment proposing some type of benchmark to judge the performance of a computer within the specific domain of the imitation game as described in Turing’s paper. Granted, the imitation game is a specific domain with more generality than most computational applications, hence the comparison to human intelligence, however, it is definitely not a substitute for actually understanding human intelligence, let alone consciousness. Even Marvin Minsky thinks it is a joke. Unfortunately, too many people have been suckered into what is just neo-behaviorism, with all the nihilistic implications that come along with it, just look at the previous comment by Scott.

  3. haig: I agreed with everything you said, right up to “Unfortunately, too many people have been suckered into what is just neo-behaviorism, with all the nihilistic implications that come along with it, just look at the previous comment by Scott.”

    I don’t see any nihilistic implications to TT whatsoever. I’m with Walmsley: it’s a great heuristic for sorting between what’s relevant and irrelevant vis a vis domain general intelligence. But it has other uses as well.

    I have been accused of ‘neobehaviourism’ before, though! It seems to be the noocentrists new pejorative of choice 😉 . I just wish I knew what it meant!

    Otherwise, do you really think human intelligence represents some kind of pinnacle, haig?

  4. Scott:

    The Turing Test alone is not the problem, the problem is the procrustean mindset of fitting the human mind into an impoverished abstract model for the sake of epistemic closure. Cognitive science was supposed to be a refutation of the Skinnerian school of behaviorism, however, it appears that the computational paradigm has reinstated the behaviorist fallacy, albeit on a deeper level than simple external behaviors, but still behaviorist nonetheless.

    I don’t think humanity is the pinnacle of intelligence by any means, but I don’t share the common sentiment that humanity is nothing special either. I think Nagel was right in his latest book where he criticizes the modern scientific worldview for ignoring or outright denying several crucial pieces to the puzzle. The first is teleology, the second is subjective experience or qualia, and the third is objective axiology. Reasoning about intelligence or consciousness or meaning in general without these pieces leads to the bankrupt models which pervade modern scientific consensus and which I’m railing against.

  5. haig: Why should humans be anything special? If the choice is between understanding the difficulties pertaining to things like phenomenality, intentionality, normativity, teleology, and formal semantics as the artifact of cognitive/metacognitive shortcomings versus ontological surpluses (of some kind) surely the former is the more parsimonious (if less-flattering/more-troubling) approach.

    And which figures would you take to be emblematic of this ‘consensus’? Not all that many ‘eliminativists’ publishing out there!

    I still don’t understand how the ‘behaviourist’ tag is supposed to work.

  6. Scott:

    Presupposing metacognitive deficits as a catch-all solution is not a parsimonious explanation to these problems at all, it is just a convenient way to close off further lines of inquiry. One could (and many did) argue that physics was leading up to a fully complete and parsimonious Newtonian model of the universe up until the revolutions of the early 20th century, that didn’t make it any more true than what physicists eventually settled on.

    And the consensus I’m referring to is not strictly eliminativist, it is the overall paradigm of denying any scientific basis for direction and value as a fundamental part of the universe. Whether one is eliminativist or emergentist or functionalist or whatever, it really doesn’t matter, my critique is directed to any of them as long as they think cognitive architecture and mental phenomena are simply arbitrary heuristics of random adaptations without conforming to a deeper organizational logic where formal and final causation plays just as much of a role in shaping things as do efficient and material causes.

    Lastly, behaviorism attempted to describe most if not all of the relevant aspects of psychology through external behaviors in relation to environment without placing any emphasis on internal cognition, maybe for practical instrumental reasons due to the state of technology at the time, astoundingly some even denied cognition playing a role at all. The modern incarnation of this thinking which I’ve labeled neo-behaviorism sees the mind as a logical machine, or a bayesian learning machine, based on the behavior, this time, of heuristic algorithms. If you can replicate those algorithms computationally, then you’ve captured all of what the mind or brain does. I’m saying that, like behaviorism of the past, the new style behaviorism stops short at the algorithms and misses the fundamental physics.

  7. Scott:

    (continued)

    Finally, what makes humans special is that, within the perspective I advocate of value and direction inherent in the universe, humanity becomes the latest stage of development of an unfolding biosphere, itself a part of an unfolding universe, where humans can be seen both from the materialistic perspective of adapted organisms, but also from the cybernetic perspective of a continuation of the increasing generality of meta-controllers with expanded consciousness.

  8. haig: Well I think ‘neobehaviourism’ is clearly a misnomer, a rhetorical attempt to map the negative associations belonging to ‘behaviourism’ onto what is in fact a radically different and far more diverse research paradigm. Personally, I think emergentism and functionalism is every bit as spooky as what you’re proposing – attempts to shoehorn nature into shapes pleasing to our metacognitive conceits. But I’m willing to admit that I have my chips stacked at the extreme (if opposite) end of the table the same as you. Which is why I’m sympathetic to your worries regarding paradigm closure. Marcus Arvan has just started a blog ‘Underappreciated Philosophy’ that is meant to address this very problem.

    So you literally think that apparent ‘purposive thinking’ is not fundamentally heuristic, that it actually cognizes some fundamental feature of physical reality that physics can actually or potentially describe?

  9. Scott:

    >” So you literally think that apparent ‘purposive thinking’ is not fundamentally heuristic, that it actually cognizes some fundamental feature of physical reality that physics can actually or potentially describe?”

    Yes, ‘purposive thinking’ is a continuation of the purposive organizational structure of the universe, and therein lies the problem of trying to understand the mind without first changing our understanding of cosmology and the rest of the special sciences by applying complex systems thinking to what have been overly reductionistic models of natural phenomena.

    These are bold claims, I know, but I didn’t arrive at my conclusions easily or without justifications. I’d be happy to expand on any of it if you want.

  10. Scott, I believe one of your books implies that a nearly perfect Turing test query would be to intensely question the “computer” about the nature and humorous value of a paradox. All you would need is to come up with a semi-novel paradox on the fly, and see if the computer understand the fundamental clash of intuitions at the heart of it.

    I actually think that’s a promising tactic… hey computer, I’ll only join a blog that won’t let me register, what do you think about that?

    (This is similar to a trope called a Logic Bomb, except the purpose isn’t to shut the computer down, but rather get it to falter in its human mimicry.)

    Peter, I eagerly await your thoughts on the rat memory Inception and the rat Mind Control that have gotten headlines recently.

  11. jorge: Where Mimara tells the skin-spy that only a soul can comprehend contradictory truths? If BBT is true, then a machine actually shouldn’t have a problem with this, so long as it faces the same set of metacognitive constraints. That’s a big irony of the position: it explains many of the mysterious things we take to be signatures of how special human cognition as the result of various incapacities.

    I’m with you on the rats: I’m very curious to read what Peter thinks!

    haig: I recall being impressed with your arguments several months ago, haig. I find the whole teleonomy project very interesting – and not without hope. I hate to say it, I’m just pessemistic about science confirming our prescientific intuitions. The way I see, it, we’ve just opened our Mount Wilson Observatory on the brain and are presently grappling with the likelihood that we’re as small and insignificant psychically as we are physically.

  12. While I don’t think a brain is a computer,I think we can still learn a lot from the comparison. Seeing the limitations of a computer’s processing methods tells us something about the ability of our own. That is the lesson I get from tests like the Turing Test. This sort of game setting has been a great testing ground for programmers.

    We all remember Kasparav, but perhaps where the more interesting ‘game’ testing has been not in Chess, but in the ancient game of Go. In Go, the middle game, the part where the winner is usually determined, is extremely flexible and open ended. There has been some incredible progress in recent years, but it took a different approach from that taken with Chess.

    In short, our ‘database’ isn’t as quick or reliable, we have a kind of ‘judgment’ computers have issues with.

    It is these kinds of studies that give me the feeling that, even if we were actually able to make a ‘sentient’ computer, it would work much differently than the human brain. And determining whether or not it was sentient, would probably take a lot more than a Turing test. I mean, we aren’t even agreed on how determine if other human beings are sentient.

  13. Scott:

    Your pessimism is not misplaced, it is the prudent position and the better bet when calculating the expected value of classes of theories based on the body of knowledge you are working with. I’m not denying that, but I am saying something that I admit should draw skeptical ire, that we are on the verge of expanding that body of knowledge and overturning critical foundational assumptions that such theories are based on. Until these newer ideas are taken more seriously and we undergo a paradigm shift, my ideas will remain incredulous to most. To that end, my work is 3 fold:

    1.) mathematical formalizations of certain ideas I have at the intersection of computational complexity theory and complex systems theory with specific applications in physics, biology, and brain-cognitive science.

    2.) biophysics of single and population neurons using 2-photon laser microscopy and spectroscopy, as well as optogenetic experiments of simple organisms starting with c.elegans and working up to more complex invertebrates. Those results will later be built upon using cognitive neuroscience experiments of vertebrates as well as neuromorphic devices to add empirical weight to theories of neural coding and cognitive architectures.

    3.) Modernizing process philosophy to integrate these newer scientific findings and clean up and expand on Whitehead’s obscurantist prose.

    Daunting, but can’t say I don’t like a challenge.

  14. Scott: I realize that if BBT is true, sooner or later a computer should be able to pass any Turing Test guantlet. My suggestion was merely one to suss out even very advanced statistical brute force machines (ie “Chinese rooms” rather than a machine that “actually thinks”).

    Haig: good luck with that 3rd one. (Isn’t optogenetics just the coolest? I think the inventors have a good shot at a Nobel prize within 5 years)

  15. Jorge:

    Yes, optogenetics is a major advancement and Deisseroth et al deserve all the accolades they can get.

  16. haig: Do you have anything I could read regarding (1)?

    My gut has been rumbling about self similarity and scale independence ever since I saw the first photograph of the ‘Sloan Wall’ and mistook it for a neural representation. Any ideas where should I root around to discover more on this particular topic?

    I’ve had a date with Whitehead that I’ve been postponing for twenty years now!

  17. Jorge, what if you had a Chinese room about a prior Chinese room? The Chinese of Chinese? Or more concretely, one processor attempts to force brute force learn Chinese – the second processor brute force learns the first processors learning about Chinese?

    In the end though, not sure about turing test gauntlets – as I’m sure it’s possible (perhaps with a little bias seeding, or perhaps no bias seeding at all) for real people at one end of the terminal to fail a turing test as judged by another human at the other end. Indeed, a classic of that is war…

    Please don’t fail my post…because quixloxal albequerky smurf…damn, ‘I’ gave it away… >:)

  18. @Scott

    There are a lot of books on fractals, natural computing, graph theory, etc. that deal with self-similarity, and some newer papers like Geoffrey West’s fascinating papers on scale invariance in both biology and social systems, but I haven’t seen anything that takes a universal interpretation of these ideas (ie applying it to cosmology as a whole) except for some popular science/philosophy books that are wishy-washy.

    The neglect of Whitehead’s process philosophy, including similar thinkers like Charles Peirce, William James’ later years, Teilhard de Chardin (with a grain of salt), even some Integral philosophy (again, grain of salt) is an incredible blind spot in modern scientific and philosophical thought. I think it reconciles the differences among analytical and continental philosophy, as well as formulates a *constructive* postmodernist picture (as opposed to the deconstructionists) that is actually useful (and IMO absolutely necessary).

  19. More damning criticism of the Turing Test and the established AI research paradigm: http://www.newyorker.com/online/blogs/elements/2013/08/why-cant-my-computer-understand-me.html

    Winograd schemas are a much harder threshold for AI programs to pass, and shows why we need to truly understand the theory of intelligence instead of just hacking engineering projects which merely appear intelligent. The linguistic phenomenon of Anaphora cleverly reveals how language is not simply a procedure for manipulating symbols according to grammar rules, even when augmented with learned patterns of associations. Nope, language is an interface that sits atop an experientially learned common-sense knowledge base with representations foreign to most AI researchers.

Leave a Reply

Your email address will not be published. Required fields are marked *