Picture: Thomas J Watson. So IBM is at it again: first chess, now quizzes? In the new year their AI system ‘Watson’ (named after the founder of the company – not the partner of a system called ‘Crick’, nor yet of a vastly cleverer system called ‘Holmes’) is to be pitted against human contestants in the TV game Jeopardy: it has already demonstrated its remarkable ability to produce correct answers frequently enough to win against human opposition. There is certainly something impressive in the sight of a computer buzzing in and enunciating a well-formed, correct answer.

However, if you launch your new technological breakthrough on a TV quiz, rather than describing it in a peer-reviewed paper, or releasing it so that the world at large can kick it around a bit, I think you have to accept that people are going to suspect your discovery is more a matter of marketing than of actual science; and much of the stuff IBM has put out tends to confirm this impression. It’s long on hoopla, here and there it has that patronising air large businesses often seem to adopt for their publicity (“Imagine if a box could talk!”) and it’s rather short on details of how Watson actually works. This video seems to give a reasonable summary: there doesn’t seem to be anything very revolutionary going on, just a canny application of known techniques on a very large, massively parallel machine.

Not a breakthrough, then? But it looks so good! It’s worth remembering that a breakthrough in this area might be of very high importance. One of the things which computers have never been much good at is tasks that call for a true grasp of meaning, or for the capacity to deal with open-ended real environments. This is why the Turing test seems (in principle, anyway) like a good idea – to carry on a conversation reliably you have to be able to work out what the other person means; and in a conversation you can talk about anything in any way. If we could crack these problems, we should be a lot closer to the kind of general intelligence which at present robots only have in science fiction.

Sceptically, there are a number of reasons to think that Watson’s performance is actually less remarkable than it seems. First, a problem of fair competition is that the game requires contestants to buzz first in order to answer a question. It’s no surprise that Watson should be able to buzz in much faster than human contestants, which amounts to giving the machine the large advantage of having first pick of whatever questions it likes.

Second, and more fundamental, is Jeopardy really a restricted domain after all? This is crucial because AI systems have always been able to perform relatively well in ‘toy worlds’ where the range of permutations could be kept under control. It’s certainly true that the interactions involved in the game are quite rigidly stylised, eliminating at a stroke many of the difficult problems of pragmatics which crop up in the Turing Test. In a real conversation the words thrown at you might require all sorts of free-form interpretation, and have all kinds of conative, phatic and inferential functions; in the quiz you know they’re all going to be questions which just require answers in a given form.  On the other hand, so far as topics go, quiz questions do appear to be unrestricted ones which can address any aspect of the world (I note that Jeopardy questions are grouped under topics, but I’m not quite sure whether Watson will know in advance the likely categories, or the kinds of categories, it will be competing in). It may be interesting in this connection that Watson does not tap into the Internet for its information, but its own large corpus of data. The Internet to some degree reflects the buzzing chaos of reality, so it’s not really surprising or improper that Watson’s creators should prefer something a little more structured, but it does raise a slight question as to whether the vast database involved has been customised for the specifics of Jeopardy-world.

I said the quiz questions were a stylised form of discourse; but we’re asked to note in this connection that Jeopardy questions are peculiarly difficult: they’re not just straight factual questions with a straight answer, but allusive, referential, clever ones that require some intelligence to see through. Isn’t it all the more surprising that Watson should be able to deal with them? Well, no, I don’t think so:  it’s no more impressive than a blind man offering to fight you in the dark. Watson has no idea whether the questions are ‘straight’ or not; so long as enough clues are in there somewhere, it doesn’t matter how contorted or even nonsensical they might be; sometimes meanings can be distracting as well as helpful, but Watson has the advantage of not being bothered by that.

Another reason to withhold some of our admiration is that Watson is, in fact, far from infallible. It would be interesting to see more of Watson’s failures. The wrong answers mentioned by IBM tend to be good misses: answers that are incorrect, but make some sort of sense. We’re more used to AIs that fail disastrously, suddenly producing responses that are bizarre or unintelligible.  This will be important for IBM if they want to sell Watson technology, since buyers are much less likely to want a system that works well most of the time but abysmally every now and then.

Does all this matter? If it really is mainly a marketing gimmick, why should we pay attention? IBM make absolutely no claims that Watson is doing human-style thought or has anything approaching consciousness, but they do speak rather loosely of it dealing with meanings. There is a possibility that a famous victory by Watson would lead to AI claiming another tranche of vocabulary as part of its legitimate territory.  Look, people might say; there’s no point in saying that Watson and similar machines can’t deal with meaning and intentionality, any more than saying planes can’t fly because they don’t do it the way birds do. If machines can answer questions as well as human beings, it’s pointless to claim they can’t understand the questions: that’s what understanding is.  OK, they might say, you can still have your special ineffable meat-world kind of understanding, but you’re going to have to redefine that as a narrower and frankly less important business.


  1. 1. Paul Bello says:

    Another great post! I wholeheartedly endorse your series of criticisms. But I tend to think the big question here is whether or not we all agree on what constitutes “meaning.” Even if you don’t in principle agree with Searle’s conclusions about the Chinese Room, the thought experiment at least serves to remind us to think hard when we throw around terms like “meaning.” In this case, it seems as if “meaning” is roughly defined as being able to take textual inputs and produce entailing statements (e.g. answers). As you rightly note, it’s possible to peel this layer off of the more general layer of pragmatic meaning. Worse than this, question answering is probably one of three or four major domains of language use. Without facilities to predict how its interlocutor will process its responses, Watson likely won’t be able to do many of the linguistic tasks that typify our interactions with others. Similarly, there are certain domains of questioning which I suspect will be weaknesses for these methods. For example, let’s take mathematics. If a jeopardy question consists of asking sequence questions like 122333444455555…. it’s unlikely that 666666 will be found on an internet search or would be pre-computed in some way or another. Even though I know (and have published with) Watson’s chief architect, this seems to be more show than substance. I just really wish that big corporations like IBM would invest some of their considerable resources is actually trying to solve the problem for real.

  2. 2. Peter says:

    Thanks, Paul. I forgot to say – Merry Christmas, everyone!

  3. 3. Douglas says:

    I think that’s exactly right: it turns out that a lot of “meaning” can be captured in a reference book or a computer program. What can’t be captured is the aspects of meaning that have to do with qualia, the grounding in qualia.

  4. 4. Christophe says:

    Interesting post indeed.
    I agree with what you write, Peter.
    Let me just comment your point about computers being not good at tasks that call for a true grasp of meaning. Perhaps we can go a bit further by trying to understand what can be the “true grasp”. Implicitly, “true meaning” refers to humans and animals where the meaning of the information is intrinsic to the human or animal nature (meaning for an animal of a perceived prey, meanings of a Bordeaux wine glass for you or me). Meanings for computers and robots are implicitly assumed as “derived” from the designer or user. They are not “true”/intrinsic.
    As Paul Bello says: the big question here is whether or not we all agree on what constitutes “meaning.”
    But can’t we put all this together by talking about “meaning for a system”, be it animal, human or robot and say that a meaning for a system is specific to the system as related to the constraints that the system has to satisfy? The constraint and the meaning being intrinsic or derived.
    The meaning of a cat for a mouse is “danger relatively to survival constraints” (intrinsic/true). The meaning of an obstacle for a robot programmed to avoid them is “obstacle to be avoided” (derived). The case of humans is much more complex as we have to cope with a huge amount of inter-related needs and constraints, and with our lack of understanding of human consciousness. But I feel that a constraint satisfaction approach can still make sense
    (see http://www.idt.mdh.se/ECAP-2005/INFOCOMPBOOK/CHAPTERS/10-Menant.pdf)
    Also, you are right reminding us that “a breakthrough in this area might be of very high importance”. To understand what can be the meaning of information for humans, we need an understanding of human consciousness. Still a long (and interesting) way to go …

  5. 5. Craig from Az says:

    Douglas – Sorry if this is a dumb question that everybody here is already familiar with, but can you give an example of “aspects of meaning that have to do with qualia, the grounding in qualia”?


  6. 6. Alex Rudnick says:

    Hello Peter,

    While it doesn’t tell us much about meaning or experience or anything, I think Watson is pretty impressive. Whatever approaches they used, it must have taken a bunch of engineering elbow-grease, something often lacking in academic CS. We often act like the actual applications should be left to the grunt technicians.

    Irrespective of the search/language-processing algorithms they use, Watson has this interesting estimation problem to solve: it has to quickly decide whether to buzz in or not, which requires a heuristic guess as to whether it will be able to answer. (unless it tries to formulate a complete answer before buzzing)

  7. 7. Paul Bello says:

    If I had perfect access to wikipedia, among hundreds of other resources, I’d always buzz in. No decision necessary.

  8. 8. Alex Rudnick says:


    Yes — if by “perfect access”, you mean that you were confident that you could reliably and locate the right piece of information in the unstructured text, maybe parse that text, extract the information into the right data structures, and then generate the Jeopardy-appropriate response.

    And if you could do all that stuff, I’d say that was a serious accomplishment!

  9. 9. Paul Bello says:

    I wasn’t implying the whole thing was trivial or anything. I am suggesting that Watson likely hasn’t any sense of how difficult any given jeopardy question will be relative to any other one, so I highly doubt it needs to decide if or when to buzz in. My understanding is that Watson doesn’t do any deep semantics past the kinds of stuff you might get from wordnet, framenet, or other large-scale lexical resources. Most of what’s going on is statistical. More than that, the kind of question-answering going on in jeopardy is divorced from all of the rich pragmatics inherent in normal natural language interaction.

Leave a Reply