Lies, damned lies, and statistics

Picture: Chomsky There has been quite a furore over some remarks made by Chomsky at the recent MIT symposium Brains, Minds and Machines; Chomsky apparently criticised researchers in machine learning who use purely statistical methods to produce simulations without addressing the meaning of the behaviour being simulated – there’s a brief account of the discussion in Technology Review. The wider debate was launched when Google’s Director of Research, Peter Norvig issued a response to Chomsky. So far as I can tell the point being made by Chomsky was that applications like Google Translate, which uses a huge corpus of parallel texts to produce equivalents in different languages, do not tell us anything about the way human beings use language.

My first reaction was surprise that anyone would disagree with that. The weakness of translation programmes is that they don’t deal with meanings, whereas human language is all about meanings. Google is more successful than earlier attempts mainly because it uses a much larger database. We always knew this was possible in principle, and that there is no theoretical upper limit to how good these translations can get (in theory we can have a database of everything ever written); it’s just that we (or I, anyway) underestimated how much would be practical and how soon.

As an analogy, we could say it’s like collecting exhaustive data about the movement of heavenly bodies. With big enough tables we could make excellent predictions about eclipses and other developments without it ever dawning on us that the whole thing was in fact about gravity. Norvig reads into Chomsky’s remarks a reassertion of views expressed earlier that ‘statistical’ research methods are no more than ‘butterfly collecting’.  It would be a little odd to dismiss the collection of data as not really proper science – but it’s true that we celebrate Newton who set out the law of gravity, and not poor Flamsteed and the other astronomers who supplied the data.

But hang on there. A huge corpus of parallel texts doesn’t tell us anything about the way human beings use language? That can’t be right, can it? Surely it tells us a lot about the way certain words are used, and their interpretation in other languages? Norvig makes the point that all interpretation is probabilistic – we pick out what people most likely meant. So probabilistic models may well enlighten us about how human beings process language.  He goes on to point out that there may be several different ways of capturing essentially the same theory, some more useful for some purposes than others: why rule out statistical descriptions?

Hm, I don’t know.  Certainly there are times when we have to choose between rival interpretations, but does that make interpretation essentially probabilistic? There are times when we have to choose between two alternative  jigsaw pieces, but I wouldn’t say that solving a jigsaw was a statistical matter.  Even if we concede that human interpretation of language is probabilistic that glosses over the substantial point that in human beings the probabilistic judgements are about likely meanings, not about likely equivalent sentences. Human language is animated by intentionality, by meaning things: and unfortunately at the moment we have little idea of how intentionality works.

But then, if we don’t know how meaning works, isn’t it possible that it might turn out to be statistical (or probabilistic, or anyway some sort of numerical)?  I don’t think it is possible to rule this out. If we had some sort of neural network which was able to do translations, or perhaps talk to us intelligently, we might be tempted to conclude that its inner weightings effectively encoded a set of probabilities about sentences and/or a set of meanings – mightn’t we? And who knows, by examining those weightings might we not finally glean some useful insight into how leaden numbers get transmuted into the gold of semantics?

As I say, I don’t think it’s possible to rule that out – but we know how Google Translate works and it isn’t really anything like that. Norvig wouldn’t claim -would he? – that Google Translate or other programs written on a similar basis display any actual understanding of the sentences they process.  So it still seems to me that, whether or not his wider attitudes are well-founded, Chomsky was essentially right.

To me the discussion has several curious echoes of the Chinese Room. I wonder whether Norvig would say that the man in the room understands Chinese?

8 thoughts on “Lies, damned lies, and statistics

  1. Great post, Peter.
    I work quite a bit with this issue in my professional life, and I happened to be lucky enough to be there for the MIT MBM symposium to hear the Chomsky/Norvig exchange first-hand. I think what all of the developmental and psycholinguistic data show us is that neither one is 100% correct. Imagine seeing the following sentence in a corpus:

    “The couple went for a walk with thier daughter. They held her hand as they crossed the street.”

    There is just a tremendous amount of inference underlying the interpretation of this sentence that is likely not able to be gotten to via co-occurrence relationships between words/phrases/etc. For example, what binds to “thier” or “they?” Does “her” refer to the daughter or to the female member of the couple. If there aren’t antecedent mentions of couples typically containing males and females (which is btw only a default interpretation anyhow), how could one ever expect a system to make appropriate inferences here?

    On the other hand, one needs to probably generate multiple interpretations, and rank them according to thier likelihood. For example, what do we do with scope ambiguity in an example such as “John saw a man with his binoculars” ? Well if we happen to know that John doesn’t have binoculars, or that the man John saw doesn’t have binoculars, we can rule out at least one of the interpretations.

    It seems clear as day that some aspects of language comprehension (and acquisition for that matter) are statistical in nature. I think what muddles the discussion is that as researchers, we often think about “statistical inference” or “logical inference” as the sine qua non of human cognition. I’m of the mind that whatever representational resources we have, they are rich enough to accommodate the advantages of both. We’re starting to seriously sink resources into developing representations and inference mechanisms that blend aspects of both logical and probabilistic inference — and even though they are in thier infancy, I for one see great potential in thier application. There’s a long way to go, we still have to consider pragmatics, speech acts, and the associated mental-state attributions. Current developments in AI are really only focused on incorporating extensional (e.g. first-order) logical representations with probability. We need to start thinking about intensional (modal) representations as well. We’re a long way off, but these are exciting times.

  2. For a brain theory of sentential propositions and pronominal reference see *The Cognitive Brain*, Ch. 6, “Building a Semantic Network”, here:

    http://people.umass.edu/trehub/thecognitivebrain/chapter6.pdf

    and Ch. 13, “Narrative Comprehension and Other Aspects of the Semantic Network”, here:

    http://people.umass.edu/trehub/thecognitivebrain/chapter13.pdf

    The semantic network can organize a lexicon and signal logical relations among lexical items, but printed character strings or phonological sequences of speech convey no adaptive MEANING unless they are systematically linked to images of objects, events, and their relations in the phenomenal world (1pp).

  3. Arnold,
    I basically agree that words need to be grounded, but I guess I’m a little unclear about where you stand on the source of the grounding relation. I can’t believe that every word anyone ever uses is somehow grounded in direct perceptual experience (although I think most of them are). After all, we’re very creative with words. We’ve described many fundamental physical particles well before they were ever experimentally observed.

  4. Paul you say: I can’t believe that every word anyone ever uses is somehow grounded in direct perceptual experience (although I think most of them are)

    May I ask if you believe that even some words are not grounded in direct perceptual experiece, they can always be decomposed into elements or concepts that are actually grounded in it.

    For example, democracy or dictatorship , I don’t think there is a direct perceptual counterpart of these terms, but we can decompose them in a set of images and ideas of the kind of things that people experience when living in those kind regimes.

    But what about absolute nothingness , I mean no space, no time, nothing existing at all. This concept really puzzles and baffles me, because I know what I am refering to, still I can’t have the slightest perceptual approach to it.

    Arnold,

    I have gone through your chapters, very interesting, but my question is, even if you create those semantic networks, even if you have in the computer dictionary pointers to images and videos for each entry (to support the meaning side), even if you identify neural networks and structures that account for the semantic networks, at the end of the day all you have are currents flowing around (to convey it somehow) in different configurations.

    For computers I have it clear, they are just that, machines, if there is no conscious observer or operator, there is no difference between a computer memory and a brick, in what to meaning concerns.

    For us, I just don’t see how the brain tissue can hold my inner life, I don’t see how does the semantic neural network accounts for the joy and the satisfaction feeling that a scientist experiences when he comes to understand a certain system. To me that is meaning, it is undestanding, not just terms cross-linking. I don’t know, maybe networks of networks and overarching networks are created to support this understanding of broad complex systems.

  5. Chomsky compared such researchers to scientists who might study the dance made by a bee returning to the hive, and who could produce a statistically based simulation of such a dance without attempting to understand why the bee behaved that way.

    I understand Chomsky’s point as saying that true understanding of language is impossible outside the context of human experience in a three-dimensional world. However, while bees dance in only limited circumstances, human beings write about everything. The human experience has been translated to symbols a billion times over so maybe there’s enough there to discern real meaning by a computer program.

  6. To follow on the astronomy-without-gravity path, maybe the huge corpus does tell us about the way we use language, but not the “why”.

    Perhaps a huge enough pile of correlation could end up significantly predictive, but wouldn’t novelties (absent or under-represented in the corpus) remain mysterious without a good theory? I’m grasping for a reason for the importance of the “why” in my own mind. Perhaps a good “why” is a radical form of organic compression, necessary for those of us who cannot process a Google-scale corpus. That is, as humans with very poor, lossy data-storage capabilities, and lacking server-farm processing power to sift and sort it, a good explanation is the smallest, most efficient way to reap the benefits of amassed data.

  7. An example of a sentence to translate where you can seperate a statistical approach to
    an approch that understand intrinsic measing:

    “Susanne likes five words with S. She likes Salt, Sausages, Sitting, Santa Clause and silly jokes.”

    Now this would -using a statistical approach- never translate
    the same meaning into other Languages, as it would likely just translate the words,
    not the reason they where chosen.
    In fact, here, the words dont have a specific reason other than
    to start with S.

    A good translation (especially when tranlating movies and
    TV shows to other languages) must regard the meaning more than
    the closet (previous) equivalent.
    For example: poems, or popular culture references.
    Without understaning the intrinsic meaning, the translation
    would not replicate the same meaning in the language translated to.
    At least not in several special circumstances.

    A perfect tranlator would understand the meaning first, and
    then reconstruct the text based on this.
    Even if the texts structure deviates from the original.

Leave a Reply

Your email address will not be published. Required fields are marked *