Chomsky on AI

There’s an interesting conversation here with Noam Chomsky. The introductory piece mentions the review by Chomsky which is often regarded as having dealt the death-blow to behaviourism, and leaves us with the implication that Chomsky has dominated thinking about AI ever since. That’s overstating the case a bit, though it’s true the prevailing outlook has been mainly congenial to those with Chomskian views . What’s generally taken to have happened is that behaviourism was succeeded by functionalism, the view that mental states arise from the functioning of a system – most often seen as a computational system. Functionalism has taken a few hits since then, and a few rival theories have emerged, but in essence I think it’s still the predominant paradigm, the idea you have to address one way or another if you’re putting forward a view about consciousness. I suspect in fact that the old days, in which one dominant psychological school – associationism, introspectionism, behaviourism – ruled the roost more or less totally until overturned and replaced equally completely in a revolution, are over, and that we now live in a more complex and ambivalent world.

Be that as it may, it seems the old warrior has taken up arms again to vanquish a resurgence of behaviourism, or at any rate of ideas from the same school: statistical methods, notably those employed by Google. The article links to a rebuttal last year by Peter Norvig of Chomsky’s criticisms, which we talked about at the time. At first glance I would have said that this is all a non-issue, because nobody at Google is trying to bring back behaviourism. Behaviourism was explicitly a theory about human mentality (or the lack of it); Google Translate was never meant to emulate the human brain or tell us anything about how human cognition works. It was just meant to be useful software. That difference of aim may perhaps tell us something about the way AI has tended to go in recent years, which is sort of recognised in Chomsky’s suggestion that it’s mere engineering, not proper science. Norvig’s response then was reasonable but in a way it partly validated Chomsky’s criticism by taking it head-on, claiming serious scientific merit for ‘engineering’ projects and for statistical techniques.

In the interview, Chomsky again attacks statistical approaches. Actually ‘attack’ is a bit strong: he actually says yes, you can legitimately apply statistical techniques if you like, and you’ll get results of some kind – but they’ll generally be somewhere between not very interesting and meaningless.  Really, he says, it’s like pointing a camera out of the window and then using the pictures to make predictions about what the view will be like next week: you might get some good predictions, you might do a lot better than trying to predict the scene by using pure physics, but you won’t really have any understanding of anything and it won’t really be good science. In the same way it’s no good collecting linguistic inputs and outputs and matching everything up (which does sound a bit behaviouristic, actually), and equally it’s no good drawing statistical inferences about the firing of millions of neurons. What you need to do is find the right level of interpretation, where you can identify the functional bits – the computational units – and work out the algorithms they’re running. Until you do that, you’re wasting your time. I think what this comes down to is that although Chomsky speaks slightingly of its forward version, reverse engineering is pretty much what he’s calling for.

This is, it seems to me, exactly right and entirely wrong in different ways at the same time. It’s right, first of all, that we should be looking to understand the actual principles, the mechanisms of cognition, and that statistical analysis is probably never going to be more than suggestive in that respect. It’s right that we should be looking carefully for the right level of description on which to tackle the problem – although that’s easier said than done. Not least, it’s right that we shouldn’t despair of our ability to reverse engineer the mind.

But looking for the equivalent of parts of  a Turing machine? It seems pretty clear that if those were recognisable we should have hit on them by now, and that in fact they’re not there in any readily recognisable form. It’s still an open question, I think, as to whether in the end the brain is basically computational, functionalist but in some way that’s at least partly non-computational, or non-functionalist in some radical sense; but we do know that discrete formal processes sealed off in the head are not really up to the job.

I would say this has proved true even of Chomsky’s own theories of language acquisition. Chomsky, famously, noted that the sample of language that children are exposed to simply does not provide enough data for them to be able to work out the syntactic principles of the language spoken around them as quickly as they do (I wonder if he relied on a statistical analysis, btw?). They must, therefore, be born with some built-in expectations about the structure of any language, and a language acquisition module which picks out which of the limited set of options has actually been implemented in their native tongue.

But this tends to make language very much a matter of encoding and decoding within a formal system, and the critiques offered by John Macnamara and Margaret Donaldson (in fact I believe Vygotsky had some similar insights even pre-Chomsky) make a persuasive case that it isn’t really like that. Whereas in Chomsky the child decodes the words in order to pick out the meaning, it often seems in fact to be the other way round; understanding the meaning from context and empathy allows the child to word out the proper decoding. Syntactic competence is probably not formalised and boxed off from general comprehension after all: and chances are, the basic functions of consciousness are equally messy and equally integrated with the perception of context and intention.

You could hardly call Chomsky an optimist: It’s worth remembering that with regard to cognitive science, we’re kind of pre-Galilean, he says; but in some respects his apparently unreconstructed computationalism is curiously upbeat and even encouraging.

 

Lies, damned lies, and statistics

Picture: Chomsky There has been quite a furore over some remarks made by Chomsky at the recent MIT symposium Brains, Minds and Machines; Chomsky apparently criticised researchers in machine learning who use purely statistical methods to produce simulations without addressing the meaning of the behaviour being simulated – there’s a brief account of the discussion in Technology Review. The wider debate was launched when Google’s Director of Research, Peter Norvig issued a response to Chomsky. So far as I can tell the point being made by Chomsky was that applications like Google Translate, which uses a huge corpus of parallel texts to produce equivalents in different languages, do not tell us anything about the way human beings use language.

My first reaction was surprise that anyone would disagree with that. The weakness of translation programmes is that they don’t deal with meanings, whereas human language is all about meanings. Google is more successful than earlier attempts mainly because it uses a much larger database. We always knew this was possible in principle, and that there is no theoretical upper limit to how good these translations can get (in theory we can have a database of everything ever written); it’s just that we (or I, anyway) underestimated how much would be practical and how soon.

As an analogy, we could say it’s like collecting exhaustive data about the movement of heavenly bodies. With big enough tables we could make excellent predictions about eclipses and other developments without it ever dawning on us that the whole thing was in fact about gravity. Norvig reads into Chomsky’s remarks a reassertion of views expressed earlier that ‘statistical’ research methods are no more than ‘butterfly collecting’.  It would be a little odd to dismiss the collection of data as not really proper science – but it’s true that we celebrate Newton who set out the law of gravity, and not poor Flamsteed and the other astronomers who supplied the data.

But hang on there. A huge corpus of parallel texts doesn’t tell us anything about the way human beings use language? That can’t be right, can it? Surely it tells us a lot about the way certain words are used, and their interpretation in other languages? Norvig makes the point that all interpretation is probabilistic – we pick out what people most likely meant. So probabilistic models may well enlighten us about how human beings process language.  He goes on to point out that there may be several different ways of capturing essentially the same theory, some more useful for some purposes than others: why rule out statistical descriptions?

Hm, I don’t know.  Certainly there are times when we have to choose between rival interpretations, but does that make interpretation essentially probabilistic? There are times when we have to choose between two alternative  jigsaw pieces, but I wouldn’t say that solving a jigsaw was a statistical matter.  Even if we concede that human interpretation of language is probabilistic that glosses over the substantial point that in human beings the probabilistic judgements are about likely meanings, not about likely equivalent sentences. Human language is animated by intentionality, by meaning things: and unfortunately at the moment we have little idea of how intentionality works.

But then, if we don’t know how meaning works, isn’t it possible that it might turn out to be statistical (or probabilistic, or anyway some sort of numerical)?  I don’t think it is possible to rule this out. If we had some sort of neural network which was able to do translations, or perhaps talk to us intelligently, we might be tempted to conclude that its inner weightings effectively encoded a set of probabilities about sentences and/or a set of meanings – mightn’t we? And who knows, by examining those weightings might we not finally glean some useful insight into how leaden numbers get transmuted into the gold of semantics?

As I say, I don’t think it’s possible to rule that out – but we know how Google Translate works and it isn’t really anything like that. Norvig wouldn’t claim -would he? – that Google Translate or other programs written on a similar basis display any actual understanding of the sentences they process.  So it still seems to me that, whether or not his wider attitudes are well-founded, Chomsky was essentially right.

To me the discussion has several curious echoes of the Chinese Room. I wonder whether Norvig would say that the man in the room understands Chinese?