Chomsky’s Mysterianism

Or perhaps Chomsky’s endorsement of Isaac Newton’s mysterianism.  We tend to think of Newton as bringing physics to a triumphant state of perfection, one that lasted until Einstein, and with qualifications, still stands. Chomsky says that in fact Newton shattered the ambitions of mechanical science, which have never recovered; and in doing so he placed permanent limits on the human mind. He quotes Hume;

While Newton seemed to draw off the veil from some of the mysteries of nature, he shewed at the same time the imperfections of the mechanical philosophy; and thereby restored her ultimate secrets to that obscurity, in which they ever did and ever will remain.

What are they talking about? Above all, the theory of gravity, which relies on the unexplained notion of action at a distance. Contemporary thinkers regarded this as nonsensical, almost logically absurd: how could object A affect object B without contacting it and without and internediating substance? Newton, according to Chomsky, agreed in essence; but defended himself by saying that there was nothing occult in his own work, which stopped short where the funny stuff began.  Newton, you might say, described gravity precisely and provided solid evidence to back up his description; what he didn’t do at all was explain it.

The acceptance of gravity, according to Chomsky, involved a permanent drop in the standard of intelligibility that scientific theories required. This has large implications for the mind it suggests there might be matters beyond our understanding, and provides a particular example. But it may well be that the mind itself is, or involves, similar intractable difficulties.

Chomsky reckons that Darwin reinforced this idea. We are not angels, after all, only apes; all other creatures suffer cognitive limitations; why should we be able to understand everything? In fact our limitations are as important as our abilities in making us what we are; if we were bound by no physical limitations we should become shapeless globs of protoplasm instead of human beings, and the same goes for our minds. Chomsky distinguishes between problems and mysteries. What is forever a mystery to a dog or rat may be a solvable problem for us, but we are bound to have mysteries of our own.

I think some care is needed over the idea of permanent mysteries. We should recognise that in principle there may be several things that look mysterious, notably the following.

  1. Questions that are, as it were, out of scope: not correctly definable as questions at all: these are answerable even by God.
  2. Mysterian mysteries; questions that are not in themselves unanswerable, but which are permanently beyond the human mind.
  3. Questions that are answerable by human beings, but very difficult indeed.
  4. Questions that would be answerable by human beings if we had further information which we (a) either just don’t happen to have, or which (b) we could never have in principle.

I think it’s just an assumption that the problem of mind, and indeed, the problem of gravity, fall into category 2. There has been a bit of movement in recent decades, I think, and the possibility of 3 or 4(a) remains open.

I don’t think the evolutionary argument is decisive either. Implicitly Chomsky assumes an indefinite scale of cognitive abilities matched by an indefinite scale of problems. Creatures that are higher up the first get higher up the second, but there’s always a higher problem.  Maybe, though, there’s a top to the scale of problems? Maybe we are already clever enough in principle to tackle them all.

If this seems optimistic, think of Chomsky the Lizard, millions of years ago. Some organisms, he opines, can stick their noses out of the water. Some can leap out, briefly. Some crawl out on the beach for a while. Amphibians have to go back to reproduce. But all creatures have a limit to how far they can go from the sea. We lizards, we’ve got legs, lungs, and the right kind of eggs; we can go further than any other. That does not mean we can go all over the island. Evolution guarantees that there will always be parts of the island we can’t reach.

Well, depending on the island, there may be inaccessible parts, but that doesn’t mean legs and lungs have inbuilt limits. So just because we are products of evolution, it doesn’t mean there are necessarily questions of type 2 for us.

Chomsky mocks those who claim that the idea of reducing the mind to activity of the brain is new and revolutionary; it has been widely espoused for centuries, he says. He mentions remarks of Locke which I don’t know, but which resemble the famous analogy of Leibniz’s mill.

If we imagine that there is a machine whose structure makes it think, sense, and have perceptions, we could conceive it enlarged, keeping the same proportions, so that we could enter into it, as one enters a mill. Assuming that, when inspecting its interior, we will find only parts that push one another, and we will never find anything to explain a perception.

The thing about that is, we’ll never find anything to explain a mill, either. Honestly, Gottfried, all I see is pieces of wood and metal moving around; none of them have any milliness? How on earth could a collection of pieces of wood – just by virtue of being arranged in some functional way, you say – acquire completely new, distinctively molational qualities?

Chomsky on AI

There’s an interesting conversation here with Noam Chomsky. The introductory piece mentions the review by Chomsky which is often regarded as having dealt the death-blow to behaviourism, and leaves us with the implication that Chomsky has dominated thinking about AI ever since. That’s overstating the case a bit, though it’s true the prevailing outlook has been mainly congenial to those with Chomskian views . What’s generally taken to have happened is that behaviourism was succeeded by functionalism, the view that mental states arise from the functioning of a system – most often seen as a computational system. Functionalism has taken a few hits since then, and a few rival theories have emerged, but in essence I think it’s still the predominant paradigm, the idea you have to address one way or another if you’re putting forward a view about consciousness. I suspect in fact that the old days, in which one dominant psychological school – associationism, introspectionism, behaviourism – ruled the roost more or less totally until overturned and replaced equally completely in a revolution, are over, and that we now live in a more complex and ambivalent world.

Be that as it may, it seems the old warrior has taken up arms again to vanquish a resurgence of behaviourism, or at any rate of ideas from the same school: statistical methods, notably those employed by Google. The article links to a rebuttal last year by Peter Norvig of Chomsky’s criticisms, which we talked about at the time. At first glance I would have said that this is all a non-issue, because nobody at Google is trying to bring back behaviourism. Behaviourism was explicitly a theory about human mentality (or the lack of it); Google Translate was never meant to emulate the human brain or tell us anything about how human cognition works. It was just meant to be useful software. That difference of aim may perhaps tell us something about the way AI has tended to go in recent years, which is sort of recognised in Chomsky’s suggestion that it’s mere engineering, not proper science. Norvig’s response then was reasonable but in a way it partly validated Chomsky’s criticism by taking it head-on, claiming serious scientific merit for ‘engineering’ projects and for statistical techniques.

In the interview, Chomsky again attacks statistical approaches. Actually ‘attack’ is a bit strong: he actually says yes, you can legitimately apply statistical techniques if you like, and you’ll get results of some kind – but they’ll generally be somewhere between not very interesting and meaningless.  Really, he says, it’s like pointing a camera out of the window and then using the pictures to make predictions about what the view will be like next week: you might get some good predictions, you might do a lot better than trying to predict the scene by using pure physics, but you won’t really have any understanding of anything and it won’t really be good science. In the same way it’s no good collecting linguistic inputs and outputs and matching everything up (which does sound a bit behaviouristic, actually), and equally it’s no good drawing statistical inferences about the firing of millions of neurons. What you need to do is find the right level of interpretation, where you can identify the functional bits – the computational units – and work out the algorithms they’re running. Until you do that, you’re wasting your time. I think what this comes down to is that although Chomsky speaks slightingly of its forward version, reverse engineering is pretty much what he’s calling for.

This is, it seems to me, exactly right and entirely wrong in different ways at the same time. It’s right, first of all, that we should be looking to understand the actual principles, the mechanisms of cognition, and that statistical analysis is probably never going to be more than suggestive in that respect. It’s right that we should be looking carefully for the right level of description on which to tackle the problem – although that’s easier said than done. Not least, it’s right that we shouldn’t despair of our ability to reverse engineer the mind.

But looking for the equivalent of parts of  a Turing machine? It seems pretty clear that if those were recognisable we should have hit on them by now, and that in fact they’re not there in any readily recognisable form. It’s still an open question, I think, as to whether in the end the brain is basically computational, functionalist but in some way that’s at least partly non-computational, or non-functionalist in some radical sense; but we do know that discrete formal processes sealed off in the head are not really up to the job.

I would say this has proved true even of Chomsky’s own theories of language acquisition. Chomsky, famously, noted that the sample of language that children are exposed to simply does not provide enough data for them to be able to work out the syntactic principles of the language spoken around them as quickly as they do (I wonder if he relied on a statistical analysis, btw?). They must, therefore, be born with some built-in expectations about the structure of any language, and a language acquisition module which picks out which of the limited set of options has actually been implemented in their native tongue.

But this tends to make language very much a matter of encoding and decoding within a formal system, and the critiques offered by John Macnamara and Margaret Donaldson (in fact I believe Vygotsky had some similar insights even pre-Chomsky) make a persuasive case that it isn’t really like that. Whereas in Chomsky the child decodes the words in order to pick out the meaning, it often seems in fact to be the other way round; understanding the meaning from context and empathy allows the child to word out the proper decoding. Syntactic competence is probably not formalised and boxed off from general comprehension after all: and chances are, the basic functions of consciousness are equally messy and equally integrated with the perception of context and intention.

You could hardly call Chomsky an optimist: It’s worth remembering that with regard to cognitive science, we’re kind of pre-Galilean, he says; but in some respects his apparently unreconstructed computationalism is curiously upbeat and even encouraging.

 

Lies, damned lies, and statistics

Picture: Chomsky There has been quite a furore over some remarks made by Chomsky at the recent MIT symposium Brains, Minds and Machines; Chomsky apparently criticised researchers in machine learning who use purely statistical methods to produce simulations without addressing the meaning of the behaviour being simulated – there’s a brief account of the discussion in Technology Review. The wider debate was launched when Google’s Director of Research, Peter Norvig issued a response to Chomsky. So far as I can tell the point being made by Chomsky was that applications like Google Translate, which uses a huge corpus of parallel texts to produce equivalents in different languages, do not tell us anything about the way human beings use language.

My first reaction was surprise that anyone would disagree with that. The weakness of translation programmes is that they don’t deal with meanings, whereas human language is all about meanings. Google is more successful than earlier attempts mainly because it uses a much larger database. We always knew this was possible in principle, and that there is no theoretical upper limit to how good these translations can get (in theory we can have a database of everything ever written); it’s just that we (or I, anyway) underestimated how much would be practical and how soon.

As an analogy, we could say it’s like collecting exhaustive data about the movement of heavenly bodies. With big enough tables we could make excellent predictions about eclipses and other developments without it ever dawning on us that the whole thing was in fact about gravity. Norvig reads into Chomsky’s remarks a reassertion of views expressed earlier that ‘statistical’ research methods are no more than ‘butterfly collecting’.  It would be a little odd to dismiss the collection of data as not really proper science – but it’s true that we celebrate Newton who set out the law of gravity, and not poor Flamsteed and the other astronomers who supplied the data.

But hang on there. A huge corpus of parallel texts doesn’t tell us anything about the way human beings use language? That can’t be right, can it? Surely it tells us a lot about the way certain words are used, and their interpretation in other languages? Norvig makes the point that all interpretation is probabilistic – we pick out what people most likely meant. So probabilistic models may well enlighten us about how human beings process language.  He goes on to point out that there may be several different ways of capturing essentially the same theory, some more useful for some purposes than others: why rule out statistical descriptions?

Hm, I don’t know.  Certainly there are times when we have to choose between rival interpretations, but does that make interpretation essentially probabilistic? There are times when we have to choose between two alternative  jigsaw pieces, but I wouldn’t say that solving a jigsaw was a statistical matter.  Even if we concede that human interpretation of language is probabilistic that glosses over the substantial point that in human beings the probabilistic judgements are about likely meanings, not about likely equivalent sentences. Human language is animated by intentionality, by meaning things: and unfortunately at the moment we have little idea of how intentionality works.

But then, if we don’t know how meaning works, isn’t it possible that it might turn out to be statistical (or probabilistic, or anyway some sort of numerical)?  I don’t think it is possible to rule this out. If we had some sort of neural network which was able to do translations, or perhaps talk to us intelligently, we might be tempted to conclude that its inner weightings effectively encoded a set of probabilities about sentences and/or a set of meanings – mightn’t we? And who knows, by examining those weightings might we not finally glean some useful insight into how leaden numbers get transmuted into the gold of semantics?

As I say, I don’t think it’s possible to rule that out – but we know how Google Translate works and it isn’t really anything like that. Norvig wouldn’t claim -would he? – that Google Translate or other programs written on a similar basis display any actual understanding of the sentences they process.  So it still seems to me that, whether or not his wider attitudes are well-founded, Chomsky was essentially right.

To me the discussion has several curious echoes of the Chinese Room. I wonder whether Norvig would say that the man in the room understands Chinese?

Colourless Green Gavagai

Picture: colorless green ideas sleep furiously. Blandula Strange, really, that the best known sentence Noam Chomsky ever wrote is probably the one which wasn’t supposed to mean anything. In ‘Syntactic structures’ (1957) he pointed out that while neither

  • Colorless green ideas sleep furiously, or
  • Furiously sleep ideas green colorless

means anything, we can easily see that the first is a valid sentence, while the second is not. Since neither sentence had ever appeared in any text until then, statistical analysis of language won’t help us tell which of them is more likely to occur in normal discourse, he said.

Blandula Strictly, the statistical point appears to be wrong – we can, in fact, assess the relative probability of sentences which have never occurred. Be that as it may, the thing that really caught people’s imagination was the grammatical but meaningless sentence. Was it really meaningless? Some thought they could see a kind of poetic meaning in it. Some people at Stanford had a competition which produced a number of poetic examples, and there is at least one other piece of verse. Resorting to poetry makes thing too easy, though. A more challenging exercise might be to reinterpret the sentence as part of a crossword clue…

Clue: Wow, awful colorless,  green ideas sleep furiously (3,4,8)

Solution: Gee, dire paleness.

(‘Furiously’ indicates an anagram of ‘green ideas sleep’, and the result means (more or less) the same as ‘Wow, awful colorless’.)

Alright,  maybe not the best crossword clue ever. Without going to those lengths, we can easily imagine that the sentence might be part of a political essay…

Even before the fall of the Berlin Wall, it was known that ‘colorless Reds’ – covert Soviet agents – had taken up places as ‘sleepers’ in the Dubitanian government. With the benefit of hindsight, we can see that these agents were in fact among the government’s most reliable and least politicised employees. Disruption, direct action, and sabotage were far more likely to be the work of the extremist ecological factions who also infiltrated certain government departments. Once securely lodged inside the state, it seems, Communist ideology waits calmly for its chance. Colorless green ideas sleep furiously.

‘Green’ has so many relatively normal uses that it lends itself to different interpretations. The sentence could be about refurbishment of a golf course, abstract painting, or something new and half-baked. But we’re not limited to ringing the changes on ‘green’. With the right context, we can make other words – ‘idea’ for example – mean something slightly different, too.

The Studge advertising agency was desperate to win the Kumfypillo account. The creative department decided on a ‘rainbow’ workshop. In these sessions, each person was assigned a color and a corresponding role. When the box of equipment was opened, however, the green badge was missing, so Jenkins, the junior member of the team, had to be green without a color: he also got the hardest job, which was to come up with new creative angles, a process which at Studge was called ‘ideaing’. Imagine the scene; in the new chairs by the window blue and red are, respectively, critiquing and relating the concept of repose; at the front of the room yellow catalogues the properties of head-support, while standing awkwardly at the back, poor colorless green ideas sleep furiously.

Without wanting to labour the point, one can imagine an interpretation in which none of the words have their usual meanings, and all are used in a different grammatical role from the one you would expect. It’s actually a bit of a challenge to hold that many novel meanings in your mind at once, but…

‘This is the strangest editorial office I’ve ever worked in,’ said John, ‘I can’t understand half of what people say’

‘You don’t have to be mad to work here, son,’ observed Smith, ‘we can teach you all that. Let me run you through some of the slang. Now ‘color’ is pictures, so ‘colorless’ means text, or words, as in ‘Mark my colorless, son’. “Green” is for green light, means “OK”, “can do” as in “I green lunch today”. OK so far? Now at one time the boss had a habit of picking up some piece and saying ‘But what does it mean ? What are the ideas ?’. So if you want to ask what somebody means, you can say ‘what ideas there, then?’ Now “sleep” means, “relax”, “it’s correct”. So if someone asks me if I’m going out today, I can say “hey, sleep”, meaning “certainly”. One more. When we’re in a desperate hurry to finish, we generally just furiously chuck in whatever stuff we’ve got. So “furiously” means “whatever you’ve got”, or “anything”, like, if somebody asks me what I’m drinking and I don’t care, I’ll just say “Oh, furiously.”.

‘Good grief’, exclaimed John, ‘Words can mean really anything.’

‘That’s what I’m saying,’ answered Smith ‘Colorless green ideas sleep furiously’.

Bitbucket Gibberish. Alright, I admire your imagination, or perhaps it would be nearer the mark to say your dull persistence in wringing out permutations. But so what? Inventing a code in which each word stands for a completely different one is a futile pursuit, and tells us nothing about Chomsky.

Blandula I’m not trying to make a point about Chomsky. What I’m actually doing is suggesting that Quine was right with his story about ‘gavagai ’ – the word which seemed to mean ‘rabbit’, but could have meant virtually anything. For any word or set of words, there really are an infinite number of possible interpretations, and it follows that the meaning is always impossible to decode with any certainty.

Bitbucket Then how does anyone ever understand anyone else? Quine’s view is one of those theories that even the author doesn’t believe when it comes to real life.

Blandula Ah, but you see, the thing is, we don’t decode words mechanically, the way a computer would have to do it. We just see what somebody means – it’s a process of recognition, not calculation. Perhaps that has some implications for Chomsky after all.