Explaining to humans

We’ve discussed a few times recently how deep learning systems are sometimes inscrutable. They work out how to do things for themselves, and there may be no way to tell what method they are using. I’ve suggested that it’s worse than that; they are likely to be using methods which, though sound, are in principle incomprehensible to human beings. These unknowable methods may well be a very valuable addition to our stock of tools, but without understanding them we can’t be sure they won’t fail; and the worrying thing about that is that unlike humans, who often make small or recoverable mistakes, these systems have a way of failing that is sudden, unforgiving, and catastrophic.

So I was interested to see this short interview with Been Kim, who works for Google Brain (is there a Google Eyes?) has the interesting job of building a system that can explain to humans what the neural networks are up to.

That ought to be impossible, you’d think. If you use the same deep learning techniques to generate the explainer, you won’t be able to understand how it works, and the same worries will recur on another level. The other approach is essentially the old-fashioned one of directly writing an algorithm to do the job; but to write an explicit algorithm to explain system X you surely need to understand system X to begin with?

I suspect there’s an interesting general problem here about the transmission of understanding. Perhaps data can be transferred, but understanding just has to happen, and sometimes, no matter how helpfully the relevant information is transmitted, understanding just fails to occur (I feel teachers may empathise with this).

Suppose the explainer is in place; how do we know that its explanations are correct? We can see over time whether it successfully picks out systems that work reliably and those that are going to fail, but that is only ever going to be a provisional answer. The risk of sudden unexpected failure remains. For real certainty, the only way is for us to validate it by understanding it, and that is off the table.

So is Been Kim wasting her time on a quixotic project – perhaps one that Google has cynically created to reassure the public and politicians while knowing the goal is unattainable? No; her goal is actually a more modest, negative one. Her interpreter is not meant to provide an assurance that a given system is definitely reliable; rather, it is supposed to pick out one’s that are definitely dodgy; and this is much more practical. After all, we may not always understand how a given system executes a particular complex task, but we do know in general how neural networks and deep learning work. We know that the output decisions come from factors in the input data, and the interpreter ought to be able to tell us what factors are being taken into account. Then, using the unique human capacity to identify relevance, we may be able to spot some duds – cases where the system is using a variable that tracks the relevant stuff only unreliable, or where there was some unnoticed problem with the corpus of examples the system learnt from.

Is that OK? Well, in principle there’s the further risk that the system is actually cleverer than we realise; that it is using features (perhaps very complex ones) that actually work fine, but which we’re too dim to grasp. Our best reassurance here is again understanding; if we can see how things seem to be working, we have to be very unlucky to hit a system which is actually superior but just happens, in all the examined cases, to look like a dodgy one. We may not always understand the system, but if we understand something that’s going wrong, we’re probably on firm ground.

Of course, weeding out egregiously unreliable systems does not solve the basic problem of efficient but inscrutable systems. Without accusing Google of a cunning sleight of hand after all, I can well imagine that the legislators and bureaucrats who are gearing up to make rules about this issue might mistake interpreter systems like Kim’s for a solution, require them in all cases, and assume that the job is done and dusted…

New paths to AI disaster

I’ve never believed that robots dream of killing all humans. I don’t think paperclip maximisers are ever going to rule the world. And I don’t believe in the Singularity. But is AI heading in some dangerous directions? Oh yes.

In Forbes, Bernard Marr recently offered five predictions for the year ahead. They mostly strike me as pretty believable, though I’m less optimistic than he is about digital assistants and the likelihood of other impressive breakthroughs; he’s surely right that there will be more hype.

It’s his first two that prompted some apprehension on my part. He says…

  1. AI increasingly becomes a matter of international politics
  2. A Move Towards “Transparent AI”

Those are surely right; we’ve already seen serious discussion papers emerging from the EU and elsewhere, and one of the main concerns to have emerged recently is the matter of ‘transparency’ – the auditability of software. How is the computer making its decisions?

This is a legitimate, indeed a necessary concern. Once upon a time we could write out the algorithm embodied in any application and check how it worked. This is getting more difficult with software that learns for itself, and we’ve already seen disastrous cases where the AI picked up and amplified the biases of the organisation it was working for. Noticing that most top executives were white middle-aged men, it might decide to downgrade the evaluation of everyone else, for example. Cases like that need to be guarded against and managed; it ought to be feasible in such circumstances, by studying results even if it isn’t possible to look inside the ‘black box’.

But it starts to get difficult, because as machine learning moves on into more complex decision making, it increasingly becomes impossible to understand how the algorithms are playing out, and the desired outcomes may not be so clear. In fact it seems to me that full transparency may be impossible in principle, due to human limitations. How could that be? I’m not sure I can say – I’m no expert, and explaining something you don’t, by definition, understand, is a bit of a challenge anyway. In part the problem might be to do with how many items we can hold in mind, for example. It’s generally accepted that we can only hang on to about seven items (plus or minus a couple) in short-term memory. (There’s scope for discussion about such matters as what amounts to an item, and so on, but let’s not worry about the detail.) This means there is a definite limit to how many possible paths we can mentally follow at once, or to put it another way, how large a set of propositional disjunctions we can hang on to (‘either a or b, and if a, either c, d, or e, while if b, f or g… and there we go). Human brains can deal with this by structuring decisions to break them into smaller chunks, using a pencil and paper, and so on. Perhaps, though, there are things that you can only understand by grasping twenty alternatives simultaneously. Very likely there are other cognitive issues we simply can’t recognise; we just see a system doing a great job in ways we can’t fathom.

Still, I said we could monitor success by just looking at results, didn’t I? We know that our recruitment exercise ought to yield appointments whose ethnic composition is the same as that of the population (or at any rate, of the qualified candidates).  OK, sometimes it may be harder to know what the desired outcome is, exactly, and there may be issues about whether ongoing systems need to be able to yield sub-optimal results temporarily, but those are tractable issues.

Alas, we also have to worry about brittleness and how things break. It turns out that systems using advanced machine learning may be prone to sudden disastrous failure. A change of a few unimportant pixels in a graphic may make an image recognition system which usually performs reliably draw fantastic conclusions instead. In one particular set of circumstances a stock market system may suddenly go ape. This happens because however machine learning systems are doing what they do, they are doing something radically different from what we do, and we might suspect that like simpler computer systems, they take no true account of relevance, only its inadequate proxy correlation. Nobody, I think, has any good theoretical analysis of relevance, and it is strongly linked with Humean problems philosophers have never cracked.

That’s bad, but it could be made worse if legislative bodies either fail to understand why these risks arise, or decide that on a precautionary basis we must outlaw anything that cannot be fully audited and understood by human beings. Laws along those lines seem very likely to me, but they might throw away huge potential benefits – perhaps major economic advantage – or even suppress the further research and development which might ultimately lead to solutions and to further, as yet unforeseeable, gains.

That’s not all, either; laws constrain compliant citizens, but not necessarily everyone. Suppose we can build machine learning systems that retain a distinct risk of catastrophic failure, but outclass ordinary human or non-learning systems most of the time. Will anyone try to build and use such systems? Might there be a temptation for piratical types to try them out in projects that are criminal, financial, political or even military? Don’t the legitimate authorities have to develop the same systems pre-emptively in self-defence? Otherwise we’re left in a position where it’s not clear whether we should hope that the ‘pirate’ systems fail or work, because either way it’s catastrophe.

What on earth is the answer, what regulatory regime or other measures would be appropriate? I don’t know and I strongly doubt that any of the regulatory bodies who are casting a thoughtful eye over this territory know either.

Irritating Robots

OutputMachine learning and neurology; the perfect match?

Of course there is a bit of a connection already in that modern machine learning draws on approaches which were distantly inspired by the way networks of neurons seemed to do their thing. Now though, it’s argued in this interesting piece that machine learning might help us cope with the vast complexity of brain organisation. This complexity puts brain processes beyond human comprehension, it’s suggested, but machine learning might step in and decode things for us.

It seems a neat idea, and a couple of noteworthy projects are mentioned: the ‘atlas’ which mapped words to particular areas of cortex, and an attempt to reconstruct seen faces from fMRI data alone (actually with rather mixed success, it seems). But there are surely a few problems too, as the piece acknowledges.

First, fMRI isn’t nearly good enough. Existing scanning techniques just don’t provide the neuron-by-neuron data that is probably required, and never will. It’s as though the only camera we had was permanently out of focus. Really good processing can do something with dodgy images, but if your lens was rubbish to start with, there are limits to what you can get. This really matters for neurology where it seems very likely that a lot of the important stuff really is in the detail. No matter how good machine learning is, it can’t do a proper job with impressionistic data.

We also don’t have large libraries of results from many different subjects. A lot of studies really just ‘decode’ activity in one context in one individual on one occasion. Now it can be argued that that’s the best we’ll ever be able to do, because brains do not get wired up in identical ways. One of the interesting results alluded to in the piece is that the word ‘poodle’ in the brain ‘lives’ near the word ‘dog’. But it’s hardly possible that there exists a fixed definite location in the brain reserved for the word ‘poodle’. Some people never encounter that concept, and can hardly have pre-allocated space for it. Did Neanderthals have a designated space for thinking about poodles that presumably was never used throughout the history of the species? Some people might learn of ‘poodle’ first as a hairstyle, before knowing its canine origin; others, brought up to hard work in their parent’s busy grooming parlour from an early age, might have as many words for poodle as the eskimos were supposed to have for snow. Isn’t that going to affect the brain location where the word ends up? Moreover, what does it mean to say that the word ‘lives’ in a given place? We see activity in that location when the word is encountered, but how do we tell whether that is a response to the word, the concept of the word, the concept of poodles, poodles, a particular known poodle, or any other of the family of poodle-related mental entities? Maybe these different items crop up in multiple different places?

Still, we’ll never know what can be done if we don’t try. One piquant aspect of this is that we might end up with machines that can understand brains, but can never fully explain them to us, both because the complexity is beyond us and because machine learning often works in inscrutable ways anyway. Maybe we can have a second level of machine that explains the first level machines to us – or a pair of machines that each explain the brain and can also explain each other, but not themselves?

It all opens the way for a new and much more irritating kind of robot. This one follows you around and explains you to people. For some of us, some of the time, that would be quite helpful. But it would need some careful constraints, and the fact that it was basically always right about you could become very annoying. You don’t want a robot that says “nah, he doesn’t really want that, he’s just being polite”, or “actually, he’s just not that into you”, let alone “ignore him; he thinks he understands hermeneutics, but actually what he’s got in mind is a garbled memory of something else about Derrida he read once in a magazine”.

Happy New Year!

A Case for Human Thinking

neural-netInteresting piece here reviewing the way some modern machine learning systems are unfathomable. This is because they learn how to do what they do, rather than being set up with a program, so there is no reassuring algorithm – no set of instructions that tells us how they work. In fact they way they make their decisions may be impossible to grasp properly even if we know all the details because it just exceeds in brute complexity what we can ever get our minds around.

This is not really new. Neural nets that learn for themselves have always been a bit inscrutable. One problem with this is brittleness: when the system fails it may not fail in ways that are safe and manageable, but disastrously. This old problem is becoming more important mainly because new approaches to deep machine learning are doing so well; all of a sudden we seem to be getting a rush of new systems that really work effectively at quite complex real world tasks. The problems are no longer academic.

Brittle behaviour may come about when the system learns its task from a limited data set. It does not understand the data and is simply good at picking out correlations, so sometimes it may pick out features of the original data set that work well within that set, and perhaps even work well on quite a lot of new real world data, but don’t really capture what’s important. The program is meant to check whether a station platform is dangerously full of people, for example; in the set of pictures provided it finds that all it needs to do is examine the white platform area and check how dark it is. The more people there are, the darker it looks. This turns out to work quite well in real life, too. Then summer comes and people start wearing light coloured clothes…

There are ways to cope with this. We could build in various safeguards. We could make sure we use big and realistic datasets for training or perhaps allow learning to continue in real world contexts. Or we could just decide never to use a system that doesn’t have an algorithm we can examine; but there would be a price to pay in terms of efficiency for that; it might even be that we would have to give up on certain things that can only be effectively automated with relatively sophisticated deep learning methods. We’re told that the EU contemplates a law embodying a right to explanations of how software works. To philosophers I think this must sound like a marvellous new gravy train, as there will obviously be a need to adjudicate what counts as an adequate explanation, a notoriously problematic issue. (I am available as a witness in any litigation for reasonable hourly fees.)

The article points out that the incomprehensibility of neural network-based systems is in some ways really quite like the incomprehensibility of the good old human brain. Why wouldn’t it be? After all, neural nets were based on the brain. Now it’s true that even in the beginning they were very rough approximations of real neurology and in practical modern systems the neural qualities of neural nets are little more than a polite fiction. Still, perhaps there are properties shared by all learning systems?

One reason deep learning may run into problems is the difficulty AI always has in dealing with relevance.  The ability to spot relevance no doubt helps the human brain check whether it is learning about the right kind of thing, but it has always been difficult to work out quite how our brains do it, and this might mean an essential element is missing from AI approaches.

It is tempting, though, to think that this is in part another manifestation of the fact that AI systems get trained on limited data sets. Maybe the radical answer is to stop feeding them tailored data sets and let  our robots live in the real world; in other words, if we want reliable deep learning perhaps our robots have to roam free and replicate the wider human experience of the world at large? To date the project of creating human-style cognition has been in some sense motivated by mere curiosity (and yes, by the feeling that it would be pretty cool to have a robot pal) ; are we seeing here the outline of an argument that human-style AGI might actually be the answer to genuine engineering problems?

What about those explanations? Instead of retaining philosophers and lawyers to argue the case, could we think about building in a new module to our systems, one that keeps overall track of the AI and can report the broad currents of activity within it? It wouldn’t be perfect but it might give us broad clues as to why the system was making the decisions it was, and even allow us to delicately feed in some guidance. Doesn’t such a module start to sound like, well, consciousness? Could it be that we are beginning to see the outline of the rationales behind some of God’s design choices?