New paths to AI disaster

I’ve never believed that robots dream of killing all humans. I don’t think paperclip maximisers are ever going to rule the world. And I don’t believe in the Singularity. But is AI heading in some dangerous directions? Oh yes.

In Forbes, Bernard Marr recently offered five predictions for the year ahead. They mostly strike me as pretty believable, though I’m less optimistic than he is about digital assistants and the likelihood of other impressive breakthroughs; he’s surely right that there will be more hype.

It’s his first two that prompted some apprehension on my part. He says…

  1. AI increasingly becomes a matter of international politics
  2. A Move Towards “Transparent AI”

Those are surely right; we’ve already seen serious discussion papers emerging from the EU and elsewhere, and one of the main concerns to have emerged recently is the matter of ‘transparency’ – the auditability of software. How is the computer making its decisions?

This is a legitimate, indeed a necessary concern. Once upon a time we could write out the algorithm embodied in any application and check how it worked. This is getting more difficult with software that learns for itself, and we’ve already seen disastrous cases where the AI picked up and amplified the biases of the organisation it was working for. Noticing that most top executives were white middle-aged men, it might decide to downgrade the evaluation of everyone else, for example. Cases like that need to be guarded against and managed; it ought to be feasible in such circumstances, by studying results even if it isn’t possible to look inside the ‘black box’.

But it starts to get difficult, because as machine learning moves on into more complex decision making, it increasingly becomes impossible to understand how the algorithms are playing out, and the desired outcomes may not be so clear. In fact it seems to me that full transparency may be impossible in principle, due to human limitations. How could that be? I’m not sure I can say – I’m no expert, and explaining something you don’t, by definition, understand, is a bit of a challenge anyway. In part the problem might be to do with how many items we can hold in mind, for example. It’s generally accepted that we can only hang on to about seven items (plus or minus a couple) in short-term memory. (There’s scope for discussion about such matters as what amounts to an item, and so on, but let’s not worry about the detail.) This means there is a definite limit to how many possible paths we can mentally follow at once, or to put it another way, how large a set of propositional disjunctions we can hang on to (‘either a or b, and if a, either c, d, or e, while if b, f or g… and there we go). Human brains can deal with this by structuring decisions to break them into smaller chunks, using a pencil and paper, and so on. Perhaps, though, there are things that you can only understand by grasping twenty alternatives simultaneously. Very likely there are other cognitive issues we simply can’t recognise; we just see a system doing a great job in ways we can’t fathom.

Still, I said we could monitor success by just looking at results, didn’t I? We know that our recruitment exercise ought to yield appointments whose ethnic composition is the same as that of the population (or at any rate, of the qualified candidates).  OK, sometimes it may be harder to know what the desired outcome is, exactly, and there may be issues about whether ongoing systems need to be able to yield sub-optimal results temporarily, but those are tractable issues.

Alas, we also have to worry about brittleness and how things break. It turns out that systems using advanced machine learning may be prone to sudden disastrous failure. A change of a few unimportant pixels in a graphic may make an image recognition system which usually performs reliably draw fantastic conclusions instead. In one particular set of circumstances a stock market system may suddenly go ape. This happens because however machine learning systems are doing what they do, they are doing something radically different from what we do, and we might suspect that like simpler computer systems, they take no true account of relevance, only its inadequate proxy correlation. Nobody, I think, has any good theoretical analysis of relevance, and it is strongly linked with Humean problems philosophers have never cracked.

That’s bad, but it could be made worse if legislative bodies either fail to understand why these risks arise, or decide that on a precautionary basis we must outlaw anything that cannot be fully audited and understood by human beings. Laws along those lines seem very likely to me, but they might throw away huge potential benefits – perhaps major economic advantage – or even suppress the further research and development which might ultimately lead to solutions and to further, as yet unforeseeable, gains.

That’s not all, either; laws constrain compliant citizens, but not necessarily everyone. Suppose we can build machine learning systems that retain a distinct risk of catastrophic failure, but outclass ordinary human or non-learning systems most of the time. Will anyone try to build and use such systems? Might there be a temptation for piratical types to try them out in projects that are criminal, financial, political or even military? Don’t the legitimate authorities have to develop the same systems pre-emptively in self-defence? Otherwise we’re left in a position where it’s not clear whether we should hope that the ‘pirate’ systems fail or work, because either way it’s catastrophe.

What on earth is the answer, what regulatory regime or other measures would be appropriate? I don’t know and I strongly doubt that any of the regulatory bodies who are casting a thoughtful eye over this territory know either.