New paths to AI disaster

I’ve never believed that robots dream of killing all humans. I don’t think paperclip maximisers are ever going to rule the world. And I don’t believe in the Singularity. But is AI heading in some dangerous directions? Oh yes.

In Forbes, Bernard Marr recently offered five predictions for the year ahead. They mostly strike me as pretty believable, though I’m less optimistic than he is about digital assistants and the likelihood of other impressive breakthroughs; he’s surely right that there will be more hype.

It’s his first two that prompted some apprehension on my part. He says…

  1. AI increasingly becomes a matter of international politics
  2. A Move Towards “Transparent AI”

Those are surely right; we’ve already seen serious discussion papers emerging from the EU and elsewhere, and one of the main concerns to have emerged recently is the matter of ‘transparency’ – the auditability of software. How is the computer making its decisions?

This is a legitimate, indeed a necessary concern. Once upon a time we could write out the algorithm embodied in any application and check how it worked. This is getting more difficult with software that learns for itself, and we’ve already seen disastrous cases where the AI picked up and amplified the biases of the organisation it was working for. Noticing that most top executives were white middle-aged men, it might decide to downgrade the evaluation of everyone else, for example. Cases like that need to be guarded against and managed; it ought to be feasible in such circumstances, by studying results even if it isn’t possible to look inside the ‘black box’.

But it starts to get difficult, because as machine learning moves on into more complex decision making, it increasingly becomes impossible to understand how the algorithms are playing out, and the desired outcomes may not be so clear. In fact it seems to me that full transparency may be impossible in principle, due to human limitations. How could that be? I’m not sure I can say – I’m no expert, and explaining something you don’t, by definition, understand, is a bit of a challenge anyway. In part the problem might be to do with how many items we can hold in mind, for example. It’s generally accepted that we can only hang on to about seven items (plus or minus a couple) in short-term memory. (There’s scope for discussion about such matters as what amounts to an item, and so on, but let’s not worry about the detail.) This means there is a definite limit to how many possible paths we can mentally follow at once, or to put it another way, how large a set of propositional disjunctions we can hang on to (‘either a or b, and if a, either c, d, or e, while if b, f or g… and there we go). Human brains can deal with this by structuring decisions to break them into smaller chunks, using a pencil and paper, and so on. Perhaps, though, there are things that you can only understand by grasping twenty alternatives simultaneously. Very likely there are other cognitive issues we simply can’t recognise; we just see a system doing a great job in ways we can’t fathom.

Still, I said we could monitor success by just looking at results, didn’t I? We know that our recruitment exercise ought to yield appointments whose ethnic composition is the same as that of the population (or at any rate, of the qualified candidates).  OK, sometimes it may be harder to know what the desired outcome is, exactly, and there may be issues about whether ongoing systems need to be able to yield sub-optimal results temporarily, but those are tractable issues.

Alas, we also have to worry about brittleness and how things break. It turns out that systems using advanced machine learning may be prone to sudden disastrous failure. A change of a few unimportant pixels in a graphic may make an image recognition system which usually performs reliably draw fantastic conclusions instead. In one particular set of circumstances a stock market system may suddenly go ape. This happens because however machine learning systems are doing what they do, they are doing something radically different from what we do, and we might suspect that like simpler computer systems, they take no true account of relevance, only its inadequate proxy correlation. Nobody, I think, has any good theoretical analysis of relevance, and it is strongly linked with Humean problems philosophers have never cracked.

That’s bad, but it could be made worse if legislative bodies either fail to understand why these risks arise, or decide that on a precautionary basis we must outlaw anything that cannot be fully audited and understood by human beings. Laws along those lines seem very likely to me, but they might throw away huge potential benefits – perhaps major economic advantage – or even suppress the further research and development which might ultimately lead to solutions and to further, as yet unforeseeable, gains.

That’s not all, either; laws constrain compliant citizens, but not necessarily everyone. Suppose we can build machine learning systems that retain a distinct risk of catastrophic failure, but outclass ordinary human or non-learning systems most of the time. Will anyone try to build and use such systems? Might there be a temptation for piratical types to try them out in projects that are criminal, financial, political or even military? Don’t the legitimate authorities have to develop the same systems pre-emptively in self-defence? Otherwise we’re left in a position where it’s not clear whether we should hope that the ‘pirate’ systems fail or work, because either way it’s catastrophe.

What on earth is the answer, what regulatory regime or other measures would be appropriate? I don’t know and I strongly doubt that any of the regulatory bodies who are casting a thoughtful eye over this territory know either.

6 thoughts on “New paths to AI disaster

  1. A good companion piece to this would be Tim Urban’s two-part piece on the dangers of AI over at his long-form essay blog Wait But Why.

    Part 1
    Part 2

    Urban, by contrast, does take the Singularity seriously (although, as he writes, he hesitates to use the term because it’s falling into disuse for various reasons), and he does worry about well-“intended” machines that end up paper-clipping us to death.

  2. Understandings getting beyond their origins, like ‘software learning for-from itself’…
    …is not unlike wave particle duality getting beyond observation-an observation…

    Will our universe, as it appears to be separating into infinity, remember-maintain itself as a singularity…

  3. Speaking of miltary applications… If your country doesn’t build self-driving robot infantry, how can its generals guarantee that the Chinese or the Russians or (etc) won’t? Treaties? Nuclear non-proliferation has been somewhere in the range of [extremely hard] to [abject failure], and nukes are a lot easier to detect than robots. Trust but verify – but how to verify?

    Also, human soldiers have an unfortunate tendency to refuse to fire on official enemies, just because those targets are citizens of their own society! “Unfortunate” from certain perspectives, that is. I happen to think it’s extremely fortunate, but my opinion might not matter so much.

    Be afraid. Be very afraid.

  4. “Nobody, I think, has any good theoretical analysis of relevance, and it is strongly linked with Humean problems philosophers have never cracked.”

    You should look up Judea Pearl’s The Book of Why. When AI starts using this modeling to get a handle on causation, things will get … interesting.

    *

  5. You say that “in fact it seems to me that full transparency may be impossible in principle, due to human limitations” and that “very likely there are other cognitive issues we simply can’t recognise; we just see a system doing a great job in ways we can’t fathom”.

    We can’t fathom what systems based on machine learning, or for example neural networks, are exactly doing – i.e. “where” a decision is made – not because we do not recognize their “cognitive issues”, but because they do not do what they do on a cognitive level at all, I would say. Take neural networks, which can approximate any mathematical function and can be used to make decisions, and are for example used in self-driving cars. That they “learn” to recognize objects (a relevant “cognitive issue” here, I assume you would say) means that certain weights between certain (software) nodes increase or decrease, making the activation of those nodes given certain inputs more likely or unlikely. Thousands of similar connections together can make very accurate predictions, similar to how brains have cognitive capabilities while nevertheless on a micro-scale being based on neurons firing electric signals at other connected neurons. This is a “black box” as you mention, since no where in this system (e.g. in a particular weight, one these three connections) we can localize this “decision”. My point is… The fact that we can’t fathom what these systems do is not due to sheer stupidity of the human (for example you suggest they have too limited working memory). The issue is more fundamental: these AI systems do no longer operate on a cognitive level at all, and we therefore have little hope of reconstructing “reasons” to explain and account for what they do and why they do it.

Leave a Reply

Your email address will not be published. Required fields are marked *