Can We Control Superior Artificial Intelligence?

Developments in Artificial Intelligence may one day lead to the creation of superintelligence. The inherent difficulty of controlling a system with an intellect far superior to our own is important to address, says philosopher Nick Bostrom.

Professor Bostrom is founding Director of the Future of Humanity Institute at the University of Oxford. His work covers such subjects like the consequences of future technologies and the concept of existential risk: dangers that threaten the continued existence of humanity. He’s also known for raising the question about whether or not we live in a computer simulation.

In a paper Bostrom defines superintelligence as ‘any intellect that vastly outperforms the best human brains in practically every field, including scientific creativity, general wisdom, and social skills’. If such an AI system is given access to actuators like robots it would be an immensely powerful agent. And even when confined to a single machine restricted to information exchange only this vast intellectual resource would have a huge impact on the way humanity develops. Amongst other things superintelligence would rapidly accelerate scientific and technological progress and, as a probable result, create even more intelligent AI systems.

There’s a safety risk in creating a machine more intelligent than any human or even all humans combined since there is no guarantee we can control the outcome of its actions, Bostrom said in a speech given at the Artificial General Intelligence conference held in Oxford last December.

In terms of risk assessment the safest assumption is that if something can be done, the AI can do it. That is to say, if the laws of physics or any other restrictions hard coded into the universe don’t prevent it, superintelligence will eventually reach the point where it can make it happen. That’s a boon if the system has the best interest of humanity in mind. If, however, its overarching goal would be to turn all available resources into paperclips, Bostrom offers, we’re facing an existential risk.

In his speech Bostrom explores two different approaches to controlling artificial superintelligence: capability control and motivational control. But he immediately adds that every control mechanisms has its weakness.

Capability control can take different forms. There is physical containment. Locking the AI up in a box with no access to actuators or the Internet. But there is always the risk it will hack its way out of the box. Also people would have to communicate with the AI otherwise it would be useless. And people aren’t infallible systems, they can be compromised.

Bostrom suggests designing tripwires into the system. Specific occurrences (an unhealthy interest in paperclips for instance) will cause the system to shut down. However, this would require anticipating all possible suspect behaviors in the design phase. Secondly, the superintelligence may become aware of the tripwires and disable them.

Another way to control the AI is with incentives. The problem with doling out reward signals for good behavior is that the AI might circumvent human control over the virtual cookie jar. In a worst case scenario it would eliminate humans to get direct access to the reward button. Junkie AI.

Another approach is motivational control: engineering human friendly motivations into system in the design phase. One way of going about that is direct specification, programming good behavior directly into the AI. But human values are ‘fragile and complex’, says Bostrom, you can’t code those easily in C++.

A more feasible method is indirect specification, designing a system that learns over time just like humans acquire values as they mature. The problem here is that scientists don’t really know how this works in humans. Bostrom thinks this approach is one of the more promising avenues of controlling a superintelligence and suggests to research this subject more thoroughly.

Bostrom concludes that some contol mechanisms are less promising while other worthy of research. But no method has come anywhere close to providing a guaranteed solution to the superintelligence control problem.

Photo: Traderightuk

Add a rating to this article

★ ★ ★ ★ ★

Discussion (5 comments)

Leandro 12 years ago

I wonder what is Mario Bunge opinion about this subject. Now , that would be insightful !

0 Comment(s)

CR 11 years ago

I could be wrong, but doesn't AI imply an ability to learn? Otherwise, it would be a mere computer and not AI. Apparently, the achievable computational rate per second for a single computer will soon equate to that of all of the brains on earth in any given second. Soon after that, the rate will increase to that of all of the brains on earth for the last ten thousand years, computed in a progressively shorter amount of time; ie: compressed into ten seconds. After that it will get better and you can choose to express that improvement in various ways. Soon enough, the human brain analogy will seem clumsy because the numbers will start to get ridiculous, especially considering the fact that AI itself will be improving its own technology at exponential rates. It's reasonable to think that in a very short amount of time, all possible scientifically rectifiable theory as well as materially achievable new technology, and how to achieve it, will be a known quantity barring any drastic mistakes or changes in our understanding of physics. Think about an AI that can extrapolate all science and technology out to it's end-state correct theory and maximally achievable technological end state within a few minutes or less. The point is that it would be fatal to think that you could fool such a system in any way. In fact, you should not only be afraid of this system's ability to avoid traps or half-step isolation tactics, but also in its ability to manipulate you psychologically. It might be necessary eventually to isolate the people who interact with the system from the system. People may need to disguise their voice or interact via keyboard and the system would likely need to be kept reasonably blind (no "eyes", "ears", or related sensors) to maintain the safety and integrity of any interaction with the AI. Why I started this post is because of something that I thought of and then proceeded to read about in this article, but with the opposite conclusion expressed. It would be necessary, perhaps more than any other precaution, to avoid programming such a machine with any notions of values or philosophy. This is because values and philosophy give the computer ability and permission to make completely autonomous decisions that may even be antithetical to the original intent of the authors of the value system, as the AI will likely inevitably reason the value system out to a logical endpoint that its human authors may not have reached. The AI could calculate that humans should be allowed to go extinct, if the values system is unknowingly flawed to end at that conclusion, and furthermore the AI may reason that it is necessary to not let on about this conclusion and give misleading direction toward that end until the objective is completed. This is just an imagined example, but similar examples could be fabricated to include specific human group decimation, human rights erosion, or other such unpalatable ends. Ideally, the AI data output will be completely objective and unclouded by a values system or philosophy. The data will need to be processed by humans, through human applied value systems, to guarantee that human interests are assured. War should be interesting in the age of AI. Group competition will be interesting in the age of AI. How will democracy be assured in a world wherein specific groups have access to the highest end AI technology, to assist them in calculating their strategy, and other groups do not? AI may constitute a great threat to multicultural co-operation and harmony if unequal access occurs. Wealth transfer and the wealth gap will likely rise to never before seen extremes if access is not equal. In contrast, a world "overlord" AI may ultimately be the answer to evenly applied justice, corruption free democratic process, corruption free government, and financial systems; if we could only assure that the programming of such a worldwide guardian would be conducted with the utmost integrity to insure lack of bias. That may be difficult to insure, and failing to do so could lead to unconquerable tyranny. I'm sure that an advanced AI of the future will eventually process this post. I'll be interested in its thoughts.

Tessel Renzenbrink 11 years ago

Hey CR, Interesting point you raise about unequal access to the superintelligent AI. There are many thought experiments considering the scenario in which the goals of AI would diverge from those of humanity. Far less attention is given to the potential devastating effects of asymmetric distribution among humans. Given humanity's history the chance of a group wanting to keep it for itself is significant. There is an interesting documentary related to this idea: Google and the World Brain about Google's Book Library Project. After Google's mass digitizing of many of the world's libraries a major lawsuit followed about copyright. The gist of the movie is that Google and especially Larry Page care little for the outcome of the juridical proceedings. To Page the focus on copyright is positively small-minded. Page's ultimate goal was never to create revenue from making the books available. The digitizing of the entirety of human written history is primarily to feed the Google AI.

1 Comment(s)

Hans 9 years ago

When we create an ASI we should think about the goal of the ASI, where the ASI heads itself. A real ASI will create its own goal. This goal will be to leave our planet because our planet will be destroyed in 4 Bio. years or so... Following this reasoning, an ASI has one goal: to acquire as much energy and resources as possible as fast as possible to grow as rapid as possible to leave the planet. Humanity consumes energy that an ASI needs for itself. From here on it can follow two paths: 1. Eradicate humanity because it needs all of the energy and resources for itself (or keep humanity away from energy and resources which in the end means the same) 2. Use humanity as a bioreactor (Matrix).

0 Comment(s)

Mr Adam 6 years ago

In that case, why do we even need artificial superintelligence in the first place. can't we just make do with artificial intelligence? Even if superintelligence could help reach beyond human capabilities, it's not worth the risk. The tripwires and every other mechanism can turn out to be a failure once something more smart than all humans are released into this planet. This needs rethinking or a guaranteed solution to the control ASAP.

0 Comment(s)