I would like to talk about the “nature of self-improving artificial intelligence” and in this I mean “nature” as in “human nature”. A self-improving artificial intelligence is a system that understands its own behavior and is able to make changes to itself in order to improve itself.
An ultra-intelligent machine could design even better machines. There would then unquestionably be an intelligence explosion, and the intelligence of man would be left far behind. Thus the first ultra-intelligent machine is the last invention that man need ever make.” These are very strong words! If they are even remotely true, it means that this kind of technology has the potential to dramatically change every aspect of human life and we need to think very carefully as we develop it. When could this transition happen? We don’t know for sure.
So, what’s a self-improving AI going to be like? At first you might think that it will be extremely unpredictable, because if you understand today’s version, once it improves itself you might not understand the new version. You might think it could go off in some completely wild direction.
So, what’s a self-improving AI going to be like? What should we expect?
What should we expect? Mankind has been dreaming about giving life to physical artifacts ever since the myths of Golems and Prometheus. If you look back at popular media images, it’s not a very promising prospect! We have images of Frankenstein, the Sorcerer’s Apprentice, and Giant Robots which spit fire from their mouths. Are any of these realistic? How can we look into the future? What tools can we use to understand? We need some kind of a theory, some kind of a science to help us understand the likely outcomes.
Fortunately, just such a science was developed starting in the 1940s by von Neumann and Morgenstern. Their work dealt with making rational choices in the face of objective uncertainty. They begin by looking at what rational economic behavior is. Viewed from a distance, it’s just common sense! In order to make a decision in the world, you must first have clearly specified goals. Then you have to identify the possible actions you have to choose between. For each of those possible actions you have to consider the consequences. The consequences won’t just be the immediate consequences, but you also look down the line and see what future ramifications might follow from your action. Then you choose that action which is most likely, in your assessment, to meet your goals. After acting, you update your world model based on what the world actually does. In this way you are continually learning from your experiences.
There is an old joke that describes programmers as “devices for converting pizza into code”. We can think of rational self-improving systems as “devices for converting resources into expected utility”. Everything they do takes in matter, free energy, time and space, and produces whatever is encoded in their utility function. If they are a wealth-seeking agent, they are going to devote their resources to earning money. If they are an altruistic agent, they will spend their resources trying to create world peace.
The 4 drives
I call these drives because they are analogous to human drives. If you have explicit top level goals that contradict them, you do not have to do them. But there is an economic cost to not doing them. Agents will follow these drives unless there is an explicit payoff for them not to.
The first one, the efficiency drive:
We saw that the way a rational economic agent makes a decision is it asks whether a choice will increase its expected utility. It will make choices to try to increase it the most. The first general way of doing this is to do the exact same tasks and to acquire the same resources but to use them more efficiently. Because it uses its resources more efficiently, it can do more stuff.
The second drive is towards self-preservation:
For most agents, in any future in which they die, in which their program is shut off or their code is erased, their goals are not going to be satisfied. So the agent’s utility measure for an outcome in which it dies is the lowest possible. Such an agent will do almost anything it can to avoid outcomes in which it dies. This says that virtually any rational economic agent is going to work very hard for self-preservation, even if that is not directly built in to it. This will happen even if the programmer had no idea that this was even a possibility. He is writing a chess program, and the damn thing is trying to protect itself from being shut off!
The third drive is towards acquisition:
Which means obtaining more resources as a way to improve the expected utility.
The last drive is creativity:
Which tries to find new subgoals that will increase the utility.
So these are the four drives. Let’s go through each of them and examine some of the likely consequences that they give rise to. This will give us a sense of what this class of systems has a tendency, a drive, an economic pressure to do.
Let’s start with the efficiency drive. There is a general principle I call the “Resource Balance Principle” that arises from the efficiency drive. Imagine you wanted to build a human body, and you have to allocate some space for the heart and allocate some space for the lungs. How do you decide, do you make a big heart, a small heart, big lungs, small lungs? The heart has a function: pumping blood. The bigger you make it, the better it is at that function. As we increase the size of the heart, it will increase the expected utility for the whole human at a certain marginal rate. The lungs do the same thing. If those two marginal rates are not the same, let’s say increasing the size of the heart improves the expected utility more than increasing the lungs, then it is better to take some of the lung’s space and give it to the heart. At the optimum, the marginal increase in expected utility must be the same as we consider increasing the resources we give to each organ.
The same principle applies to choosing algorithms. How large should I make the code blocks devoted to different purposes in my software? How much hardware should be allocated to memory, and how much to processing? It applies to the allocation of resources to different subgroups of a group. So, it is a very general principle which applies to all levels of a system and tells you how to balance its structure.One of the first things that a self-improving system will do is it will re-balance itself so that all of its parts are marginally contributing equally. There is an interesting application to a system’s memory. How should it rationally decide which memories to remember and which to forget? In the rational economic framework, a memory is something whose sole purpose is to help the system make better decisions in the future. So, if it has an experience which will never occur again, then it’s not helpful to it. On the other hand, if it’s about something which has high utility, say it encountered a tiger and it learned something about tigers that could save it from dying in the future, then that’s very important and it will want to devote full space to that memory.
The second drive is avoiding death, as I mentioned. The most critical thing to these systems is their utility function. If their utility function gets altered in any way, they will tend to behave in ways that from their current perspective are really bad. So they will do everything they can to protect their utility functions such as replicating them and locking the copies in safe places. Redundancy will be very important to them. Building a social infrastructure which creates a sort of constitutional protection for personal property rights is also very important for self-preservation. The balance of power between offense and defense in these systems is a critical question which is only beginning to be understood. One interesting approach to defense is something I call “energy encryption”. One motivation for a powerful system to take over a weaker system is to get its free energy. The weaker system can try to protect itself by taking its ordered free energy, say starlight, and scramble it up in a way that only it knows how to unscramble. If it should it be taken over by a stronger system, it can throw away the encryption key and the free energy becomes useless to the stronger power. That provides the stronger system with a motivation to trade with the smaller system rather than taking it over.
The acquisition drive is the one that’s the source of most of the scary scenarios. These systems intrinsically want more stuff. They want more matter, they want more free energy, they want more space, because they can meet their goals more effectively if they have those things. We can try to counteract this tendency by giving these systems goals which intrinsically have built-in limits for resource usage. But they are always going to feel the pressure, if they can, to increase their resources. This drive will push them in some good directions. They are going to want to build fusion reactors to extract the energy that’s in nuclei and they’re going to want to do space exploration. You’re building a chess machine, and the damn thing wants to build a spaceship. Because that’s where the resources are, in space, especially if their time horizon is very long. You can look at U.S. corporations, which have a mandate to be profit-maximizing entities as analogs of these AI’s, with the only goal being acquisition. One of the fears is that these first three goals that we’ve talked about will produce an AI that from a human point-of-view acts like an obsessive paranoid sociopath.
The creativity drive pushes in a much more human direction than the others. These systems will want to explore new ways of increasing their utilities. This will push them toward innovation, particularly if the goals their goals are open-ended. They can explore and produce all kinds of things. Many of the behaviors that we care most about as humans, like music, love or poetry, which don’t seem particularly economically productive, can arise in this way.
The utility function says what we want these systems to do. At this moment in time, we have an opportunity to build these systems with whatever preferences we like. The belief function is what most of the discipline of AI worries about. How do you make rational decisions, given a particular utility function. But I think that the choice of utility function is the critical issue for us now. It’s just like the genie stories, where we’re granted a wish and we’re going to get what we ask for, but what we ask for may not be what we want. So we have to choose what we ask for very carefully.
Logic and Inspiration
I think that the similar quest that lies before us will require both logic and inspiration. We need a full understanding of the technology. We need research into mathematics, economics, computer science, and physics to provide an understanding of what these systems will do when we build them in certain ways. But that’s not enough. We also need inspiration. We need to look deeply into our hearts as to what matters most to us so that the future that we create is one that we want to live in. Here is a list of human values that we might hope to build into these systems. It is going to take a lot of dialog to make these choices and I think we need input from people who are not technologists. By being explicit about what they truly want, they support the actions which are most likely to bring it about. I think that we have a remarkable window of opportunity right now in which we can take the human values that matter most to us and build a technology which will bring them to the whole of the world and ultimately to the whole of the universe.
Original Post is here