The incredible amount of hype that Artificial Intelligence is currently receiving is mystifying to those of us who have spent our lives as AI researchers. "Novel written by AI wins contest in Japan" is the headline you see until later you find out it’s not quite true. Or "AI wins at Go" until later you find out that this was just a computationally intensive neural net which is quite different than true artificial intelligence. Or "Watson wins Jeopardy" until later you find out that Watson can barely understand any English sentences.
The main issue in AI has always been the same: There are those who think that any program that beats a chess or Go master, no matter how, is "intelligent," and there are others who think that being able to do massive computation does not have very much to do with intelligence. The latter group, to which we belong, would like to build systems that really do display some intelligence. What would that look like? For one thing intelligent entities can explain what they did and why they did it. For another, intelligent entities can communicate through natural language and actually can at least act like they understand what they are being told. Intelligent four year olds don't understand everything you say to them, but they do have goals of their own, plans to achieve those goals, ways of explaining their actions, and ways of understanding what you might want from them. Let’s take this as our basis for defining what a real AI system should be.
Now let’s ask the question: What can we build now that would be both intelligent and useful?
Whatever we build must have three distinct capabilities that correspond to three integral parts of human intelligence. These capabilities are:
Let’s talk about each in a bit more detail.
AI researchers have worked on this problem for many years – because it is hard. People who have taken the problem seriously have come to the conclusion that natural language comprehension by an AI system is possible if the domain is highly constrained. Constrained domains (e.g., talking only about moving blocks or understanding texts about terrorism) enabled good progress more than 30 years ago (when AI was an honest field). On the other hand, trying to build an AI that you can say anything to leads to systems like Siri, Cortana, Alexa, and Watson that really have no idea what is said to them but can make statistically-derived guesses that sometimes result in a useful response.
Constraining the domain is essential because the biggest problem in natural language comprehension is knowing the context an utterance is referring to. An elevator door opens, and someone inside says to someone standing outside "Down?" What did they just ask? The answer is easy but only if you already know the elevator script. "When you visit Mary in the hospital, take her flowers" is not a command to steal flowers, but you have to know the hospital visit script to understand that sentence correctly.
We always leave out the "obvious" stuff when we speak, but what's obvious to us is completely unknown to the computer. By constraining the domain of conversation to specific contexts in advance, the computer has a better chance of correctly interpreting natural language inputs.
If the world we are talking about is highly constrained, it is possible for a system to learn and employ a model as a basis for understanding the domain’s nuances and people’s goal-directed behaviors within it. For example if we constrain the world to “air travel,” it would be possible for a system to know a great deal about that domain and to use that knowledge to help a person to optimally plan a trip in line with their needs and preferences. The user is also a key part of the system’s world. To optimally satisfy the user’s needs, the system must also have a model of the user – his or her level of expertise in the domain, previous experiences, preferences, and so on.
When we have a problem, we want help, preferably from an expert. Just like we sometimes needed our parents to give us advice, we need someone who knows about Iceland to tell us about where to stay and what to see if we are planning a trip. If we want to learn to program we need an expert to tell us when we make a mistake and to offer helpful hints. If we are trying to choose a school to attend, we would like to hear from people who have gone there and liked it, and others who have hated it. We would like to hear from both experts and people like us who can help us think about making better choices. Therefore, an intelligent system needs a memory of advice that is indexed in a way that enables the system to retrieve a piece of advice when it is likely to be relevant to what the user is doing.
With these capabilities in mind, we propose to build Interactive Intelligent Advisor that could work in one domain and one domain only (although we could use the same underlying architecture to build Advisors in many domains).
Here are some examples:
All of these systems (and many more like them) would:
All of these Advisors will be based on a general AI architecture comprising mechanisms for comprehending natural language, maintaining a world model, and accessing expert advice and experience. This architecture can be instantiated to create specific Advisors for a range of domains. Through use, each advisor will learn more about its users and its domain, improving the quality of help and advice it provides.