Robot Learning from Human Teachers. Sonia Chernova
CHAPTER 1
Introduction
Machine Learning techniques have had great success in many robotics applications. Today’s robots are able to study the depths of the Earth’s oceans, carry equipment while following soldiers through mountainous terrain, and explore the peaks and valleys of Mars. Robots build (and will soon drive) our cars, gather the items for shopping orders in busy warehouses, and keep hospital shelves stocked with supplies. Robots can vacuum your floor, mow your lawn, and clean your pool. Yet robots, or more specifically the algorithms that control them, are still unable to handle many of the complexities of the real world. Today, and for the foreseeable future, it is not possible to go to a store and bring home a robot that will clean your house, cook your breakfast, and do your laundry. These everyday tasks, while seemingly simple, contain many variations and complexities that pose insurmountable challenges for today’s machine learning algorithms.
What separates impossible domains from challenging-yet-achievable ones for today’s autonomous technologies is the degree of structure and consistency within the problem domain. Vacuuming robots require a flat floor to operate and not much else. Since the vast majority of house floors meet this requirement, the deployment of robotic vacuum cleaners has been highly successful. The new owner of such a robot simply needs to press the Clean button and in most cases the robot performs its function as expected (until it gets stuck on that sock you left on the floor, but that’s why it’s not a room cleaning robot).
Now consider the scenario of bringing a new house cleaning robot home, putting it in the kitchen, and pressing a similar Clean button for the first time. Some of the tasks such a robot might be expected to do is to load the dishwasher with all the dirty dishes, toss waste into the trash, and return clean items to their correct locations. The level of complexity of these tasks is higher not only in terms of perception and manipulation capabilities, but also in the required degree of adaptation to the new environment. Each house is unique, with custom layouts, preferred object locations, and rules (e.g., “never put the knife with the red handle in the dishwasher or it will rust”). Just as a human house guest arriving to a home for the first time, the robot needs to adapt to the customs of a particular household. This means that a single Clean button is no longer sufficient for such a system, instead the platform, and its underlying algorithms, must support the ability for the user to customize the robot’s policy of behavior.
Robot Learning from Demonstration (LfD) explores techniques for learning a task policy from examples provided by a human teacher. The field of LfD has grown into an extensive body of literature over the past 30+ years, with a wide variety of approaches for encoding human demonstrations and modeling skills and tasks. In this book we provide an introduction to the field with a focus on the unique technical challenges associated with designing robots that learn from human instruction. The book is written for AI researchers who are interested in developing Learning from Demonstration methods, or who would like to learn more about different modes of interaction or evaluation methods for such systems.
1.1 MACHINE LEARNING FOR END-USERS
The above household robot scenario describes one possible application area for LfD techniques. More generally, this scenario is motivated by the challenge of enabling a novice user, a non-programmer, to customize existing robot behaviors or develop new ones through intuitive teaching methods. The motivation for tackling this challenge centers on the belief that it is impossible to pre-program all the necessary knowledge into a robot operating in a diverse, dynamic and unstructured environment. Instead, end-users must have the ability to customize the functionality of these robotic systems. Since it is impractical to assume that every end-user will have programming experience, natural and intuitive methods of interactions must be developed to enable non-roboticists to effectively use such systems.
LfD techniques build upon many standard Machine Learning methods that have had great success in a wide range of applications. However, learning from a human teacher poses additional challenges, such as limited human patience and inconsistent user input. Traditional Machine Learning techniques have not been designed for learning from ordinary human teachers in a real-time interaction, resulting in a need for new, or modified, methods. Figuring out at which level of the algorithm to involve the user is also a challenge, with different approaches being applicable to different aspects of the learning problem. Some of the design choices that go into structuring a learning problem include the following.
• Data collection. In any Supervised Learning process, collecting the training and testing data sets is critical to a successful learning process. The data must be representative of the states and actions that the robot will encounter in the future. The size and diversity of the training and testing data set will determine the speed and accuracy of learning and the quality of the resulting system, including its generalization characteristics. How can the teacher decide what training data to include? Can the robot make the selection or influence the decision process?
• Selecting the feature space and its structure. Deciding what input features and similarity metrics are most important for discriminating in the task and environment at hand is a critical step. The designer must be careful to include input features that are in fact discriminatory and the algorithm will learn faster if the redundant or non-discriminatory features are excluded. Who is responsible for performing feature selection for learning a new task through LfD?
• Defining a reward signal. In many learning systems, such as Reinforcement Learning (RL) [245], the reward function serves a central role in the learning process. How can the teacher effectively define a reward or objective function that accurately represents the task to be learned?
• Subtasking the problem. Learning speed can often be dramatically improved by splitting a task into several less complicated subtasks, although determining the subtask structure can be challenging in some domains. Should the teacher determine the task structure, or will it be determined automatically by the robot? Can the robot guide the teacher’s choices and provide feedback?
These are some of the design choices that developers face in implementing interactive machine learning methods. While in many cases the answers to these questions are predetermined by the target application domain, in other situations the choice is left up to the developer.
Additionally, it’s important to note that working with novice users is not the only motivation for LfD, some techniques are designed specifically with expert users in mind. Most such application areas focus on techniques for generating control strategies that would be very difficult or time consuming to program through traditional means, such as when the dynamics of the underlying system are poorly modeled or understood. In this scenario the user is often assumed to be at the very least a trained task expert, if not a roboticist. Potential application areas include a wide variety of professional fields, including manufacturing and the military.
1.2 THE LEARNING FROM DEMONSTRATION PIPELINE
Regardless of whether the target user is a novice or an expert, all Learning from Demonstration techniques share certain key properties. Figure 1.1 illustrates the LfD pipeline. This is an abstract oversimplification, but is a useful abstraction with which to frame the design process for building an LfD system. In this book, we explore the field of Learning from Demonstration from both algorithmic and Human-Robot Interaction (HRI) perspectives, by stepping through each stage of this pipeline.
Figure 1.1: A simplified illustration of the Learning from Demonstration pipeline. This also serves as a roadmap for this book, in which chapters are devoted to each stage of the pipeline.
The assumption in all LfD work is that there exists a Human Teacher who demonstrates execution of a desired behavior. In