Robot Learning from Human Teachers. Sonia Chernova
2, we consider the learning process from the human’s point of view. We look at the social learning mechanisms used by humans, particularly children, in order to gain possible insights into how LfD systems might be developed and to better understand how learning robots might one day fit within the established human social norms. Then in Chapter 3, we address the Demonstrations component, reviewing common modes of human-robot interaction that are used to provide demonstrations.
The learner is provided with these demonstrations, and from them derives a policy—a mapping from perceived state to desired behavior—that is able to reproduce the demonstrated behavior. The ability to generalize across states is considered critical, since it is impractical, and often impossible, for the teacher to demonstrate the correct behavior for every possible situation that the robot might encounter. Our goal in this book is to present an overview of state of the art techniques for this policy derivation process. We do this by organizing the field into those algorithms focused on Low-level Skill Learning (Chapter 4) and those focused on High-level Task Learning (Chapter 5).
In Chapter 6 we address the ways in which this process can be made into a loop, such that an initially learned model is further refined. The ability to perform incremental learning or refinement over time, as well as the ability to generalize from a small number of demonstrations will be crucial in many domains. Factors such as the interpretability or transparency of the policy, and techniques for enabling the user to understand what knowledge the robot possesses and why it behaves in the way it does will be critical to the success of LfD methods in real-world applications.
After stepping through each aspect of the LfD pipeline, in Chapter 7 we turn the focus to evaluation. In particular, we argue for the importance of validating LfD algorithms with HRI studies. As such, this chapter contains guidelines for conducting such experiments to evaluate LfD methods with end-users. Finally, Chapter 8 is a discussion of where we see the field heading, and what we consider the most crucial future work in this exciting field.
1.3 A NOTE ON TERMINOLOGY
This book builds on an extensive collection of research literature, and one of the goals of the book is to familiarize the reader with many of the seminal works in this area. Within this research literature, LfD techniques are described by a variety of terms, such as Learning by Demonstration (LbD), Learning from Demonstration (LfD), Programming by Demonstration (PbD), Learning by Experienced Demonstrations, Assembly Plan from Observation, Learning by Showing, Learning by Watching, Learning from Observation, behavioral cloning, imitation and mimicry. While the definitions for some of these terms, such as imitation, have been loosely borrowed from other sciences, the overall use of these terms is often inconsistent or contradictory across articles. Within this book, we refer to the general category of algorithms in which a policy is derived based on demonstrated data as Learning from Demonstration, and we reference other terms as appropriate in the coming chapters.
CHAPTER 2
Human Social Learning
When a machine learner is in the presence of a human that is motivated to help, social interaction can be a key element in the success of the learning process. Although robots can also learn from observing demonstrations not directed at them, albeit less efficiently, the scenario we address here is primarily the one where a person is explicitly trying to teach the robot something in particular.
In this chapter, we review some key insights from human psychology that can influence the design of learning robots. We focus our discussion on findings in situated learning, a field of study that looks at the social world of a child and how it contributes to their development. In a situated learning interaction, a good instructor maintains a mental model of the learner’s understanding and structures the learning task appropriately with timely feedback and guidance. The learner contributes to the process by expressing their internal state via communicative acts (e.g., expressing understanding, confusion, attention, etc.). This reciprocal and tightly coupled interaction enables the learner to leverage from instruction to build the appropriate representations and associations.
The situated learning process stands in contrast to typical scenarios of machine learning which are often neither interactive nor intuitive for a non-expert human partner. Since social learning mechanisms used by humans are both proven to be effective and naturally occurring across society, enabling robots to engage in social interaction with the user can lead to more flexible, efficient, personable and teachable machines that more closely match the user’s expectations in behavior.
It is worth noting that despite its reliance on human teachers, the field of Learning from Demonstration has not focused much attention on the interactivity of the learning system. As we will see in Chapters 4 and 5, it is quite typical to first collect demonstrations in batch and then have a learning algorithm use this data to model a skill or task later. What the work highlighted in this chapter points out is the distinction between a typical batch process and the interactivity of a social learning process. We will return to this topic in Chapter 6, where we consider how to make an LfD process interactive through online learning, high level critiques of the robot’s exploration, and the incorporation of Active Learning.
Figure 2.1: In this chapter we start with a look at the Human Teacher component of the LfD pipeline. A survey of human social learning provides insight into biases and expectations that a human may bring to the LfD process.
Figure 2.2: Starting at an early ages, children use the information around them to learn from observation, experience, and instruction, striving to imitate the adults around them.
In this chapter, we highlight characteristics of human social learning in the first three sections. We look at human motivation for learning, how human teachers scaffold the learning process, and what feedback human learners provide. All of these topics have implications for the technical design of robot learners, which are the focus of the remaining chapters of this book (Figure 2.1).
2.1 LEARNING IS A PART OF ALL ACTIVITY
In most Machine Learning scenarios, learning is an explicit activity. The system is designed to learn a particular thing at a particular time. With humans, on the other hand, there is an ever-present motivation for learning, a drive to improve oneself, and an ability to seek out the expertise of others. Some inspiring characteristics of a motivated learner include: a curiosity about new environments and experiences; the ability to recognize and exploit good sources of information, and to adopt such an information source as a role model; the desire to “be more like” that role model, which underlies all activity; and a sense of one’s level of mastery with acquired skills, further driving the motivation to explore and learn about the world at opportune times.
Self-Determination Theory seeks to understand the mechanisms behind both intrinsic and extrinsic motivation in human behavior in general [224]. Here our focus is on situated learning interactions rather than self-motivated learning. We summarize two types of human motivation that lay the foundation for social learning interactions.
Motivated to Interact
A critical part of learning is gaining the ability to exploit the expertise of others [203]. Children