Active Learning. Burr Settles

Active Learning

links to video lectures, software implementations, and other resources for active learning can be found online at http://active-learning.net.

Burr Settles

May 2012

Acknowledgments

This book has a roundabout history, and there are a lot of people to thank along the way. It grew out of an informal literature survey I wrote on active learning (Settles, 2009) which in turn began as a chapter in my PhD thesis. During that phase of my career I am indebted to my committee, Mark Craven, Jude Shavlik, Xiaojin “Jerry” Zhu, David Page, and Lewis Friedland, who encouraged me to expand on my review and make it publicly available. There has been a lot of work in active learning over the past two decades, from simple heuristics to complex and crazy ideas coming from a variety of subfields in AI and statistics. The survey was my attempt to curate, organize, and make sense of it for myself; to help me understand how my work fit into the overall landscape.

Thanks to John Langford, who mentioned the survey on his popular machine learning blog¹. As a result, many other people found it and found it helpful as well. Several people encouraged me to write this book. To that end, Jude Shavlik and Edith Law (independently) introduced me to Michael Morgan. Thanks to Michael, William Cohen, Tom Dietterich, and others at Morgan & Claypool for doing their best to keep things on schedule, and for patiently encouraging me through the process of expanding what was a literature review into more of a tutorial or textbook. Thanks also to Tom Mitchell for his support and helpful advice on how to organize and write a book.

Special thanks to Steve Hanneke and Sanjoy Dasgupta for the detailed feedback on both the original survey and the expanded manuscript. Chapter 6 is particularly indebted to their comments as well as their research. I also found Dasgupta’s review of active learning from a theoretical perspective (Dasgupta, 2010) quite helpful. The insights and organization of ideas presented here are not wholly my own, but draw on conversations I have had with numerous people. In addition to the names mentioned above, I would like to thank Josh Attenberg, Jason Baldridge, Carla Brodley, Aron Culotta, Pinar Donmez, Miroslav Dudík, Gregory Druck, Jacob Eisenstein, Russ Greiner, Carlos Guestrin, Robbie Haertel, Ashish Kapoor, Percy Liang, Andrew McCallum, Prem Melville, Clare Monteleoni, Ray Mooney, Foster Provost, Soumya Ray, Eric Ringger, Teddy Seidenfeld, Kevin Small, Partha Talukdar, Katrin Tomanek, Byron Wallace, and other colleagues for turning me on to papers, ideas, and perspectives that I might have otherwise overlooked. I am sure there are other names I have forgotten to list here, but know that I appreciate all the ongoing discussions on active learning (and machine learning in general), both online and in person. Thanks also to Daniel Hsu, Eric Baum, Nicholas Roy, and their coauthors (some listed above) for kindly allowing me to reuse figures from their publications.

I would like to thank my parents for getting me started, and my wife Natalie for keeping me going. She remained amazingly supportive during my long hours of writing (and re-writing). Whenever I was stumped or frustrated, she was quick to offer a fresh perspective: “Look at you, you’re writing a book!” Lo and behold, I have written a book. I hope you enjoy the book.

While writing this book, I was supported by the Defense Advanced Research Projects Agency (under contracts FA8750-08-1-0009 and AF8750-09-C-0179), the National Science Foundation (under grant IIS-0968487), and Google. The text also includes material written while I was supported by a grant from National Human Genome Research Institute (HGRI). Any opinions, findings and conclusions, or recommendations expressed in this material are mine and do not necessarily reflect those of the sponsors.

Burr Settles May

2012

¹ http://hunch.net

CHAPTER 1

Automating Inquiry

“Computers are useless. They can only give you answers.”

— Pablo Picasso (attributed)

1.1 A THOUGHT EXPERIMENT

Imagine that you are the leader of a colonial expedition from Earth to an extrasolar planet. Luckily, this planet is habitable and has a fair amount of vegetation suitable for feeding your group. Importantly, the most abundant source of food comes from a plant whose fruits are sometimes smooth and round, but sometimes bumpy and irregular. See Figure 1.1 for some examples.

Figure 1.1: Several alien fruits, which vary in shape from round to irregular.

Almost immediately, physicians on the expedition notice that colonists who eat the smooth fruits find them delicious, while those who eat the irregular fruits tend to fall ill. Naturally, you want to be able to distinguish between which fruits are safe to eat and which ones are harmful. If you can accurately “classify” these fruits, it will be much easier to keep the colony healthy and well-fed while finding important alternative uses for the noxious fruits (such as processing them into dyes and fuels). The physicians assure you that the shape of a fruit is the only feature that seems related to its safety. The problem, though, is that a wide variety of fruit shapes from these plants exist: almost a continuous range from round to irregular. Since the colony has essential uses for both safe and noxious fruits, you want to be able to classify them as accurately as possible.

Supposing we have a way to quantify the “irregularity” of a fruit’s shape, we can formalize this classification task using a simple function. Let X be a range of data instances, e.g., fruits that have been harvested, where each x ∊ R represents the irregularity measure of a particular fruit. Each data instance has a hidden label y, which in this example comes from the finite set Y = {safe, noxious}. Our classifier, then, is a function mapping h : X → Y, parameterized by a threshold θ:

The remaining challenge is to figure out what this threshold really is.

One way to pick the threshold is by supervised learning techniques, where we estimate θ from a set of labeled data instances

. Here, 〈x, y〉 denotes a fruit x which has been tested and assigned a label y, e.g., by having someone eat it and then observing whether or not he or she becomes ill. A simple learning algorithm in this case isered set of fruits, we only need to te to harvest a large number of fruits, arrange them in order from round to irregular in shape, and have them all tested. The point at which the fruits switch from being safe to noxious is the best threshold θ* (assuming for the moment that everyone responds to a fruit in the same way). Figure 1.2 illustrates this method for a handful of labeled instances.

Figure 1.2: Supervised learning for the alien fruits example. Given a set of 〈x, y〉 instance-label pairs, we want to choose the threshold θ* that classifies them most accurately.

According to the PAC learning¹ framework (Valiant, 1984), if the underlying data distribution can be perfectly classified by some hypothesis h in the hypothesis class H (in this case, the set of all values for θ), then it is enough to test O(1/∊) randomly selected instances, where ∊ is the maximum desired error rate. This means that if we want to achieve 99% accuracy in our “fruit safety” classifier, we might need to harvest and test on the order of hundreds of fruits using this approach. That is a lot of fruits to test, though, and many of your family and friends might get sick in the process!

Can we do better? While we certainly want a threshold that classifies these fruits as accurately as possible, we also want to discover this

Скачать книгу