Deep Learning Approaches to Text Production. Shashi Narayan
biography from Wikipedia infobox
6.7Example of word-property alignments for the Wikipedia abstract and facts
6.8The exposure bias in cross-entropy trained models
6.9Text production as a reinforcement learning problem
6.10The curriculum learning in application
6.11Deep reinforcement learning for sentence simplification
6.12Extractive summarisation with reinforcement learning
6.13Inconsistent responses generated by a sequence-to-sequence model
6.14Single-speaker model for response generation
6.15Examples of speaker consistency and inconsistency in the speaker model
6.16Responses to “Do you love me?” from the speaker-addressee model
7.1Infobox/text example from the WikiBio data set
7.2Example data-document pair from the extended WikiBio data set
7.4Example data-document pair from the RotoWire data set
7.5Example input and output from the SemEval AMR-to-Text Generation Task
7.6Example shallow input from the SR’18 data set
7.7Example instance from the E2E data set
7.8Example summary from the NewsRoom data set
7.9An abridged example from the XSum data set
7.10PWKP complex and simplified example pairs
7.11Newsela example simplifications
7.12GigaWord sentence compression or summarisation example
7.13Sentence compression example
7.14Example of abstractive compression from Toutanova et al. [2016]
7.15Example of abstractive compression from Cohn and Lapata [2008]
7.16Example paraphrase pairs from ParaNMT-50
7.17Examples from the Twitter News URL Corpus
7.18Paraphrase examples from PIT-
7.19Paraphrase examples from the MSR corpus
List of Tables
6.1An abridged CNN article and its story highlights (Continues.)
6.1(Continued.) An abridged CNN article and its story highlights
7.1Summary of publicly available large corpora for summarisation
7.2Data statistics assessing extractiveness of summarisation data sets
7.3Summary of publicly available large sentential paraphrase corpora
Preface
Neural methods have triggered a paradigm shift in text production by supporting two key features. First, recurrent neural networks allow for the learning of powerful language models which can be conditioned on arbitrarily long input and are not limited by the Markov assumption. In practice, this proved to allow for the generation of highly fluent, natural sounding text. Second, the encoder-decoder architecture provides a natural and unifying framework for all generation tasks independent of the input type (data, text, or meaning representation). As shown by the dramatic increase in the number of conference and journal submissions on that topic, these two features have led to a veritable explosion of the field.
In this book, we introduce the basics of early neural text-production models and contrast them with pre-neural approaches. We begin by briefly reviewing the main characteristics of pre-neural text-production models, emphasising the stark contrast with early neural approaches which mostly modeled text-production tasks independent of the input type and of the communicative goal. We then introduce the encoder-decoder framework where, first, a continuous representation is learned for the input and, second, an output text is incrementally generated conditioned on the input representation and on the representation of the previously generated words. We discuss the attention, copy, and coverage mechanisms that were introduced to improve the quality of generated texts. We show how text-production can benefit from better input representation when the input is a long document or a graph. Finally, we motivate the need for neural models that are sensitive to the current communication goal. We describe different variants of neural models with task-specific objectives and architectures which directly optimise task-specific communication goals. We discuss generation from text, data, and meaning representations, bringing various text-production scenarios under one roof to study them all together. Throughout the book we provide an extensive list of references to support further reading.
As we were writing this book, the field had already moved on to new architectures and models (Transformer, pre-training, and fine-tuning have now become the dominant approach), and we discussed these briefly in the conclusion. We hope that this book will provide a useful introduction to the workings of neural text production and that it will help newcomers from both academia and industry quickly get acquainted with that rapidly expanding field.
We would like to thank several people who provided data or images, and authorization to use them in this book. In particular, we would like to thank Abigail See for the pointer-generator model, Asli Celikyilmaz for the diagrams of deep communicating paragraph encoders, Bayu Distiawan Trisedya for graph-triple encoders, Bernd Bohnet for an example from the 2018 surface realisation challenge, Diego Marcheggiani for graph convolutional network (GCN) diagrams, Jiwei Tan for hierarchical document encoders and graph-based attention mechanism using them, Jonathan May for an abstract meaning representation (AMR) graph, Laura Perez-Beltrachini for an extended RotoWire example, Linfeng Song for graph-state long short-term memories (LSTMs)