Automatic Text Simplification. Horacio Saggion
Sentence Transformations
5.3 Optimizing Rule Application
5.4 Learning from a Semantic Representation
5.5 Conclusion
5.6 Further Reading
6 Full Text Simplification Systems
6.1 Text Simplification in PSET
6.2 Text Simplification in Simplext
6.2.1 Rule-based “Lexical” Simplification
6.2.2 Computational Grammars for Simplification
6.2.3 Evaluating Simplext
6.3 Text Simplification in PorSimples
6.3.1 An Authoring Tool with Simplification Capabilities
6.4 Conclusion
6.5 Further Reading
7 Applications of Automatic Text Simplification
7.1 Simplification for Specific Target Populations
7.1.1 Automatic Text Simplification for Reading Assistance
7.1.2 Simplification for Dyslexic Readers
7.1.3 Simplification-related Techniques for People with Autism Spectrum Disorder
7.1.4 Natural Language Generation for Poor Readers
7.2 Text Simplification as NLP Facilitator
7.2.1 Simplification for Parsing
7.2.2 Simplification for Information Extraction
7.2.3 Simplification in and for Text Summarization
7.2.4 Simplifying Medical Literature
7.2.5 Retrieving Facts from Simplified Sentences
7.2.6 Simplifying Patent Documents
7.3 Conclusion
7.4 Further Reading
8 Text Simplification Resources and Evaluation
8.1 Lexical Resources for Simplification Applications
8.2 Lexical Simplification Resources
8.3 Corpora
8.4 Non-English Text Simplification Datasets
8.5 Evaluation
8.6 Toward Automatically Measuring the Quality of Simplified Output
8.7 Conclusion
8.8 Further Reading
Acknowledgments
I am indebted to my fellow colleagues Stefan, Sanja, Biljana, Susana, Luz, Daniel, Simon, and Montserrat for sharing their knowledge and expertise with me.
Horacio Saggion
January 2017
CHAPTER 1
Introduction
Automatic text simplification is a research field in computational linguistics that studies methods and techniques to simplify textual content. Text simplification methods should facilitate or at least speed up the adaptation of available and future textual material, making accessible information for all a reality. Usually (but not necessarily), adapted texts would have information loss and a simplistic style, which is not necessarily a bad thing if the message of the text, which was in the beginning complicated, can in the end be understood by the target reader. Text simplification has also been suggested as a potential pre-processing step for making texts easier to handle by generic text processors such as parsers, or to be used in specific information access tasks such as information extraction. Simplifying for people is more challenging than the second use of simplification because the output of the automatic system could be perceived as inadequate in the presence of the least error.
The interest in automatic text simplification has grown in recent years and in spite of the many approaches and techniques proposed, automatic text simplification is, as of today, far from perfect. The growing interest in text simplification is evidenced by the number of languages which are targeted by researchers worldwide. Simplification systems and simplification studies exist at least for English [Carroll et al., 1998, Chandrasekar et al., 1996, Siddharthan, 2002], Brazilian Portuguese [Aluísio and Gasperin, 2010], Japanese [Inui et al., 2003], French [Seretan, 2012], Italian [Barlacchi and Tonelli, 2013, Dell’Orletta et al., 2011], Basque [Aranzabe et al., 2012], and Spanish [Saggion et al.].
1.1 TEXT SIMPLIFICATION TASKS
Although there are many text characteristics which can be modified in order to make a text more readable or understandable, including the way in which the text is presented, automatic text simplification has usually concentrated on two different tasks—lexical simplification and syntactic simplification—each addressing different sub-problems.
Lexical simplification will attempt to either modify the vocabulary of the text by choosing words which are thought to be more appropriate for the reader (i.e., transforming the sentence “The book was magnificent” into “The book was excellent”) or to include appropriate definitions (e.g., transforming the sentence “The boy had tuberculosis.” into “The boy had tuberculosis, a disease of the lungs.”). Changing words in context is not an easy task because it is almost certain that the original meaning will be confused.
Syntactic simplification will try to identify syntactic phenomena in sentences which may hinder readability and comprehension in an effort to possibly transform the sentence into more readable or understandable equivalents. For example, relative or subordinate clauses or passive constructions, which may be very difficult to read by certain readers, could be transformed into simpler sentences or into active form. For example, the sentence “The festival was held in New Orleans, which was recovering from Hurricane Katrina” could be transformed without altering the original too much into “The festival was held in New Orleans. New Orleans was recovering from Hurricane Katrina.”
As we shall later see, automatic text simplification is related to other natural language processing tasks such as text summarization and machine translation. The objective of text summarization is to reduce a text to its essential content which might be useful in simplification on occasions where the text to simplify has too many unnecessary details. The objective of machine translation is to translate a text into a semantic equivalent in another language. A number of recent automatic text simplification approaches cast text simplification as statistical machine translation; however, this approach to simplification is currently limited by the scarcity of parallel simplification data.
There is an important point to mention here: although lexical and syntactic simplification usually have been addressed separately, they are naturally related. If during syntactic simplification a particular syntactic