Natural Language Processing for Social Media. Diana Inkpen
http://www.cs.technion.ac.il/~gabr/resources/data/ne_datasets.html
8LDA is a method that assumes a number of hidden topics for a corpus, and discovers a cluster of words for each topic, with associated probabilities. Then, for each document, LDA can estimate a probability distribution over the topics. The topics—word clusters—do not have names, but names can be given, for example, by choosing the word with the highest probability in each cluster.
9
http://nlp.stanford.edu/downloads/
10
http://opennlp.apache.org/
11
http://nlp.lsi.upc.edu/freeling/
12
http://nltk.org/
13
http://gate.ac.uk/
14
http://php-nlp-tools.com/
15
https://gate.ac.uk/wiki/twitie.html
16
http://www.ark.cs.cmu.edu/TweetNLP/
17
https://github.com/aritter/twitter_nlp
18
https://github.com/saffsd/langid.py
19
http://blog.mikemccandless.com/2011/10/accuracy-and-performance-of-googles.html
20
http://www.google.com/chrome
21
https://code.google.com/p/language-detection/
22
https://github.com/martin-majlis/YALI
23
http://odur.let.rug.nl/~vannoord/TextCat/
24
https://github.com/shuyo/ldig
25
http://en.wikipedia.org/wiki/Trie
26
http://www.win.tue.nl/~mpechen/projects/smm/
27
http://people.eng.unimelb.edu.au/tbaldwin/data/lasm2014-twituser-v1.tgz
28
http://en.wikipedia.org/wiki/Geographic_distribution_of_Arabic#Population
29We will describe the concept of Naïve Bayes classifiers in detail in this section because they tend to work well on textual data and they are fast in terms of training and testing time.
CHAPTER 3
Semantic Analysis of Social Media Texts
3.1 INTRODUCTION
In this chapter, we discuss current NLP methods for social media applications that aim at extracting useful information from social media data. Examples of such applications are geolocation detection, opinion mining, emotion analysis, event and topic detection, summarization, machine translation, etc. We survey the current techniques, and we briefly define the evaluation measures used for each application, followed by examples of results.
Section 3.2 presents geo-location detection techniques. Section 3.3 discusses entity linking and disambiguation, a task that links detected entities to a database of known entities. Section 3.4 discusses the methods for opinion mining and sentiment analysis, including emotion and mood analysis. Section 3.5 presents event and topic detection. Section 3.6 highlights the various issues in automatic summarization in social media. Section 3.7 presents the adaptation of statistical machine translation for social media text. Section 3.8 summarizes this chapter.
3.2 GEO-LOCATION DETECTION
One of the important topics in semantic analysis in social media is the identification of geolocation information for social content such as blog posts or tweets. By geo-location we mean a real location in the world, such as a region, or a city, or a point described by longitude and latitude. Automatic detection of event location for individuals or group of individuals with common interests is important for marketing purposes, and also for detecting potential threats to public safety.
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.