1,490 research outputs found
To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy,
non-canonical data? To the best of our knowledge, little is known on the actual
impact of normalization in a real-world scenario, where gold error detection is
not available. We investigate the effect of automatic normalization on POS
tagging of tweets. We also compare normalization to strategies that leverage
large amounts of unlabeled data kept in its raw form. Our results show that
normalization helps, but does not add consistently beyond just word embedding
layer initialization. The latter approach yields a tagging model that is
competitive with a Twitter state-of-the-art tagger.Comment: In WNUT 201
MoNoise: Modeling Noise Using a Modular Normalization System
We propose MoNoise: a normalization model focused on generalizability and
efficiency, it aims at being easily reusable and adaptable. Normalization is
the task of translating texts from a non- canonical domain to a more canonical
domain, in our case: from social media data to standard language. Our proposed
model is based on a modular candidate generation in which each module is
responsible for a different type of normalization action. The most important
generation modules are a spelling correction system and a word embeddings
module. Depending on the definition of the normalization task, a static lookup
list can be crucial for performance. We train a random forest classifier to
rank the candidates, which generalizes well to all different types of
normaliza- tion actions. Most features for the ranking originate from the
generation modules; besides these features, N-gram features prove to be an
important source of information. We show that MoNoise beats the
state-of-the-art on different normalization benchmarks for English and Dutch,
which all define the task of normalization slightly different.Comment: Source code: https://bitbucket.org/robvanderg/monois
Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data
In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres
Named Entity Recognition in Twitter using Images and Text
Named Entity Recognition (NER) is an important subtask of information
extraction that seeks to locate and recognise named entities. Despite recent
achievements, we still face limitations with correctly detecting and
classifying entities, prominently in short and noisy text, such as Twitter. An
important negative aspect in most of NER approaches is the high dependency on
hand-crafted features and domain-specific knowledge, necessary to achieve
state-of-the-art results. Thus, devising models to deal with such
linguistically complex contexts is still challenging. In this paper, we propose
a novel multi-level architecture that does not rely on any specific linguistic
resource or encoded rule. Unlike traditional approaches, we use features
extracted from images and text to classify named entities. Experimental tests
against state-of-the-art NER for Twitter on the Ritter dataset present
competitive results (0.59 F-measure), indicating that this approach may lead
towards better NER models.Comment: The 3rd International Workshop on Natural Language Processing for
Informal Text (NLPIT 2017), 8 page
An In-depth Analysis of the Effect of Lexical Normalization on the Dependency Parsing of Social Media
Existing natural language processing systems have often been designed with standard texts in mind. However, when these tools are used on the substantially different texts from social media, their performance drops dramatically. One solution is to translate social media data to standard language before processing, this is also called normalization. It is well-known that this improves performance for many natural language processing tasks on social media data. However, little is known about which types of normalization replacements have the most effect. Furthermore, it is unknown what the weaknesses of existing lexical normalization systems are in an extrinsic setting. In this paper, we analyze the effect of manual as well as automatic lexical normalization for dependency parsing. After our analysis, we conclude that for most categories, automatic normalization scores close to manually annotated normalization and that small annotation differences are important to take into consideration when exploiting normalization in a pipeline setup
Concept Extraction Challenge: University of Twente at #MSM2013
Twitter messages are a potentially rich source of continuously and instantly updated information. Shortness and informality of such messages are challenges for Natural Language Processing tasks. In this paper we present a hybrid approach for Named Entity Extraction (NEE) and Classification (NEC) for tweets. The system uses the power of the Conditional Random Fields (CRF) and the Support Vector Machines (SVM) in a hybrid way to achieve better results. For named entity type classification we used AIDA \cite{YosefHBSW11} disambiguation system to disambiguate the extracted named entities and hence find their type
Semantic Sentiment Analysis of Twitter Data
Internet and the proliferation of smart mobile devices have changed the way
information is created, shared, and spreads, e.g., microblogs such as Twitter,
weblogs such as LiveJournal, social networks such as Facebook, and instant
messengers such as Skype and WhatsApp are now commonly used to share thoughts
and opinions about anything in the surrounding world. This has resulted in the
proliferation of social media content, thus creating new opportunities to study
public opinion at a scale that was never possible before. Naturally, this
abundance of data has quickly attracted business and research interest from
various fields including marketing, political science, and social studies,
among many others, which are interested in questions like these: Do people like
the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about
the Brexit? Answering these questions requires studying the sentiment of
opinions people express in social media, which has given rise to the fast
growth of the field of sentiment analysis in social media, with Twitter being
especially popular for research due to its scale, representativeness, variety
of topics discussed, as well as ease of public access to its messages. Here we
present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the
Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition.
201
- …