6,342 research outputs found
A Deep Network Model for Paraphrase Detection in Short Text Messages
This paper is concerned with paraphrase detection. The ability to detect
similar sentences written in natural language is crucial for several
applications, such as text mining, text summarization, plagiarism detection,
authorship authentication and question answering. Given two sentences, the
objective is to detect whether they are semantically identical. An important
insight from this work is that existing paraphrase systems perform well when
applied on clean texts, but they do not necessarily deliver good performance
against noisy texts. Challenges with paraphrase detection on user generated
short texts, such as Twitter, include language irregularity and noise. To cope
with these challenges, we propose a novel deep neural network-based approach
that relies on coarse-grained sentence modeling using a convolutional neural
network and a long short-term memory model, combined with a specific
fine-grained word-level similarity matching model. Our experimental results
show that the proposed approach outperforms existing state-of-the-art
approaches on user-generated noisy social media data, such as Twitter texts,
and achieves highly competitive performance on a cleaner corpus
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Sentiment analysis has been emerging recently as one of the major natural
language processing (NLP) tasks in many applications. Especially, as social
media channels (e.g. social networks or forums) have become significant sources
for brands to observe user opinions about their products, this task is thus
increasingly crucial. However, when applied with real data obtained from social
media, we notice that there is a high volume of short and informal messages
posted by users on those channels. This kind of data makes the existing works
suffer from many difficulties to handle, especially ones using deep learning
approaches. In this paper, we propose an approach to handle this problem. This
work is extended from our previous work, in which we proposed to combine the
typical deep learning technique of Convolutional Neural Networks with domain
knowledge. The combination is used for acquiring additional training data
augmentation and a more reasonable loss function. In this work, we further
improve our architecture by various substantial enhancements, including
negation-based data augmentation, transfer learning for word embeddings, the
combination of word-level embeddings and character-level embeddings, and using
multitask learning technique for attaching domain knowledge rules in the
learning process. Those enhancements, specifically aiming to handle short and
informal messages, help us to enjoy significant improvement in performance once
experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in
IJCVR on September 201
A Machine Learning Approach For Opinion Holder Extraction In Arabic Language
Opinion mining aims at extracting useful subjective information from reliable
amounts of text. Opinion mining holder recognition is a task that has not been
considered yet in Arabic Language. This task essentially requires deep
understanding of clauses structures. Unfortunately, the lack of a robust,
publicly available, Arabic parser further complicates the research. This paper
presents a leading research for the opinion holder extraction in Arabic news
independent from any lexical parsers. We investigate constructing a
comprehensive feature set to compensate the lack of parsing structural
outcomes. The proposed feature set is tuned from English previous works coupled
with our proposed semantic field and named entities features. Our feature
analysis is based on Conditional Random Fields (CRF) and semi-supervised
pattern recognition techniques. Different research models are evaluated via
cross-validation experiments achieving 54.03 F-measure. We publicly release our
own research outcome corpus and lexicon for opinion mining community to
encourage further research
Basic tasks of sentiment analysis
Subjectivity detection is the task of identifying objective and subjective
sentences. Objective sentences are those which do not exhibit any sentiment.
So, it is desired for a sentiment analysis engine to find and separate the
objective sentences for further analysis, e.g., polarity detection. In
subjective sentences, opinions can often be expressed on one or multiple
topics. Aspect extraction is a subtask of sentiment analysis that consists in
identifying opinion targets in opinionated text, i.e., in detecting the
specific aspects of a product or service the opinion holder is either praising
or complaining about
- …