1,041 research outputs found
On the Impact of Entity Linking in Microblog Real-Time Filtering
Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 -
17, 201
Neural Collective Entity Linking
Entity Linking aims to link entity mentions in texts to knowledge bases, and
neural models have achieved recent success in this task. However, most existing
methods rely on local contexts to resolve entities independently, which may
usually fail due to the data sparsity of local information. To address this
issue, we propose a novel neural model for collective entity linking, named as
NCEL. NCEL applies Graph Convolutional Network to integrate both local
contextual features and global coherence information for entity linking. To
improve the computation efficiency, we approximately perform graph convolution
on a subgraph of adjacent entity mentions instead of those in the entire text.
We further introduce an attention scheme to improve the robustness of NCEL to
data noise and train the model on Wikipedia hyperlinks to avoid overfitting and
domain bias. In experiments, we evaluate NCEL on five publicly available
datasets to verify the linking performance as well as generalization ability.
We also conduct an extensive analysis of time complexity, the impact of key
modules, and qualitative results, which demonstrate the effectiveness and
efficiency of our proposed method.Comment: 12 pages, 3 figures, COLING201
Sentiment Analysis on Financial News and Microblogs
Sentiment analysis is useful for multiple tasks including customer satisfaction metrics, identifying market trends for any industry or products, analyzing reviews from social media comments. This thesis highlights the importance of sentiment analysis, provides a summary of seminal works and different approaches towards sentiment analysis. It aims to address sentiment analysis on financial news and microblogs by classifying textual data from financial news and microblogs as positive or negative. Sentiment analysis is performed by making use of paragraph vectors and logistic regression in this thesis and it aims to compare it with previously performed approaches to performing analysis and help researchers in this field. This approach achieves state of the art results for the dataset used in this research. It also presents an insightful analysis of the results of this approach
An integrated semantic-based framework for intelligent similarity measurement and clustering of microblogging posts
Twitter, the most popular microblogging platform, is gaining rapid prominence as a source of
information sharing and social awareness due to its popularity and massive user generated
content. These include applications such as tailoring advertisement campaigns, event
detection, trends analysis, and prediction of micro-populations. The aforementioned
applications are generally conducted through cluster analysis of tweets to generate a more
concise and organized representation of the massive raw tweets. However, current approaches
perform traditional cluster analysis using conventional proximity measures, such as Euclidean
distance. However, the sheer volume, noise, and dynamism of Twitter, impose challenges that
hinder the efficacy of traditional clustering algorithms in detecting meaningful clusters within
microblogging posts. The research presented in this thesis sets out to design and develop a
novel short text semantic similarity (STSS) measure, named TREASURE, which captures the
semantic and structural features of microblogging posts for intelligently predicting the
similarities. TREASURE is utilised in the development of an innovative semantic-based
cluster analysis algorithm (SBCA) that contributes in generating more accurate and
meaningful granularities within microblogging posts. The integrated semantic-based
framework incorporating TREASURE and the SBCA algorithm tackles both the problem of
microblogging cluster analysis and contributes to the success of a variety of natural language
processing (NLP) and computational intelligence research.
TREASURE utilises word embedding neural network (NN) models to capture the semantic
relationships between words based on their co-occurrences in a corpus. Moreover,
TREASURE analyses the morphological and lexical structure of tweets to predict the syntactic
similarities. An intrinsic evaluation of TREASURE was performed with reference to a reliable
similarity benchmark generated through an experiment to gather human ratings on a Twitter
political dataset. A further evaluation was performed with reference to the SemEval-2014
similarity benchmark in order to validate the generalizability of TREASURE. The intrinsic
evaluation and statistical analysis demonstrated a strong positive linear correlation between
TREASURE and human ratings for both benchmarks. Furthermore, TREASURE achieved a
significantly higher correlation coefficient compared to existing state-of-the-art STSS
measures.
The SBCA algorithm incorporates TREASURE as the proximity measure. Unlike
conventional partition-based clustering algorithms, the SBCA algorithm is fully unsupervised
and dynamically determine the number of clusters beforehand. Subjective evaluation criteria
were employed to evaluate the SBCA algorithm with reference to the SemEval-2014 similarity
benchmark. Furthermore, an experiment was conducted to produce a reliable multi-class
benchmark on the European Referendum political domain, which was also utilised to evaluate
the SBCA algorithm. The evaluation results provide evidence that the SBCA algorithm
undertakes highly accurate combining and separation decisions and can generate pure clusters
from microblogging posts.
The contributions of this thesis to knowledge are mainly demonstrated as: 1) Development
of a novel STSS measure for microblogging posts (TREASURE). 2) Development of a new
SBCA algorithm that incorporates TREASURE to detect semantic themes in microblogs. 3)
Generating a word embedding pre-trained model learned from a large corpus of political
tweets. 4) Production of a reliable similarity-annotated benchmark and a reliable multi-class
benchmark in the domain of politics
Terminology-based Text Embedding for Computing Document Similarities on Technical Content
We propose in this paper a new, hybrid document embedding approach in order
to address the problem of document similarities with respect to the technical
content. To do so, we employ a state-of-the-art graph techniques to first
extract the keyphrases (composite keywords) of documents and, then, use them to
score the sentences. Using the ranked sentences, we propose two approaches to
embed documents and show their performances with respect to two baselines. With
domain expert annotations, we illustrate that the proposed methods can find
more relevant documents and outperform the baselines up to 27% in terms of
NDCG
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
- …