183 research outputs found
Tweet Contextualization Based on Wikipedia and Dbpedia
National audienceBound to 140 characters, tweets are short and not written maintaining formal grammar and proper spelling. These spelling variations increase the likelihood of vocabulary mismatch and make them difficult to understand without context. This paper falls under the tweet contextualization task that aims at providing, automatically, a summary that explains a given tweet, allowing a reader to understand it. We propose different tweet expansion approaches based on Wikipeda and Dbpedia as external knowledge sources. These proposed approaches are divided into two steps. The first step consists in generating the candidate terms for a given tweet, while the second one consists in ranking and selecting these candidate terms using asimilarity measure. The effectiveness of our methods is proved through an experimental study conducted on the INEX 2014 collection
Modeling Paragraph-Level Vision-Language Semantic Alignment for Multi-Modal Summarization
Most current multi-modal summarization methods follow a cascaded manner,
where an off-the-shelf object detector is first used to extract visual
features, then these features are fused with language representations to
generate the summary with an encoder-decoder model. The cascaded way cannot
capture the semantic alignments between images and paragraphs, which are
crucial to a precise summary. In this paper, we propose ViL-Sum to jointly
model paragraph-level \textbf{Vi}sion-\textbf{L}anguage Semantic Alignment and
Multi-Modal \textbf{Sum}marization. The core of ViL-Sum is a joint multi-modal
encoder with two well-designed tasks, image reordering and image selection. The
joint multi-modal encoder captures the interactions between modalities, where
the reordering task guides the model to learn paragraph-level semantic
alignment and the selection task guides the model to selected summary-related
images in the final summary. Experimental results show that our proposed
ViL-Sum significantly outperforms current state-of-the-art methods. In further
analysis, we find that two well-designed tasks and joint multi-modal encoder
can effectively guide the model to learn reasonable paragraphs-images and
summary-images relations
Sentiment Analysis on Financial News and Microblogs
Sentiment analysis is useful for multiple tasks including customer satisfaction metrics, identifying market trends for any industry or products, analyzing reviews from social media comments. This thesis highlights the importance of sentiment analysis, provides a summary of seminal works and different approaches towards sentiment analysis. It aims to address sentiment analysis on financial news and microblogs by classifying textual data from financial news and microblogs as positive or negative. Sentiment analysis is performed by making use of paragraph vectors and logistic regression in this thesis and it aims to compare it with previously performed approaches to performing analysis and help researchers in this field. This approach achieves state of the art results for the dataset used in this research. It also presents an insightful analysis of the results of this approach
A multimodal feature learning approach for sentiment analysis of social network multimedia
Investigation of the use of a multimodal feature learning approach, using neural network based models such as Skip-gram and Denoising Autoencoders, to address sentiment analysis of micro-blogging content, such as Twitter short messages, that are composed by a short text and, possibly, an image
Microblogging Temporal Summarization: Filtering Important Twitter Updates for Breaking News
While news stories are an important traditional medium to broadcast and consume news, microblogging has recently emerged as a place where people can dis- cuss, disseminate, collect or report information about news. However, the massive information in the microblogosphere makes it hard for readers to keep up with these real-time updates. This is especially a problem when it comes to breaking news, where people are more eager to know “what is happening”. Therefore, this dis- sertation is intended as an exploratory effort to investigate computational methods to augment human effort when monitoring the development of breaking news on a given topic from a microblog stream by extractively summarizing the updates in a timely manner.
More specifically, given an interest in a topic, either entered as a query or presented as an initial news report, a microblog temporal summarization system is proposed to filter microblog posts from a stream with three primary concerns: topical relevance, novelty, and salience. Considering the relatively high arrival rate of microblog streams, a cascade framework consisting of three stages is proposed to progressively reduce quantity of posts. For each step in the cascade, this dissertation studies methods that improve over current baselines.
In the relevance filtering stage, query and document expansion techniques are applied to mitigate sparsity and vocabulary mismatch issues. The use of word embedding as a basis for filtering is also explored, using unsupervised and supervised modeling to characterize lexical and semantic similarity. In the novelty filtering stage, several statistical ways of characterizing novelty are investigated and ensemble learning techniques are used to integrate results from these diverse techniques. These results are compared with a baseline clustering approach using both standard and delay-discounted measures. In the salience filtering stage, because of the real-time prediction requirement a method of learning verb phrase usage from past relevant news reports is used in conjunction with some standard measures for characterizing writing quality.
Following a Cranfield-like evaluation paradigm, this dissertation includes a se- ries of experiments to evaluate the proposed methods for each step, and for the end- to-end system. New microblog novelty and salience judgments are created, building on existing relevance judgments from the TREC Microblog track. The results point to future research directions at the intersection of social media, computational jour- nalism, information retrieval, automatic summarization, and machine learning
Extracting News Events from Microblogs
Twitter stream has become a large source of information for many people, but
the magnitude of tweets and the noisy nature of its content have made
harvesting the knowledge from Twitter a challenging task for researchers for a
long time. Aiming at overcoming some of the main challenges of extracting the
hidden information from tweet streams, this work proposes a new approach for
real-time detection of news events from the Twitter stream. We divide our
approach into three steps. The first step is to use a neural network or deep
learning to detect news-relevant tweets from the stream. The second step is to
apply a novel streaming data clustering algorithm to the detected news tweets
to form news events. The third and final step is to rank the detected events
based on the size of the event clusters and growth speed of the tweet
frequencies. We evaluate the proposed system on a large, publicly available
corpus of annotated news events from Twitter. As part of the evaluation, we
compare our approach with a related state-of-the-art solution. Overall, our
experiments and user-based evaluation show that our approach on detecting
current (real) news events delivers a state-of-the-art performance
VIRAL TOPIC PREDICTION AND DESCRIPTION IN MICROBLOG SOCIAL NETWORKS
Ph.DDOCTOR OF PHILOSOPH
Sentiment analysis and real-time microblog search
This thesis sets out to examine the role played by sentiment in real-time microblog search. The recent prominence of the real-time web is proving both challenging and disruptive for a number of areas of research, notably information retrieval and web data mining. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user query at a given point in time, automated methods are required to enable users to sift through this information. As an area of research reaching maturity, sentiment analysis offers a promising direction for modelling the text content in microblog streams.
In this thesis we review the real-time web as a new area of focus for sentiment analysis, with a specific focus on microblogging. We propose a system and method for evaluating the effect of sentiment on perceived search quality in real-time microblog search scenarios. Initially we provide an evaluation of sentiment analysis using supervised learning for classi- fying the short, informal content in microblog posts. We then evaluate our sentiment-based filtering system for microblog search in a user study with simulated real-time scenarios. Lastly, we conduct real-time user studies for the live broadcast of the popular television programme, the X Factor, and for the Leaders Debate during the Irish General Election. We find that we are able to satisfactorily classify positive, negative and neutral sentiment in microblog posts. We also find a significant role played by sentiment in many microblog search scenarios, observing some detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users’ prior topic sentiment
- …