774 research outputs found
Winter is here: summarizing Twitter streams related to pre-scheduled events
Pre-scheduled events, such as TV shows and sports games, usually garner considerable attention from the public. Twitter captures large volumes of discussions and messages related to these events, in real-time. Twitter streams related to pre-scheduled events are characterized by the following: (1) spikes in the volume of published tweets reflect the highlights of the event and (2) some of the published tweets make reference to the characters involved in the event, in the context in which they are currently portrayed in a subevent. In this paper, we take advantage of these characteristics to identify the highlights of pre-scheduled events from tweet streams and we demonstrate a method to summarize these highlights. We evaluate our algorithm on tweets collected around 2 episodes of a popular TV show, Game of Thrones, Season 7.Published versio
A System for Crowd Oriented Event Detection, Tracking, and Summarization in Social Media
In this research thesis, we investigate new methods for crowd-oriented event detection in social media (specifically Twitter). Specifically, we describe, evaluate, and suggest content based methods of extracting features that define events occurring in social media streams. Content-based methods examine the appearance of event-describing keywords, topical words, in a stream of tweets. With these aggregated features, tweets are then clustered using a parallelized version of canopy and k-means clustering in order to find groups of “similar” tweets which represent events. Tracking of events through time is done by evaluating the similarity of events in consecutive time periods. The effectiveness of the feature extraction stage is determined by the relevance of tweets to one another in event summaries. Our experiments aim toward finding the optimal parameters for feature extraction and event clustering
OntoDSumm : Ontology based Tweet Summarization for Disaster Events
The huge popularity of social media platforms like Twitter attracts a large
fraction of users to share real-time information and short situational messages
during disasters. A summary of these tweets is required by the government
organizations, agencies, and volunteers for efficient and quick disaster
response. However, the huge influx of tweets makes it difficult to manually get
a precise overview of ongoing events. To handle this challenge, several tweet
summarization approaches have been proposed. In most of the existing
literature, tweet summarization is broken into a two-step process where in the
first step, it categorizes tweets, and in the second step, it chooses
representative tweets from each category. There are both supervised as well as
unsupervised approaches found in literature to solve the problem of first step.
Supervised approaches requires huge amount of labelled data which incurs cost
as well as time. On the other hand, unsupervised approaches could not clusters
tweet properly due to the overlapping keywords, vocabulary size, lack of
understanding of semantic meaning etc. While, for the second step of
summarization, existing approaches applied different ranking methods where
those ranking methods are very generic which fail to compute proper importance
of a tweet respect to a disaster. Both the problems can be handled far better
with proper domain knowledge. In this paper, we exploited already existing
domain knowledge by the means of ontology in both the steps and proposed a
novel disaster summarization method OntoDSumm. We evaluate this proposed method
with 4 state-of-the-art methods using 10 disaster datasets. Evaluation results
reveal that OntoDSumm outperforms existing methods by approximately 2-66% in
terms of ROUGE-1 F1 score
Enhanced web-based summary generation for search.
After a user types in a search query on a major search engine, they are presented with a number of search results. Each search result is made up of a title, brief text summary and a URL. It is then the user\u27s job to select documents for further review. Our research aims to improve the accuracy of users selecting relevant documents by improving the way these web pages are summarized. Improvements in accuracy will lead to time improvements and user experience improvements. We propose ReClose, a system for generating web document summaries. ReClose generates summary content through combining summarization techniques from query-biased and query-independent summary generation. Query-biased summaries generally provide query terms in context. Query-independent summaries focus on summarizing documents as a whole. Combining these summary techniques led to a 10% improvement in user decision making over Google generated summaries. Color-coded ReClose summaries provide keyword usage depth at a glance and also alert users to topic departures. Color-coding further enhanced ReClose results and led to a 20% improvement in user decision making over Google generated summaries. Many online documents include structure and multimedia of various forms such as tables, lists, forms and images. We propose to include this structure in web page summaries. We found that the expert user was insignificantly slowed in decision making while the majority of average users made decisions more quickly using summaries including structure without any decrease in decision accuracy. We additionally extended ReClose for use in summarizing large numbers of tweets in tracking flu outbreaks in social media. The resulting summaries have variable length and are effective at summarizing flu related trends. Users of the system obtained an accuracy of 0.86 labeling multi-tweet summaries. This showed that the basis of ReClose is effective outside of web documents and that variable length summaries can be more effective than fixed length. Overall the ReClose system provides unique summaries that contain more informative content than current search engines produce, highlight the results in a more meaningful way, and add structure when meaningful. The applications of ReClose extend far beyond search and have been demonstrated in summarizing pools of tweets
PORTRAIT: a hybrid aPproach tO cReate extractive ground-TRuth summAry for dIsaster evenT
Disaster summarization approaches provide an overview of the important
information posted during disaster events on social media platforms, such as,
Twitter. However, the type of information posted significantly varies across
disasters depending on several factors like the location, type, severity, etc.
Verification of the effectiveness of disaster summarization approaches still
suffer due to the lack of availability of good spectrum of datasets along with
the ground-truth summary. Existing approaches for ground-truth summary
generation (ground-truth for extractive summarization) relies on the wisdom and
intuition of the annotators. Annotators are provided with a complete set of
input tweets from which a subset of tweets is selected by the annotators for
the summary. This process requires immense human effort and significant time.
Additionally, this intuition-based selection of the tweets might lead to a high
variance in summaries generated across annotators. Therefore, to handle these
challenges, we propose a hybrid (semi-automated) approach (PORTRAIT) where we
partly automate the ground-truth summary generation procedure. This approach
reduces the effort and time of the annotators while ensuring the quality of the
created ground-truth summary. We validate the effectiveness of PORTRAIT on 5
disaster events through quantitative and qualitative comparisons of
ground-truth summaries generated by existing intuitive approaches, a
semi-automated approach, and PORTRAIT. We prepare and release the ground-truth
summaries for 5 disaster events which consist of both natural and man-made
disaster events belonging to 4 different countries. Finally, we provide a study
about the performance of various state-of-the-art summarization approaches on
the ground-truth summaries generated by PORTRAIT using ROUGE-N F1-scores
Microblog Contextualization Using Continuous Space Vectors: Multi-Sentence Compression of Cultural Documents
International audienceIn this paper we describe our work for the MC2 CLEF 2017 lab. We participated in the content analysis task that involves filtering, language recognition and summarization. We combine Information Retrieval with Multi-Sentence Compression methods to contextualize mi-croblogs using Wikipedia's pages
Investigating Rumor Propagation with TwitterTrails
Social media have become part of modern news reporting, used by journalists
to spread information and find sources, or as a news source by individuals. The
quest for prominence and recognition on social media sites like Twitter can
sometimes eclipse accuracy and lead to the spread of false information. As a
way to study and react to this trend, we introduce {\sc TwitterTrails}, an
interactive, web-based tool ({\tt twittertrails.com}) that allows users to
investigate the origin and propagation characteristics of a rumor and its
refutation, if any, on Twitter. Visualizations of burst activity, propagation
timeline, retweet and co-retweeted networks help its users trace the spread of
a story. Within minutes {\sc TwitterTrails} will collect relevant tweets and
automatically answer several important questions regarding a rumor: its
originator, burst characteristics, propagators and main actors according to the
audience. In addition, it will compute and report the rumor's level of
visibility and, as an example of the power of crowdsourcing, the audience's
skepticism towards it which correlates with the rumor's credibility. We
envision {\sc TwitterTrails} as valuable tool for individual use, but we
especially for amateur and professional journalists investigating recent and
breaking stories. Further, its expanding collection of investigated rumors can
be used to answer questions regarding the amount and success of misinformation
on Twitter.Comment: 10 pages, 8 figures, under revie
- …