1,432 research outputs found
Live Blog Corpus for Summarization
Live blogs are an increasingly popular news format to cover breaking news and
live events in online journalism. Online news websites around the world are
using this medium to give their readers a minute by minute update on an event.
Good summaries enhance the value of the live blogs for a reader but are often
not available. In this paper, we study a way of collecting corpora for
automatic live blog summarization. In an empirical evaluation using well-known
state-of-the-art summarization systems, we show that live blogs corpus poses
new challenges in the field of summarization. We make our tools publicly
available to reconstruct the corpus to encourage the research community and
replicate our results.Comment: To appear in the Proceedings of LREC 201
The Library & Generative AI
A demonstration of several AI tools, including ChatGPT, ChatPDF, Consensus, and more. The focus of the session is on potential student uses of the tools and related library initiatives, so we address the limits of ChatGPT as an information source. Librarians can help students learn how to use these tools responsibly and provide leadership on campus as AI is integrated into assignments
Comparison and Adaptation of Automatic Evaluation Metrics for Quality Assessment of Re-Speaking
Re-speaking is a mechanism for obtaining high quality subtitles for use in
live broadcast and other public events. Because it relies on humans performing
the actual re-speaking, the task of estimating the quality of the results is
non-trivial. Most organisations rely on humans to perform the actual quality
assessment, but purely automatic methods have been developed for other similar
problems, like Machine Translation. This paper will try to compare several of
these methods: BLEU, EBLEU, NIST, METEOR, METEOR-PL, TER and RIBES. These will
then be matched to the human-derived NER metric, commonly used in re-speaking.Comment: Comparison and Adaptation of Automatic Evaluation Metrics for Quality
Assessment of Re-Speaking. arXiv admin note: text overlap with
arXiv:1509.0908
HunSum-1 : an abstractive summarization dataset for Hungarian
We introduce HunSum-1 : a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the modelsâ results. The HunSum-1 dataset, all models used in our experiments and our code1 are available open source
Crowdsourced real-world sensing: sentiment analysis and the real-time web
The advent of the real-time web is proving both challeng-
ing and at the same time disruptive for a number of areas of research,
notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis
and discusses the motivations and challenges behind such a direction
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
Entity Ranking on Graphs: Studies on Expert Finding
Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models
A survey of data mining techniques for social media analysis
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors
Sentiment analysis and real-time microblog search
This thesis sets out to examine the role played by sentiment in real-time microblog search. The recent prominence of the real-time web is proving both challenging and disruptive for a number of areas of research, notably information retrieval and web data mining. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user query at a given point in time, automated methods are required to enable users to sift through this information. As an area of research reaching maturity, sentiment analysis offers a promising direction for modelling the text content in microblog streams.
In this thesis we review the real-time web as a new area of focus for sentiment analysis, with a specific focus on microblogging. We propose a system and method for evaluating the effect of sentiment on perceived search quality in real-time microblog search scenarios. Initially we provide an evaluation of sentiment analysis using supervised learning for classi- fying the short, informal content in microblog posts. We then evaluate our sentiment-based filtering system for microblog search in a user study with simulated real-time scenarios. Lastly, we conduct real-time user studies for the live broadcast of the popular television programme, the X Factor, and for the Leaders Debate during the Irish General Election. We find that we are able to satisfactorily classify positive, negative and neutral sentiment in microblog posts. We also find a significant role played by sentiment in many microblog search scenarios, observing some detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and usersâ prior topic sentiment
- âŠ