19 research outputs found

    Building a Test Collection for Significant-Event Detection in Arabic Tweets

    Get PDF
    With the increasing popularity of microblogging services like Twitter, researchers discov- ered a rich medium for tackling real-life problems like event detection. However, event detection in Twitter is often obstructed by the lack of public evaluation mechanisms such as test collections (set of tweets, labels, and queries to measure the eectiveness of an information retrieval system). The problem is more evident when non-English lan- guages, e.g., Arabic, are concerned. With the recent surge of signicant events in the Arab world, news agencies and decision makers rely on Twitters microblogging service to obtain recent information on events. In this thesis, we address the problem of building a test collection of Arabic tweets (named EveTAR) for the task of event detection. To build EveTAR, we rst adopted an adequate denition of an event, which is a signicant occurrence that takes place at a certain time. An occurrence is signicant if there are news articles about it. We collected Arabic tweets using Twitter's streaming API. Then, we identied a set of events from the Arabic data collection using Wikipedias current events portal. Corresponding tweets were extracted by querying the Arabic data collection with a set of manually-constructed queries. To obtain relevance judgments for those tweets, we leveraged CrowdFlower's crowdsourcing platform. Over a period of 4 weeks, we crawled over 590M tweets, from which we identied 66 events that cover 8 dierent categories and gathered more than 134k relevance judgments. Each event contains an average of 779 relevant tweets. Over all events, we got an average Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evalu- ate three state-of-the-art event detection algorithms. The best performing algorithms achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make our test collection available for research, including events description, manually-crafted queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is the rst Arabic test collection built from scratch for the task of event detection. Addi- tionally, we show in our experiments that it supports other tasks like ad-hoc search

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Modelling Social Media Popularity of News Articles Using Headline Text

    Get PDF
    The way we formulate headlines matters -- this is the central tenet of this thesis. Headlines play a key role in attracting and engaging online audiences. With the increasing usage of mobile apps and social media to consume news, headlines are the most prominent -- and often the only -- part of the news article visible to readers. Earlier studies examined how readers' preferences and their social network influence which headlines are clicked or shared on social media. However, there is limited research on the impact of the headline text on social media popularity. To address this research gap we pose the following question: how to formulate a headline so that it reaches as many readers as possible on social media. To answer this question we adopt an experimental approach to model and predict the popularity of news articles on social media using headlines. First, we develop computational methods for an automatic extraction of two types of headline characteristics. The first type is news values: Prominence, Sentiment, Magnitude, Proximity, Surprise, and Uniqueness. The second type is linguistic style: Brevity, Simplicity, Unambiguity, Punctuation, Nouns, Verbs, and Adverbs. We then investigate the impact of these features on popularity using social media popularity on Twitter and Facebook, and perceived popularity obtained from a crowdsourced survey. Finally, using these features and headline metadata we build prediction models for global and country-specific social media popularity. For the country-specific prediction model we augment several news values features with country relatedness information using knowledge graphs. Our research established that computational methods can be reliably used to characterise headlines in terms of news values and linguistic style features; and that most of these features significantly correlate with social media popularity and to a lesser extent with perceived popularity. Our prediction model for global social media popularity outperformed state-of-the-art baselines, showing that headline wording has an effect on social media popularity. With the country-specific prediction model we showed that we improved the features implementations by adding data from knowledge graphs. These findings indicate that formulating a headline in a certain way can lead to wider readership engagement. Furthermore, our methods can be applied to other types of digital content similar to headlines, such as titles for blog posts or videos. More broadly our results signify the importance of content analysis for popularity prediction

    Detecting New, Informative Propositions in Social Media

    Get PDF
    The ever growing quantity of online text produced makes it increasingly challenging to find new important or useful information. This is especially so when topics of potential interest are not known a-priori, such as in “breaking news stories”. This thesis examines techniques for detecting the emergence of new, interesting information in Social Media. It sets the investigation in the context of a hypothetical knowledge discovery and acquisition system, and addresses two objectives. The first objective addressed is the detection of new topics. The second is filtering of non-informative text from Social Media. A rolling time-slicing approach is proposed for discovery, in which daily frequencies of nouns, named entities, and multiword expressions are compared to their expected daily frequencies, as estimated from previous days using a Poisson model. Trending features, those showing a significant surge in use, in Social Media are potentially interesting. Features that have not shown a similar recent surge in News are selected as indicative of new information. It is demonstrated that surges in nouns and news entities can be detected that predict corresponding surges in mainstream news. Co-occurring trending features are used to create clusters of potentially topic-related documents. Those formed from co-occurrences of named entities are shown to be the most topically coherent. Machine learning based filtering models are proposed for finding informative text in Social Media. News/Non-News and Dialogue Act models are explored using the News annotated Redites corpus of Twitter messages. A simple 5-act Dialogue scheme, used to annotate a small sample thereof, is presented. For both News/Non-News and Informative/Non-Informative classification tasks, using non-lexical message features produces more discriminative and robust classification models than using message terms alone. The combination of all investigated features yield the most accurate models

    Real-time event detection using Twitter

    Get PDF
    Twitter has become the social network of news and journalism. Monitoring what is said on Twitter is a frequent task for anyone who requires timely access to information: journalists, traders, and the emergency services have all invested heavily in monitoring Twitter in recent years. Given this, there is a need to develop systems that can automatically monitor Twitter to detect real-world events as they happen, and alert users to novel events. However, this is not an easy task due to the noise and volume of data that is produced from social media streams such as Twitter. Although a range of approaches have been developed, many are unevaluated, cannot scale past low volume streams, or can only detect specific types of event. In this thesis, we develop novel approaches to event detection, and enable the evaluation and comparison of event detection approaches by creating a large-scale test collection called Events 2012, containing 120 million tweets and with relevance judgements for over 500 events. We use existing event detection approaches and Wikipedia to generate candidate events, then use crowdsourcing to gather annotations. We propose a novel entity-based, real-time, event detection approach that we evaluate using the Events 2012 collection, and show that it outperforms existing state-of-the-art approaches to event detection whilst also being scalable. We examine and compare automated and crowdsourced evaluation methodologies for the evaluation of event detection. Finally, we propose a Newsworthiness score that is learned in real-time from heuristically labelled data. The score is able to accurately classify individual tweets as newsworthy or noise in real-time. We adapt the score for use as a feature for event detection, and find that it can easily be used to filter out noisy clusters and improve existing event detection techniques. We conclude with a summary of our research findings and answers to our research questions. We discuss some of the difficulties that remain to be solved in event detection on Twitter and propose some possible future directions for research into real-time event detection on Twitter

    Cartoons as interdiscourse : a quali-quantitative analysis of social representations based on collective imagination in cartoons produced after the Charlie Hebdo attack

    No full text
    The attacks against Charlie Hebdo in Paris at the beginning of the year 2015 urged many cartoonists – most professionals but some laymen as well – to create cartoons as a reaction to this tragedy. The main goal of this article is to show how traumatic events like this one can converge in a rather limited set of metaphors, ranging from easily recognizable topoi to rather vague interdiscourses that circulate in contemporary societies. To do so, we analyzed 450 cartoons that were produced as a reaction to the Charlie Hebdo attacks, and took a quali-quantitative approach that draws both on discourse analysis and semiotics. In this paper, we identified eight main themes and we analyzed the five ones which are anchored in collective imagination (the pen against the sword, the journalist as a modern hero, etc.). Then, we studied the cartoons at figurative, narrative and thematic levels thanks to Greimas’ model of the semiotic square. This paper shows the ways in which these cartoons build upon a memory-based network of events from the recent past (particularly 9/11), and more generally on a collective imagination which can be linked to Western values.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
    corecore