186 research outputs found
An in-Browser Microblog Ranking Engine
International audienceMicroblogs, although extremely peculiar pieces of data, constitute a very rich source of information, which has been widely exploited recently, thanks to the liberal access Twitter offers through its API. Nevertheless, computing relevant answers to general queries is still a very challenging task. We propose a new engine, the Twittering Machine, which evaluates SQL like queries on streams of tweets, using ranking techniques computed at query time. Our algorithm is real time, it produces streams of results which are refined progressively, adaptive, the queries continuously adapt to new trends, invasive, it interacts with Twitter by suggesting relevant users to follow, and query results to publish as tweets. Moreover it works in a decentralized environment, directly in the browser on the client side, making it easy to use, and server independent
Exploratory Analysis of Highly Heterogeneous Document Collections
We present an effective multifaceted system for exploratory analysis of
highly heterogeneous document collections. Our system is based on intelligently
tagging individual documents in a purely automated fashion and exploiting these
tags in a powerful faceted browsing framework. Tagging strategies employed
include both unsupervised and supervised approaches based on machine learning
and natural language processing. As one of our key tagging strategies, we
introduce the KERA algorithm (Keyword Extraction for Reports and Articles).
KERA extracts topic-representative terms from individual documents in a purely
unsupervised fashion and is revealed to be significantly more effective than
state-of-the-art methods. Finally, we evaluate our system in its ability to
help users locate documents pertaining to military critical technologies buried
deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery
and Data Minin
Evaluation Measures for Relevance and Credibility in Ranked Lists
Recent discussions on alternative facts, fake news, and post truth politics
have motivated research on creating technologies that allow people not only to
access information, but also to assess the credibility of the information
presented to them by information retrieval systems. Whereas technology is in
place for filtering information according to relevance and/or credibility, no
single measure currently exists for evaluating the accuracy or precision (and
more generally effectiveness) of both the relevance and the credibility of
retrieved results. One obvious way of doing so is to measure relevance and
credibility effectiveness separately, and then consolidate the two measures
into one. There at least two problems with such an approach: (I) it is not
certain that the same criteria are applied to the evaluation of both relevance
and credibility (and applying different criteria introduces bias to the
evaluation); (II) many more and richer measures exist for assessing relevance
effectiveness than for assessing credibility effectiveness (hence risking
further bias).
Motivated by the above, we present two novel types of evaluation measures
that are designed to measure the effectiveness of both relevance and
credibility in ranked lists of retrieval results. Experimental evaluation on a
small human-annotated dataset (that we make freely available to the research
community) shows that our measures are expressive and intuitive in their
interpretation
ANALYZING IMAGE TWEETS IN MICROBLOGS
Ph.DDOCTOR OF PHILOSOPH
A Service Oriented Framework for Analysing Social Network Activities
AbstractAnalysing and monitoring Social Networking activities raise multiple challenges for the evolution of Service Oriented Systems Engineering. This is particularly evident for event detection in social networks and, more in general, for large-scale Social Analytics, which require continuous processing of data. In this paper we present a service oriented framework exploring effective ways to leverage the opportunities coming from innovations and evolutions in computational power, storage, and infrastructures, with particular focus on modern architectures including in-memory database technology, in-database computation, massive parallel processing, Open Data Services, and scalability with multi-node clusters in Cloud. A prototype of this system was experimented in the contest of a specific kind of social event, an art exhibition of sculptures, where the system collected and analyzed in real-time the tweets issued in an entire region, including exhibition sites, and continuously updated analytical dashboards placed in one of the exhibition rooms
Meta-information censorship and the creation of the Chinanet Bubble
The question of who controls meta-information online has become
a hot-button issue with profound political implications. The present
article explores how state-led online censorship in the People’s
Republic of China can create information bubbles, and how it is
possible to analyze them. The article is based on a systematic
comparison between 3,000 Google.com and Baidu.com image
search results on a series of selected, potentially sensitive,
keywords. This allows us to discern how censorship and
information bubbles are connected, and how it is possible to
detect and analyze them. To facilitate this, we offer a typology for
conceptualizing the different dimensions of internet censorship.
Our analysis points to the importance of censorship on metainformation and suggests that generally censored internet
contents can also spill over to a liberal context through the
Sinophone internet.</p
- …