186 research outputs found

    An in-Browser Microblog Ranking Engine

    Get PDF
    International audienceMicroblogs, although extremely peculiar pieces of data, constitute a very rich source of information, which has been widely exploited recently, thanks to the liberal access Twitter offers through its API. Nevertheless, computing relevant answers to general queries is still a very challenging task. We propose a new engine, the Twittering Machine, which evaluates SQL like queries on streams of tweets, using ranking techniques computed at query time. Our algorithm is real time, it produces streams of results which are refined progressively, adaptive, the queries continuously adapt to new trends, invasive, it interacts with Twitter by suggesting relevant users to follow, and query results to publish as tweets. Moreover it works in a decentralized environment, directly in the browser on the client side, making it easy to use, and server independent

    Exploratory Analysis of Highly Heterogeneous Document Collections

    Full text link
    We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.Comment: 9 pages; KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Minin

    Evaluation Measures for Relevance and Credibility in Ranked Lists

    Full text link
    Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure currently exists for evaluating the accuracy or precision (and more generally effectiveness) of both the relevance and the credibility of retrieved results. One obvious way of doing so is to measure relevance and credibility effectiveness separately, and then consolidate the two measures into one. There at least two problems with such an approach: (I) it is not certain that the same criteria are applied to the evaluation of both relevance and credibility (and applying different criteria introduces bias to the evaluation); (II) many more and richer measures exist for assessing relevance effectiveness than for assessing credibility effectiveness (hence risking further bias). Motivated by the above, we present two novel types of evaluation measures that are designed to measure the effectiveness of both relevance and credibility in ranked lists of retrieval results. Experimental evaluation on a small human-annotated dataset (that we make freely available to the research community) shows that our measures are expressive and intuitive in their interpretation

    ANALYZING IMAGE TWEETS IN MICROBLOGS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A Service Oriented Framework for Analysing Social Network Activities

    Get PDF
    AbstractAnalysing and monitoring Social Networking activities raise multiple challenges for the evolution of Service Oriented Systems Engineering. This is particularly evident for event detection in social networks and, more in general, for large-scale Social Analytics, which require continuous processing of data. In this paper we present a service oriented framework exploring effective ways to leverage the opportunities coming from innovations and evolutions in computational power, storage, and infrastructures, with particular focus on modern architectures including in-memory database technology, in-database computation, massive parallel processing, Open Data Services, and scalability with multi-node clusters in Cloud. A prototype of this system was experimented in the contest of a specific kind of social event, an art exhibition of sculptures, where the system collected and analyzed in real-time the tweets issued in an entire region, including exhibition sites, and continuously updated analytical dashboards placed in one of the exhibition rooms

    Meta-information censorship and the creation of the Chinanet Bubble

    Get PDF
    The question of who controls meta-information online has become a hot-button issue with profound political implications. The present article explores how state-led online censorship in the People’s Republic of China can create information bubbles, and how it is possible to analyze them. The article is based on a systematic comparison between 3,000 Google.com and Baidu.com image search results on a series of selected, potentially sensitive, keywords. This allows us to discern how censorship and information bubbles are connected, and how it is possible to detect and analyze them. To facilitate this, we offer a typology for conceptualizing the different dimensions of internet censorship. Our analysis points to the importance of censorship on metainformation and suggests that generally censored internet contents can also spill over to a liberal context through the Sinophone internet.</p
    • …
    corecore