Search CORE

17,417 research outputs found

Decoding Information from noisy, redundant, and intentionally-distorted sources

Author: Chung
Guttman
Hoel
Kleinberg
Laureti
Lionel Moret
Masum
Paolo Laureti
Press
Resnick
Shannon
Yi-Cheng Zhang
Yi-Kuo Yu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

Advances in information technology reduce barriers to information propagation, but at the same time they also induce the information overload problem. For the making of various decisions, mere digestion of the relevant information has become a daunting task due to the massive amount of information available. This information, such as that generated by evaluation systems developed by various web sites, is in general useful but may be noisy and may also contain biased entries. In this study, we establish a framework to systematically tackle the challenging problem of information decoding in the presence of massive and redundant data. When applied to a voting system, our method simultaneously ranks the raters and the ratees using only the evaluation data, consisting of an array of scores each of which represents the rating of a ratee by a rater. Not only is our appraoch effective in decoding information, it is also shown to be robust against various hypothetical types of noise as well as intentional abuses.Comment: 19 pages, 5 figures, accepted for publication in Physica

arXiv.org e-Print Archive

CiteSeerX

Crossref

RERO DOC Digital Library

A Puff of Steem: Security Analysis of Decentralized Content Curation

Author: Kiayias Aggelos
Livshits Benjamin
Thyfronitis Litos Orfeas Stefanos
Publication venue: OASIcs - OpenAccess Series in Informatics. International Conference on Blockchain Economics, Security and Protocols (Tokenomics 2019)
Publication date: 01/01/2020
Field of study

Decentralized content curation is the process through which uploaded posts are ranked and filtered based exclusively on users\u27 feedback. Platforms such as the blockchain-based Steemit employ this type of curation while providing monetary incentives to promote the visibility of high quality posts according to the perception of the participants. Despite the wide adoption of the platform very little is known regarding its performance and resilience characteristics. In this work, we provide a formal model for decentralized content curation that identifies salient complexity and game-theoretic measures of performance and resilience to selfish participants. Armed with our model, we provide a first analysis of Steemit identifying the conditions under which the system can be expected to correctly converge to curation while we demonstrate its susceptibility to selfish participant behaviour. We validate our theoretical results with system simulations in various scenarios

Dagstuhl Research Online Publication Server

Equality of Voice: Towards Fair Representation in Crowdsourced Top-K Recommendations

Author: Chakraborty Abhijnan
Ganguly Niloy
Gummadi Krishna P.
Loiseau Patrick
Patro Gourab K
Publication venue
Publication date: 21/11/2018
Field of study

To help their users to discover important items at a particular time, major websites like Twitter, Yelp, TripAdvisor or NYTimes provide Top-K recommendations (e.g., 10 Trending Topics, Top 5 Hotels in Paris or 10 Most Viewed News Stories), which rely on crowdsourced popularity signals to select the items. However, different sections of a crowd may have different preferences, and there is a large silent majority who do not explicitly express their opinion. Also, the crowd often consists of actors like bots, spammers, or people running orchestrated campaigns. Recommendation algorithms today largely do not consider such nuances, hence are vulnerable to strategic manipulation by small but hyper-active user groups. To fairly aggregate the preferences of all users while recommending top-K items, we borrow ideas from prior research on social choice theory, and identify a voting mechanism called Single Transferable Vote (STV) as having many of the fairness properties we desire in top-K item (s)elections. We develop an innovative mechanism to attribute preferences of silent majority which also make STV completely operational. We show the generalizability of our approach by implementing it on two different real-world datasets. Through extensive experimentation and comparison with state-of-the-art techniques, we show that our proposed approach provides maximum user satisfaction, and cuts down drastically on items disliked by most but hyper-actively promoted by a few users.Comment: In the proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* '19). Please cite the conference versio

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MPG.PuRe

Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings

Author: Gurciullo Stefano
Mikhaylov Slava
Publication venue
Publication date: 01/07/2017
Field of study

Foreign policy analysis has been struggling to find ways to measure policy preferences and paradigm shifts in international political systems. This paper presents a novel, potential solution to this challenge, through the application of a neural word embedding (Word2vec) model on a dataset featuring speeches by heads of state or government in the United Nations General Debate. The paper provides three key contributions based on the output of the Word2vec model. First, it presents a set of policy attention indices, synthesizing the semantic proximity of political speeches to specific policy themes. Second, it introduces country-specific semantic centrality indices, based on topological analyses of countries' semantic positions with respect to each other. Third, it tests the hypothesis that there exists a statistical relation between the semantic content of political speeches and UN voting behavior, falsifying it and suggesting that political speeches contain information of different nature then the one behind voting outcomes. The paper concludes with a discussion of the practical use of its results and consequences for foreign policy analysis, public accountability, and transparency

arXiv.org e-Print Archive

University of Essex Research Repository

Crossref

University of Birmingham Research Portal

Towards Data-Driven Autonomics in Data Centers

Author: Babaoglu Ozalp
Sîrbu Alina
Publication venue
Publication date: 01/01/2015
Field of study

Continued reliance on human operators for managing data centers is a major impediment for them from ever reaching extreme dimensions. Large computer systems in general, and data centers in particular, will ultimately be managed using predictive computational and executable models obtained through data-science tools, and at that point, the intervention of humans will be limited to setting high-level goals and policies rather than performing low-level operations. Data-driven autonomics, where management and control are based on holistic predictive models that are built and updated using generated data, opens one possible path towards limiting the role of operators in data centers. In this paper, we present a data-science study of a public Google dataset collected in a 12K-node cluster with the goal of building and evaluating a predictive model for node failures. We use BigQuery, the big data SQL platform from the Google Cloud suite, to process massive amounts of data and generate a rich feature set characterizing machine state over time. We describe how an ensemble classifier can be built out of many Random Forest classifiers each trained on these features, to predict if machines will fail in a future 24-hour window. Our evaluation reveals that if we limit false positive rates to 5%, we can achieve true positive rates between 27% and 88% with precision varying between 50% and 72%. We discuss the practicality of including our predictive model as the central component of a data-driven autonomic manager and operating it on-line with live data streams (rather than off-line on data logs). All of the scripts used for BigQuery and classification analyses are publicly available from the authors' website.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California