Search CORE

38 research outputs found

From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

Author: A Fog
A Vidmer
B Karrer
C Aicher
D Liben-Nowell
G Robins
I Scholtes
J Jacod
JD Wilson
K Anand
M Domenico De
M Kivelä
M Molloy
M Rosvall
M Szell
MEJ Newman
MEJ Newman
N Eagle
N Eagle
P Erdös
P Holme
TP Peixoto
WW Zachary
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2017
Field of study

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.Comment: 10 pages, 8 figures, accepted at SocInfo201

arXiv.org e-Print Archive

Crossref

An Accuracy-Assured Privacy-Preserving Recommender System for Internet Commerce

Author: Lu Zhigang
Shen Hong
Publication venue
Publication date: 01/01/2015
Field of study

Recommender systems, tool for predicting users' potential preferences by computing history data and users' interests, show an increasing importance in various Internet applications such as online shopping. As a well-known recommendation method, neighbourhood-based collaborative filtering has attracted considerable attention recently. The risk of revealing users' private information during the process of filtering has attracted noticeable research interests. Among the current solutions, the probabilistic techniques have shown a powerful privacy preserving effect. When facing

k

Nearest Neighbour attack, all the existing methods provide no data utility guarantee, for the introduction of global randomness. In this paper, to overcome the problem of recommendation accuracy loss, we propose a novel approach, Partitioned Probabilistic Neighbour Selection, to ensure a required prediction accuracy while maintaining high security against

k

NN attack. We define the sum of

k

neighbours' similarity as the accuracy metric alpha, the number of user partitions, across which we select the

k

neighbours, as the security metric beta. We generalise the

k

Nearest Neighbour attack to beta k Nearest Neighbours attack. Differing from the existing approach that selects neighbours across the entire candidate list randomly, our method selects neighbours from each exclusive partition of size

k

with a decreasing probability. Theoretical and experimental analysis show that to provide an accuracy-assured recommendation, our Partitioned Probabilistic Neighbour Selection method yields a better trade-off between the recommendation accuracy and system security.Comment: replacement for the previous versio

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Uses of the Hypergeometric Distribution for Determining Survival or Complete Representation of Subpopulations in Sequential Sampling

Author: Busbee Brooke
Publication venue: SFA ScholarWorks
Publication date: 01/08/2017
Field of study

This thesis will explore the hypergeometric probability distribution by looking at many different aspects of the distribution. These include, and are not limited to: history and origin, derivation and elementary applications, properties, relationships to other probability models, kindred hypergeometric distributions and elements of statistical inference associated with the hypergeometric distribution. Once the above are established, an investigation into and furthering of work done by Walton (1986) and Charlambides (2005) will be done. Here, we apply the hypergeometric distribution to sequential sampling in order to determine a surviving subcategory as well as study the problem of and complete representation of the subcategories within the population

SFA ScholarWorks

First-Come-First-Served for Online Slot Allocation and Huffman Coding

Author: Khare Monik
Mathieu Claire
Young Neal E.
Publication venue
Publication date: 07/10/2013
Field of study

Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not previously requested, then the algorithm (knowing the slot costs and the requests so far, but not p) must place the item in some vacant slot j(i). The goal is to minimize the sum, over the items, of the probability of the item times the cost of its assigned slot. The optimal offline algorithm is trivial: put the most probable item in the cheapest slot, the second most probable item in the second cheapest slot, etc. The optimal online algorithm is First Come First Served (FCFS): put the first requested item in the cheapest slot, the second (distinct) requested item in the second cheapest slot, etc. The optimal competitive ratios for any online algorithm are 1+H(n-1) ~ ln n for general costs and 2 for concave costs. For logarithmic costs, the ratio is, asymptotically, 1: FCFS gives cost opt + O(log opt). For Huffman coding, FCFS yields an online algorithm (one that allocates codewords on demand, without knowing the underlying probability distribution) that guarantees asymptotically optimal cost: at most opt + 2 log(1+opt) + 2.Comment: ACM-SIAM Symposium on Discrete Algorithms (SODA) 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Wallenius Naive Bayes

Author: Junque de Fortuny Enric
Martens David
Provost Foster
Publication venue
Publication date: 20/12/2013
Field of study

Traditional event models underlying naive Bayes classifiers assume probability distributions that are not appropriate for binary data generated by human behaviour. In this work, we develop a new event model, based on a somewhat forgotten distribution created by Kenneth Ted Wallenius in 1963. We show that it achieves superior performance using less data on a collection of Facebook datasets, where the task is to predict personality traits, based on likes.Faculty of Applied Economics, University of Antwerp, Belgium; Department of Information, Operations & Management Sciences, NYU Stern School of Busines

New York University Faculty Digital Archive

Modelling Preference Data with the Wallenius Distribution

Author: Grazian Clara
Leisen Fabrizio
Liseo Brunero
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

The Wallenius distribution is a generalisation of the Hypergeometric distribution where weights are assigned to balls of different colours. This naturally defines a model for ranking categories which can be used for classification purposes. Since, in general, the resulting likelihood is not analytically available, we adopt an approximate Bayesian computational (ABC) approach for estimating the importance of the categories. We illustrate the performance of the estimation procedure on simulated datasets. Finally, we use the new model for analysing two datasets concerning movies ratings and Italian academic statisticians' journal preferences. The latter is a novel dataset collected by the authors

arXiv.org e-Print Archive

Oxford University Research Archive

Kent Academic Repository

Archivio della ricerca- Università di Roma La Sapienza

Some Objects Are More Equal Than Others: Measuring and Predicting Importance

Author: A. Fog
A. Yarbus
B.F.J. Manly
D. Walther
L. Elazary
M. Weber
S. Kullback
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

We observe that everyday images contain dozens of objects, and that humans, in describing these images, give different priority to these objects. We argue that a goal of visual recognition is, therefore, not only to detect and classify objects but also to associate with each a level of priority which we call 'importance'. We propose a definition of importance and show how this may be estimated reliably from data harvested from human observers. We conclude by showing that a first-order estimate of importance may be computed from a number of simple image region measurements and does not require access to image meaning

Crossref

Caltech Authors