Search CORE

20,107 research outputs found

From Relational Data to Graphs: Inferring Significant Links using Generalized Hypergeometric Ensembles

Author: A Fog
A Vidmer
B Karrer
C Aicher
D Liben-Nowell
G Robins
I Scholtes
J Jacod
JD Wilson
K Anand
M Domenico De
M Kivelä
M Molloy
M Rosvall
M Szell
MEJ Newman
MEJ Newman
N Eagle
N Eagle
P Erdös
P Holme
TP Peixoto
WW Zachary
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/07/2017
Field of study

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.Comment: 10 pages, 8 figures, accepted at SocInfo201

arXiv.org e-Print Archive

Crossref

Evaluating Overfit and Underfit in Models of Network Community Structure

Author: Clauset Aaron
Ghasemian Amir
Hosseinmardi Homa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

arXiv.org e-Print Archive

Crossref

Monte Carlo optimization approach for decentralized estimation networks under communication constraints

Author: Cetin Mujdat
Uney Murat
Çetin Müjdat
Üney Murat
Publication venue: 'Sabanci University Information Center'
Publication date: 25/11/2010
Field of study

We consider designing decentralized estimation schemes over bandwidth limited communication links with a particular interest in the tradeoff between the estimation accuracy and the cost of communications due to, e.g., energy consumption. We take two classes of in–network processing strategies into account which yield graph representations through modeling the sensor platforms as the vertices and the communication links by edges as well as a tractable Bayesian risk that comprises the cost of transmissions and penalty for the estimation errors. This approach captures a broad range of possibilities for “online” processing of observations as well as the constraints imposed and enables a rigorous design setting in the form of a constrained optimization problem. Similar schemes as well as the structures exhibited by the solutions to the design problem has been studied previously in the context of decentralized detection. Under reasonable assumptions, the optimization can be carried out in a message passing fashion. We adopt this framework for estimation, however, the corresponding optimization schemes involve integral operators that cannot be evaluated exactly in general. We develop an approximation framework using Monte Carlo methods and obtain particle representations and approximate computational schemes for both classes of in–network processing strategies and their optimization. The proposed Monte Carlo optimization procedures operate in a scalable and efficient fashion and, owing to the non-parametric nature, can produce results for any distributions provided that samples can be produced from the marginals. In addition, this approach exhibits graceful degradation of the estimation accuracy asymptotically as the communication becomes more costly, through a parameterized Bayesian risk

Sabanci University Research Database

Final report of the GARNet Advisory Committee on Arabidopsis Systems Biology in the UK, June 2006.

Author: Millar Andrew
Publication venue
Publication date: 01/01/2006
Field of study

Edinburgh Research Explorer

Catalog Matching with Astrometric Correction and its Application to the Hubble Legacy Archive

Author: Budavári
Budavári
Heinis
Hogg
Jenkner
Kerekes
Kunszt
Lasker
Lindsay
Rots
Stephen H. Lubow
Tamás Budavári
Whitmore
York
Publication venue: 'IOP Publishing'
Publication date: 31/10/2012
Field of study

Object cross-identification in multiple observations is often complicated by the uncertainties in their astrometric calibration. Due to the lack of standard reference objects, an image with a small field of view can have significantly larger errors in its absolute positioning than the relative precision of the detected sources within. We present a new general solution for the relative astrometry that quickly refines the World Coordinate System of overlapping fields. The efficiency is obtained through the use of infinitesimal 3-D rotations on the celestial sphere, which do not involve trigonometric functions. They also enable an analytic solution to an important step in making the astrometric corrections. In cases with many overlapping images, the correct identification of detections that match together across different images is difficult to determine. We describe a new greedy Bayesian approach for selecting the best object matches across a large number of overlapping images. The methods are developed and demonstrated on the Hubble Legacy Archive, one of the most challenging data sets today. We describe a novel catalog compiled from many Hubble Space Telescope observations, where the detections are combined into a searchable collection of matches that link the individual detections. The matches provide descriptions of astronomical objects involving multiple wavelengths and epochs. High relative positional accuracy of objects is achieved across the Hubble images, often sub-pixel precision in the order of just a few milli-arcseconds. The result is a reliable set of high-quality associations that are publicly available online.Comment: 9 pages, 9 figures, accepted for publication in the Astrophysical Journa

arXiv.org e-Print Archive

Crossref

Monte Carlo optimization approach for decentralized estimation networks under communication constraints

Author: Çetin Müjdat
Üney Murat
Publication venue
Publication date: 25/11/2010
Field of study

Sabanci University Research Database