129,638 research outputs found

    Unbiased sampling of network ensembles

    Get PDF
    Sampling random graphs with given properties is a key step in the analysis of networks, as random ensembles represent basic null models required to identify patterns such as communities and motifs. An important requirement is that the sampling process is unbiased and efficient. The main approaches are microcanonical, i.e. they sample graphs that match the enforced constraints exactly. Unfortunately, when applied to strongly heterogeneous networks (like most real-world examples), the majority of these approaches become biased and/or time-consuming. Moreover, the algorithms defined in the simplest cases, such as binary graphs with given degrees, are not easily generalizable to more complicated ensembles. Here we propose a solution to the problem via the introduction of a "Maximize and Sample" ("Max & Sam" for short) method to correctly sample ensembles of networks where the constraints are `soft', i.e. realized as ensemble averages. Our method is based on exact maximum-entropy distributions and is therefore unbiased by construction, even for strongly heterogeneous networks. It is also more computationally efficient than most microcanonical alternatives. Finally, it works for both binary and weighted networks with a variety of constraints, including combined degree-strength sequences and full reciprocity structure, for which no alternative method exists. Our canonical approach can in principle be turned into an unbiased microcanonical one, via a restriction to the relevant subset. Importantly, the analysis of the fluctuations of the constraints suggests that the microcanonical and canonical versions of all the ensembles considered here are not equivalent. We show various real-world applications and provide a code implementing all our algorithms.Comment: MatLab code available at http://www.mathworks.it/matlabcentral/fileexchange/46912-max-sam-package-zi

    Exponential Family Matrix Completion under Structural Constraints

    Full text link
    We consider the matrix completion problem of recovering a structured matrix from noisy and partial measurements. Recent works have proposed tractable estimators with strong statistical guarantees for the case where the underlying matrix is low--rank, and the measurements consist of a subset, either of the exact individual entries, or of the entries perturbed by additive Gaussian noise, which is thus implicitly suited for thin--tailed continuous data. Arguably, common applications of matrix completion require estimators for (a) heterogeneous data--types, such as skewed--continuous, count, binary, etc., (b) for heterogeneous noise models (beyond Gaussian), which capture varied uncertainty in the measurements, and (c) heterogeneous structural constraints beyond low--rank, such as block--sparsity, or a superposition structure of low--rank plus elementwise sparseness, among others. In this paper, we provide a vastly unified framework for generalized matrix completion by considering a matrix completion setting wherein the matrix entries are sampled from any member of the rich family of exponential family distributions; and impose general structural constraints on the underlying matrix, as captured by a general regularizer R(.)\mathcal{R}(.). We propose a simple convex regularized MM--estimator for the generalized framework, and provide a unified and novel statistical analysis for this general class of estimators. We finally corroborate our theoretical results on simulated datasets.Comment: 20 pages, 9 figure

    Task Selection for Bandit-Based Task Assignment in Heterogeneous Crowdsourcing

    Full text link
    Task selection (picking an appropriate labeling task) and worker selection (assigning the labeling task to a suitable worker) are two major challenges in task assignment for crowdsourcing. Recently, worker selection has been successfully addressed by the bandit-based task assignment (BBTA) method, while task selection has not been thoroughly investigated yet. In this paper, we experimentally compare several task selection strategies borrowed from active learning literature, and show that the least confidence strategy significantly improves the performance of task assignment in crowdsourcing.Comment: arXiv admin note: substantial text overlap with arXiv:1507.0580

    A Robust Information Source Estimator with Sparse Observations

    Get PDF
    In this paper, we consider the problem of locating the information source with sparse observations. We assume that a piece of information spreads in a network following a heterogeneous susceptible-infected-recovered (SIR) model and that a small subset of infected nodes are reported, from which we need to find the source of the information. We adopt the sample path based estimator developed in [1], and prove that on infinite trees, the sample path based estimator is a Jordan infection center with respect to the set of observed infected nodes. In other words, the sample path based estimator minimizes the maximum distance to observed infected nodes. We further prove that the distance between the estimator and the actual source is upper bounded by a constant independent of the number of infected nodes with a high probability on infinite trees. Our simulations on tree networks and real world networks show that the sample path based estimator is closer to the actual source than several other algorithms

    Learning the Joint Representation of Heterogeneous Temporal Events for Clinical Endpoint Prediction

    Full text link
    The availability of a large amount of electronic health records (EHR) provides huge opportunities to improve health care service by mining these data. One important application is clinical endpoint prediction, which aims to predict whether a disease, a symptom or an abnormal lab test will happen in the future according to patients' history records. This paper develops deep learning techniques for clinical endpoint prediction, which are effective in many practical applications. However, the problem is very challenging since patients' history records contain multiple heterogeneous temporal events such as lab tests, diagnosis, and drug administrations. The visiting patterns of different types of events vary significantly, and there exist complex nonlinear relationships between different events. In this paper, we propose a novel model for learning the joint representation of heterogeneous temporal events. The model adds a new gate to control the visiting rates of different events which effectively models the irregular patterns of different events and their nonlinear correlations. Experiment results with real-world clinical data on the tasks of predicting death and abnormal lab tests prove the effectiveness of our proposed approach over competitive baselines.Comment: 8 pages, this paper has been accepted by AAAI 201
    • …
    corecore