86 research outputs found

    Fidelity-Weighted Learning

    Full text link
    Training deep neural networks requires many training samples, but in practice training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we propose "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data. We evaluate FWL on two tasks in information retrieval and natural language processing where we outperform state-of-the-art alternative semi-supervised methods, indicating that our approach makes better use of strong and weak labels, and leads to better task-dependent data representations.Comment: Published as a conference paper at ICLR 201

    Distributed Computing in the Asynchronous LOCAL model

    Full text link
    The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in lock steps, and are failure-free. We show that relaxing these hypotheses to some extent does not hurt local computing. In particular, we show that, for any construction task TT associated to a locally checkable labeling (LCL), if TT is solvable in tt rounds in the LOCAL model, then TT remains solvable in O(t)O(t) rounds in the asynchronous LOCAL model. This improves the result by Casta\~neda et al. [SSS 2016], which was restricted to 3-coloring the rings. More generally, the main contribution of this paper is to show that, perhaps surprisingly, asynchrony and failures in the computations do not restrict the power of the LOCAL model, as long as the communications remain synchronous and failure-free

    Learning from Samples of Variable Quality

    Get PDF
    Training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account, we could get the best of both worlds. To this end, we introduce “fidelity-weighted learning” (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network, trained on the task we care about on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher, who has access to limited samples with high-quality labels

    Learning from Samples of Variable Quality

    Get PDF
    Training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus-quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account, we could get the best of both worlds. To this end, we introduce “fidelity-weighted learning” (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network, trained on the task we care about on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher, who has access to limited samples with high-quality labels

    Use of Enumerative Combinatorics for proving the applicability of an asymptotic stability result on discrete-time SIS epidemics in complex networks

    Full text link
    In this paper, we justify by the use of Enumerative Combinatorics, that the results obtained in \cite{Alarcon1}, where is analysed the complex dynamics of an epidemic model to identify the nodes that contribute the most to the propagation process and because of that are good candidates to be controlled in the network in order to stabilize the network to reach the extinction state, is applicable in almost all the cases. The model analysed was proposed in \cite{Gomez1} %et al. [Phys.Rev.E 84, 036105(2011)] and results obtained in \cite{Alarcon1} implies that it is not necessary to control all nodes, but only a minimal set of nodes if the topology of the network is not regular. This result could be important in the spirit of considering policies of isolation or quarantine of those nodes to be controlled. Simulation results were presented in \cite{Alarcon1} for large free-scale and regular networks, that corroborate the theoretical findings. In this article we justify the applicability of the controllability result obtained in \cite{Alarcon1} in almost all the cases by means of the use of Combinatorics. {\em Mathematics Subjects Classification}: 05A16,34H20,58E25 {\em Keywords}: Asymptotic Graph Enumeration Problems; Network control; virus spreading

    Normalization Techniques for Sequential and Graphical Data

    Get PDF
    Normalization methods have proven to be an invaluable tool in the training of deep neural networks. In particular, Layer and Batch Normalization are commonly used to mitigate the risks of exploding and vanishing gradients. This work presents two methods which are related to these normalization techniques. The first method is Batch Normalized Preconditioning (BNP) for recurrent neural networks (RNN) and graph convolutional networks (GCN). BNP has been suggested as a technique for Fully Connected and Convolutional networks for achieving similar performance benefits to Batch Normalization by controlling the condition number of the Hessian through preconditioning on the gradients. We extend this work by applying it to Recurrent Neural Networks and Graph Convolutional Networks, two architectures which are prone to high computational costs and therefore benefit from the training acceleration provided by BNP. The second method is Assorted-Time Normalization (ATN). ATN is a normalization technique designed for use in sequential problems. It combines information from the hidden layers of the model with temporal data across the sequence dimension, which remedies a weakness of Layer Normalization in these applications

    Enumerating molecules.

    Full text link

    Efficient Frequent Subtree Mining Beyond Forests

    Get PDF
    A common paradigm in distance-based learning is to embed the instance space into some appropriately chosen feature space equipped with a metric and to define the dissimilarity between instances by the distance of their images in the feature space. If the instances are graphs, then frequent connected subgraphs are a well-suited pattern language to define such feature spaces. Identifying the set of frequent connected subgraphs and subsequently computing embeddings for graph instances, however, is computationally intractable. As a result, existing frequent subgraph mining algorithms either restrict the structural complexity of the instance graphs or require exponential delay between the output of subsequent patterns. Hence distance-based learners lack an efficient way to operate on arbitrary graph data. To resolve this problem, in this thesis we present a mining system that gives up the demand on the completeness of the pattern set to instead guarantee a polynomial delay between subsequent patterns. Complementing this, we devise efficient methods to compute the embedding of arbitrary graphs into the Hamming space spanned by our pattern set. As a result, we present a system that allows to efficiently apply distance-based learning methods to arbitrary graph databases. To overcome the computational intractability of the mining step, we consider only frequent subtrees for arbitrary graph databases. This restriction alone, however, does not suffice to make the problem tractable. We reduce the mining problem from arbitrary graphs to forests by replacing each graph by a polynomially sized forest obtained from a random sample of its spanning trees. This results in an incomplete mining algorithm. However, we prove that the probability of missing a frequent subtree pattern is low. We show empirically that this is true in practice even for very small sized forests. As a result, our algorithm is able to mine frequent subtrees in a range of graph databases where state-of-the-art exact frequent subgraph mining systems fail to produce patterns in reasonable time or even at all. Furthermore, the predictive performance of our patterns is comparable to that of exact frequent connected subgraphs, where available. The above method considers polynomially many spanning trees for the forest, while many graphs have exponentially many spanning trees. The number of patterns found by our mining algorithm can be negatively influenced by this exponential gap. We hence propose a method that can (implicitly) consider forests of exponential size, while remaining computationally tractable. This results in a higher recall for our incomplete mining algorithm. Furthermore, the methods extend the known positive results on the tractability of exact frequent subtree mining to a novel class of transaction graphs. We conjecture that the next natural extension of our results to a larger transaction graph class is at least as difficult as proving whether P = NP, or not. Regarding the graph embedding step, we apply a similar strategy as in the mining step. We represent a novel graph by a forest of its spanning trees and decide whether the frequent trees from the mining step are subgraph isomorphic to this forest. As a result, the embedding computation has one-sided error with respect to the exact subgraph isomorphism test but is computationally tractable. Furthermore, we show that we can leverage a partial order on the pattern set. This structure can be used to reduce the runtime of the embedding computation dramatically. For the special case of Jaccard-similarity between graph embeddings, a further substantial reduction of runtime can be achieved using min-hashing. The Jaccard-distance can be approximated using small sketch vectors that can be computed fast, again using the partial order on the tree patterns
    corecore