81,139 research outputs found

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks

    Full text link
    Evaluating similarity between graphs is of major importance in several computer vision and pattern recognition problems, where graph representations are often used to model objects or interactions between elements. The choice of a distance or similarity metric is, however, not trivial and can be highly dependent on the application at hand. In this work, we propose a novel metric learning method to evaluate distance between graphs that leverages the power of convolutional neural networks, while exploiting concepts from spectral graph theory to allow these operations on irregular graphs. We demonstrate the potential of our method in the field of connectomics, where neuronal pathways or functional connections between brain regions are commonly modelled as graphs. In this problem, the definition of an appropriate graph similarity function is critical to unveil patterns of disruptions associated with certain brain disorders. Experimental results on the ABIDE dataset show that our method can learn a graph similarity metric tailored for a clinical application, improving the performance of a simple k-nn classifier by 11.9% compared to a traditional distance metric.Comment: International Conference on Medical Image Computing and Computer-Assisted Interventions (MICCAI) 201

    Generalizing Kronecker graphs in order to model searchable networks

    Get PDF
    This paper describes an extension to stochastic Kronecker graphs that provides the special structure required for searchability, by defining a “distance”-dependent Kronecker operator. We show how this extension of Kronecker graphs can generate several existing social network models, such as the Watts-Strogatz small-world model and Kleinberg’s latticebased model. We focus on a specific example of an expanding hypercube, reminiscent of recently proposed social network models based on a hidden hyperbolic metric space, and prove that a greedy forwarding algorithm can find very short paths of length O((log log n)^2) for graphs with n nodes

    Ramsey expansions of metrically homogeneous graphs

    Full text link
    We discuss the Ramsey property, the existence of a stationary independence relation and the coherent extension property for partial isometries (coherent EPPA) for all classes of metrically homogeneous graphs from Cherlin's catalogue, which is conjectured to include all such structures. We show that, with the exception of tree-like graphs, all metric spaces in the catalogue have precompact Ramsey expansions (or lifts) with the expansion property. With two exceptions we can also characterise the existence of a stationary independence relation and the coherent EPPA. Our results can be seen as a new contribution to Ne\v{s}et\v{r}il's classification programme of Ramsey classes and as empirical evidence of the recent convergence in techniques employed to establish the Ramsey property, the expansion (or lift or ordering) property, EPPA and the existence of a stationary independence relation. At the heart of our proof is a canonical way of completing edge-labelled graphs to metric spaces in Cherlin's classes. The existence of such a "completion algorithm" then allows us to apply several strong results in the areas that imply EPPA and respectively the Ramsey property. The main results have numerous corollaries on the automorphism groups of the Fra\"iss\'e limits of the classes, such as amenability, unique ergodicity, existence of universal minimal flows, ample generics, small index property, 21-Bergman property and Serre's property (FA).Comment: 57 pages, 14 figures. Extends results of arXiv:1706.00295. Minor revisio
    corecore