22 research outputs found

    Geometric Pattern Matching Reduces to k-SUM

    Get PDF
    We prove that some exact geometric pattern matching problems reduce in linear time to o k-SUM when the pattern has a fixed size k. This holds in the real RAM model for searching for a similar copy of a set of k ? 3 points within a set of n points in the plane, and for searching for an affine image of a set of k ? d+2 points within a set of n points in d-space. As corollaries, we obtain improved real RAM algorithms and decision trees for the two problems. In particular, they can be solved by algebraic decision trees of near-linear height

    Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT

    Full text link
    Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that (k+1)(k+1)-core of a graph G\mathcal{G} suffices to compute its kthk^{th} persistence diagram, PDk(G)PD_k(\mathcal{G}). Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunitComment: Spotlight paper at NeurIPS 202

    Computation of Cycle Bases in Surface Embedded Graphs

    Get PDF

    LIPIcs, Volume 258, SoCG 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 258, SoCG 2023, Complete Volum

    Space Efficient Two-Dimensional Orthogonal Colored Range Counting

    Get PDF
    In the two-dimensional orthogonal colored range counting problem, we preprocess a set, PP, of nn colored points on the plane, such that given an orthogonal query rectangle, the number of distinct colors of the points contained in this rectangle can be computed efficiently. For this problem, we design three new solutions, and the bounds of each can be expressed in some form of time-space tradeoff. By setting appropriate parameter values for these solutions, we can achieve new specific results with (the space are in words and ϵ\epsilon is an arbitrary constant in (0,1)(0,1)): ** O(nlg3n)O(n\lg^3 n) space and O(nlg5/2nlglgn)O(\sqrt{n}\lg^{5/2} n \lg \lg n) query time; ** O(nlg2n)O(n\lg^2 n) space and O(nlg4+ϵn)O(\sqrt{n}\lg^{4+\epsilon} n) query time; ** O(nlg2nlglgn)O(n\frac{\lg^2 n}{\lg \lg n}) space and O(nlg5+ϵn)O(\sqrt{n}\lg^{5+\epsilon} n) query time; ** O(nlgn)O(n\lg n) space and O(n1/2+ϵ)O(n^{1/2+\epsilon}) query time. A known conditional lower bound to this problem based on Boolean matrix multiplication gives some evidence on the difficulty of achieving near-linear space solutions with query time better than n\sqrt{n} by more than a polylogarithmic factor using purely combinatorial approaches. Thus the time and space bounds in all these results are efficient. Previously, among solutions with similar query times, the most space-efficient solution uses O(nlg4n)O(n\lg^4 n) space to answer queries in O(nlg8n)O(\sqrt{n}\lg^8 n) time (SIAM. J. Comp.~2008). Thus the new results listed above all achieve improvements in space efficiency, while all but the last result achieve speed-up in query time as well.Comment: full version of an ESA 2021 pape

    Stable topological summaries for analyzing the organization of cells in a packed tissue

    Get PDF
    We use topological data analysis tools for studying the inner organization of cells in segmented images of epithelial tissues. More specifically, for each segmented image, we compute different persistence barcodes, which codify the lifetime of homology classes (persistent homology) along different filtrations (increasing nested sequences of simplicial complexes) that are built from the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical variables over those persistence barcodes, also known as topological summaries. A novel combination of normalization methods for both the set of input segmented images and the produced barcodes allows for the proven stability results for those variables with respect to small changes in the input, as well as invariance to image scale. Our study provides new insights to this problem, such as a possible novel indicator for the development of the drosophila wing disc tissue or the importance of centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries may improve the classification accuracy of epithelial images using a Random Forest algorithm.Ministerio de Ciencia e Innovación PID2019-107339GB-I0

    Learning Neural Graph Representations in Non-Euclidean Geometries

    Get PDF
    The success of Deep Learning methods is heavily dependent on the choice of the data representation. For that reason, much of the actual effort goes into Representation Learning, which seeks to design preprocessing pipelines and data transformations that can support effective learning algorithms. The aim of Representation Learning is to facilitate the task of extracting useful information for classifiers and other predictor models. In this regard, graphs arise as a convenient data structure that serves as an intermediary representation in a wide range of problems. The predominant approach to work with graphs has been to embed them in an Euclidean space, due to the power and simplicity of this geometry. Nevertheless, data in many domains exhibit non-Euclidean features, making embeddings into Riemannian manifolds with a richer structure necessary. The choice of a metric space where to embed the data imposes a geometric inductive bias, with a direct impact on the performance of the models. This thesis is about learning neural graph representations in non-Euclidean geometries and showcasing their applicability in different downstream tasks. We introduce a toolkit formed by different graph metrics with the goal of characterizing the topology of the data. In that way, we can choose a suitable target embedding space aligned to the shape of the dataset. By virtue of the geometric inductive bias provided by the structure of the non-Euclidean manifolds, neural models can achieve higher performances with a reduced parameter footprint. As a first step, we study graphs with hierarchical structures. We develop different techniques to derive hierarchical graphs from large label inventories. Noticing the capacity of hyperbolic spaces to represent tree-like arrangements, we incorporate this information into an NLP model through hyperbolic graph embeddings and showcase the higher performance that they enable. Second, we tackle the question of how to learn hierarchical representations suited for different downstream tasks. We introduce a model that jointly learns task-specific graph embeddings from a label inventory and performs classification in hyperbolic space. The model achieves state-of-the-art results on very fine-grained labels, with a remarkable reduction of the parameter size. Next, we move to matrix manifolds to work on graphs with diverse structures and properties. We propose a general framework to implement the mathematical tools required to learn graph embeddings on symmetric spaces. These spaces are of particular interest given that they have a compound geometry that simultaneously contains Euclidean as well as hyperbolic subspaces, allowing them to automatically adapt to dissimilar features in the graph. We demonstrate a concrete implementation of the framework on Siegel spaces, showcasing their versatility on different tasks. Finally, we focus on multi-relational graphs. We devise the means to translate Euclidean and hyperbolic multi-relational graph embedding models into the space of symmetric positive definite (SPD) matrices. To do so we develop gyrocalculus in this geometry and integrate it with the aforementioned framework

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum

    LIPIcs, Volume 248, ISAAC 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 248, ISAAC 2022, Complete Volum
    corecore