22 research outputs found
Geometric Pattern Matching Reduces to k-SUM
We prove that some exact geometric pattern matching problems reduce in linear time to o k-SUM when the pattern has a fixed size k. This holds in the real RAM model for searching for a similar copy of a set of k ? 3 points within a set of n points in the plane, and for searching for an affine image of a set of k ? d+2 points within a set of n points in d-space.
As corollaries, we obtain improved real RAM algorithms and decision trees for the two problems. In particular, they can be solved by algebraic decision trees of near-linear height
Reduction Algorithms for Persistence Diagrams of Networks: CoralTDA and PrunIT
Topological data analysis (TDA) delivers invaluable and complementary
information on the intrinsic properties of data inaccessible to conventional
methods. However, high computational costs remain the primary roadblock
hindering the successful application of TDA in real-world studies, particularly
with machine learning on large complex networks.
Indeed, most modern networks such as citation, blockchain, and online social
networks often have hundreds of thousands of vertices, making the application
of existing TDA methods infeasible. We develop two new, remarkably simple but
effective algorithms to compute the exact persistence diagrams of large graphs
to address this major TDA limitation. First, we prove that -core of a
graph suffices to compute its persistence diagram,
. Second, we introduce a pruning algorithm for graphs to
compute their persistence diagrams by removing the dominated vertices. Our
experiments on large networks show that our novel approach can achieve
computational gains up to 95%.
The developed framework provides the first bridge between the graph theory
and TDA, with applications in machine learning of large complex networks. Our
implementation is available at
https://github.com/cakcora/PersistentHomologyWithCoralPrunitComment: Spotlight paper at NeurIPS 202
LIPIcs, Volume 258, SoCG 2023, Complete Volume
LIPIcs, Volume 258, SoCG 2023, Complete Volum
Space Efficient Two-Dimensional Orthogonal Colored Range Counting
In the two-dimensional orthogonal colored range counting problem, we
preprocess a set, , of colored points on the plane, such that given an
orthogonal query rectangle, the number of distinct colors of the points
contained in this rectangle can be computed efficiently.
For this problem, we design three new solutions, and the bounds of each can
be expressed in some form of time-space tradeoff.
By setting appropriate parameter values for these solutions, we can achieve
new specific results with (the space are in words and is an
arbitrary constant in ):
** space and query time;
** space and query time;
** space and
query time;
** space and query time.
A known conditional lower bound to this problem based on Boolean matrix
multiplication gives some evidence on the difficulty of achieving near-linear
space solutions with query time better than by more than a
polylogarithmic factor using purely combinatorial approaches. Thus the time and
space bounds in all these results are efficient.
Previously, among solutions with similar query times, the most
space-efficient solution uses space to answer queries in
time (SIAM. J. Comp.~2008).
Thus the new results listed above all achieve improvements in space
efficiency, while all but the last result achieve speed-up in query time as
well.Comment: full version of an ESA 2021 pape
Stable topological summaries for analyzing the organization of cells in a packed tissue
We use topological data analysis tools for studying the inner organization of cells in
segmented images of epithelial tissues. More specifically, for each segmented image, we compute
different persistence barcodes, which codify the lifetime of homology classes (persistent homology)
along different filtrations (increasing nested sequences of simplicial complexes) that are built from
the regions representing the cells in the tissue. We use a complete and well-grounded set of numerical
variables over those persistence barcodes, also known as topological summaries. A novel combination
of normalization methods for both the set of input segmented images and the produced barcodes
allows for the proven stability results for those variables with respect to small changes in the input,
as well as invariance to image scale. Our study provides new insights to this problem, such as a
possible novel indicator for the development of the drosophila wing disc tissue or the importance of
centroids’ distribution to differentiate some tissues from their CVT-path counterpart (a mathematical
model of epithelia based on Voronoi diagrams). We also show how the use of topological summaries
may improve the classification accuracy of epithelial images using a Random Forest algorithm.Ministerio de Ciencia e Innovación PID2019-107339GB-I0
Learning Neural Graph Representations in Non-Euclidean Geometries
The success of Deep Learning methods is heavily dependent on the choice of the data representation. For that reason, much of the actual effort goes into Representation Learning, which seeks to design preprocessing pipelines and data transformations that can support effective learning algorithms. The aim of Representation Learning is to facilitate the task of extracting useful information for classifiers and other predictor models. In this regard, graphs arise as a convenient data structure that serves as an intermediary representation in a wide range of problems. The predominant approach to work with graphs has been to embed them in an Euclidean space, due to the power and simplicity of this geometry. Nevertheless, data in many domains exhibit non-Euclidean features, making embeddings into Riemannian manifolds with a richer structure necessary. The choice of a metric space where to embed the data imposes a geometric inductive bias, with a direct impact on the performance of the models.
This thesis is about learning neural graph representations in non-Euclidean geometries and showcasing their applicability in different downstream tasks. We introduce a toolkit formed by different graph metrics with the goal of characterizing the topology of the data. In that way, we can choose a suitable target embedding space aligned to the shape of the dataset. By virtue of the geometric inductive bias provided by the structure of the non-Euclidean manifolds, neural models can achieve higher performances with a reduced parameter footprint.
As a first step, we study graphs with hierarchical structures. We develop different techniques to derive hierarchical graphs from large label inventories. Noticing the capacity of hyperbolic spaces to represent tree-like arrangements, we incorporate this information into an NLP model through hyperbolic graph embeddings and showcase the higher performance that they enable.
Second, we tackle the question of how to learn hierarchical representations suited for different downstream tasks. We introduce a model that jointly learns task-specific graph embeddings from a label inventory and performs classification in hyperbolic space. The model achieves state-of-the-art results on very fine-grained labels, with a remarkable reduction of the parameter size.
Next, we move to matrix manifolds to work on graphs with diverse structures and properties. We propose a general framework to implement the mathematical tools required to learn graph embeddings on symmetric spaces. These spaces are of particular interest given that they have a compound geometry that simultaneously contains Euclidean as well as hyperbolic subspaces, allowing them to automatically adapt to dissimilar features in the graph. We demonstrate a concrete implementation of the framework on Siegel spaces, showcasing their versatility on different tasks.
Finally, we focus on multi-relational graphs. We devise the means to translate Euclidean and hyperbolic multi-relational graph embedding models into the space of symmetric positive definite (SPD) matrices. To do so we develop gyrocalculus in this geometry and integrate it with the aforementioned framework
LIPIcs, Volume 277, GIScience 2023, Complete Volume
LIPIcs, Volume 277, GIScience 2023, Complete Volum
LIPIcs, Volume 248, ISAAC 2022, Complete Volume
LIPIcs, Volume 248, ISAAC 2022, Complete Volum