280 research outputs found
Classical and quantum algorithms for scaling problems
This thesis is concerned with scaling problems, which have a plethora of connections to different areas of mathematics, physics and computer science. Although many structural aspects of these problems are understood by now, we only know how to solve them efficiently in special cases.We give new algorithms for non-commutative scaling problems with complexity guarantees that match the prior state of the art. To this end, we extend the well-known (self-concordance based) interior-point method (IPM) framework to Riemannian manifolds, motivated by its success in the commutative setting. Moreover, the IPM framework does not obviously suffer from the same obstructions to efficiency as previous methods. It also yields the first high-precision algorithms for other natural geometric problems in non-positive curvature.For the (commutative) problems of matrix scaling and balancing, we show that quantum algorithms can outperform the (already very efficient) state-of-the-art classical algorithms. Their time complexity can be sublinear in the input size; in certain parameter regimes they are also optimal, whereas in others we show no quantum speedup over the classical methods is possible. Along the way, we provide improvements over the long-standing state of the art for searching for all marked elements in a list, and computing the sum of a list of numbers.We identify a new application in the context of tensor networks for quantum many-body physics. We define a computable canonical form for uniform projected entangled pair states (as the solution to a scaling problem), circumventing previously known undecidability results. We also show, by characterizing the invariant polynomials, that the canonical form is determined by evaluating the tensor network contractions on networks of bounded size
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Analog Photonics Computing for Information Processing, Inference and Optimisation
This review presents an overview of the current state-of-the-art in photonics
computing, which leverages photons, photons coupled with matter, and
optics-related technologies for effective and efficient computational purposes.
It covers the history and development of photonics computing and modern
analogue computing platforms and architectures, focusing on optimization tasks
and neural network implementations. The authors examine special-purpose
optimizers, mathematical descriptions of photonics optimizers, and their
various interconnections. Disparate applications are discussed, including
direct encoding, logistics, finance, phase retrieval, machine learning, neural
networks, probabilistic graphical models, and image processing, among many
others. The main directions of technological advancement and associated
challenges in photonics computing are explored, along with an assessment of its
efficiency. Finally, the paper discusses prospects and the field of optical
quantum computing, providing insights into the potential applications of this
technology.Comment: Invited submission by Journal of Advanced Quantum Technologies;
accepted version 5/06/202
Automated detection of tumoural cells with graph neural networks
La detecció de cèl·lules tumorals en imatges de seccions completes és una tasca essencial en el diagnòstic mèdic i la investigació. En aquesta tesi, proposem i analitzem un enfocament innovador que combina models basats en visió amb xarxes neuronals en grafs per millorar la precisió de la detecció automatitzada de cèl·lules tumorals. La nostra proposta aprofita l'estructura inherent i les relacions entre cèl·lules en el teixit. Els resultats experimentals en el nostre propi conjunt de dades curat mostrin que diversos indicadors milloren fins a un 15\% en comparació amb només usar l'enfocament de visió. S'ha demostrat que funciona amb teixit pulmonar tenyit amb H\&E i teixit mamari tenyit amb HER2. Creiem que el nostre mètode proposat té el potencial de millorar la precisió de la detecció automatitzada de cèl·lules tumorals, el que pot portar a uns diagnòstics més ràpids i una investigació accelerada en el camp degut a la reducció en la càrrega de treball dels histopatòlegs.La detección de células tumorales en imágenes de portaobjeto completo juega un papel esencial en el diagnóstico médico y es un elemento fundamental de la investigación sobre el cáncer. En esta tesis proponemos y analizamos un enfoque novedoso que combina modelos de visión por ordenador con redes neuronales en grafos para mejorar la precisión de la detección automatizada de células tumorales. Nuestra propuesta aprovecha la estructura inherente y las relaciones entre las células del tejido. Los resultados experimentales obtenidos sobre nuestra propia base de datos muestran que varias métricas mejoran hasta en un 15\% en comparación con solo usar el enfoque de visión. Se ha demostrado que funciona con tejido pulmonar teñido con H\&E y tejido mamario teñido con HER2. Creemos que nuestro método tiene el potencial de mejorar la precisión de los métodos automáticos de detección de células tumorales, lo que puede llevar a acelerar los diagnósticos y la investigación en este ámbito al reducir la carga de trabajo de los histopatólogos.The detection of tumoural cells from whole slide images is an essential task in medical diagnosis and research. In this thesis, we propose and analyse a novel approach that combines computer vision-based models with graph neural networks to improve the accuracy of automated tumoural cell detection. Our proposal leverages the inherent structure and relationships between cells in the tissue. Experimental results on our own curated dataset shows that several different metrics improve by up to compared to just using the computer vision approach. It has been proved to work with H\&E stained lung tissue and HER2 stained breast tissue. We believe that our proposed method has the potential to improve the accuracy of automated tumoural cell detection, which can lead to accelerated diagnosis and research in the field by reducing the worload of hystopathologists
Geometric Learning on Graph Structured Data
Graphs provide a ubiquitous and universal data structure that can be applied in many domains such as social networks, biology, chemistry, physics, and computer science. In this thesis we focus on two fundamental paradigms in graph learning: representation learning and similarity learning over graph-structured data. Graph representation learning aims to learn embeddings for nodes by integrating topological and feature information of a graph. Graph similarity learning brings into play with similarity functions that allow to compute similarity between pairs of graphs in a vector space. We address several challenging issues in these two paradigms, designing powerful, yet efficient and theoretical guaranteed machine learning models that can leverage rich topological structural properties of real-world graphs.
This thesis is structured into two parts. In the first part of the thesis, we will present how to develop powerful Graph Neural Networks (GNNs) for graph representation learning from three different perspectives: (1) spatial GNNs, (2) spectral GNNs, and (3) diffusion GNNs. We will discuss the model architecture, representational power, and convergence properties of these GNN models. Specifically, we first study how to develop expressive, yet efficient and simple message-passing aggregation schemes that can go beyond the Weisfeiler-Leman test (1-WL). We propose a generalized message-passing framework by incorporating graph structural properties into an aggregation scheme. Then, we introduce a new local isomorphism hierarchy on neighborhood subgraphs. We further develop a novel neural model, namely GraphSNN, and theoretically prove that this model is more expressive than the 1-WL test. After that, we study how to build an effective and efficient graph convolution model with spectral graph filters. In this study, we propose a spectral GNN model, called DFNets, which incorporates a novel spectral graph filter, namely feedback-looped filters. As a result, this model can provide better localization on neighborhood while achieving fast convergence and linear memory requirements. Finally, we study how to capture the rich topological information of a graph using graph diffusion. We propose a novel GNN architecture with dynamic PageRank, based on a learnable transition matrix. We explore two variants of this GNN architecture: forward-euler solution and invariable feature solution, and theoretically prove that our forward-euler GNN architecture is guaranteed with the convergence to a stationary distribution.
In the second part of this thesis, we will introduce a new optimal transport distance metric on graphs in a regularized learning framework for graph kernels. This optimal transport distance metric can preserve both local and global structures between graphs during the transport, in addition to preserving features and their local variations. Furthermore, we propose two strongly convex regularization terms to theoretically guarantee the convergence and numerical stability in finding an optimal assignment between graphs. One regularization term is used to regularize a Wasserstein distance between graphs in the same ground space. This helps to preserve the local clustering structure on graphs by relaxing the optimal transport problem to be a cluster-to-cluster assignment between locally connected vertices. The other regularization term is used to regularize a Gromov-Wasserstein distance between graphs across different ground spaces based on degree-entropy KL divergence. This helps to improve the matching robustness of an optimal alignment to preserve the global connectivity structure of graphs. We have evaluated our optimal transport-based graph kernel using different benchmark tasks. The experimental results show that our models considerably outperform all the state-of-the-art methods in all benchmark tasks
Recommended from our members
Taking shape: The data science of elastic shape analysis with practical applications
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University London.A mathematical curve can represent many different objects, both physical and abstract,
from the outline curve of an artefact in an image to the weight of growing animal to
the set of frequencies used in a sound. Regardless of these variations, the curves can
almost always vary non-linearly. One way to study shapes and their potential variations
is elastic shape analysis, a rich theory of which has developed over the past twenty years.
However, methods of elastic shape analysis are seldom utilized in practical applications
on real-world data, especially outside of the mathematical shape analysis community.
Our aim in this thesis is to explore some practical applications of elastic shape analysis.
To do this, we work with various types of shape data, the majority of which are based on
image datasets. As our focus is on two-dimensional curves, it is important to be able to
robustly extract contours from images, before we can apply elastic shape analysis tools.
In order to analyse the shapes in a dataset, we turn to methods of machine learning, to
investigate the applications of elastic shape analysis in classification.
In this thesis, we introduce an anthology of projects, in order to emphasise and under-
stand the potential of elastic shape analysis in practical applications. There are four main
projects in this thesis: (i) Classification of objects using outlines and the comparisons
between methods of elastic shape analysis, geometric morphometrics, and human experts,
with a focus on ancient Greek vases, (ii) Mussel species identification and a demonstra-
tion that shape may not be enough in some applications, (iii) A novel tool to monitor
the development of k ̄ak ̄ap ̄o chicks, and (iv) Classifying individual kiwi based on acoustic
data from their calls.
By combining tools from computer vision and machine learning with methods of elastic
shape analysis, we introduce a practical framework for the application of elastic shape
analysis, through a data science lens
Euler Characteristic Tools For Topological Data Analysis
In this article, we study Euler characteristic techniques in topological data
analysis. Pointwise computing the Euler characteristic of a family of
simplicial complexes built from data gives rise to the so-called Euler
characteristic profile. We show that this simple descriptor achieve
state-of-the-art performance in supervised tasks at a very low computational
cost. Inspired by signal analysis, we compute hybrid transforms of Euler
characteristic profiles. These integral transforms mix Euler characteristic
techniques with Lebesgue integration to provide highly efficient compressors of
topological signals. As a consequence, they show remarkable performances in
unsupervised settings. On the qualitative side, we provide numerous heuristics
on the topological and geometric information captured by Euler profiles and
their hybrid transforms. Finally, we prove stability results for these
descriptors as well as asymptotic guarantees in random settings.Comment: 39 page
Parallel and Flow-Based High Quality Hypergraph Partitioning
Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits.
Given a hypergraph and an integer , the task is to divide the vertices into disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks.
In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge.
The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases.
In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs.
Once sufficiently small, an initial partition is computed.
Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level.
An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time.
The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem.
Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality.
While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible.
We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways.
Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines.
In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof.
We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation.
For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements.
For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly.
Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework.
It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner.
Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level.
This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential.
We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening.
In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio.
This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening.
The last ingredient for high quality is an iterative improvement algorithm based on maximum flows.
In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts.
Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel.
Beyond the strive for highest quality, we present a deterministically parallel partitioning framework.
We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement.
Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small.
All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets.
To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar.
While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain.
With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense.
Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Data analysis with merge trees
Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram
- …