8,677 research outputs found
A Unifying Theory for Graph Transformation
The field of graph transformation studies the rule-based transformation of graphs. An important branch is the algebraic graph transformation tradition, in which approaches are defined and studied using the language of category theory. Most algebraic graph transformation approaches (such as DPO, SPO, SqPO, and AGREE) are opinionated about the local contexts that are allowed around matches for rules, and about how replacement in context should work exactly. The approaches also differ considerably in their underlying formal theories and their general expressiveness (e.g., not all frameworks allow duplication). This dissertation proposes an expressive algebraic graph transformation approach, called PBPO+, which is an adaptation of PBPO by Corradini et al. The central contribution is a proof that PBPO+ subsumes (under mild restrictions) DPO, SqPO, AGREE, and PBPO in the important categorical setting of quasitoposes. This result allows for a more unified study of graph transformation metatheory, methods, and tools. A concrete example of this is found in the second major contribution of this dissertation: a graph transformation termination method for PBPO+, based on decreasing interpretations, and defined for general categories. By applying the proposed encodings into PBPO+, this method can also be applied for DPO, SqPO, AGREE, and PBPO
Computational Analyses of Metagenomic Data
Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic.
In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics.
The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches.
The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation
Covering and Separation for Permutations and Graphs
This is a thesis of two parts, focusing on covering and separation topics of extremal combinatorics and graph theory, two major themes in this area. They entail the existence and properties of collections of combinatorial objects which together either represent all objects (covering) or can be used to distinguish all objects from each other (separation). We will consider a range of problems which come under these areas. The first part will focus on shattering k-sets with permutations. A family of permutations is said to shatter a given k-set if the permutations cover all possible orderings of the k elements. In particular, we investigate the size of permutation families which cover t orders for every possible k-set as well as study the problem of determining the largest number of k-sets that can be shattered by a family with given size. We provide a construction for a small permutation family which shatters every k-set. We also consider constructions of large families which do not shatter any triple. The second part will be concerned with the problem of separating path systems. A separating path system for a graph is a family of paths where, for any two edges, there is a path containing one edge but not the other. The aim is to find the size of the smallest such family. We will study the size of the smallest separating path system for a range of graphs, including complete graphs, complete bipartite graphs, and lattice-type graphs. A key technique we introduce is the use of generator paths - constructed to utilise the symmetric nature of Kn. We continue this symmetric approach for bipartite graphs and study the limitations of the method. We consider lattice-type graphs as an example of the most efficient possible separating systems for any graph
Probabilistic Programming Interfaces for Random Graphs::Markov Categories, Graphons, and Nominal Sets
We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way.We provide three constructions for showing that every graphon arises from an equational theory. The first is an abstract construction, using Markov categories and monoidal indeterminates. The second and third are more concrete. The second is in terms of traditional measure theoretic probability, which covers 'black-and-white' graphons. The third is in terms of probability monads on the nominal sets of Gabbay and Pitts. Specifically, we use a variation of nominal sets induced by the theory of graphs, which covers Erdős-Rényi graphons. In this way, we build new models of graph probabilistic programming from graphons
Parameter Setting in Quantum Approximate Optimization of Weighted Problems
Quantum Approximate Optimization Algorithm (QAOA) is a leading candidate algorithm for solving combinatorial optimization problems on quantum computers. However, in many cases QAOA requires computationally intensive parameter optimization. The challenge of parameter optimization is particularly acute in the case of weighted problems, for which the eigenvalues of the phase operator are non-integer and the QAOA energy landscape is not periodic. In this work, we develop parameter setting heuristics for QAOA applied to a general class of weighted problems. First, we derive optimal parameters for QAOA with depth applied to the weighted MaxCut problem under different assumptions on the weights. In particular, we rigorously prove the conventional wisdom that in the average case the first local optimum near zero gives globally-optimal QAOA parameters. Second, for we prove that the QAOA energy landscape for weighted MaxCut approaches that for the unweighted case under a simple rescaling of parameters. Therefore, we can use parameters previously obtained for unweighted MaxCut for weighted problems. Finally, we prove that for the QAOA objective sharply concentrates around its expectation, which means that our parameter setting rules hold with high probability for a random weighted instance. We numerically validate this approach on general weighted graphs and show that on average the QAOA energy with the proposed fixed parameters is only percentage points away from that with optimized parameters. Third, we propose a general heuristic rescaling scheme inspired by the analytical results for weighted MaxCut and demonstrate its effectiveness using QAOA with the XY Hamming-weight-preserving mixer applied to the portfolio optimization problem. Our heuristic improves the convergence of local optimizers, reducing the number of iterations by 7.4x on average
Robustness, Heterogeneity and Structure Capturing for Graph Representation Learning and its Application
Graph neural networks (GNNs) are potent methods for graph representation learn- ing (GRL), which extract knowledge from complicated (graph) structured data in various real-world scenarios. However, GRL still faces many challenges. Firstly GNN-based node classification may deteriorate substantially by overlooking the pos- sibility of noisy data in graph structures, as models wrongly process the relation among nodes in the input graphs as the ground truth. Secondly, nodes and edges have different types in the real-world and it is essential to capture this heterogeneity in graph representation learning. Next, relations among nodes are not restricted to pairwise relations and it is necessary to capture the complex relations accordingly. Finally, the absence of structural encodings, such as positional information, deterio- rates the performance of GNNs. This thesis proposes novel methods to address the aforementioned problems:
1. Bayesian Graph Attention Network (BGAT): Developed for situations with scarce data, this method addresses the influence of spurious edges. Incor- porating Bayesian principles into the graph attention mechanism enhances robustness, leading to competitive performance against benchmarks (Chapter 3).
2. Neighbour Contrastive Heterogeneous Graph Attention Network (NC-HGAT): By enhancing a cutting-edge self-supervised heterogeneous graph neural net- work model (HGAT) with neighbour contrastive learning, this method ad- dresses heterogeneity and uncertainty simultaneously. Extra attention to edge relations in heterogeneous graphs also aids in subsequent classification tasks (Chapter 4).
3. A novel ensemble learning framework is introduced for predicting stock price movements. It adeptly captures both group-level and pairwise relations, lead- ing to notable advancements over the existing state-of-the-art. The integration of hypergraph and graph models, coupled with the utilisation of auxiliary data via GNNs before recurrent neural network (RNN), provides a deeper under- standing of long-term dependencies between similar entities in multivariate time series analysis (Chapter 5).
4. A novel framework for graph structure learning is introduced, segmenting graphs into distinct patches. By harnessing the capabilities of transformers and integrating other position encoding techniques, this approach robustly capture intricate structural information within a graph. This results in a more comprehensive understanding of its underlying patterns (Chapter 6)
Large cliques or cocliques in hypergraphs with forbidden order-size pairs
The well-known Erdős-Hajnal conjecture states that for any graph , there exists such that every -vertex graph that contains no induced copy of has a homogeneous set of size at least . We consider a variant of the Erdős-Hajnal problem for hypergraphs where we forbid a family of hypergraphs described by their orders and sizes. For graphs, we observe that if we forbid induced subgraphs on vertices and edges for any positive and , then we obtain large homogeneous sets. For triple systems, in the first nontrivial case , for every , we give bounds on the minimum size of a homogeneous set in a triple system where the number of edges spanned by every four vertices is not in . In most cases the bounds are essentially tight. We also determine, for all , whether the growth rate is polynomial or polylogarithmic. Some open problems remain
Classical and quantum algorithms for scaling problems
This thesis is concerned with scaling problems, which have a plethora of connections to different areas of mathematics, physics and computer science. Although many structural aspects of these problems are understood by now, we only know how to solve them efficiently in special cases.We give new algorithms for non-commutative scaling problems with complexity guarantees that match the prior state of the art. To this end, we extend the well-known (self-concordance based) interior-point method (IPM) framework to Riemannian manifolds, motivated by its success in the commutative setting. Moreover, the IPM framework does not obviously suffer from the same obstructions to efficiency as previous methods. It also yields the first high-precision algorithms for other natural geometric problems in non-positive curvature.For the (commutative) problems of matrix scaling and balancing, we show that quantum algorithms can outperform the (already very efficient) state-of-the-art classical algorithms. Their time complexity can be sublinear in the input size; in certain parameter regimes they are also optimal, whereas in others we show no quantum speedup over the classical methods is possible. Along the way, we provide improvements over the long-standing state of the art for searching for all marked elements in a list, and computing the sum of a list of numbers.We identify a new application in the context of tensor networks for quantum many-body physics. We define a computable canonical form for uniform projected entangled pair states (as the solution to a scaling problem), circumventing previously known undecidability results. We also show, by characterizing the invariant polynomials, that the canonical form is determined by evaluating the tensor network contractions on networks of bounded size
Algorithms and complexity for approximately counting hypergraph colourings and related problems
The past decade has witnessed advancements in designing efficient algorithms for approximating the number of solutions to constraint satisfaction problems (CSPs), especially in the local lemma regime. However, the phase transition for the computational tractability is not known. This thesis is dedicated to the prototypical problem of this kind of CSPs, the hypergraph colouring. Parameterised by the number of colours q, the arity of each hyperedge k, and the vertex maximum degree Δ, this problem falls into the regime of Lovász local lemma when Δ ≲ qᵏ. In prior, however, fast approximate counting algorithms exist when Δ ≲ qᵏ/³, and there is no known inapproximability result. In pursuit of this, our contribution is two-folded, stated as follows.
• When q, k ≥ 4 are evens and Δ ≥ 5·qᵏ/², approximating the number of hypergraph colourings is NP-hard.
• When the input hypergraph is linear and Δ ≲ qᵏ/², a fast approximate counting algorithm does exist
- …