976 research outputs found

    Computational Analyses of Metagenomic Data

    Get PDF
    Metagenomics studies the collective microbial genomes extracted from a particular environment without requiring the culturing or isolation of individual genomes, addressing questions revolving around the composition, functionality, and dynamics of microbial communities. The intrinsic complexity of metagenomic data and the diversity of applications call for efficient and accurate computational methods in data handling. In this thesis, I present three primary projects that collectively focus on the computational analysis of metagenomic data, each addressing a distinct topic. In the first project, I designed and implemented an algorithm named Mapbin for reference-free genomic binning of metagenomic assemblies. Binning aims to group a mixture of genomic fragments based on their genome origin. Mapbin enhances binning results by building a multilayer network that combines the initial binning, assembly graph, and read-pairing information from paired-end sequencing data. The network is further partitioned by the community-detection algorithm, Infomap, to yield a new binning result. Mapbin was tested on multiple simulated and real datasets. The results indicated an overall improvement in the common binning quality metrics. The second and third projects are both derived from ImMiGeNe, a collaborative and multidisciplinary study investigating the interplay between gut microbiota, host genetics, and immunity in stem-cell transplantation (SCT) patients. In the second project, I conducted microbiome analyses for the metagenomic data. The workflow included the removal of contaminant reads and multiple taxonomic and functional profiling. The results revealed that the SCT recipients' samples yielded significantly fewer reads with heavy contamination of the host DNA, and their microbiomes displayed evident signs of dysbiosis. Finally, I discussed several inherent challenges posed by extremely low levels of target DNA and high levels of contamination in the recipient samples, which cannot be rectified solely through bioinformatics approaches. The primary goal of the third project is to design a set of primers that can be used to cover bacterial flagellin genes present in the human gut microbiota. Considering the notable diversity of flagellins, I incorporated a method to select representative bacterial flagellin gene sequences, a heuristic approach based on established primer design methods to generate a degenerate primer set, and a selection method to filter genes unlikely to occur in the human gut microbiome. As a result, I successfully curated a reduced yet representative set of primers that would be practical for experimental implementation

    Algorithms and complexity for approximately counting hypergraph colourings and related problems

    Get PDF
    The past decade has witnessed advancements in designing efficient algorithms for approximating the number of solutions to constraint satisfaction problems (CSPs), especially in the local lemma regime. However, the phase transition for the computational tractability is not known. This thesis is dedicated to the prototypical problem of this kind of CSPs, the hypergraph colouring. Parameterised by the number of colours q, the arity of each hyperedge k, and the vertex maximum degree Δ, this problem falls into the regime of LovĂĄsz local lemma when Δ â‰Č qᔏ. In prior, however, fast approximate counting algorithms exist when Δ â‰Č qᔏ/Âł, and there is no known inapproximability result. In pursuit of this, our contribution is two-folded, stated as follows. ‱ When q, k ≄ 4 are evens and Δ ≄ 5·qᔏ/ÂČ, approximating the number of hypergraph colourings is NP-hard. ‱ When the input hypergraph is linear and Δ â‰Č qᔏ/ÂČ, a fast approximate counting algorithm does exist

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    The success probability in Levine's hat problem, and independent sets in graphs

    Full text link
    Lionel Levine's hat challenge has tt players, each with a (very large, or infinite) stack of hats on their head, each hat independently colored at random black or white. The players are allowed to coordinate before the random colors are chosen, but not after. Each player sees all hats except for those on her own head. They then proceed to simultaneously try and each pick a black hat from their respective stacks. They are proclaimed successful only if they are all correct. Levine's conjecture is that the success probability tends to zero when the number of players grows. We prove that this success probability is strictly decreasing in the number of players, and present some connections to problems in graph theory: relating the size of the largest independent set in a graph and in a random induced subgraph of it, and bounding the size of a set of vertices intersecting every maximum-size independent set in a graph.Comment: arXiv admin note: substantial text overlap with arXiv:2103.01541, arXiv:2103.0599

    Nonlocal games and their device-independent quantum applications

    Get PDF
    Device-independence is a property of certain protocols that allows one to ensure their proper execution given only classical interaction with devices and assuming the correctness of the laws of physics. This scenario describes the most general form of cryptographic security, in which no trust is placed in the hardware involved; indeed, one may even take it to have been prepared by an adversary. Many quantum tasks have been shown to admit device-independent protocols by augmentation with "nonlocal games". These are games in which noncommunicating parties jointly attempt to fulfil some conditions imposed by a referee. We introduce examples of such games and examine the optimal strategies of players who are allowed access to different possible shared resources, such as entangled quantum states. We then study their role in self-testing, private random number generation, and secure delegated quantum computation. Hardware imperfections are naturally incorporated in the device-independent scenario as adversarial, and we thus also perform noise robustness analysis where feasible. We first study a generalization of the Mermin–Peres magic square game to arbitrary rectangular dimensions. After exhibiting some general properties, these "magic rectangle" games are fully characterized in terms of their optimal win probabilities for quantum strategies. We find that for m×n magic rectangle games with dimensions m,n≄3, there are quantum strategies that win with certainty, while for dimensions 1×n quantum strategies do not outperform classical strategies. The final case of dimensions 2×n is richer, and we give upper and lower bounds that both outperform the classical strategies. As an initial usage scenario, we apply our findings to quantum certified randomness expansion to find noise tolerances and rates for all magic rectangle games. To do this, we use our previous results to obtain the winning probabilities of games with a distinguished input for which the devices give a deterministic outcome and follow the analysis of C. A. Miller and Y. Shi [SIAM J. Comput. 46, 1304 (2017)]. Self-testing is a method to verify that one has a particular quantum state from purely classical statistics. For practical applications, such as device-independent delegated verifiable quantum computation, it is crucial that one self-tests multiple Bell states in parallel while keeping the quantum capabilities required of one side to a minimum. We use our 3×n magic rectangle games to obtain a self-test for n Bell states where one side needs only to measure single-qubit Pauli observables. The protocol requires small input sizes [constant for Alice and O(log n) bits for Bob] and is robust with robustness O(n⁔/ÂČ√Δ), where Δ is the closeness of the ideal (perfect) correlations to those observed. To achieve the desired self-test, we introduce a one-side-local quantum strategy for the magic square game that wins with certainty, we generalize this strategy to the family of 3×n magic rectangle games, and we supplement these nonlocal games with extra check rounds (of single and pairs of observables). Finally, we introduce a device-independent two-prover scheme in which a classical verifier can use a simple untrusted quantum measurement device (the client device) to securely delegate a quantum computation to an untrusted quantum server. To do this, we construct a parallel self-testing protocol to perform device-independent remote state preparation of n qubits and compose this with the unconditionally secure universal verifiable blind quantum computation (VBQC) scheme of J. F. Fitzsimons and E. Kashefi [Phys. Rev. A 96, 012303 (2017)]. Our self-test achieves a multitude of desirable properties for the application we consider, giving rise to practical and fully device-independent VBQC. It certifies parallel measurements of all cardinal and intercardinal directions in the XY-plane as well as the computational basis, uses few input questions (of size logarithmic in n for the client and a constant number communicated to the server), and requires only single-qubit measurements to be performed by the client device

    Faster Deterministic Distributed MIS and Approximate Matching

    Full text link
    \renewcommand{\tilde}{\widetilde} We present an O~(log⁥2n)\tilde{O}(\log^2 n) round deterministic distributed algorithm for the maximal independent set problem. By known reductions, this round complexity extends also to maximal matching, Δ+1\Delta+1 vertex coloring, and 2Δ−12\Delta-1 edge coloring. These four problems are among the most central problems in distributed graph algorithms and have been studied extensively for the past four decades. This improved round complexity comes closer to the Ω~(log⁥n)\tilde{\Omega}(\log n) lower bound of maximal independent set and maximal matching [Balliu et al. FOCS '19]. The previous best known deterministic complexity for all of these problems was Θ(log⁥3n)\Theta(\log^3 n). Via the shattering technique, the improvement permeates also to the corresponding randomized complexities, e.g., the new randomized complexity of Δ+1\Delta+1 vertex coloring is now O~(log⁥2log⁥n)\tilde{O}(\log^2\log n) rounds. Our approach is a novel combination of the previously known two methods for developing deterministic algorithms for these problems, namely global derandomization via network decomposition (see e.g., [Rozhon, Ghaffari STOC'20; Ghaffari, Grunau, Rozhon SODA'21; Ghaffari et al. SODA'23]) and local rounding of fractional solutions (see e.g., [Fischer DISC'17; Harris FOCS'19; Fischer, Ghaffari, Kuhn FOCS'17; Ghaffari, Kuhn FOCS'21; Faour et al. SODA'23]). We consider a relaxation of the classic network decomposition concept, where instead of requiring the clusters in the same block to be non-adjacent, we allow each node to have a small number of neighboring clusters. We also show a deterministic algorithm that computes this relaxed decomposition faster than standard decompositions. We then use this relaxed decomposition to significantly improve the integrality of certain fractional solutions, before handing them to the local rounding procedure that now has to do fewer rounding steps

    A Theory of Link Prediction via Relational Weisfeiler-Leman on Knowledge Graphs

    Full text link
    Graph neural networks are prominent models for representation learning over graph-structured data. While the capabilities and limitations of these models are well-understood for simple graphs, our understanding remains incomplete in the context of knowledge graphs. Our goal is to provide a systematic understanding of the landscape of graph neural networks for knowledge graphs pertaining to the prominent task of link prediction. Our analysis entails a unifying perspective on seemingly unrelated models and unlocks a series of other models. The expressive power of various models is characterized via a corresponding relational Weisfeiler-Leman algorithm. This analysis is extended to provide a precise logical characterization of the class of functions captured by a class of graph neural networks. The theoretical findings presented in this paper explain the benefits of some widely employed practical design choices, which are validated empirically.Comment: Proceedings of the Thirty-Seventh Annual Conference on Advances in Neural Information Processing Systems (NeurIPS 2023). Code available at: https://github.com/HxyScotthuang/CMPN

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm

    Vertex-critical graphs far from edge-criticality

    Full text link
    Let rr be any positive integer. We prove that for every sufficiently large kk there exists a kk-chromatic vertex-critical graph GG such that χ(G−R)=k\chi(G-R)=k for every set R⊆E(G)R \subseteq E(G) with ∣RâˆŁâ‰€r|R|\le r. This partially solves a problem posed by Erd\H{o}s in 1985, who asked whether the above statement holds for k≄4k \ge 4.Comment: 6 page

    Dedekind's problem in the hypergrid

    Full text link
    Consider the partially ordered set on [t]n:={0,
,t−1}n[t]^n:=\{0,\dots,t-1\}^n equipped with the natural coordinate-wise ordering. Let A(t,n)A(t,n) denote the number of antichains of this poset. The quantity A(t,n)A(t,n) has a number of combinatorial interpretations: it is precisely the number of (n−1)(n-1)-dimensional partitions with entries from {0,
,t}\{0,\dots,t\}, and by a result of Moshkovitz and Shapira, A(t,n)+1A(t,n)+1 is equal to the nn-color Ramsey number of monotone paths of length tt in 3-uniform hypergraphs. This has led to significant interest in the growth rate of A(t,n)A(t,n). A number of results in the literature show that log⁥2A(t,n)=(1+o(1))⋅α(t,n)\log_2 A(t,n)=(1+o(1))\cdot \alpha(t,n), where α(t,n)\alpha(t,n) is the width of [t]n[t]^n, and the o(1)o(1) term goes to 00 for tt fixed and nn tending to infinity. In the present paper, we prove the first bound that is close to optimal in the case where tt is arbitrarily large compared to nn, as well as improve all previous results for sufficiently large nn. In particular, we prove that there is an absolute constant cc such that for every t,n≄2t,n\geq 2, log⁥2A(t,n)≀(1+c⋅(log⁥n)3n)⋅α(t,n).\log_2 A(t,n)\leq \left(1+c\cdot \frac{(\log n)^3}{n}\right)\cdot \alpha(t,n). This resolves a conjecture of Moshkovitz and Shapira. A key ingredient in our proof is the construction of a normalized matching flow on the cover graph of the poset [t]n[t]^n in which the distribution of weights is close to uniform, a result that may be of independent interest.Comment: 28 pages + 4 page Appendix, 3 figure
    • 

    corecore