99 research outputs found

    Hypergraph Partitioning With Embeddings

    Get PDF
    Problems in scientific computing, such as distributing large sparse matrix operations, have analogous formulations as hypergraph partitioning problems. A hypergraph is a generalization of a traditional graph wherein "hyperedges" may connect any number of nodes. As a result, hypergraph partitioning is an NP-Hard problem to both solve or approximate. State-of-the-art algorithms that solve this problem follow the multilevel paradigm, which begins by iteratively "coarsening" the input hypergraph to smaller problem instances that share key structural features. Once identifying an approximate problem that is small enough to be solved directly, that solution can be interpolated and refined to the original problem. While this strategy represents an excellent trade off between quality and running time, it is sensitive to coarsening strategy. In this work we propose using graph embeddings of the initial hypergraph in order to ensure that coarsened problem instances retrain key structural features. Our approach prioritizes coarsening within self-similar regions within the input graph, and leads to significantly improved solution quality across a range of considered hypergraphs. Reproducibility: All source code, plots and experimental data are available at https://sybrandt.com/2019/partition

    NP-completeness results for partitioning a graph into total dominating sets

    Get PDF
    A total domatic k-partition of a graph is a partition of its vertex set into k subsets such that each intersects the open neighborhood of each vertex. The maximum k for which a total domatic k-partition exists is known as the total domatic number of a graph G, denoted by d(t) (G). We extend considerably the known hardness results by showing it is NP-complete to decide whether d(t) (G) >= 3 where G is a bipartite planar graph of bounded maximum degree. Similarly, for every k >= 3, it is NP-complete to decide whether d(t) (G) >= k, where G is split or k-regular. In particular, these results complement recent combinatorial results regarding d(t) (G) on some of these graph classes by showing that the known results are, in a sense, best possible. Finally, for general n-vertex graphs, we show the problem is solvable in 2(n)n(O(1)) time, and derive even faster algorithms for special graph classes. (C) 2018 Elsevier B.V. All rights reserved.Peer reviewe

    Efficient Flow-based Approximation Algorithms for Submodular Hypergraph Partitioning via a Generalized Cut-Matching Game

    Full text link
    In the past 20 years, increasing complexity in real world data has lead to the study of higher-order data models based on partitioning hypergraphs. However, hypergraph partitioning admits multiple formulations as hyperedges can be cut in multiple ways. Building upon a class of hypergraph partitioning problems introduced by Li & Milenkovic, we study the problem of minimizing ratio-cut objectives over hypergraphs given by a new class of cut functions, monotone submodular cut functions (mscf's), which captures hypergraph expansion and conductance as special cases. We first define the ratio-cut improvement problem, a family of local relaxations of the minimum ratio-cut problem. This problem is a natural extension of the Andersen & Lang cut improvement problem to the hypergraph setting. We demonstrate the existence of efficient algorithms for approximately solving this problem. These algorithms run in almost-linear time for the case of hypergraph expansion, and when the hypergraph rank is at most O(1)O(1). Next, we provide an efficient O(logn)O(\log n)-approximation algorithm for finding the minimum ratio-cut of GG. We generalize the cut-matching game framework of Khandekar et. al. to allow for the cut player to play unbalanced cuts, and matching player to route approximate single-commodity flows. Using this framework, we bootstrap our algorithms for the ratio-cut improvement problem to obtain approximation algorithms for minimum ratio-cut problem for all mscf's. This also yields the first almost-linear time O(logn)O(\log n)-approximation algorithms for hypergraph expansion, and constant hypergraph rank. Finally, we extend a result of Louis & Makarychev to a broader set of objective functions by giving a polynomial time O(logn)O\big(\sqrt{\log n}\big)-approximation algorithm for the minimum ratio-cut problem based on rounding 22\ell_2^2-metric embeddings.Comment: Comments and feedback welcom

    Graph Learning and Its Applications: A Holistic Survey

    Full text link
    Graph learning is a prevalent domain that endeavors to learn the intricate relationships among nodes and the topological structure of graphs. These relationships endow graphs with uniqueness compared to conventional tabular data, as nodes rely on non-Euclidean space and encompass rich information to exploit. Over the years, graph learning has transcended from graph theory to graph data mining. With the advent of representation learning, it has attained remarkable performance in diverse scenarios, including text, image, chemistry, and biology. Owing to its extensive application prospects, graph learning attracts copious attention from the academic community. Despite numerous works proposed to tackle different problems in graph learning, there is a demand to survey previous valuable works. While some researchers have perceived this phenomenon and accomplished impressive surveys on graph learning, they failed to connect related objectives, methods, and applications in a more coherent way. As a result, they did not encompass current ample scenarios and challenging problems due to the rapid expansion of graph learning. Different from previous surveys on graph learning, we provide a holistic review that analyzes current works from the perspective of graph structure, and discusses the latest applications, trends, and challenges in graph learning. Specifically, we commence by proposing a taxonomy from the perspective of the composition of graph data and then summarize the methods employed in graph learning. We then provide a detailed elucidation of mainstream applications. Finally, based on the current trend of techniques, we propose future directions.Comment: 20 pages, 7 figures, 3 table

    Learning from Partially Labeled Data: Unsupervised and Semi-supervised Learning on Graphs and Learning with Distribution Shifting

    Get PDF
    This thesis focuses on two fundamental machine learning problems:unsupervised learning, where no label information is available, and semi-supervised learning, where a small amount of labels are given in addition to unlabeled data. These problems arise in many real word applications, such as Web analysis and bioinformatics,where a large amount of data is available, but no or only a small amount of labeled data exists. Obtaining classification labels in these domains is usually quite difficult because it involves either manual labeling or physical experimentation. This thesis approaches these problems from two perspectives: graph based and distribution based. First, I investigate a series of graph based learning algorithms that are able to exploit information embedded in different types of graph structures. These algorithms allow label information to be shared between nodes in the graph---ultimately communicating information globally to yield effective unsupervised and semi-supervised learning. In particular, I extend existing graph based learning algorithms, currently based on undirected graphs, to more general graph types, including directed graphs, hypergraphs and complex networks. These richer graph representations allow one to more naturally capture the intrinsic data relationships that exist, for example, in Web data, relational data, bioinformatics and social networks. For each of these generalized graph structures I show how information propagation can be characterized by distinct random walk models, and then use this characterization to develop new unsupervised and semi-supervised learning algorithms. Second, I investigate a more statistically oriented approach that explicitly models a learning scenario where the training and test examples come from different distributions. This is a difficult situation for standard statistical learning approaches, since they typically incorporate an assumption that the distributions for training and test sets are similar, if not identical. To achieve good performance in this scenario, I utilize unlabeled data to correct the bias between the training and test distributions. A key idea is to produce resampling weights for bias correction by working directly in a feature space and bypassing the problem of explicit density estimation. The technique can be easily applied to many different supervised learning algorithms, automatically adapting their behavior to cope with distribution shifting between training and test data

    Exploiting Latent Features of Text and Graphs

    Get PDF
    As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

    Quantum and Classical Multilevel Algorithms for (Hyper)Graphs

    Get PDF
    Combinatorial optimization problems on (hyper)graphs are ubiquitous in science and industry. Because many of these problems are NP-hard, development of sophisticated heuristics is of utmost importance for practical problems. In recent years, the emergence of Noisy Intermediate-Scale Quantum (NISQ) computers has opened up the opportunity to dramaticaly speedup combinatorial optimization. However, the adoption of NISQ devices is impeded by their severe limitations, both in terms of the number of qubits, as well as in their quality. NISQ devices are widely expected to have no more than hundreds to thousands of qubits with very limited error-correction, imposing a strict limit on the size and the structure of the problems that can be tackled directly. A natural solution to this issue is hybrid quantum-classical algorithms that combine a NISQ device with a classical machine with the goal of capturing “the best of both worlds”. Being motivated by lack of high quality optimization solvers for hypergraph partitioning, in this thesis, we begin by discussing classical multilevel approaches for this problem. We present a novel relaxation-based vertex similarity measure termed algebraic distance for hypergraphs and the coarsening schemes based on it. Extending the multilevel method to include quantum optimization routines, we present Quantum Local Search (QLS) – a hybrid iterative improvement approach that is inspired by the classical local search approaches. Next, we introduce the Multilevel Quantum Local Search (ML-QLS) that incorporates the quantum-enhanced iterative improvement scheme introduced in QLS within the multilevel framework, as well as several techniques to further understand and improve the effectiveness of Quantum Approximate Optimization Algorithm used throughout our work
    corecore