35 research outputs found

    On the complexity of enumerating pseudo-intents

    Get PDF
    AbstractWe investigate whether the pseudo-intents of a given formal context can efficiently be enumerated. We show that they cannot be enumerated in a specified lexicographic order with polynomial delay unless P=NP. Furthermore we show that if the restriction on the order of enumeration is removed, then the problem becomes at least as hard as enumerating minimal transversals of a given hypergraph. We introduce the notion of minimal pseudo-intents and show that recognizing minimal pseudo-intents is polynomial. Despite their less complicated nature, surprisingly it turns out that minimal pseudo-intents cannot be enumerated in output-polynomial time unless P=NP

    Data Tiling for Sparse Computation

    Get PDF
    Many real-world data contain internal relationships. Efficient analysis of these relationship data is crucial for important problems including genome alignment, network vulnerability analysis, ranking web pages, among others. Such relationship data is frequently sparse and analysis on it is called sparse computation. We demonstrate that the important technique of data tiling is more powerful than previously known by broadening its application space. We focus on three important sparse computation areas: graph analysis, linear algebra, and bioinformatics. We demonstrate data tiling's power by addressing key issues and providing significant improvements---to both runtime and solution quality---in each area. For graph analysis, we focus on fast data tiling techniques that can produce well-structured tiles and demonstrate theoretical hardness results. These tiles are suitable for graph problems as they reduce data movement and ultimately improve end-to-end runtime performance. For linear algebra, we introduce a new cache-aware tiling technique and apply it to the key kernel of sparse matrix by sparse matrix multiplication. This technique tiles the second input matrix and then uses a small, summary matrix to guide access to the tiles during computation. Our approach results in the fastest known implementation across three distinct CPU architectures. In bioinformatics, we develop a tiling based de novo genome assembly pipeline. We start with reads and develop either a graph or hypergraph that captures internal relationships between reads. This is then tiled to minimize connections while maintaining balance. We then treat each resulting tile independently as the input to an existing, shared-memory assembler. Our pipeline improves existing state-of-the-art de novo genome assemblers and brings both runtime and quality improvements to them on both real-world and simulated datasets.Ph.D

    Fast Parallel Algorithms on a Class of Graph Structures With Applications in Relational Databases and Computer Networks.

    Get PDF
    The quest for efficient parallel algorithms for graph related problems necessitates not only fast computational schemes but also requires insights into their inherent structures that lend themselves to elegant problem solving methods. Towards this objective efficient parallel algorithms on a class of hypergraphs called acyclic hypergraphs and directed hypergraphs are developed in this thesis. Acyclic hypergraphs are precisely chordal graphs and their subclasses, and they have applications in relational databases and computer networks. In this thesis, first, we present efficient parallel algorithms for the following problems on graphs. (1) determining whether a graph is strongly chordal, ptolemaic, or a block graph. If the graph is strongly chordal, determine the strongly perfect vertex elimination ordering. (2) determining the minimal set of edges needed to make an arbitrary graph strongly chordal, ptolemaic, or a block graph. (3) determining the minimum cardinality dominating set, connected dominating set, total dominating set, and the domatic number of a strongly chordal graph. Secondly, we show that the query implication problem (Q\sb1\ \to\ Q\sb2) on two queries, which is to determine whether the data retrieved by query Q\sb1 is always a subset of the data retrieved by query Q\sb2, is not even in NP and in fact complete in \Pi\sb2\sp{p}. We present several \u27fine-grain\u27 analyses of the query implication problem and show that the query implication can be solved in polynomial time given chordal queries. Thirdly, we develop efficient parallel algorithms for manipulating directed hypergraphs H such as finding a directed path in H, closure of H, and minimum equivalent hypergraph of H. We show that finding a directed path in a directed hypergraph is inherently sequential. For directed hypergraphs with fixed degree and diameter we present NC algorithms for manipulations. Directed hypergraphs are representation schemes for functional dependencies in relational databases. Finally, we also present an efficient parallel algorithm for multi-dimensional range search. We show that a set of points in a rectangular parallelepiped can be obtained in O(logn) time with only 2.log\sp2 n −- 10.logn + 14 processors on a EREW-PRAM. A nontrivial implementation technique on the hypercube parallel architecture is also presented. Our method can be easily generalized to the case of d-dimensional range search

    Hyperbolic Deep Neural Networks: A Survey

    Full text link
    Recently, there has been a rising surge of momentum for deep representation learning in hyperbolic spaces due to theirhigh capacity of modeling data like knowledge graphs or synonym hierarchies, possessing hierarchical structure. We refer to the model as hyperbolic deep neural network in this paper. Such a hyperbolic neural architecture potentially leads to drastically compact model withmuch more physical interpretability than its counterpart in Euclidean space. To stimulate future research, this paper presents acoherent and comprehensive review of the literature around the neural components in the construction of hyperbolic deep neuralnetworks, as well as the generalization of the leading deep approaches to the Hyperbolic space. It also presents current applicationsaround various machine learning tasks on several publicly available datasets, together with insightful observations and identifying openquestions and promising future directions

    New Directions for Contact Integrators

    Get PDF
    Contact integrators are a family of geometric numerical schemes which guarantee the conservation of the contact structure. In this work we review the construction of both the variational and Hamiltonian versions of these methods. We illustrate some of the advantages of geometric integration in the dissipative setting by focusing on models inspired by recent studies in celestial mechanics and cosmology.Comment: To appear as Chapter 24 in GSI 2021, Springer LNCS 1282

    Bridging Data Management and Machine Learning: Case Studies on Index, Query Optimization, and Data Acquisition

    Get PDF
    Data management tasks and techniques can be observed in a variety of real world scenarios, including web search, business analysis, traffic scheduling, and advertising, to name a few. While data management as a research area has been studied for decades, recent breakthroughs in Machine Learning (ML) provide new perspectives to define and tackle problems in the area, and at the same time, the wisdom integrated in data management techniques also greatly helps to accelerate the advancement of Machine Learning. In this work, we focus on the intersection area of data management and Machine Learning, and study several important, interesting, and challenging problems. More specifically, our work mainly concentrates on the following three topics: (1) leveraging the ability of ML models in capturing data distribution to design lightweight and data-adaptive indexes and search algorithms to accelerate similarity search over large-scale data; (2) designing robust and trustworthy approaches to improve the reliability of both conventional query optimizer and learned query optimizer, and boost the performance of DBMS; (3) developing data management techniques with statistical guarantees to acquire the most useful training data for ML models with a budget limitation, striving to maximize the accuracy of the model. We conduct detailed theoretical and empirical study for each topic, establishing these fundamental problems as well as developing efficient and effective approaches for the tasks

    Bridging Data Management and Machine Learning: Case Studies on Index, Query Optimization, and Data Acquisition

    Get PDF
    Data management tasks and techniques can be observed in a variety of real world scenarios, including web search, business analysis, traffic scheduling, and advertising, to name a few. While data management as a research area has been studied for decades, recent breakthroughs in Machine Learning (ML) provide new perspectives to define and tackle problems in the area, and at the same time, the wisdom integrated in data management techniques also greatly helps to accelerate the advancement of Machine Learning. In this work, we focus on the intersection area of data management and Machine Learning, and study several important, interesting, and challenging problems. More specifically, our work mainly concentrates on the following three topics: (1) leveraging the ability of ML models in capturing data distribution to design lightweight and data-adaptive indexes and search algorithms to accelerate similarity search over large-scale data; (2) designing robust and trustworthy approaches to improve the reliability of both conventional query optimizer and learned query optimizer, and boost the performance of DBMS; (3) developing data management techniques with statistical guarantees to acquire the most useful training data for ML models with a budget limitation, striving to maximize the accuracy of the model. We conduct detailed theoretical and empirical study for each topic, establishing these fundamental problems as well as developing efficient and effective approaches for the tasks

    Towards Performance Portable Graph Algorithms

    Get PDF
    In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore