4,044 research outputs found

    Bringing Linear Algebra Objects to Life in a Column-Oriented In-Memory Database

    Get PDF
    Large numeric matrices and multidimensional data arrays appear in many science domains, as well as in applications of financial and business warehousing. Common applications include eigenvalue determination of large matrices, which decompose into a set of linear algebra operations. With the rise of in-memory databases it is now feasible to execute these complex analytical queries directly in a relational database system without the need of transfering data out of the system and being restricted by hard disc latencies for random accesses. In this paper, we present a way to integrate linear algebra operations and large matrices as first class citizens into an in-memory database following a two-layered architectural model. The architecture consists of a logical component receiving manipulation statements and linear algebra expressions, and of a physical layer, which autonomously administrates multiple matrix storage representations. A cost-based hybrid storage representation is presented and an experimental implementation is evaluated for matrix-vector multiplications

    PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

    Full text link
    This paper describes PlinyCompute, a system for development of high-performance, data-intensive, distributed computing tools and libraries. In the large, PlinyCompute presents the programmer with a very high-level, declarative interface, relying on automatic, relational-database style optimization to figure out how to stage distributed computations. However, in the small, PlinyCompute presents the capable systems programmer with a persistent object data model and API (the "PC object model") and associated memory management system that has been designed from the ground-up for high performance, distributed, data-intensive computing. This contrasts with most other Big Data systems, which are constructed on top of the Java Virtual Machine (JVM), and hence must at least partially cede performance-critical concerns such as memory management (including layout and de/allocation) and virtual method/function dispatch to the JVM. This hybrid approach---declarative in the large, trusting the programmer's ability to utilize PC object model efficiently in the small---results in a system that is ideal for the development of reusable, data-intensive tools and libraries. Through extensive benchmarking, we show that implementing complex objects manipulation and non-trivial, library-style computations on top of PlinyCompute can result in a speedup of 2x to more than 50x or more compared to equivalent implementations on Spark.Comment: 48 pages, including references and Appendi

    SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database System

    Get PDF
    Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs

    Julia: A Fresh Approach to Numerical Computing

    Get PDF
    Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast. Julia questions notions generally held as "laws of nature" by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must prototype in one language and then rewrite in another language for speed or deployment, and 3. There are parts of a system for the programmer, and other parts best left untouched as they are built by the experts. We introduce the Julia programming language and its design --- a dance between specialization and abstraction. Specialization allows for custom treatment. Multiple dispatch, a technique from computer science, picks the right algorithm for the right circumstance. Abstraction, what good computation is really about, recognizes what remains the same after differences are stripped away. Abstractions in mathematics are captured as code through another technique from computer science, generic programming. Julia shows that one can have machine performance without sacrificing human convenience.Comment: 37 page

    Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

    Get PDF
    Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists

    Streamlining Temporal Formal Verification over Columnar Databases

    Get PDF
    Recent findings demonstrate how database technology enhances the computation of formal verification tasks expressible in linear time logic for finite traces (LTLf). Human-readable declarative languages also help the common practitioner to express temporal constraints in a straightforward and accessible language. Notwithstanding the former, this technology is in its infancy, and therefore, few optimization algorithms are known for dealing with massive amounts of information audited from real systems. We, therefore, present four novel algorithms subsuming entire LTLf expressions while outperforming previous state-of-the-art implementations on top of KnoBAB, thus postulating the need for the corresponding, leading to the formulation of novel xtLTLf-derived algebraic operators
    • …
    corecore