8,000 research outputs found

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US

    Geometry-Oblivious FMM for Compressing Dense SPD Matrices

    Full text link
    We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in Nlog⁥NN \log N or even NN time, where NN is the matrix size. Compression requires Nlog⁥NN \log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.Comment: 13 pages, accepted by SC'1

    Data Streams from the Low Frequency Instrument On-Board the Planck Satellite: Statistical Analysis and Compression Efficiency

    Get PDF
    The expected data rate produced by the Low Frequency Instrument (LFI) planned to fly on the ESA Planck mission in 2007, is over a factor 8 larger than the bandwidth allowed by the spacecraft transmission system to download the LFI data. We discuss the application of lossless compression to Planck/LFI data streams in order to reduce the overall data flow. We perform both theoretical analysis and experimental tests using realistically simulated data streams in order to fix the statistical properties of the signal and the maximal compression rate allowed by several lossless compression algorithms. We studied the influence of signal composition and of acquisition parameters on the compression rate Cr and develop a semiempirical formalism to account for it. The best performing compressor tested up to now is the arithmetic compression of order 1, designed for optimizing the compression of white noise like signals, which allows an overall compression rate = 2.65 +/- 0.02. We find that such result is not improved by other lossless compressors, being the signal almost white noise dominated. Lossless compression algorithms alone will not solve the bandwidth problem but needs to be combined with other techniques.Comment: May 3, 2000 release, 61 pages, 6 figures coded as eps, 9 tables (4 included as eps), LaTeX 2.09 + assms4.sty, style file included, submitted for the pubblication on PASP May 3, 200

    Shift Aggregate Extract Networks

    Get PDF
    We introduce an architecture based on deep hierarchical decompositions to learn effective representations of large graphs. Our framework extends classic R-decompositions used in kernel methods, enabling nested "part-of-part" relations. Unlike recursive neural networks, which unroll a template on input graphs directly, we unroll a neural network template over the decomposition hierarchy, allowing us to deal with the high degree variability that typically characterize social network graphs. Deep hierarchical decompositions are also amenable to domain compression, a technique that reduces both space and time complexity by exploiting symmetries. We show empirically that our approach is competitive with current state-of-the-art graph classification methods, particularly when dealing with social network datasets

    On optimally partitioning a text to improve its compression

    Full text link
    In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in the context of table compression, and then further elaborated and extended to strings and trees. Unfortunately, the literature offers poor solutions: namely, we know either a cubic-time algorithm for computing the optimal partition based on dynamic programming, or few heuristics that do not guarantee any bounds on the efficacy of their computed partition, or algorithms that are efficient but work in some specific scenarios (such as the Burrows-Wheeler Transform) and achieve compression performance that might be worse than the optimal-partitioning by a Ω(log⁥n)\Omega(\sqrt{\log n}) factor. Therefore, computing efficiently the optimal solution is still open. In this paper we provide the first algorithm which is guaranteed to compute in O(n \log_{1+\eps}n) time a partition of T whose compressed output is guaranteed to be no more than (1+Ï”)(1+\epsilon)-worse the optimal one, where Ï”\epsilon may be any positive constant
    • 

    corecore