6 research outputs found

    Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression

    Full text link
    Graphs such as de Bruijn graphs and OLC (overlap-layout-consensus) graphs have been widely adopted for the de novo assembly of genomic short reads. This work studies another important problem in the field: how graphs can be used for high-performance compression of the large-scale sequencing data. We present a novel graph definition named Hamming-Shifting graph to address this problem. The definition originates from the technological characteristics of next-generation sequencing machines, aiming to link all pairs of distinct reads that have a small Hamming distance or a small shifting offset or both. We compute multiple lexicographically minimal k-mers to index the reads for an efficient search of the weight-lightest edges, and we prove a very high probability of successfully detecting these edges. The resulted graph creates a full mutual reference of the reads to cascade a code-minimized transfer of every child-read for an optimal compression. We conducted compression experiments on the minimum spanning forest of this extremely sparse graph, and achieved a 10 − 30% more file size reduction compared to the best compression results using existing algorithms. As future work, the separation and connectivity degrees of these giant graphs can be used as economical measurements or protocols for quick quality assessment of wet-lab machines, for sufficiency control of genomic library preparation, and for accurate de novo genome assembly

    Bridging the gap between algorithmic and learned index structures

    Get PDF
    Index structures such as B-trees and bloom filters are the well-established petrol engines of database systems. However, these structures do not fully exploit patterns in data distribution. To address this, researchers have suggested using machine learning models as electric engines that can entirely replace index structures. Such a paradigm shift in data system design, however, opens many unsolved design challenges. More research is needed to understand the theoretical guarantees and design efficient support for insertion and deletion. In this thesis, we adopt a different position: index algorithms are good enough, and instead of going back to the drawing board to fit data systems with learned models, we should develop lightweight hybrid engines that build on the benefits of both algorithmic and learned index structures. The indexes that we suggest provide the theoretical performance guarantees and updatability of algorithmic indexes while using position prediction models to leverage the data distributions and thereby improve the performance of the index structure. We investigate the potential for minimal modifications to algorithmic indexes such that they can leverage data distribution similar to how learned indexes work. In this regard, we propose and explore the use of helping models that boost classical index performance using techniques from machine learning. Our suggested approach inherits performance guarantees from its algorithmic baseline index, but at the same time it considers the data distribution to improve performance considerably. We study single-dimensional range indexes, spatial indexes, and stream indexing, and show that the suggested approach results in range indexes that outperform the algorithmic indexes and have comparable performance to the read-only, fully learned indexes and hence can be reliably used as a default index structure in a database engine. Besides, we consider the updatability of the indexes and suggest solutions for updating the index, notably when the data distribution drastically changes over time (e.g., for indexing data streams). In particular, we propose a specific learning-augmented index for indexing a sliding window with timestamps in a data stream. Additionally, we highlight the limitations of learned indexes for low-latency lookup on real- world data distributions. To tackle this issue, we suggest adding an algorithmic enhancement layer to a learned model to correct the prediction error with a small memory latency. This approach enables efficient modelling of the data distribution and resolves the local biases of a learned model at the cost of roughly one memory lookup.Open Acces

    A novel approach for the hardware implementation of a PPMC statistical data compressor

    Get PDF
    This thesis aims to understand how to design high-performance compression algorithms suitable for hardware implementation and to provide hardware support for an efficient compression algorithm. Lossless data compression techniques have been developed to exploit the available bandwidth of applications in data communications and computer systems by reducing the amount of data they transmit or store. As the amount of data to handle is ever increasing, traditional methods for compressing data become· insufficient. To overcome this problem, more powerful methods have been developed. Among those are the so-called statistical data compression methods that compress data based on their statistics. However, their high complexity and space requirements have prevented their hardware implementation and the full exploitation of their potential benefits. This thesis looks into the feasibility of the hardware implementation of one of these statistical data compression methods by exploring the potential for reorganising and restructuring the method for hardware implementation and investigating ways of achieving efficient and effective designs to achieve an efficient and cost-effective algorithm. [Continues.

    Quantum Chemistry in the Age of Quantum Computing

    Full text link
    Practical challenges in simulating quantum systems on classical computers have been widely recognized in the quantum physics and quantum chemistry communities over the past century. Although many approximation methods have been introduced, the complexity of quantum mechanics remains hard to appease. The advent of quantum computation brings new pathways to navigate this challenging complexity landscape. By manipulating quantum states of matter and taking advantage of their unique features such as superposition and entanglement, quantum computers promise to efficiently deliver accurate results for many important problems in quantum chemistry such as the electronic structure of molecules. In the past two decades significant advances have been made in developing algorithms and physical hardware for quantum computing, heralding a revolution in simulation of quantum systems. This article is an overview of the algorithms and results that are relevant for quantum chemistry. The intended audience is both quantum chemists who seek to learn more about quantum computing, and quantum computing researchers who would like to explore applications in quantum chemistry

    Codificação digital de áudio baseada em retroadaptação perceptual

    Get PDF
    Doutoramento em Engenharia ElectrónicaFaz-se uma análise do problema da codificação digital de sinais áudio de alta qualidade e identifica-se o princípio de codificação perceptual como a solução mais satisfatória. Apresenta-se uma síntese dos sistemas de codificação perceptual encontrados na literatura, e identificam-se, comparam-se e relacionam-se as técnicas usadas em cada um. Pela sua relevância para a codificação de áudio, faz-se um estudo mais aprofundado das transformadas e bancos de filtros multifrequência, da quantização, dos códigos reversíveis e dos modelos matemáticos da percepção auditiva. Propõe-se um sistema de codificação composto por um banco de filtros multi-resolução, quantizadores logarítmicos adaptativos, codificação aritmética, e um modelo psicoacústico explícito para adaptar os quantizadores de acordo com critérios perceptuais. Ao contrário de outros codificadores perceptuais, o sistema proposto é retroadaptativo, isto é: a adaptação depende exclusivamente de amostras já quantizadas, e não do sinal original. Discutimos as vantagens do uso de retroadaptação e mostramos que esta técnica pode ser aplicada com sucesso à codificação perceptual.The problem of digital coding of high quality audio signals is analised, and the principles of perceptual coding are identified as the most satisfactory approach. We present a synthesis of the perceptual coding systems found in the literature, and we identify, compare and relate the techniques used in each one. Given their relevance for audio coding, transforms and multifrequency filter banks as well as quantization, lossless coding, and mathematical models of auditory perception are subject to a more thorough study. We propose a coding system consisting of a multirate filter bank, logarithmic quantizers, arithmetic entropy coding and an explicit psychoacoustic model to adapt the quantization according to perceptual considerations. Unlike other perceptual coders, the proposed system is backward-adaptive, that is: adaptation depends exclusively on already quantized samples, not on the original signal. We discuss the advantages of backward-adaptation and show that it can be successfully applied to perceptual coding
    corecore