9,113 research outputs found

    Compression algorithms for biomedical signals and nanopore sequencing data

    Get PDF
    The massive generation of biological digital information creates various computing challenges such as its storage and transmission. For example, biomedical signals, such as electroencephalograms (EEG), are recorded by multiple sensors over long periods of time, resulting in large volumes of data. Another example is genome DNA sequencing data, where the amount of data generated globally is seeing explosive growth, leading to increasing needs for processing, storage, and transmission resources. In this thesis we investigate the use of data compression techniques for this problem, in two different scenarios where computational efficiency is crucial. First we study the compression of multi-channel biomedical signals. We present a new lossless data compressor for multi-channel signals, GSC, which achieves compression performance similar to the state of the art, while being more computationally efficient than other available alternatives. The compressor uses two novel integer-based implementations of the predictive coding and expert advice schemes for multi-channel signals. We also develop a version of GSC optimized for EEG data. This version manages to significantly lower compression times while attaining similar compression performance for that specic type of signal. In a second scenario we study the compression of DNA sequencing data produced by nanopore sequencing technologies. We present two novel lossless compression algorithms specifically tailored to nanopore FASTQ files. ENANO is a reference-free compressor, which mainly focuses on the compression of quality scores. It achieves state of the art compression performance, while being fast and with low memory consumption when compared to other popular FASTQ compression tools. On the other hand, RENANO is a reference-based compressor, which improves on ENANO, by providing a more efficient base call sequence compression component. For RENANO two algorithms are introduced, corresponding to the following scenarios: a reference genome is available without cost to both the compressor and the decompressor; and the reference genome is available only on the compressor side, and a compacted version of the reference is included in the compressed le. Both algorithms of RENANO significantly improve the compression performance of ENANO, with similar compression times, and higher memory requirements.La generación masiva de información digital biológica da lugar a múltiples desafíos informáticos, como su almacenamiento y transmisión. Por ejemplo, las señales biomédicas, como los electroencefalogramas (EEG), son generadas por múltiples sensores registrando medidas en simultaneo durante largos períodos de tiempo, generando grandes volúmenes de datos. Otro ejemplo son los datos de secuenciación de ADN, en donde la cantidad de datos a nivel mundial esta creciendo de forma explosiva, lo que da lugar a una gran necesidad de recursos de procesamiento, almacenamiento y transmisión. En esta tesis investigamos como aplicar técnicas de compresión de datos para atacar este problema, en dos escenarios diferentes donde la eficiencia computacional juega un rol importante. Primero estudiamos la compresión de señales biomédicas multicanal. Comenzamos presentando un nuevo compresor de datos sin perdida para señales multicanal, GSC, que logra obtener niveles de compresión en el estado del arte y que al mismo tiempo es mas eficiente computacionalmente que otras alternativas disponibles. El compresor utiliza dos nuevas implementaciones de los esquemas de codificación predictiva y de asesoramiento de expertos para señales multicanal, basadas en aritmética de enteros. También presentamos una versión de GSC optimizada para datos de EEG. Esta versión logra reducir significativamente los tiempos de compresión, sin deteriorar significativamente los niveles de compresión para datos de EEG. En un segundo escenario estudiamos la compresión de datos de secuenciación de ADN generados por tecnologías de secuenciación por nanoporos. En este sentido, presentamos dos nuevos algoritmos de compresión sin perdida, específicamente diseñados para archivos FASTQ generados por tecnología de nanoporos. ENANO es un compresor libre de referencia, enfocado principalmente en la compresión de los valores de calidad de las bases. ENANO alcanza niveles de compresión en el estado del arte, siendo a la vez mas eficiente computacionalmente que otras herramientas populares de compresión de archivos FASTQ. Por otro lado, RENANO es un compresor basado en la utilización de una referencia, que mejora el rendimiento de ENANO, a partir de un nuevo esquema de compresión de las secuencias de bases. Presentamos dos variantes de RENANO, correspondientes a los siguientes escenarios: (i) se tiene a disposición un genoma de referencia, tanto del lado del compresor como del descompresor, y (ii) se tiene un genoma de referencia disponible solo del lado del compresor, y se incluye una versión compacta de la referencia en el archivo comprimido. Ambas variantes de RENANO mejoran significativamente los niveles compresión de ENANO, alcanzando tiempos de compresión similares y un mayor consumo de memoria

    Non-invasive Detection and Compression of Fetal Electrocardiogram

    Get PDF
    Noninvasive detection of fetal electrocardiogram (FECG) from abdominal ECG recordings is highly dependent on typical statistical signal processing techniques such as independent component analysis (ICA), adaptive noise filtering, and multichannel blind deconvolution. In contrast to the previous multichannel FECG extraction methods, several recent schemes for single‐channel FECG extraction such as the extended Kalman filter (EKF), extended Kalman smoother (EKS), template subtraction (TS), and support vector regression (SVR) for detecting R waves on ECG, are evaluated via the quantitative metrics such as sensitivity (SE), positive predictive value (PPV), F‐score, detection error rate (DER), and range of accuracy. A correlation predictor that combines with multivariable gray model (GM) is also proposed for sequential ECG data compression, which displays better percent root mean-square difference (PRD) than those of Sabah’s scheme for fixed and predicted compression ratio (CR). Automatic calculation on fetal heart rate (FHR) on the reconstructed FECG from mixed signals of abdominal ECG recordings is also experimented with sample synthetic ECG data. Sample data on FHR and T/QRS for both physiological case and pathological case are simulated in a 10-min time sequence

    Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis

    Full text link
    The widespread use of multi-sensor technology and the emergence of big datasets has highlighted the limitations of standard flat-view matrix models and the necessity to move towards more versatile data analysis tools. We show that higher-order tensors (i.e., multiway arrays) enable such a fundamental paradigm shift towards models that are essentially polynomial and whose uniqueness, unlike the matrix methods, is guaranteed under verymild and natural conditions. Benefiting fromthe power ofmultilinear algebra as theirmathematical backbone, data analysis techniques using tensor decompositions are shown to have great flexibility in the choice of constraints that match data properties, and to find more general latent components in the data than matrix-based methods. A comprehensive introduction to tensor decompositions is provided from a signal processing perspective, starting from the algebraic foundations, via basic Canonical Polyadic and Tucker models, through to advanced cause-effect and multi-view data analysis schemes. We show that tensor decompositions enable natural generalizations of some commonly used signal processing paradigms, such as canonical correlation and subspace techniques, signal separation, linear regression, feature extraction and classification. We also cover computational aspects, and point out how ideas from compressed sensing and scientific computing may be used for addressing the otherwise unmanageable storage and manipulation problems associated with big datasets. The concepts are supported by illustrative real world case studies illuminating the benefits of the tensor framework, as efficient and promising tools for modern signal processing, data analysis and machine learning applications; these benefits also extend to vector/matrix data through tensorization. Keywords: ICA, NMF, CPD, Tucker decomposition, HOSVD, tensor networks, Tensor Train

    A Deep Learning Approach for Vital Signs Compression and Energy Efficient Delivery in mhealth Systems

    Get PDF
    © 2013 IEEE. Due to the increasing number of chronic disease patients, continuous health monitoring has become the top priority for health-care providers and has posed a major stimulus for the development of scalable and energy efficient mobile health systems. Collected data in such systems are highly critical and can be affected by wireless network conditions, which in return, motivates the need for a preprocessing stage that optimizes data delivery in an adaptive manner with respect to network dynamics. We present in this paper adaptive single and multiple modality data compression schemes based on deep learning approach, which consider acquired data characteristics and network dynamics for providing energy efficient data delivery. Results indicate that: 1) the proposed adaptive single modality compression scheme outperforms conventional compression methods by 13.24% and 43.75% reductions in distortion and processing time, respectively; 2) the proposed adaptive multiple modality compression further decreases the distortion by 3.71% and 72.37% when compared with the proposed single modality scheme and conventional methods through leveraging inter-modality correlations; and 3) adaptive multiple modality compression demonstrates its efficiency in terms of energy consumption, computational complexity, and responding to different network states. Hence, our approach is suitable for mobile health applications (mHealth), where the smart preprocessing of vital signs can enhance energy consumption, reduce storage, and cut down transmission delays to the mHealth cloud.This work was supported by NPRP through the Qatar National Research Fund (a member of the Qatar Foundation) under Grant 7-684-1-127
    corecore