751 research outputs found

    MEDICAL SIGNALS ALIGNMENT AND PRIVACY PROTECTION USING BELIEF PROPAGATION AND COMPRESSED SENSING

    Get PDF
    The advance in human genome sequencing technology has significantly reduced the cost of data generation and overwhelms the computing capability of sequence analysis. Efficiency, efficacy and scalability remain challenging in sequence alignment, which is an important and foundational operation for genome data analysis. In this dissemination, I propose a two stage approach to tackle this problem. In the preprocessing step, I match blocks of reference and target genome sequences based on the similarities between their empirical transition probability distributions using belief propagation. I then conduct a refined match using our recently published SCoBeP technique. I extract features from neighbors of an input nucleotide (a genome sequence of neighboring nucleotides that the input nucleotide is its middle nucleotide) and leverage sparse coding to find a set of candidate nucleotides, followed by using Belief Propagation (BP) to rank these candidates. Our experimental results demonstrated robustness in nucleotide sequence alignment and our results are competitive to those of the SOAP aligner and the BWA algorithm . In addition, Most genomic datasets are not publicly accessible, due to privacy concerns. Patients genomic data contains identifiable markers and can be used to determine the presence of an individual in a dataset. Prior research shows that the re-identification can be possible when a very small set of genomic data is released. To protect patients, the data owners impose an application and evaluation procedure which often takes months to complete and limits the researchers. One solution to the problem is to let each data owner publish a set of pilot data to help data users choose the right datasets based on their needs. The data owners release these pilot data with the noise parameters and the mechanism that they used. A data user can run any kind of association tests and compare the outcomes with the other datasets outputs to get an idea which datasets can be useful. I present a privacy preserving genomic data dissemination algorithm based on the compressed sensing. In my proposed method, I am adding the noise into the sparse representation of the input vector to make it differentially private. It means I find the sparse representation using using the SubSpace Pursuit and then disturb it with sufficient Laplasian noise. I compare my method with state-of-the-art compressed sensing privacy protection method

    Compressive Sensing DNA Microarrays

    Get PDF
    sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from compressive sensing theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Our lab experiments suggest that, in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80 % with all targets to be detected. Furthermore, outof-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications for which only short hybridization times are allowed. Index Terms—Compressive sensing, DNA microarray, group testing, hybridization affinity, probe design I

    A Tutorial on Coding Methods for DNA-based Molecular Communications and Storage

    Full text link
    Exponential increase of data has motivated advances of data storage technologies. As a promising storage media, DeoxyriboNucleic Acid (DNA) storage provides a much higher data density and superior durability, compared with state-of-the-art media. In this paper, we provide a tutorial on DNA storage and its role in molecular communications. Firstly, we introduce fundamentals of DNA-based molecular communications and storage (MCS), discussing the basic process of performing DNA storage in MCS. Furthermore, we provide tutorials on how conventional coding schemes that are used in wireless communications can be applied to DNA-based MCS, along with numerical results. Finally, promising research directions on DNA-based data storage in molecular communications are introduced and discussed in this paper

    Deep Learning for Genomics: A Concise Overview

    Full text link
    Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning Application

    Artificial intelligence used in genome analysis studies

    Get PDF
    Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field

    Genomic Detection Using Sparsity-inspired Tools

    Get PDF
    Genome-based detection methods provide the most conclusive means for establishing the presence of microbial species. A prime example of their use is in the detection of bacterial species, many of which are naturally vital or dangerous to human health, or can be genetically engineered to be so. However, current genomic detection methods are cost-prohibitive and inevitably use unique sensors that are specific to each species to be detected. In this thesis we advocate the use of combinatorial and non-specific identifiers for detection, made possible by exploiting the sparsity inherent in the species detection problem in a clinical or environmental sample. By modifying the sensor design process, we have developed new molecular biology tools with advantages that were not possible in their previous incarnations. Chief among these advantages are a universal species detection platform, the ability to discover unknown species, and the elimination of PCR, an expensive and laborious amplification step prerequisite in every molecular biology detection technique. Finally, we introduce a sparsity-based model for analyzing the millions of raw sequencing reads generated during whole genome sequencing for species detection, and achieve significant reductions in computational speed and high accuracy
    corecore