751 research outputs found
MEDICAL SIGNALS ALIGNMENT AND PRIVACY PROTECTION USING BELIEF PROPAGATION AND COMPRESSED SENSING
The advance in human genome sequencing technology has significantly reduced the
cost of data generation and overwhelms the computing capability of sequence analysis.
Efficiency, efficacy and scalability remain challenging in sequence alignment,
which is an important and foundational operation for genome data analysis. In this
dissemination, I propose a two stage approach to tackle this problem. In the preprocessing
step, I match blocks of reference and target genome sequences based on the
similarities between their empirical transition probability distributions using belief
propagation. I then conduct a refined match using our recently published SCoBeP
technique. I extract features from neighbors of an input nucleotide (a genome sequence
of neighboring nucleotides that the input nucleotide is its middle nucleotide)
and leverage sparse coding to find a set of candidate nucleotides, followed by using
Belief Propagation (BP) to rank these candidates. Our experimental results demonstrated
robustness in nucleotide sequence alignment and our results are competitive
to those of the SOAP aligner and the BWA algorithm .
In addition, Most genomic datasets are not publicly accessible, due to privacy
concerns. Patients genomic data contains identifiable markers and can be used to
determine the presence of an individual in a dataset. Prior research shows that the
re-identification can be possible when a very small set of genomic data is released.
To protect patients, the data owners impose an application and evaluation procedure
which often takes months to complete and limits the researchers. One solution to
the problem is to let each data owner publish a set of pilot data to help data users choose the right datasets based on their needs. The data owners release these pilot
data with the noise parameters and the mechanism that they used. A data user can
run any kind of association tests and compare the outcomes with the other datasets
outputs to get an idea which datasets can be useful. I present a privacy preserving
genomic data dissemination algorithm based on the compressed sensing. In my
proposed method, I am adding the noise into the sparse representation of the input
vector to make it differentially private. It means I find the sparse representation
using using the SubSpace Pursuit and then disturb it with sufficient Laplasian noise.
I compare my method with state-of-the-art compressed sensing privacy protection
method
Compressive Sensing DNA Microarrays
sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from compressive sensing theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Our lab experiments suggest that, in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80 % with all targets to be detected. Furthermore, outof-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications for which only short hybridization times are allowed. Index Terms—Compressive sensing, DNA microarray, group testing, hybridization affinity, probe design I
A Tutorial on Coding Methods for DNA-based Molecular Communications and Storage
Exponential increase of data has motivated advances of data storage
technologies. As a promising storage media, DeoxyriboNucleic Acid (DNA) storage
provides a much higher data density and superior durability, compared with
state-of-the-art media. In this paper, we provide a tutorial on DNA storage and
its role in molecular communications. Firstly, we introduce fundamentals of
DNA-based molecular communications and storage (MCS), discussing the basic
process of performing DNA storage in MCS. Furthermore, we provide tutorials on
how conventional coding schemes that are used in wireless communications can be
applied to DNA-based MCS, along with numerical results. Finally, promising
research directions on DNA-based data storage in molecular communications are
introduced and discussed in this paper
Deep Learning for Genomics: A Concise Overview
Advancements in genomic research such as high-throughput sequencing
techniques have driven modern genomic studies into "big data" disciplines. This
data explosion is constantly challenging conventional methods used in genomics.
In parallel with the urgent demand for robust algorithms, deep learning has
succeeded in a variety of fields such as vision, speech, and text processing.
Yet genomics entails unique challenges to deep learning since we are expecting
from deep learning a superhuman intelligence that explores beyond our knowledge
to interpret the genome. A powerful deep learning model should rely on
insightful utilization of task-specific knowledge. In this paper, we briefly
discuss the strengths of different deep learning models from a genomic
perspective so as to fit each particular task with a proper deep architecture,
and remark on practical considerations of developing modern deep learning
architectures for genomics. We also provide a concise review of deep learning
applications in various aspects of genomic research, as well as pointing out
potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning
Application
Artificial intelligence used in genome analysis studies
Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field
Genomic Detection Using Sparsity-inspired Tools
Genome-based detection methods provide the most conclusive means for establishing the presence of microbial species. A prime example of their use is in the detection of bacterial species, many of which are naturally vital or dangerous to human health, or can be genetically engineered to be so. However, current genomic detection methods are cost-prohibitive and inevitably use unique sensors that are specific to each species to be detected. In this thesis we advocate the use of combinatorial and non-specific identifiers for detection, made possible by exploiting the sparsity inherent in the species detection problem in a clinical or environmental sample. By modifying the sensor design process, we have developed new molecular biology tools with advantages that were not possible in their previous incarnations. Chief among these advantages are a universal species detection platform, the ability to discover unknown species, and the elimination of PCR, an expensive and laborious amplification step prerequisite in every molecular biology detection technique. Finally, we introduce a sparsity-based model for analyzing the millions of raw sequencing reads generated during whole genome sequencing for species detection, and achieve significant reductions in computational speed and high accuracy
Recommended from our members
Hidden Markov models and other machine learning approaches in computational molecular biology
This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Computational tools are increasingly needed to process the massive amounts of data, to organize and classify sequences, to detect weak similarities, to separate coding from non-coding regions, and reconstruct the underlying evolutionary history. The fundamental problem in machine learning is the same as in scientific reasoning in general, as well as statistical modeling: to come up with a good model for the data. In this tutorial four classes of models are reviewed. They are: Hidden Markov models; artificial Neural Networks; Belief Networks; and Stochastic Grammars. When dealing with DNA and protein primary sequences, Hidden Markov models are one of the most flexible and powerful alignments and data base searches. In this tutorial, attention is focused on the theory of Hidden Markov Models, and how to apply them to problems in molecular biology
- …