Search CORE

751 research outputs found

MEDICAL SIGNALS ALIGNMENT AND PRIVACY PROTECTION USING BELIEF PROPAGATION AND COMPRESSED SENSING

Author: Roozgard Aminmohammad
Publication venue
Publication date: 10/11/2014
Field of study

The advance in human genome sequencing technology has significantly reduced the cost of data generation and overwhelms the computing capability of sequence analysis. Efficiency, efficacy and scalability remain challenging in sequence alignment, which is an important and foundational operation for genome data analysis. In this dissemination, I propose a two stage approach to tackle this problem. In the preprocessing step, I match blocks of reference and target genome sequences based on the similarities between their empirical transition probability distributions using belief propagation. I then conduct a refined match using our recently published SCoBeP technique. I extract features from neighbors of an input nucleotide (a genome sequence of neighboring nucleotides that the input nucleotide is its middle nucleotide) and leverage sparse coding to find a set of candidate nucleotides, followed by using Belief Propagation (BP) to rank these candidates. Our experimental results demonstrated robustness in nucleotide sequence alignment and our results are competitive to those of the SOAP aligner and the BWA algorithm . In addition, Most genomic datasets are not publicly accessible, due to privacy concerns. Patients genomic data contains identifiable markers and can be used to determine the presence of an individual in a dataset. Prior research shows that the re-identification can be possible when a very small set of genomic data is released. To protect patients, the data owners impose an application and evaluation procedure which often takes months to complete and limits the researchers. One solution to the problem is to let each data owner publish a set of pilot data to help data users choose the right datasets based on their needs. The data owners release these pilot data with the noise parameters and the mechanism that they used. A data user can run any kind of association tests and compare the outcomes with the other datasets outputs to get an idea which datasets can be useful. I present a privacy preserving genomic data dissemination algorithm based on the compressed sensing. In my proposed method, I am adding the noise into the sparse representation of the input vector to make it differentially private. It means I find the sparse representation using using the SubSpace Pursuit and then disturb it with sufficient Laplasian noise. I compare my method with state-of-the-art compressed sensing privacy protection method

SHAREOK repository

Compressive Sensing DNA Microarrays

Author: Baraniuk Richard G
Dai Wei
Milenkovic Olgica
Sheikh Mona A
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

sensors that operate using group testing and compressive sensing (CS) principles. In contrast to conventional DNA microarrays, in which each genetic sensor is designed to respond to a single target, in a CSM each sensor responds to a set of targets. We study the problem of designing CSMs that simultaneously account for both the constraints from compressive sensing theory and the biochemistry of probe-target DNA hybridization. An appropriate cross-hybridization model is proposed for CSMs, and several methods are developed for probe design and CS signal recovery based on the new model. Our lab experiments suggest that, in order to achieve accurate hybridization profiling, consensus probe sequences are required to have sequence homology of at least 80 % with all targets to be detected. Furthermore, outof-equilibrium datasets are usually as accurate as those obtained from equilibrium conditions. Consequently, one can use CSMs in applications for which only short hybridization times are allowed. Index Terms—Compressive sensing, DNA microarray, group testing, hybridization affinity, probe design I

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Spiral - Imperial College Digital Repository

Novel Methods for Approximate Bayesian Inference of Independent and Evolutionarily Dependent Data

Author: Fearn James A
Publication venue
Publication date: 02/12/2021
Field of study

Explore Bristol Research

A Tutorial on Coding Methods for DNA-based Molecular Communications and Storage

Author: Chen Sirong
Liu Qiang
Wu Wenfeng
Xiang Luping
Yan Kang
Yang Kun
Publication venue
Publication date: 13/11/2023
Field of study

Exponential increase of data has motivated advances of data storage technologies. As a promising storage media, DeoxyriboNucleic Acid (DNA) storage provides a much higher data density and superior durability, compared with state-of-the-art media. In this paper, we provide a tutorial on DNA storage and its role in molecular communications. Firstly, we introduce fundamentals of DNA-based molecular communications and storage (MCS), discussing the basic process of performing DNA storage in MCS. Furthermore, we provide tutorials on how conventional coding schemes that are used in wireless communications can be applied to DNA-based MCS, along with numerical results. Finally, promising research directions on DNA-based data storage in molecular communications are introduced and discussed in this paper

arXiv.org e-Print Archive

Deep Learning for Genomics: A Concise Overview

Author: Wang Haohan
Yue Tianwei
Publication venue
Publication date: 08/05/2018
Field of study

Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning Application

arXiv.org e-Print Archive

Artificial intelligence used in genome analysis studies

Author: D'Agaro Edo
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Directory of Open Access Journals

Genomic Detection Using Sparsity-inspired Tools

Author: Sheikh Mona A.
Publication venue
Publication date: 01/01/2011
Field of study

Genome-based detection methods provide the most conclusive means for establishing the presence of microbial species. A prime example of their use is in the detection of bacterial species, many of which are naturally vital or dangerous to human health, or can be genetically engineered to be so. However, current genomic detection methods are cost-prohibitive and inevitably use unique sensors that are specific to each species to be detected. In this thesis we advocate the use of combinatorial and non-specific identifiers for detection, made possible by exploiting the sparsity inherent in the species detection problem in a clinical or environmental sample. By modifying the sensor design process, we have developed new molecular biology tools with advantages that were not possible in their previous incarnations. Chief among these advantages are a universal species detection platform, the ability to discover unknown species, and the elimination of PCR, an expensive and laborious amplification step prerequisite in every molecular biology detection technique. Finally, we introduce a sparsity-based model for analyzing the millions of raw sequencing reads generated during whole genome sequencing for species detection, and achieve significant reductions in computational speed and high accuracy

DSpace at Rice University

Recommended from our members

Hidden Markov models and other machine learning approaches in computational molecular biology

Author: Baldi P.
Publication venue: 'Stanford University Press'
Publication date: 31/12/1995
Field of study

This tutorial was one of eight tutorials selected to be presented at the Third International Conference on Intelligent Systems for Molecular Biology which was held in the United Kingdom from July 16 to 19, 1995. Computational tools are increasingly needed to process the massive amounts of data, to organize and classify sequences, to detect weak similarities, to separate coding from non-coding regions, and reconstruct the underlying evolutionary history. The fundamental problem in machine learning is the same as in scientific reasoning in general, as well as statistical modeling: to come up with a good model for the data. In this tutorial four classes of models are reviewed. They are: Hidden Markov models; artificial Neural Networks; Belief Networks; and Stochastic Grammars. When dealing with DNA and protein primary sequences, Hidden Markov models are one of the most flexible and powerful alignments and data base searches. In this tutorial, attention is focused on the theory of Hidden Markov Models, and how to apply them to problems in molecular biology

UNT Digital Library