Search CORE

7,116 research outputs found

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Malware Detection Using Dynamic Analysis

Author: Vemparala Swapna
Publication venue: SJSU ScholarWorks
Publication date: 25/05/2015
Field of study

In this research, we explore the field of dynamic analysis which has shown promis- ing results in the field of malware detection. Here, we extract dynamic software birth- marks during malware execution and apply machine learning based detection tech- niques to the resulting feature set. Specifically, we consider Hidden Markov Models and Profile Hidden Markov Models. To determine the effectiveness of this dynamic analysis approach, we compare our detection results to the results obtained by using static analysis. We show that in some cases, significantly stronger results can be obtained using our dynamic approach

SJSU ScholarWorks

Profile Context-Sensitive HMMs for Probabilistic Modeling of Sequences With Complex Correlations

Author: Vaidyanathan P. P.
Yoon Byung-Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

The profile hidden Markov model is a specific type of HMM that is well suited for describing the common features of a set of related sequences. It has been extensively used in computational biology, where it is still one of the most popular tools. In this paper, we propose a new model called the profile context-sensitive HMM. Unlike traditional profile-HMMs, the proposed model is capable of describing complex long-range correlations between distant symbols in a consensus sequence. We also introduce a general algorithm that can be used for finding the optimal state-sequence of an observed symbol sequence based on the given profile-csHMM. The proposed model has an important application in RNA sequence analysis, especially in modeling and analyzing RNA pseudoknots

CiteSeerX

Caltech Authors

DeepSF: deep convolutional neural network for mapping protein sequences to folds

Author: Alfonso Valencia
Altschul
Altschul
Badri Adhikari
Berman
Cao
Chandonia
Cheng
Cheng
Chung
Cui
Damoulas
Dill
Dong
Eickholt
Greene
Hadley
Henikoff
Holm
Jackson
Jianlin Cheng
Jie Hou
Jo
Jo
Kalchbrenner
Kim
Kinch
Kinch
Krizhevsky
Li
Ma
Magnan
McGuffin
Murzin
Shen
Spencer
Srivastava
Söding
Wang
Wang
Wang
Webb
Wei
Xia
Xu
Zhang
Publication venue
Publication date: 03/06/2017
Field of study

Motivation Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a tar get protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein se quence into one of 1195 known folds, which is useful for both fold recognition and the study of se quence-structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and map it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 77.0%. We compare our method with a top profile profile alignment method - HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 14.5%-29.1% higher than HHSearch on template-free modeling targets and 4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.Comment: 28 pages, 13 figure

arXiv.org e-Print Archive

Crossref

University of Missouri, St. Louis

Genetic Barcode Identification With Profile Hidden Markov Models

Author: Sharma Vishrut
Publication venue: SJSU ScholarWorks
Publication date: 01/04/2018
Field of study

DNA barcoding is a method that uses an organism’s DNA to identify its species. The gene cytochrome c oxidase I (COI) has been used effectively as a DNA barcode to identify organisms and elucidate relationships among species [1]. There also exists a database BOLD (Barcode Of Life Database) that contains COI sequences used for DNA barcoding for more than 1 million different species. Using BOLD to identify samples that have a match in the database is an uncomplicated process. However, this method fails to determine samples that are absent from the database. Given a sample that is not represented in BOLD but is similar to a represented sequence, it would be valuable to describe the sample at a higher taxonomic classification. Since COI is represented as long character sequences of amino acids, Hidden Markov Models (HMMs) can be used to associate an unknown DNA sequence with a taxonomic rank. In this work, I show that dynamically created Profile HMMs are an effective tool for such identification

SJSU ScholarWorks