Search CORE

15,807 research outputs found

MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

Author: Ma Jianzhu
Wang Sheng
Wang Zhiyong
Xu Jinbo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/.Comment: Accepted by both RECOMB 2014 and PLOS Computational Biolog

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

Learning a Hybrid Architecture for Sequence Regression and Annotation

Author: Carin Lawrence
Hartemink Alexander J.
Henao Ricardo
Zhang Yizhe
Zhong Jianling
Publication venue
Publication date: 16/12/2015
Field of study

When learning a hidden Markov model (HMM), sequen- tial observations can often be complemented by real-valued summary response variables generated from the path of hid- den states. Such settings arise in numerous domains, includ- ing many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible frame- work for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is com- patible with a rich set of mapping functions. Results show that the availability of additional continuous response vari- ables can simultaneously improve the annotation of the se- quential observations and yield good prediction performance in both synthetic data and real-world datasets.Comment: AAAI 201

arXiv.org e-Print Archive

DukeSpace

Association for the Advancement of Artificial Intelligence: AAAI Publications

Identification of functionally related enzymes by learning-to-rank methods

Author: Airola Antti
De Baets Bernard
Fober Thomas
Glinca Serghei
Hüllermeier Eyke
Klebe Gerhard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue
Publication date: 01/01/2014
Field of study

Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Parametric Inference for Biological Sequence Analysis

Author: B. Sturmfels
Gardiner-Garden
Gusfield
L. Pachter
Lander
Waterman
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2004
Field of study

One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems associated with different statistical models. This paper introduces the \emph{polytope propagation algorithm} for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.Comment: 15 pages, 4 figures. See also companion paper "Tropical Geometry of Statistical Models" (q-bio.QM/0311009

arXiv.org e-Print Archive

A chemogenomic screening identifies CK2 as a target for pro-senescence therapy in PTEN-deficient tumours

Author: Alajati Abdullah
Alimonti Andrea
Brambilla Lara
Carbone Giuseppina
Catapano Carlo V.
Chen Jingjing
Cozza Giorgio
Danzer Baltzer Claudia
Delaleu Nicolas
Di Mitri Diletta
Frew Ian J.
Garcia Escudero R.
Guccini Ilaria
Kalathur Madhuri
Kalathur Ravi Kiran Reddy
Maccari Laura
Magnoni Letizia
Malusa Federico
Padova Alessandro
Pinna Lorenzo A.
Pinton Sandra
Revandkar Ajinkya
Ruzzene Maria
Sarti Manuela
Tarditi Alessia
Toso Alberto
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Enhancement of cellular senescence in tumours triggers a stable cell growth arrest and activation of an antitumour immune response that can be exploited for cancer therapy. Currently, there are only a limited number of targeted therapies that act by increasing senescence in cancers, but the majority of them are not selective and also target healthy cells. Here we developed a chemogenomic screening to identify compounds that enhance senescence in PTEN-deficient cells without affecting normal cells. By using this approach, we identified casein kinase 2 (CK2) as a pro-senescent target. Mechanistically, we show that Pten loss increases CK2 levels by activating STAT3. CK2 upregulation in Pten null tumours affects the stability of Pml, an essential regulator of senescence. However, CK2 inhibition stabilizes Pml levels enhancing senescence in Pten null tumours. Taken together, our screening strategy has identified a novel STAT3-CK2-PML network that can be targeted for pro-senescence therapy for cancer

Crossref

Archivio istituzionale della ricerca - Università di Padova

Euclidean distance geometry and applications

Author: Lavor Carlile
Liberti Leo
Maculan Nelson
Mucherino Antonio
Publication venue
Publication date: 02/05/2012
Field of study

Euclidean distance geometry is the study of Euclidean geometry based on the concept of distance. This is useful in several applications where the input data consists of an incomplete set of distances, and the output is a set of points in Euclidean space that realizes the given distances. We survey some of the theory of Euclidean distance geometry and some of the most important applications: molecular conformation, localization of sensor networks and statics.Comment: 64 pages, 21 figure

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

INRIA a CCSD electronic archive server

Repositorio da Producao Cientifica e Intelectual da Unicamp

HAL-Polytechnique

HAL-Rennes 1

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main