Search CORE

5,621 research outputs found

Protein multi-scale organization through graph partitioning and robustness analysis: Application to the myosin-myosin light chain interaction

Author: A Delmotte
Aynaud T
Blondel V D
Delano W L
Delmotte A
E W Tate
Hettmann C
Lambiotte R Delvenne J C Barahona M
M Barahona
Meliga S
Meliga S Yaliraki S N Barahona M
Newman M E J
S N Yaliraki
Schiaffino S
Publication venue: 'IOP Publishing'
Publication date: 20/09/2011
Field of study

Despite the recognized importance of the multi-scale spatio-temporal organization of proteins, most computational tools can only access a limited spectrum of time and spatial scales, thereby ignoring the effects on protein behavior of the intricate coupling between the different scales. Starting from a physico-chemical atomistic network of interactions that encodes the structure of the protein, we introduce a methodology based on multi-scale graph partitioning that can uncover partitions and levels of organization of proteins that span the whole range of scales, revealing biological features occurring at different levels of organization and tracking their effect across scales. Additionally, we introduce a measure of robustness to quantify the relevance of the partitions through the generation of biochemically-motivated surrogate random graph models. We apply the method to four distinct conformations of myosin tail interacting protein, a protein from the molecular motor of the malaria parasite, and study properties that have been experimentally addressed such as the closing mechanism, the presence of conserved clusters, and the identification through computational mutational analysis of key residues for binding.Comment: 13 pages, 7 Postscript figure

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

Efficient seeding techniques for protein similarity search

Author: Furletova Eugenia
Gambin Anna
Kucherov Gregory
Lasota Slawomir
Noé Laurent
Roytberg Mihkail
Szczurek Ewa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets.We then perform an analysis of seeds built over those alphabet and compare them with the standard Blastp seeding method [2,3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seed is less expressive (but less costly to implement) than the accumulative principle used in Blastp and vector seeds, our seeds show a similar or even better performance than Blastp on Bernoulli models of proteins compatible with the common BLOSUM62 matrix

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Latent tree models

Author: Zwiernik Piotr
Publication venue
Publication date: 02/08/2017
Field of study

Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. This article offers a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned

arXiv.org e-Print Archive

Crossref

Efficient seeding techniques for protein similarity search

Author: Roytberg Mihkail
Gambin Anna
Noé Laurent
Lasota Slawomir
Furletova Eugenia
Szczurek Ewa
Kucherov Gregory
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

arXiv.org e-Print Archive

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Automatic identification of highly conserved family regions and relationships in genome wide datasets including remote protein sequences

Author: Doğan Tunca
Karaçalı Bilge
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Identifying shared sequence segments along amino acid sequences generally requires a collection of closely related proteins, most often curated manually from the sequence datasets to suit the purpose at hand. Currently developed statistical methods are strained, however, when the collection contains remote sequences with poor alignment to the rest, or sequences containing multiple domains. In this paper, we propose a completely unsupervised and automated method to identify the shared sequence segments observed in a diverse collection of protein sequences including those present in a smaller fraction of the sequences in the collection, using a combination of sequence alignment, residue conservation scoring and graph-theoretical approaches. Since shared sequence fragments often imply conserved functional or structural attributes, the method produces a table of associations between the sequences and the identified conserved regions that can reveal previously unknown protein families as well as new members to existing ones. We evaluated the biological relevance of the method by clustering the proteins in gold standard datasets and assessing the clustering performance in comparison with previous methods from the literature. We have then applied the proposed method to a genome wide dataset of 17793 human proteins and generated a global association map to each of the 4753 identified conserved regions. Investigations on the major conserved regions revealed that they corresponded strongly to annotated structural domains. This suggests that the method can be useful in predicting novel domains on protein sequences

Crossref

Directory of Open Access Journals

PubMed Central

Dokuz Eylul University Research Information System

Probabilistic Clustering of Time-Evolving Distance Data

Author: AK Jain
AY Ng
C Leslie
CP Robert
D Blei
DD Lee
DM Blei
Gunnar Rätsch
H Saigo
J Pitman
Julia E. Vogt
M Bilodeau
Marius Kloft
MB Eisen
MS Srivastava
P McCullagh
P McCullagh
RM Neal
S Sonnenburg
Sandhya Prabhakaran
SN MacEachern
Stefan Stark
Sudhir S. Raman
SVN Vishwanathan
TS Ferguson
TW Anderson
Volker Roth
WJ Ewens
Publication venue
Publication date: 01/01/2015
Field of study

We present a novel probabilistic clustering model for objects that are represented via pairwise distances and observed at different time points. The proposed method utilizes the information given by adjacent time points to find the underlying cluster structure and obtain a smooth cluster evolution. This approach allows the number of objects and clusters to differ at every time point, and no identification on the identities of the objects is needed. Further, the model does not require the number of clusters being specified in advance -- they are instead determined automatically using a Dirichlet process prior. We validate our model on synthetic data showing that the proposed method is more accurate than state-of-the-art clustering methods. Finally, we use our dynamic clustering model to analyze and illustrate the evolution of brain cancer patients over time

arXiv.org e-Print Archive

Crossref

edoc