Search CORE

405 research outputs found

Phylogenetic correlations can suffice to infer protein partners from sequences

Author: Bitbol Anne-Florence
Marmier Guillaume
Weigt Martin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

International audienceDetermining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Directory of Open Access Journals

Inferring interaction partners from protein sequences using mutual information

Author: Bitbol Anne-Florence
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2018
Field of study

Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.Comment: 26 pages, 11 figures, published versio

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

arXiv.org e-Print Archive

Crossref

Information Theory in Molecular Evolution: From Models to Structures and Dynamics

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

This Special Issue collects novel contributions from scientists in the interdisciplinary field of biomolecular evolution. Works listed here use information theoretical concepts as a core but are tightly integrated with the study of molecular processes. Applications include the analysis of phylogenetic signals to elucidate biomolecular structure and function, the study and quantification of structural dynamics and allostery, as well as models of molecular interaction specificity inspired by evolutionary cues

Directory of Open Access Books (DOAB)

Protein 3D Structure Computed from Evolutionary Sequence Variation

Author: A Kryshtafovych
A Roy
A Schug
A Zemla
AA Fodor
AF Poon
AF Poon
Andrea Pagnani
Andrej Sali
AP Kamat
AR Ortiz
AR Ortiz
ASGB Lapedes
AT Brunger
B Reva
BG Giraud
C Chothia
Chris Sander
CS Miller
D Altschuh
D Altschuh
D Cozzetto
DE Kim
DE Shaw
Debora S. Marks
E Neher
E Schneidman
EI Shakhnovich
F Morcos
G Kolesov
H Fehlhammer
HRFB Kappen
IN Shindyalov
J DeBartolo
J Moult
J Moult
J Moult
J Qiu
J Skolnick
JM Duarte
JM Skerker
JS Yang
JW Locasale
KT Simons
L Burger
L Burger
L Holm
Lucy J. Colwell
M Mezard
M Miyano
M Vendruscolo
M Weigt
MMT Mezard
N Halabi
N Siew
P Bradley
P Bradley
P Fariselli
P Joost
PMJW Ravikumar
R Das
R Nair
R Sathyapriya
RD Finn
Riccardo Zecchina
RO Dror
Robert Sheridan
S Raman
S Raman
S Wu
S Wu
S Yooseph
SD Dunn
T Mora
TF Havel
Thomas A. Hopf
TR Lezon
TR Lezon
U Göbel
V Morea
VMR Sessak
WP Russ
WR Atchley
WR Taylor
WR Taylor
Y Duan
Y Zhang
Y Zhang
YJAH Roudi
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Unsupervised inference methods for protein sequence data

Author: SESTA LUCA
Publication venue: country:Italy
Publication date: 12/05/2023
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Assessing the utility of mutual information stored in protein-protein interfaces to infer specific protein partners

Author: Pontes Camila Ferreira Thé
Publication venue
Publication date: 16/03/2021
Field of study

Tese (doutorado)—Universidade de Brasília, Instituto de Ciências Biológicas, Departamento de Biologia Celular, Programa de Pós-Graduação em Biologia Molecular, 2021.Proteínas são essenciais para diversos processos celulares. Assim, um dos objetivos centrais da Biologia é entender as relações entre sequência, estrutura e função dessas macromoléculas. Nesse contexto, as marcas deixadas pelo processo coevolutivo em sequências de proteínas parceiras são uma importante fonte de informação estrutural. De fato, as correlações estatísticas entre sítios de aminoácidos em sequências de proteínas são a base dos métodos mais modernos para a previsão de contatos inter- e intra-proteínas, predição de estrutura tridimensional, identificação de sítios funcionais e resíduos determinantes de especificidade, inferência de interações entre parálogos, entre outras aplicações. Em consonância com isso, o presente trabalho apresenta um conjunto de resultados teóricos sobre como proteínas parceiras específicas podem ser recuperadas com base apenas nas informações da sequência. No primeiro capítulo, é realizada uma decomposição da informação mútua (MI) presente nos complexos proteína-proteína, considerando a hipótese de que a MI em proteínas se origina de uma combinação de diferentes fontes: coevolutiva, evolutiva e estocástica. Foi observado que a interface contém, em média por contato, mais informações do que o restante do complexo protéico, resultado que se mantém quando se considera tanto a MI de Shannon quanto a de Tsallis como medida de informação. Essa observação levou à conclusão de que a interface contém o sinal de informação mais forte para distinguir o conjunto correto de proteínas parceiras em famílias de proteínas que interagem. Com base nisso, a utilidade de usar a MI armazenada em interfaces proteína-proteína para recuperar o conjunto correto de proteínas parceiras é avaliada no segundo capítulo. Um algoritmo genético (GA) foi desenvolvido para explorar o espaço de possíveis concatenações entre um par de famílias de proteínas que interagem usando a MI da interface como função objetivo. Usando o GA, a maximização da MI da interface foi realizada para 26 pares de famílias de proteínas que interagem e foi observado que concatenações otimizadas correspondem a soluções degeneradas com duas fontes de erro distintas, decorrentes de pareamentos errados entre (i) sequências similares e (ii) não similares. Quando os erros cometidos com sequências semelhantes foram desconsiderados, as soluções do tipo (i) apresentaram taxas de verdadeiros positivos (TP) de 70 % - muito acima das mesmas estimativas para soluções do tipo (ii). Esses resultados se mantêm quando as otimizações são feitas com base na MI de Tsallis. Essas descobertas levantam questões sobre os mecanismos por trás da coevolução de proteínas parceiras e ajudam a racionalizar os dados da literatura que mostram uma forte deterioração das taxas de TP com o aumento do número de sequência em abordagens baseadas em MI.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).Proteins are essential for several cellular processes. Hence, one of the central objectives in Biology is to understand the relationships between sequence, structure and function of these macromolecules. In this context, marks left by the coevolutionary process in interacting protein sequences are an important source of structural information. In fact, statistical correlations between amino acid sites in protein sequences are at the basis of state-of-the-art methods for prediction of inter- and intra-protein contacts, template-free structure prediction, identification of functional sites and specificity determining residues, inference of interacting paralogs, among other applications. In line with that, the present work conveys a set of theoretical results on how specific protein partners can be recovered based on sequence information alone. In the first chapter, a decomposition of the mutual information (MI) present in protein-protein complexes is carried out, considering the hypothesis that MI in proteins is originated from a combination of coevolutive, evolutive and stochastic sources. It was observed that the interface contains on average, by contact, more information than the rest of the protein complex, a result that holds when considering both Shannon and Tsallis MI as a measure of information. This observation led to the conclusion that the interface contains the strongest information signal for distinguishing the correct set of protein partners in interacting protein families. Building on that, the utility of using MI encoded on protein-protein interfaces to recover the correct set of protein partners is assessed in the second chapter. A genetic algorithm (GA) was developed to explore the space of possible concatenations between a pair of interacting protein families using the interface MI as objective function. Using the GA, interface MI maximization was performed for 26 different pairs of interacting protein families and it was observed that optimized concatenations corresponded to degenerate solutions with two distinct error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. When mistakes made among similar sequences were disregarded, type-(i) solutions were found to resolve correct pairings at best true positive (TP) rates of 70% - far above the very same estimates in type-(ii) solutions. These results hold when the optimizations are made based on Tsallis MI. These findings raise further questions about the mechanisms behind protein partners coevolution and help rationalize literature data showing a sharp deterioration of TP rates with increasing sequence number in MI-based approaches

Repositório Institucional da Universidade de Brasília

Assessing Microbial Diversity Through Nucleotide Variation

Author: Eren Ahmet
Publication venue: ScholarWorks@UNO
Publication date: 20/05/2011
Field of study

Microbes are the most abundant and most diverse form of life on Earth, constituting the largest portion of the total biomass of the entire planet. They are present in every niche in nature, including very extreme environments, and they govern biogeochemical transformations in ecosystems. The human body is home to a diverse assemblage of microbial species as well. In fact, the number of microbial cells in the gastrointestinal tract, oral cavity, skin, airway passages and urogenital system is approximately an order of magnitude greater than the number of cells that make up the human body itself, and changes in the composition and relative abundance of these microbial communities are highly associated with intestinal and respiratory disorders and diseases of the skin and mucus membranes. In the early 1990\u27s, cultivation-‐independent methods, especially those based on PCR-‐amplification and sequences of phylogenetically informative 16S rRNA genes, made it possible to assess the composition of microbial species in natural environments, advances in high-‐throughput sequencing technologies in recent years have increased sequencing capacity and microbial detection by orders of magnitude. However, the effectiveness of current computational methods available to analyze the vast amounts of sequence data is poor and investigating the diversity within microbial communities remains challenging. In addition to offering an easy-‐to-‐use visualization and statistical analysis framework for microbial community analyses, the study described herein aims to present a biologically relevant computational approach for assessing microbial diversity at finer scales of microbial communities through nucleotide variation in 16S rRNA genes

University of New Orleans

The Rough Guide to In Silico Function Prediction, or How To Use Sequence and Structure Information To Predict Protein Function

Author: A Armon
A Bateman
A Godzik
A Passerini
A Pierleoni
A Stark
AE Todd
AT Laurie
B Rost
B Rost
BA Shoemaker
C Notredame
CA Innis
CA Orengo
CA Wilson
CE Stebbins
CJ Jeffery
CJ Jeffery
CP Ponting
CT Porter
D Brown
D Desveaux
D Devos
D Pal
D Petrey
D Petrey
E Krissinel
E Reynolds
EP Gianchandani
F Corpet
F Ferron
F Zhou
Fran Lewitter
G Theissen
GJ Bartlett
GJ Kleywegt
GL Holliday
H Nakashima
HL Schubert
HM Berman
IM Wallace
J Hawkins
J Thompson
JA Barker
JB Bard
JC Whisstock
JG Henikoff
JM Thornton
JS Sodhi
JW Torrance
JZ Wang
K Goyal
K Hofmann
K Karplus
K Nakai
L Holm
L Jaroszewski
L Shapiro
L Wang
LJ Jensen
M Babor
M Gruber
M Linial
M Lippi
M Nayal
M Remm
Marco Punta
MJ Hartshorn
O Emanuelsson
O Lichtarge
OA Bateman
OC Redfern
P Puntervoll
PD Thomas
R Apweiler
R Kolodny
R Nair
R Nair
R Nair
RA Laskowski
RL Tatusov
S Altschul
S Shazman
SG Lee
T Gabaldon
TA Binkowski
TJ Hubbard
TK Attwood
VA Ivanisenko
W Humphrey
W Tian
Y Ofran
Y Ye
Yanay Ofran
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Crossref

Directory of Open Access Journals

PubMed Central