Search CORE

734 research outputs found

CodonTest: Modeling Amino Acid Substitution Preferences in Coding Sequences

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Cronfa at Swansea University

Stellenbosch University SUNScholar Repository

Predicting DNA-Binding Specificities of Eukaryotic Transcription Factors

Author: A Juncker
A Kel
A Moll
A Prakash
A Sandelin
A Sarai
Adrian Schröder
AM Leontovich
AM Waterhouse
Andreas Zell
BC Foat
BE Engelhardt
C Bock
C Wrzodek
Carsten Henneges
CJ Harrison
CJ Mungall
CM Bergman
CS Leslie
D Alamanova
D Wilson
D Zhou
DA Rodionov
DE Newburger
Dierk Wanke
DL Wheeler
E Boutet
E Kretschmann
E Wingender
G Badis
H Hegyi
H Li
H Saigo
H Saigo
HG Roider
J Kilian
J Kopp
J Supper
J Zhu
JA Gerlt
JC Bryne
JL Risler
Jochen Supper
Johannes Eichner
Jonas Eichner
JV Turatsinze
K Higo
K Liolios
K Niefind
K Pearson
L Liao
L Narlikar
L Wei
LJ Jensen
M Akerfelt
M Piipari
MA Andrade
MC Teixeira
MO Dayhoff
N Shental
P Baldi
P Bork
P Flicek
P Stegmaier
PH von Hippel
PK Mehta
PV Loo
R Bonneau
R Lüthy
RCG Holland
RV Davuluri
S Aerts
S Henikoff
S Kawashima
S Mahony
S Mahony
S Miyazawa
SB Needleman
SJ Maerkl
T Miyata
Tim J. Hubbard
TM Alleyne
U Gerland
UJ Pape
V Matys
V Matys
XD Liu
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

First insights into the microbiology of three antarctic briny systems of the northern Victoria land

Author: Azzaro M.
Caruso G.
Guglielmin M.
La Ferla R.
Lo Giudice A.
Maimone G.
Papale M.
Rizzo C.
Publication venue
Publication date: 01/01/2021
Field of study

Different polar environments (lakes and glaciers), also in Antarctica, encapsulate brine pools characterized by a unique combination of extreme conditions, mainly in terms of high salinity and low temperature. Since 2014, we have been focusing our attention on the microbiology of brine pockets from three lakes in the Northern Victoria Land (NVL), lying in the Tarn Flat (TF) and Boulder Clay (BC) areas. The microbial communities have been analyzed for community structure by next generation sequencing, extracellular enzyme activities, metabolic potentials, and microbial abundances. In this study, we aim at reconsidering all available data to analyze the influence exerted by environmental parameters on the community composition and activities. Additionally, the prediction of metabolic functions was attempted by the phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt2) tool, highlighting that prokaryotic communities were presumably involved in methane metabolism, aromatic compound biodegradation, and organic compound (proteins, polysaccharides, and phosphates) decomposition. The analyzed cryoenvironments were different in terms of prokaryotic diversity, abundance, and retrieved metabolic pathways. By the analysis of DNA sequences, common operational taxonomic units ranged from 2.2% to 22.0%. The bacterial community was dominated by Bacteroidetes. In both BC and TF brines, sequences of the most thermally tolerant and methanogenic Archaea were detected, some of them related to hyperthermophiles

Archivio istituzionale della ricerca - Università dell'Insubria

Computationally Comparing Biological Networks and Reconstructing Their Evolution

Author: Patro Robert
Publication venue
Publication date: 01/01/2012
Field of study

Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks. First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition. Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work. Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data. Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models

CiteSeerX

Digital Repository at the University of Maryland

ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability

Author: Arenas Busto Miguel
Bastolla Ugo
Publication venue: Xenómica e Biomedicina
Publication date: 19/02/2024
Field of study

The ancestral sequence reconstruction (ASR) is a molecular evolution technique that provides applications to a variety of fields such as biotechnology and biomedicine. To infer ancestral sequences with realistic biological properties, the accuracy of ASR methods is crucial. We previously developed an ASR framework for proteins, called ProtASR, which is based on our site‐specific stability‐constrained substitution (SCS) model with selection on protein folding stability against both unfolding and misfolding. This model improved the empirical substitution models traditionally applied in ASR without increasing the computational complexity. However, it adopted a global exchangeability matrix, an approximation that we overcome here by considering site‐specific exchangeability matrices based on the Halpern–Bruno approach. Here we present ProtASR2, a new version of our ASR framework that implements novel SCS models of protein evolution, namely mean‐field (MF) and wild‐type (WT). ProtASR2 under MF and WT SCS models outperforms empirical models and previous SCS models in terms of goodness of fit and site‐specific distributions of amino acids. Importantly, the framework infers ancestral sequences with more realistic predicted folding stability with respect to simulated sequences, while empirical, CAT and other SCS models tend to overestimate the folding stability. We applied ProtASR2 to explore the evolution of two protein families present in diverse Prokaryota and found fluctuations of protein stability over time in both families. ProtASR2 is available from https://github.com/miguelarenas/protasr and the new SCS models are also available from https://github.com/ugobas/protevol Use of ProtASR2 will allow more realistic inferences of ancestral proteins in terms of folding stability with respect to those based on traditional empirical and CAT substitution models of protein evolution.Agencia Estatal de Investigación | Ref. RYC-2015-18241Agencia Estatal de Investigación | Ref. BIO2016-79043-PXunta de Galicia | Ref. ED431F 2018/08Fundación Ramón Arece

Investigo

Computational Approaches to Understanding the Structure, Dynamics, Functions, and Mechanisms of Various Bacterial Proteins

Author: Cooper Connor
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2020
Field of study

The 3D structure of a protein can be fundamentally useful for understanding protein function. In the absence of an experimentally determined structure, the most common way to obtain protein structures is to use homology modeling, or the mapping of the target sequence onto a closely related homolog with an available structure. However, despite recent efforts in structural biology, the 3D structures of many proteins remain unknown. Recent advances in genomic and metagenomic sequencing coupled with coevolution analysis and protein structure prediction have allowed for highly accurate models of proteins that were previously considered intractable to model due to the lack of suitable templates. Structural models obtained from homology modeling, coevolution-based modeling, or crystallography can then be used with other computational tools such as small molecule docking or molecular dynamics (MD) simulations to help understand protein function, dynamics, and mechanism.Here coevolution-based modeling was used to build a structural model of the HgcAB complex involved in mercury methylation (Chapter I). Based on the model it was proposed that conserved cysteines in HgcB are involved in shuttling mercury, methylmercury, or both. MD simulations and docking to a homology model of E. coli inosine monophosphate dehydrogenase (IMPDH) provided insights into how a single amino acid mutation could relieve inhibition by altering protein structure and dynamics (Chapter II). Coevolution-based structure prediction was also combined with docking, and experimental activity data to generate machine learning models that predict enzyme substrate scope for a series of bacterial nitrilases (Chapter III). Machine learning was also used to identify physicochemical properties that describe outer membrane permeability and efflux in E. coli and P. aeruginosa and new efflux pump inhibitors for the E. coli AcrAB-TolC efflux pump were identified using existing physicochemical guidelines in combination with small molecule docking to a homology model of AcrA (Chapter IV). Lastly, quantum mechanical/molecular mechanical simulations were used to study the mechanism of a key proton transfer step in Toho-1 beta-lactamase using experimentally determined structures of both the apo and cefotaxime-bound forms. These simulations revealed that substrate binding promotes catalysis by enhancing the favorability of this initial proton transfer step (Chapter V)

University of Tennessee, Knoxville: Trace

Recommended from our members

The Influence of Structural Constraints on Protein Evolution

Author: Perron Umberto
Publication venue: University of Cambridge
Publication date: 01/05/2020
Field of study

Few mathematical models of sequence evolution incorporate parameters describingprotein structure, despite its high conservation, essential functional role and the increasingavailability of structural data. The primary goal of my PhD project was to create astructurally aware amino acid substitution model in which proteins are represented usingan expanded alphabet that relays both amino acid identity and structural information.Each character in this alphabet specifies an amino acid as well as information aboutthe rotamer configuration of its side chain: the discrete geometric pattern of permittedside chain atomic positions, as defined by the dihedral angles between covalently linkedatoms. I generated a 55-state “Dayhoff-like” substitution model (RAM55) by assigningrotamer states in 79,558 structures (∼50%of all PDBe entries) and identifying substitu-tions between closely related sequences. RAM55’s rotamer state exchange patterns clearlyshow that the evolutionary properties of amino acids depend strongly upon side chain ge-ometry. Exploiting knowledge of these patterns assists in phylogenetic analyses: I showthat RAM55 performs as well as or better than traditional 20-state models on simulatedand empirical data for divergence time estimation, tree inference, side chain configurationprediction and ancestral sequence reconstruction.Further, encoding observed characters in an alignment as ambiguous representations ofcharacters in a larger state-space allows the application of RAM55 to 20-state amino aciddata for which structures are not known. Adding structural information to as few as12.5%of the sequences in an amino acid alignment results in excellent ancestral reconstructionperformance compared to a benchmark that considers the full rotamer state information.This strategy significantly expands the applicability of RAM55 to real-world scenarioswhere structure might only be available for some of the sequences of interest.Thus, not only is rotamer configuration a valuable source of information for phylo-genetic studies, but modelling the concomitant evolution of sequence and structure mayhave important implications for understanding protein folding and function

Apollo (Cambridge)

The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studies

Author: Moses Vuyani
Publication venue: Faculty of Science, Biochemistry and Microbiology
Publication date: 01/01/2018
Field of study

AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)

Beyond structural genomics: computational approaches for the identification of ligand binding sites in protein structures

Author: Ghersi Dario
Sanchez Roberto
Publication venue: DigitalCommons@UNO
Publication date: 03/05/2011
Field of study

t Structural genomics projects have revealed structures for a large number of proteins of unknown function. Understanding the interactions between these proteins and their ligands would provide an initial step in their functional characterization. Binding site identification methods are a fast and cost-effective way to facilitate the characterization of functionally important protein regions. In this review we describe our recently developed methods for binding site identification in the context of existing methods. The advantage of energy-based approaches is emphasized, since they provide flexibility in the identifi- cation and characterization of different types of binding site

Crossref

PubMed Central

The University of Nebraska, Omaha

Bioinformatické metody detekce koevoluce proteinů

Author: Pařízková Hana
Publication venue: Univerzita Karlova, Přírodovědecká fakulta
Publication date: 01/01/2018
Field of study

The term coevolution describes the situation when two or more species or biomole- cules reciprocally affect each others' evolution. On the protein level, it is thought to be the main mechanism ensuring correct folding, interactions and function of a protein, and it can be observed both on the level of interacting protein families and individual amino acid residues. Coevolution studies have been proved to be a powerful tool for prediction of protein structure, function, interaction partners, etc. In this thesis, different algorithms used for detection of protein coevolution are described, as well as their applications and limitations. Keywords: coevolution, protein family, protein structure prediction, interac- tion partners, correlated mutations, mirrortree, mutual information, direct cou- pling analysisSlovem koevoluce popisujeme stav, kdy dva či více druhů nebo biomolekul vzá- jemně ovlivňují svou evoluci. Na proteinové úrovni je koevoluce považována za jeden z hlavních mechanismů zajišťujících správné sbalení, interakce a funkci pro- teinů. Pozorována může být jak na úrovni interagujících proteinových rodin, tak na úrovni jednotlivých aminokyselinových residuí. Studium koevoluce může být užitečným nástrojem při predikci struktury proteinů, jejich funkce, interakčních partnerů, apod. V této práci jsou popsány algoritmy, které jsou používány k detekci koevoluce proteinů, stejně jako jejich možné aplikace a omezení. Klíčová slova: koevoluce, proteinová rodina, predikce struktury proteinů, in- terakční partneři, korelované mutace, mirrortree, vzájemná informace, analýza přímého párováníDepartment of Cell BiologyKatedra buněčné biologieFaculty of SciencePřírodovědecká fakult

CU Digital Repository