546 research outputs found
Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules
© The Author(s) 2018 This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Abstract
Objectives
The arrival of free oxygen on the globe, aerobic life is becoming possible. However, it has become very clear that the oxygen binding proteins are widespread in the biosphere and are found in all groups of organisms, including prokaryotes, eukaryotes as well as in fungi, plants, and animals. The exponential growth and availability of fresh annotated protein sequences in the databases motivated us to develop an improved version of “Oxypred” for identifying oxygen-binding proteins.
Results
In this study, we have proposed a method for identifying oxy-proteins with two different sequence similarity cutoffs 50 and 90%. A different amino acid composition based Support Vector Machines models was developed, including the evolutionary profiles in the form position-specific scoring matrix (PSSM). The fivefold cross-validation techniques were applied to evaluate the prediction performance. Also, we compared with existing methods, which shows nearly 97% recognition, but, our newly developed models were able to recognize almost 99.99 and 100% in both oxy-50 and 90% similarity models respectively. Our result shows that our approaches are faster and achieve a better prediction performance over the existing methods. The web-server Oxypred2 was developed for an alternative method for identifying oxy-proteins with more additional modules including PSSM, available at
http://bioinfo.imtech.res.in/servers/muthu/oxypred2/home.html
Recommended from our members
The evolution of protein kinase specificity
All research conducted at EMBL-EBI under the supervision of Dr. Pedro Beltrao. Work on the PhD project was paused temporarily in the Spring of 2017 for me to undertake a 3-month internship at EMBO Press (in Heidelberg).Protein phosphorylation represents one of the most important post-translational modifica-
tions (PTMs) for cell signalling, and is is catalysed by a group of enzymes called protein
kinases. Through this activity they serve as key regulators of almost all cellular processes.
This is achieved at any time by a network of different kinases that are transiently active. The
fidelity of cell systems control therefore requires that each kinase targets only a restricted set
of substrates. This specificity is achieved partly by contextual factors that separate kinases
spatially and temporally, but also by sequence features that are encoded in the kinase domain
itself.
For this thesis I focus on elements of kinase specificity that are encoded in the the active
site of the enzyme. During these investigations I have tried to address three main questions:
1) How is specificity for residues surrounding the phosphorylation site determined in the
kinase? 2) How did these specificities evolve? and 3) To what extent does kinase evolution
correlate with the evolution of its substrates?
First, I developed a sequence-based method for the automated detection of kinase speci-
ficity determining residues (SDRs). The putative determinants were then rationalised using
available structural data, and in two specific cases were validated experimentally. I also used
mutation data from The Cancer Genome Atlas (TCGA) to demonstrate that kinase SDRs are
often targeted during cancer.
Second, a global analysis of SDR evolution was performed for kinases following gene
duplication and speciation, revealing that SDRs often diverge between paralogues but not
between orthologues. This global analysis is followed by a detailed case study of G-protein
coupled receptor kinase (GRKs) evolution using ancestral sequence reconstructions.
Third, I inferred global substrate preferences in a taxonomically broad range of species
using phosphoproteome data. I then related the evolution of substrate motif sequences to
that of their cognate effector kinases where possible. The results strongly suggest that many
of the motifs emerged in a universal eukaryotic ancestor.
I finish by summarising the major findings of this doctoral research, which to my knowl-
edge represents the most comprehensive analysis to date of protein kinase specificity and its
evolution.BBSR
Mutations in the protein kinase superfamily
Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 25-11-201
Nuevas aproximaciones computacionales para el estudio y la predicción funcional de dominios de proteínas
Tesis doctoral inédita. Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 23-09-2013Obtaining experimental information on the structure, function and
important residues for the proteins of a given organism is very
time-consuming and expensive. For that reason, developing computational
techniques for assigning functional features to protein sequences is an active
area of research.
Almost all resources for predicting protein function assign functional
terms to whole chains, and do not distinguish which particular domain is
responsible for the allocated function. This is due to the fact that in the
databases of functional annotations these methods use, these annotations
are done on a whole-chain basis. Nevertheless, domains are the basic
evolutionary and often functional units of proteins. Moreover, in many
cases the domains of a protein chain have distinct molecular functions,
independent from each other. For this reason, resources with functional
annotations at the domain level, as well as methodologies for predicting
function for individual domains adapted to these resources are required.
The main proposal of this thesis was to generate such two resources. We
generated the rst large-scale functional annotation at the domain level by
annotating the SCOP structural domains with gene ontology terms.
Additionally, we performed a large-scale comparison of these annotations
with the ones implicit in the functional annotations of InterPro signatures,
showing that the performance of this method is globally better.
Based on this database of functional annotations at the domain level,
we developed a methodology for predicting the molecular function of
individual domains and showed that this approach outperforms a standard
method based on sequence searches in assigning functions. Additionally, we
implemented this methodology on a web server for the concomitant
prediction of fold, molecular function and functional sites at the domain
level.
Although it is clear that the amino acid types are by far the main
determinants of the functional features of proteins, several studies
suggested that translational speed may also be playing a role in some cases.However, a large scale comparative study on its relationship with a
comprehensive diverse set of annotated functional features was missing. For
that reason, we performed the rst large scale analysis of the relationship
between three experimental proxies of mRNA translation speed and the
local features of the corresponding encoded proteins. We found that a
number of protein functional and structural features are related to these
mRNA properties. This results support the idea that the genome not only
codes the protein functional features as sequences of amino acids, but also
as subtle patterns of mRNA properties which, probably through local
e ects on the translation speed, have some consequence on the nal
polypeptide. Although the patterns found so far are in general very subtle,
for particular cases with very clear patterns these could be used for
predicting protein functional sites using single gene sequences. These results
might have also implications for the heterologous expression of proteins
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Fragment Based Protein Active Site Analysis Using Markov Random Field Combinations of Stereochemical Feature-Based Classifications
Recent improvements in structural genomics efforts have greatly increased the
number of hypothetical proteins in the Protein Data Bank. Several computational
methodologies have been developed to determine the function of these proteins but
none of these methods have been able to account successfully for the diversity in
the sequence and structural conformations observed in proteins that have the same
function. An additional complication is the
flexibility in both the protein active site
and the ligand.
In this dissertation, novel approaches to deal with both the ligand flexibility
and the diversity in stereochemistry have been proposed. The active site analysis
problem is formalized as a classification problem in which, for a given test protein,
the goal is to predict the class of ligand most likely to bind the active site based
on its stereochemical nature and thereby define its function. Traditional methods
that have adapted a similar methodology have struggled to account for the
flexibility
observed in large ligands. Therefore, I propose a novel fragment-based approach to
dealing with larger ligands. The advantage of the fragment-based methodology is
that considering the protein-ligand interactions in a piecewise manner does not affect
the active site patterns, and it also provides for a way to account for the problems
associated with
flexible ligands. I also propose two feature-based methodologies to account for the diversity observed
in sequences and structural conformations among proteins with the same function.
The feature-based methodologies provide detailed descriptions of the active site
stereochemistry and are capable of identifying stereochemical patterns within the
active site despite the diversity.
Finally, I propose a Markov Random Field approach to combine the individual
ligand fragment classifications (based on the stereochemical descriptors) into a single
multi-fragment ligand class. This probabilistic framework combines the information
provided by stereochemical features with the information regarding geometric constraints
between ligand fragments to make a final ligand class prediction.
The feature-based fragment identification methodology had an accuracy of 84%
across a diverse set of ligand fragments and the mrf analysis was able to succesfully
combine the various ligand fragments (identified by feature-based analysis) into one
final ligand based on statistical models of ligand fragment distances. This novel
approach to protein active site analysis was additionally tested on 3 proteins with very
low sequence and structural similarity to other proteins in the PDB (a challenge for
traditional methods) and in each of these cases, this approach successfully identified
the cognate ligand. This approach addresses the two main issues that affect the
accuracy of current automated methodologies in protein function assignment
- …