Search CORE

3,196 research outputs found

Supervised multivariate analysis of sequence groups to identify specificity determining residues

Author: A Carro
A del Sol Mesa
AC Culhane
AC Culhane
AR Fersht
CD Livingstone
CL Tucker
D Charif
Desmond G Higgins
DG Higgins
DH Morgan
E Beitz
F Pazos
G Casari
G Zhang
H Yao
HM Wilks
Iain M Wallace
J Thioulouse
JC Gower
JD Thompson
JG Henikoff
KM Mayer
L Yuan
LA Mirny
M Clamp
N Saitou
O Lichtarge
OV Kalinina
OV Kalinina
RC Gentleman
RD Finn
RJ Edwards
S Dolédec
S Henikoff
SJ Hubbard
SS Hannenhalli
TD Schneider
V Vacic
W Pirovano
WR Atchley
X Gu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Proteins that evolve from a common ancestor can change functionality over time, and it is important to be able identify residues that cause this change. In this paper we show how a supervised multivariate statistical method, Between Group Analysis (BGA), can be used to identify these residues from families of proteins with different substrate specifities using multiple sequence alignments. Results We demonstrate the usefulness of this method on three different test cases. Two of these test cases, the Lactate/Malate dehydrogenase family and Nucleotidyl Cyclases, consist of two functional groups. The other family, Serine Proteases consists of three groups. BGA was used to analyse and visualise these three families using two different encoding schemes for the amino acids. Conclusion This overall combination of methods in this paper is powerful and flexible while being computationally very fast and simple. BGA is especially useful because it can be used to analyse any number of functional classes. In the examples we used in this paper, we have only used 2 or 3 classes for demonstration purposes but any number can be used and visualised.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An entropy based heuristic model for predicting functional sub-type divisions of protein families

Author: Bakis Yasin
Bakış Yasin
Sezerman Ugur
Sezerman Uğur
Yorukoglu Deniz
Yörükoğlu Deniz
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/07/2009
Field of study

Multiple sequence alignments of protein families are often used for locating residues that are widely apart in the sequence, which are considered as influential for determining functional specificity of proteins towards various substrates, ligands, DNA and other proteins. In this paper, we propose an entropy-score based heuristic algorithm model for predicting functional sub-family divisions of protein families, given the multiple sequence alignment of the protein family as input without any functional sub-type or key site information given for any protein sequence. Two of the experimented test-cases are reported in this paper. First test-case is Nucleotidyl Cyclase protein family consisting of guanalyate and adenylate cyclases. And the second test-case is a dataset of proteins taken from six superfamilies in Structure-Function Linkage Database (SFLD). Results from these test-cases are reported in terms of confirmed sub-type divisions with phylogeny relations from former studies in the literature

Sabanci University Research Database

Statistical deconvolution of enthalpic energetic contributions to MHC-peptide binding affinity

Author: Davies M.N.
Drew M.G.B.
Flower D.R.
Hattotuwagama C.K.
Moss David S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Background: MHC Class I molecules present antigenic peptides to cytotoxic T cells, which forms an integral part of the adaptive immune response. Peptides are bound within a groove formed by the MHC heavy chain. Previous approaches to MHC Class I-peptide binding prediction have largely concentrated on the peptide anchor residues located at the P2 and C-terminus positions. Results: A large dataset comprising MHC-peptide structural complexes was created by re-modelling pre-determined x-ray crystallographic structures. Static energetic analysis, following energy minimisation, was performed on the dataset in order to characterise interactions between bound peptides and the MHC Class I molecule, partitioning the interactions within the groove into van der Waals, electrostatic and total non-bonded energy contributions. Conclusion: The QSAR techniques of Genetic Function Approximation (GFA) and Genetic Partial Least Squares (G/PLS) algorithms were used to identify key interactions between the two molecules by comparing the calculated energy values with experimentally-determined BL50 data. Although the peptide termini binding interactions help ensure the stability of the MHC Class I-peptide complex, the central region of the peptide is also important in defining the specificity of the interaction. As thermodynamic studies indicate that peptide association and dissociation may be driven entropically, it may be necessary to incorporate entropic contributions into future calculations

Central Archive at the University of Reading

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Aston Publications Explorer

Birkbeck Institutional Research Online

Multi-Harmony: detecting functional specificity from sequence alignment

Author: Attisano
B. W. Brandt
Chakrabarti
del Sol Mesa
Donald
Feng
Georgi
Hannenhalli
Herraez
J. Heringa
K. A. Feenstra
Kalinina
Kalinina
Reva
Wallace
Whisstock
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww

Crossref

VU Research Portal

PubMed Central

Ensemble approach to predict specificity determinants: benchmarking and validation

Author: A Carro
A del Sol
AA Schäffer
Anna R Panchenko
B Reva
DP Brown
E Marchiori
HM Berman
I Kononenko
IM Wallace
J Pei
JA Capra
JE Donald
K Mizuguchi
K Ye
L Mirny
N Krishnamurthy
O Lichtarge
OV Kalinina
OV Kalinina
P Marttinen
RF Doolittle
RM Ward
S Chakrabarti
S Chakrabarti
S Ohno
Saikat Chakrabarti
SS Hannenhalli
W Pirovano
WL DeLano
X Gu
X Gu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results. Results It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand. Conclusion We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Clustering of protein domains for functional and evolutionary studies

Author: Basrak Bojan
Cullum John
Etchebest Catherine
Goldstein Pavle
Hranueli Daslav
Kriško Anita
Long Paul F
Vujaklija Dušica
Zucko Jurica
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

King's Research Portal

Development of New Bioinformatic Approaches for Human Genetic Studies

Author: Coto Jose Andres Guevara
Publication venue: Clemson University Libraries
Publication date: 01/12/2017
Field of study

The development of bioinformatics methods for human genetic studies utilizes the vast amount of data to generate new valuable information. Machine learning and statistical coupling analysis can be used in the study of human diseases. These diseases include intellectual disabilities (ID), prevalent in 1-3% of the population and caused primarily by genetics. Although many cases of ID are caused by mutations in protein-coding genes, the possible involvement of long non-coding RNAs (lncRNAs) in ID due to their role in gene expression regulation, has been explored. In this study, we used machine learning to develop a new expression-based model trained using ID genes encoded with the developing brain transcriptome. The model was fine-tuned using the class-balancing approach of synthetic over-sampling of the minority class, resulting in improved performance. We used the model to predict candidate ID-associated lncRNAs. Our model identified several candidates that overlapped with previously reported ID-associated lncRNAs, enriched with neurodevelopmental functions, and highly expressed in brain tissues. Machine learning was also used to predict protein stability changes caused by missense mutations, which can lead to disease conditions including ID. We tested Random Forests, Support Vector Machines (SVM) and Naïve Bayes to find the best-performing algorithm to develop a multi-class classifier. We developed an SVM model using relevant physico-chemical features after feature selection. Our work identified new features for predicting the effect of amino acid substitutions on protein stability and a well-performing multi-class classifier solely based on sequence information. Statistical approaches were used to analyze the association between mutations and phenotypes. In this study, we used statistical coupling analysis (SCA) to cluster disease-causing mutations and ID phenotypes. Using SCA we identified groups of co-evolving residues, known as protein sectors, in ID protein families. Within each distinct sector, mutations associated with different phenotypic manifestations associated with a syndromic ID were identified. Our results suggest that protein sector analysis can be used to associate mutations with phenotypic manifestations in human diseases. The bioinformatic methods developed in this dissertation can be used in human genetic research to understand the role of new genes and proteins in human disease

Clemson University: TigerPrints

Combining specificity determining and conserved residues improves functional site prediction

Author: A Carro
A del Sol Mesa
A Shulman-Peleg
A Stark
A Stark
A Teplyakov
AE Todd
ATR Laurie
B Ma
B Mirkin
B Reva
B Zambelli
BJ Polacco
C Romier
C Yeats
CT Porter
DA Rodionov
EA Gaucher
G Dodson
G Koczyk
G Wu
GJ Kleywegt
H Yao
IM Wallace
IN Shindyalov
J Capra
J Dundas
J Pei
J-M Chandonia
JA Capra
JE Donald
JR Manning
K Ye
K Ye
KA Feenstra
KM Mayer
L Aravind
L Holm
LA Mirny
M Hendlich
M Landau
MA Willis
Mikhail S Gelfand
O Lichtarge
Olga V Kalinina
OV Kalinina
OV Kalinina
P Aloy
PP Khil
PP Khil
R Landgraf
RD Finn
RJ Edwards
Robert B Russell
S Ahmad
S Chakrabarti
S Sankararaman
S Whelan
SS Hannenhalli
T Maier
T Pupko
WR Taylor
WSJ Valdar
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The C-Terminal Fragment of Prostate-Specific Antigen, a 2331 Da Peptide, as a New Urinary Pathognomonic Biomarker Candidate for Diagnosing Prostate Cancer

Author: Goto Takayuki
Ikawa Kuniko
Inoue Takahiro
Iwamoto Shinichi
Kajihara Shigeki
Kawabata Shin-Ichiro
Miyazaki Yu
Nakayama Kenji
Ogawa Osamu
Oosaga Junko
Sekiya Sadanori
Tanaka Koichi
Terada Naoki
Tsuji Hiroaki
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Background and Objectives: Prostate cancer (PCa) is one of the most common cancers and leading cause of cancer-related deaths in men. Mass screening has been carried out since the 1990s using prostate-specific antigen (PSA) levels in the serum as a PCa biomarker. However, although PSA is an excellent organ-specific marker, it is not a cancer-specific marker. Therefore, the aim of this study was to discover new biomarkers for the diagnosis of PCa. Materials and Methods: We focused on urine samples voided following prostate massage (digital rectal examination [DRE]) and conducted a peptidomic analysis of these samples using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/MS_n). Urinary biomaterials were concentrated and desalted using CM-Sepharose prior to the following analyses being performed by MALDI-TOF/MS_n: 1) differential analyses of mass spectra; 2) determination of amino acid sequences; and 3) quantitative analyses using a stable isotope-labeled internal standard. Results: Multivariate analysis of the MALDI-TOF/MS mass spectra of urinary extracts revealed a 2331 Da peptide in urine samples following DRE. This peptide was identified as a C-terminal PSA fragment composed of 19 amino acid residues. Moreover, quantitative analysis of the relationship between isotope-labeled synthetic and intact peptides using MALDI-TOF/MS revealed that this peptide may be a new pathognomonic biomarker candidate that can differentiate PCa patients from non-cancer subjects. Conclusion: The results of the present study indicate that the 2331 Da peptide fragment of PSA may become a new pathognomonic biomarker for the diagnosis of PCa. A further large-scale investigation is currently underway to assess the possibility of using this peptide in the early detection of PCa

Directory of Open Access Journals

PubMed Central

Kyoto University Research Information Repository

Robust Algorithms for Detecting Hidden Structure in Biological Data

Author: Sloutsky Roman
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

Washington University St. Louis: Open Scholarship