916 research outputs found
Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots
<p>Abstract</p> <p>Background</p> <p>An important and yet rather neglected question related to bioinformatics predictions is the estimation of the amount of data that is needed to allow reliable predictions. Bioinformatics predictions are usually validated through a series of figures of merit, like for example sensitivity and precision, and little attention is paid to the fact that their performance may depend on the amount of data used to make the predictions themselves.</p> <p>Results</p> <p>Here I describe a tool, named Fragmented Prediction Performance Plot (FPPP), which monitors the relationship between the prediction reliability and the amount of information underling the prediction themselves. Three examples of FPPPs are presented to illustrate their principal features. In one example, the reliability becomes independent, over a certain threshold, of the amount of data used to predict protein features and the intrinsic reliability of the predictor can be estimated. In the other two cases, on the contrary, the reliability strongly depends on the amount of data used to make the predictions and, thus, the intrinsic reliability of the two predictors cannot be determined. Only in the first example it is thus possible to fully quantify the prediction performance.</p> <p>Conclusion</p> <p>It is thus highly advisable to use FPPPs to determine the performance of any new bioinformatics prediction protocol, in order to fully quantify its prediction power and to allow comparisons between two or more predictors based on different types of data.</p
Deriving a mutation index of carcinogenicity using protein structure and protein interfaces
With the advent of Next Generation Sequencing the identification of mutations in the genomes of healthy and diseased tissues has become commonplace. While much progress has been made to elucidate the aetiology of disease processes in cancer, the contributions to disease that many individual mutations make remain to be characterised and their downstream consequences on cancer phenotypes remain to be understood. Missense mutations commonly occur in cancers and their consequences remain challenging to predict. However, this knowledge is becoming more vital, for both assessing disease progression and for stratifying drug treatment regimes. Coupled with structural data, comprehensive genomic databases of mutations such as the 1000 Genomes project and COSMIC give an opportunity to investigate general principles of how cancer mutations disrupt proteins and their interactions at the molecular and network level. We describe a comprehensive comparison of cancer and neutral missense mutations; by combining features derived from structural and interface properties we have developed a carcinogenicity predictor, InCa (Index of Carcinogenicity). Upon comparison with other methods, we observe that InCa can predict mutations that might not be detected by other methods. We also discuss general limitations shared by all predictors that attempt to predict driver mutations and discuss how this could impact high-throughput predictions. A web interface to a server implementation is publicly available at http://inca.icr.ac.uk/
NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features
UNLABELLED: β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC=0.50, Qtotal=82.1%, sensitivity=75.6%, PPV=68.8% and AUC=0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17-0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. CONCLUSION: The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences
Prediction of peptide and protein propensity for amyloid formation
Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔGº values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation
PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines
Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo
Genotype and functional correlates of disease phenotype in deficiency of adenosine deaminase 2 (DADA2)
BACKGROUND
Deficiency of adenosine deaminase 2 (DADA2) is a syndrome with pleiotropic manifestations including vasculitis and hematologic compromise. A systematic definition of the relationship between ADA2 mutations and clinical phenotype remains unavailable.
OBJECTIVE
We tested whether the impact of ADA2 mutations on enzyme function correlates with clinical presentation.
METHODS
DADA2 patients with severe hematologic manifestations were compared with vasculitis-predominant patients. Enzymatic activity was assessed using expression constructs reflecting all 53 missense, nonsense, insertion and deletion genotypes from 152 patients across the DADA2 spectrum.
RESULTS
We identified DADA2 patients presenting with pure red cell aplasia (PRCA, n = 5) or bone marrow failure syndrome (BMF, n = 10). Most patients did not exhibit features of vasculitis. Recurrent infection, hepatosplenomegaly and gingivitis were common in patients with BMF, of whom half died from infection. Unlike DADA2 patients with vasculitis, patients with PRCA and BMF proved largely refractory to tumor necrosis factor inhibitors. ADA2 variants associated with vasculitis predominantly reflected missense mutations with at least 3% residual enzymatic activity. By contrast, PRCA and BMF were associated with missense mutations with minimal residual enzyme activity, nonsense variants, and insertions / deletions resulting in complete loss of function.
CONCLUSION
Functional interrogation of ADA2 mutations reveals an association of subtotal function loss with vasculitis, typically responsive to TNF blockade, whereas more extensive loss is observed in hematologic disease which may be refractory to treatment. These findings establish a genotype-phenotype spectrum in DADA2
Decreased Level of Nurr1 in Heterozygous Young Adult Mice Leads to Exacerbated Acute and Long-Term Toxicity after Repeated Methamphetamine Exposure
The abuse of psychostimulants, such as methamphetamine (METH), is prevalent in young adults and could lead to long-term adaptations in the midbrain dopamine system in abstinent human METH abusers. Nurr1 is a gene that is critical for the survival and maintenance of dopaminergic neurons and has been implicated in dopaminergic neuron related disorders. In this study, we examined the synergistic effects of repeated early exposure to methamphetamine in adolescence and reduction in Nurr1 gene levels. METH binge exposure in adolescence led to greater damage in the nigrostrial dopaminergic system when mice were exposed to METH binge later in life, suggesting a long-term adverse effect on the dopaminergic system. Compared to naïve mice that received METH binge treatment for the first time, mice pretreated with METH in adolescence showed a greater loss of tyrosine hydroxylase (TH) immunoreactivity in striatum, loss of THir fibers in the substantia nigra reticulata (SNr) as well as decreased dopamine transporter (DAT) level and compromised DA clearance in striatum. These effects were further exacerbated in Nurr1 heterozygous mice. Our data suggest that a prolonged adverse effect exists following adolescent METH binge exposure which may lead to greater damage to the dopaminergic system when exposed to repeated METH later in life. Furthermore, our data support that Nurr1 mutations or deficiency could be a potential genetic predisposition which may lead to higher vulnerability in some individuals
- …