267 research outputs found
Unsupervised Integration of Multiple Protein Disorder Predictors: The Method and Evaluation on CASP7, CASP8 and CASP9 Data
<p>Abstract</p> <p>Background</p> <p>Studies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder have been developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders.</p> <p>Results</p> <p>To more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at CASP7, CASP8, and CASP9 have shown the effectiveness of the proposed algorithm.</p> <p>Conclusions</p> <p>The proposed algorithm can potentially be used to predict protein disorder and provide helpful suggestions on choosing suitable disorder predictors for unknown protein sequences.</p
Intrinsic disorder in putative protein sequences
Abstract — Intrinsically disordered proteins perform a variety of crucial biological functions despite lacking stable tertiary structure under physiological conditions in vitro. State-of-the-art sequence-based predictors of intrinsic disorder are achieving perresidue accuracies over 80%. In a genome-wide study we observed big difference in predicted disorder content between confirmed and putative human proteins, and suspected that this is due to large errors introduced by gene-finding algorithms for putative sequence annotation. To test this hypothesis we trained a predictor to discriminate sequences of real proteins from synthetic sequences that mimic errors of gene finding algorithms. Its application to putative human protein sequences shows that they contain a substantial fraction of incorrectly assigned regions. These regions are predicted to have higher levels of disorder content than correctly assigned regions. Our finding provides first evidence that current practice of predicting disorder content in putative sequences should be reconsidered, as such estimates are biased
Systematic Framework for Integration of Weather Data into Prediction Models for the Electric Grid Outage and Asset Management Applications
This paper describes a Weather Impact Model (WIM) capable of serving a variety of predictive applications ranging from real-time operation and day-ahead operation planning, to asset and outage management. The proposed model is capable of combining various weather parameters into different weather impact features of interest to a specific application. This work focuses on the development of a universal weather impacts model based on the logistic regression embedded in a Geographic Information System (GIS). It is capable of merging massive data sets from historical outage and weather data, to real-time weather forecast and network monitoring measurements, into a feature known as weather hazard probability. The examples of the outage and asset management applications are used to illustrate the model capabilities
- …