513 research outputs found

    Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important and yet rather neglected question related to bioinformatics predictions is the estimation of the amount of data that is needed to allow reliable predictions. Bioinformatics predictions are usually validated through a series of figures of merit, like for example sensitivity and precision, and little attention is paid to the fact that their performance may depend on the amount of data used to make the predictions themselves.</p> <p>Results</p> <p>Here I describe a tool, named Fragmented Prediction Performance Plot (FPPP), which monitors the relationship between the prediction reliability and the amount of information underling the prediction themselves. Three examples of FPPPs are presented to illustrate their principal features. In one example, the reliability becomes independent, over a certain threshold, of the amount of data used to predict protein features and the intrinsic reliability of the predictor can be estimated. In the other two cases, on the contrary, the reliability strongly depends on the amount of data used to make the predictions and, thus, the intrinsic reliability of the two predictors cannot be determined. Only in the first example it is thus possible to fully quantify the prediction performance.</p> <p>Conclusion</p> <p>It is thus highly advisable to use FPPPs to determine the performance of any new bioinformatics prediction protocol, in order to fully quantify its prediction power and to allow comparisons between two or more predictors based on different types of data.</p

    PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines

    Get PDF
    Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo

    Novel mutations in the VKORC1 gene of wild rats and mice – a response to 50 years of selection pressure by warfarin?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Coumarin derivatives have been in world-wide use for rodent pest control for more than 50 years. Due to their retarded action as inhibitors of blood coagulation by repression of the vitamin K reductase (VKOR) activity, they are the rodenticides of choice against several species. Resistance to these compounds has been reported for rodent populations from many countries around the world and poses a considerable problem for efficacy of pest control.</p> <p>Results</p> <p>In the present study, we have sequenced the <it>VKORC1 </it>genes of more than 250 rats and mice trapped in anticoagulant-exposed areas from four continents, and identified 18 novel and five published missense mutations, as well as eight neutral sequence variants, in a total of 178 animals. Mutagenesis in <it>VKORC1 </it>cDNA constructs and their recombinant expression revealed that these mutations reduced VKOR activities as compared to the wild-type protein. However, the <it>in vitro </it>enzyme assay used was not suited to convincingly demonstrate the warfarin resistance of all mutant proteins</p> <p>Conclusion</p> <p>Our results corroborate the <it>VKORC1 </it>gene as the main target for spontaneous mutations conferring warfarin resistance. The mechanism(s) of how mutations in the <it>VKORC1 </it>gene mediate insensitivity to coumarins <it>in vivo </it>has still to be elucidated.</p

    Holographic Metamagnetism, Quantum Criticality, and Crossover Behavior

    Full text link
    Using high-precision numerical analysis, we show that 3+1 dimensional gauge theories holographically dual to 4+1 dimensional Einstein-Maxwell-Chern-Simons theory undergo a quantum phase transition in the presence of a finite charge density and magnetic field. The quantum critical theory has dynamical scaling exponent z=3, and is reached by tuning a relevant operator of scaling dimension 2. For magnetic field B above the critical value B_c, the system behaves as a Fermi liquid. As the magnetic field approaches B_c from the high field side, the specific heat coefficient diverges as 1/(B-B_c), and non-Fermi liquid behavior sets in. For B<B_c the entropy density s becomes non-vanishing at zero temperature, and scales according to s \sim \sqrt{B_c - B}. At B=B_c, and for small non-zero temperature T, a new scaling law sets in for which s\sim T^{1/3}. Throughout a small region surrounding the quantum critical point, the ratio s/T^{1/3} is given by a universal scaling function which depends only on the ratio (B-B_c)/T^{2/3}. The quantum phase transition involves non-analytic behavior of the specific heat and magnetization but no change of symmetry. Above the critical field, our numerical results are consistent with those predicted by the Hertz/Millis theory applied to metamagnetic quantum phase transitions, which also describe non-analytic changes in magnetization without change of symmetry. Such transitions have been the subject of much experimental investigation recently, especially in the compound Sr_3 Ru_2 O_7, and we comment on the connections.Comment: 23 pages, 8 figures v2: added ref

    PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.</p> <p>Results</p> <p>Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at <url>http://bioinf.sce.carleton.ca/PCISS</url>. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition.</p> <p>Conclusion</p> <p>Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.</p

    Responses of marine benthic microalgae to elevated CO<inf>2</inf>

    Get PDF
    Increasing anthropogenic CO2 emissions to the atmosphere are causing a rise in pCO2 concentrations in the ocean surface and lowering pH. To predict the effects of these changes, we need to improve our understanding of the responses of marine primary producers since these drive biogeochemical cycles and profoundly affect the structure and function of benthic habitats. The effects of increasing CO2 levels on the colonisation of artificial substrata by microalgal assemblages (periphyton) were examined across a CO2 gradient off the volcanic island of Vulcano (NE Sicily). We show that periphyton communities altered significantly as CO2 concentrations increased. CO2 enrichment caused significant increases in chlorophyll a concentrations and in diatom abundance although we did not detect any changes in cyanobacteria. SEM analysis revealed major shifts in diatom assemblage composition as CO2 levels increased. The responses of benthic microalgae to rising anthropogenic CO2 emissions are likely to have significant ecological ramifications for coastal systems. © 2011 Springer-Verlag

    Gene Function Classification Using Bayesian Models with Hierarchy-Based Priors

    Get PDF
    We investigate the application of hierarchical classification schemes to the annotation of gene function based on several characteristics of protein sequences including phylogenic descriptors, sequence based attributes, and predicted secondary structure. We discuss three Bayesian models and compare their performance in terms of predictive accuracy. These models are the ordinary multinomial logit (MNL) model, a hierarchical model based on a set of nested MNL models, and a MNL model with a prior that introduces correlations between the parameters for classes that are nearby in the hierarchy. We also provide a new scheme for combining different sources of information. We use these models to predict the functional class of Open Reading Frames (ORFs) from the E. coli genome. The results from all three models show substantial improvement over previous methods, which were based on the C5 algorithm. The MNL model using a prior based on the hierarchy outperforms both the non-hierarchical MNL model and the nested MNL model. In contrast to previous attempts at combining these sources of information, our approach results in a higher accuracy rate when compared to models that use each data source alone. Together, these results show that gene function can be predicted with higher accuracy than previously achieved, using Bayesian models that incorporate suitable prior information

    Organizational factors and depression management in community-based primary care settings

    Get PDF
    Abstract Background Evidence-based quality improvement models for depression have not been fully implemented in routine primary care settings. To date, few studies have examined the organizational factors associated with depression management in real-world primary care practice. To successfully implement quality improvement models for depression, there must be a better understanding of the relevant organizational structure and processes of the primary care setting. The objective of this study is to describe these organizational features of routine primary care practice, and the organization of depression care, using survey questions derived from an evidence-based framework. Methods We used this framework to implement a survey of 27 practices comprised of 49 unique offices within a large primary care practice network in western Pennsylvania. Survey questions addressed practice structure (e.g., human resources, leadership, information technology (IT) infrastructure, and external incentives) and process features (e.g., staff performance, degree of integrated depression care, and IT performance). Results The results of our survey demonstrated substantial variation across the practice network of organizational factors pertinent to implementation of evidence-based depression management. Notably, quality improvement capability and IT infrastructure were widespread, but specific application to depression care differed between practices, as did coordination and communication tasks surrounding depression treatment. Conclusions The primary care practices in the network that we surveyed are at differing stages in their organization and implementation of evidence-based depression management. Practical surveys such as this may serve to better direct implementation of these quality improvement strategies for depression by improving understanding of the organizational barriers and facilitators that exist within both practices and practice networks. In addition, survey information can inform efforts of individual primary care practices in customizing intervention strategies to improve depression management.http://deepblue.lib.umich.edu/bitstream/2027.42/78269/1/1748-5908-4-84.xmlhttp://deepblue.lib.umich.edu/bitstream/2027.42/78269/2/1748-5908-4-84-S1.PDFhttp://deepblue.lib.umich.edu/bitstream/2027.42/78269/3/1748-5908-4-84.pdfPeer Reviewe

    FLORA: a novel method to predict protein function from structure in diverse superfamilies

    Get PDF
    Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues

    Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

    Get PDF
    Background : Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. Results: Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available. Conclusion: The predictive system are publicly available at the address http://distill.ucd.ieScience Foundation IrelandIrish Research Council for Science, Engineering and TechnologyHealth Research BoardUCD President's Award 2004au, da, ke, ab, sp - kpw30/11/1
    corecore