148 research outputs found

    Genome wide prediction of protein function via a generic knowledge discovery approach based on evidence integration

    Get PDF
    BACKGROUND: The automation of many common molecular biology techniques has resulted in the accumulation of vast quantities of experimental data. One of the major challenges now facing researchers is how to process this data to yield useful information about a biological system (e.g. knowledge of genes and their products, and the biological roles of proteins, their molecular functions, localizations and interaction networks). We present a technique called Global Mapping of Unknown Proteins (GMUP) which uses the Gene Ontology Index to relate diverse sources of experimental data by creation of an abstraction layer of evidence data. This abstraction layer is used as input to a neural network which, once trained, can be used to predict function from the evidence data of unannotated proteins. The method allows us to include almost any experimental data set related to protein function, which incorporates the Gene Ontology, to our evidence data in order to seek relationships between the different sets. RESULTS: We have demonstrated the capabilities of this method in two ways. We first collected various experimental datasets associated with yeast (Saccharomyces cerevisiae) and applied the technique to a set of previously annotated open reading frames (ORFs). These ORFs were divided into training and test sets and were used to examine the accuracy of the predictions made by our method. Then we applied GMUP to previously un-annotated ORFs and made 1980, 836 and 1969 predictions corresponding to the GO Biological Process, Molecular Function and Cellular Component sub-categories respectively. We found that GMUP was particularly successful at predicting ORFs with functions associated with the ribonucleoprotein complex, protein metabolism and transportation. CONCLUSION: This study presents a global and generic gene knowledge discovery approach based on evidence integration of various genome-scale data. It can be used to provide insight as to how certain biological processes are implemented by interaction and coordination of proteins, which may serve as a guide for future analysis. New data can be readily incorporated as it becomes available to provide more reliable predictions or further insights into processes and interactions

    An Improved, Bias-Reduced Probabilistic Functional Gene Network of Baker's Yeast, Saccharomyces cerevisiae

    Get PDF
    Background: Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. Methodology/Principal Findings: We report a significantly improved version (v. 2) of a probabilistic functional gene network [1] of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. Conclusions/Significance: YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.This work was supported by grants from the N.S.F. (IIS-0325116, EIA-0219061), N.I.H. (GM06779-01,GM076536-01), Welch (F-1515), and a Packard Fellowship (EMM). These agencies were not involved in the design and conduct of the study, in the collection, analysis, and interpretation of the data, or in the preparation, review, or approval of the manuscript.Cellular and Molecular Biolog

    Speed, Variability, and Timing of Motor Output in ADHD: Which Measures are Useful for Endophenotypic Research?

    Get PDF
    Attention-Deficit/Hyperactivity Disorder (ADHD) shares a genetic basis with motor coordination problems and probably motor timing problems. In line with this, comparable problems in motor timing should be observed in first degree relatives and might, therefore, form a suitable endophenotypic candidate. This hypothesis was investigated in 238 ADHD-families (545 children) and 147 control-families (271 children). A motor timing task was administered, in which children had to produce a 1,000 ms interval. In addition to this task, two basic motor tasks were administered to examine speed and variability of motor output, when no timing component was required. Results indicated that variability in motor timing is a useful endophenotypic candidate: It was clearly associated with ADHD, it was also present in non-affected siblings, and it correlated within families. Accuracy (under- versus over-production) in motor timing appeared less useful: Even though accuracy was associated with ADHD (probands and affected siblings had a tendency to under-produce the 1,000 ms interval compared to controls), non-affected siblings did not differ from controls and sibling correlations were only marginally significant. Slow and variable motor output without timing component also appears present in ADHD, but not in non-affected siblings, suggesting these deficits not to be related to a familial vulnerability for ADHD. Deficits in motor timing could not be explained by deficits already present in basic motor output without a timing component. This suggests abnormalities in motor timing were predominantly related to deficient motor timing processes and not to general deficient motor functioning. The finding that deficits in motor timing run in ADHD-families suggests this to be a fruitful domain for further exploration in relation to the genetic underpinnings of ADHD

    Accounting for Redundancy when Integrating Gene Interaction Databases

    Get PDF
    During the last years gene interaction networks are increasingly being used for the assessment and interpretation of biological measurements. Knowledge of the interaction partners of an unknown protein allows scientists to understand the complex relationships between genetic products, helps to reveal unknown biological functions and pathways, and get a more detailed picture of an organism's complexity. Being able to measure all protein interactions under all relevant conditions is virtually impossible. Hence, computational methods integrating different datasets for predicting gene interactions are needed. However, when integrating different sources one has to account for the fact that some parts of the information may be redundant, which may lead to an overestimation of the true likelihood of an interaction. Our method integrates information derived from three different databases (Bioverse, HiMAP and STRING) for predicting human gene interactions. A Bayesian approach was implemented in order to integrate the different data sources on a common quantitative scale. An important assumption of the Bayesian integration is independence of the input data (features). Our study shows that the conditional dependency cannot be ignored when combining gene interaction databases that rely on partially overlapping input data. In addition, we show how the correlation structure between the databases can be detected and we propose a linear model to correct for this bias. Benchmarking the results against two independent reference data sets shows that the integrated model outperforms the individual datasets. Our method provides an intuitive strategy for weighting the different features while accounting for their conditional dependencies

    Hubs with Network Motifs Organize Modularity Dynamically in the Protein-Protein Interaction Network of Yeast

    Get PDF
    BACKGROUND: It has been recognized that modular organization pervades biological complexity. Based on network analysis, 'party hubs' and 'date hubs' were proposed to understand the basic principle of module organization of biomolecular networks. However, recent study on hubs has suggested that there is no clear evidence for coexistence of 'party hubs' and 'date hubs'. Thus, an open question has been raised as to whether or not 'party hubs' and 'date hubs' truly exist in yeast interactome. METHODOLOGY: In contrast to previous studies focusing on the partners of a hub or the individual proteins around the hub, our work aims to study the network motifs of a hub or interactions among individual proteins including the hub and its neighbors. Depending on the relationship between a hub's network motifs and protein complexes, we define two new types of hubs, 'motif party hubs' and 'motif date hubs', which have the same characteristics as the original 'party hubs' and 'date hubs' respectively. The network motifs of these two types of hubs display significantly different features in spatial distribution (or cellular localizations), co-expression in microarray data, controlling topological structure of network, and organizing modularity. CONCLUSION: By virtue of network motifs, we basically solved the open question about 'party hubs' and 'date hubs' which was raised by previous studies. Specifically, at the level of network motifs instead of individual proteins, we found two types of hubs, motif party hubs (mPHs) and motif date hubs (mDHs), whose network motifs display distinct characteristics on biological functions. In addition, in this paper we studied network motifs from a different viewpoint. That is, we show that a network motif should not be merely considered as an interaction pattern but be considered as an essential function unit in organizing modules of networks

    Biological Process Linkage Networks

    Get PDF
    BACKGROUND. The traditional approach to studying complex biological networks is based on the identification of interactions between internal components of signaling or metabolic pathways. By comparison, little is known about interactions between higher order biological systems, such as biological pathways and processes. We propose a methodology for gleaning patterns of interactions between biological processes by analyzing protein-protein interactions, transcriptional co-expression and genetic interactions. At the heart of the methodology are the concept of Linked Processes and the resultant network of biological processes, the Process Linkage Network (PLN). RESULTS. We construct, catalogue, and analyze different types of PLNs derived from different data sources and different species. When applied to the Gene Ontology, many of the resulting links connect processes that are distant from each other in the hierarchy, even though the connection makes eminent sense biologically. Some others, however, carry an element of surprise and may reflect mechanisms that are unique to the organism under investigation. In this aspect our method complements the link structure between processes inherent in the Gene Ontology, which by its very nature is species-independent. As a practical application of the linkage of processes we demonstrate that it can be effectively used in protein function prediction, having the power to increase both the coverage and the accuracy of predictions, when carefully integrated into prediction methods. CONCLUSIONS. Our approach constitutes a promising new direction towards understanding the higher levels of organization of the cell as a system which should help current efforts to re-engineer ontologies and improve our ability to predict which proteins are involved in specific biological processes.Lynn and William Frankel Center for Computer Science; the Paul Ivanier center for robotics research and production; National Science Foundation (ITR-048715); National Human Genome Research Institute (1R33HG002850-01A1, R01 HG003367-01A1); National Institute of Health (U54 LM008748

    Structure-Based Rational Design of a Toll-like Receptor 4 (TLR4) Decoy Receptor with High Binding Affinity for a Target Protein

    Get PDF
    Repeat proteins are increasingly attracting much attention as alternative scaffolds to immunoglobulin antibodies due to their unique structural features. Nonetheless, engineering interaction interface and understanding molecular basis for affinity maturation of repeat proteins still remain a challenge. Here, we present a structure-based rational design of a repeat protein with high binding affinity for a target protein. As a model repeat protein, a Toll-like receptor4 (TLR4) decoy receptor composed of leucine-rich repeat (LRR) modules was used, and its interaction interface was rationally engineered to increase the binding affinity for myeloid differentiation protein 2 (MD2). Based on the complex crystal structure of the decoy receptor with MD2, we first designed single amino acid substitutions in the decoy receptor, and obtained three variants showing a binding affinity (KD) one-order of magnitude higher than the wild-type decoy receptor. The interacting modes and contributions of individual residues were elucidated by analyzing the crystal structures of the single variants. To further increase the binding affinity, single positive mutations were combined, and two double mutants were shown to have about 3000- and 565-fold higher binding affinities than the wild-type decoy receptor. Molecular dynamics simulations and energetic analysis indicate that an additive effect by two mutations occurring at nearby modules was the major contributor to the remarkable increase in the binding affinities

    Interrogating domain-domain interactions with parsimony based approaches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification and characterization of interacting domain pairs is an important step towards understanding protein interactions. In the last few years, several methods to predict domain interactions have been proposed. Understanding the power and the limitations of these methods is key to the development of improved approaches and better understanding of the nature of these interactions.</p> <p>Results</p> <p>Building on the previously published Parsimonious Explanation method (PE) to predict domain-domain interactions, we introduced a new Generalized Parsimonious Explanation (GPE) method, which (i) adjusts the granularity of the domain definition to the granularity of the input data set and (ii) permits domain interactions to have different costs. This allowed for preferential selection of the so-called "co-occurring domains" as possible mediators of interactions between proteins. The performance of both variants of the parsimony method are competitive to the performance of the top algorithms for this problem even though parsimony methods use less information than some of the other methods. We also examined possible enrichment of co-occurring domains and homo-domains among domain interactions mediating the interaction of proteins in the network. The corresponding study was performed by surveying domain interactions predicted by the GPE method as well as by using a combinatorial counting approach independent of any prediction method. Our findings indicate that, while there is a considerable propensity towards these special domain pairs among predicted domain interactions, this overrepresentation is significantly lower than in the iPfam dataset.</p> <p>Conclusion</p> <p>The Generalized Parsimonious Explanation approach provides a new means to predict and study domain-domain interactions. We showed that, under the assumption that all protein interactions in the network are mediated by domain interactions, there exists a significant deviation of the properties of domain interactions mediating interactions in the network from that of iPfam data.</p

    WNP: A Novel Algorithm for Gene Products Annotation from Weighted Functional Networks

    Get PDF
    Predicting the biological function of all the genes of an organism is one of the fundamental goals of computational system biology. In the last decade, high-throughput experimental methods for studying the functional interactions between gene products (GPs) have been combined with computational approaches based on Bayesian networks for data integration. The result of these computational approaches is an interaction network with weighted links representing connectivity likelihood between two functionally related GPs. The weighted network generated by these computational approaches can be used to predict annotations for functionally uncharacterized GPs. Here we introduce Weighted Network Predictor (WNP), a novel algorithm for function prediction of biologically uncharacterized GPs. Tests conducted on simulated data show that WNP outperforms other 5 state-of-the-art methods in terms of both specificity and sensitivity and that it is able to better exploit and propagate the functional and topological information of the network. We apply our method to Saccharomyces cerevisiae yeast and Arabidopsis thaliana networks and we predict Gene Ontology function for about 500 and 10000 uncharacterized GPs respectively

    Parental rating of sleep in children with attention deficit/hyperactivity disorder

    Full text link
    Objective: Sleep problems have often been associated with attention deficit/hyperactivity disorder (ADHD). Parents of those with ADHD and children with ADHD report sleep difficulties more frequently than healthy children and their parents. The primary objective of this paper is to describe sleep patterns and problems of 5 to 11-year-old children suffering from ADHD as described by parental reports and sleep questionnaires. Method: The study included 321 children aged 5–11 years (average age 8.4 years); 45 were diagnosed with ADHD, 64 had other psychiatric diagnoses, and 212 were healthy. One hundred and ninety-six of the test subjects were boys and 125 were girls. A semi-structured interview (Kiddie-SADS-PL) was used to DSM-IV diagnose ADHD and comorbidity in the clinical group. Sleep difficulties were rated using a structured sleep questionnaire (Children Sleep Behaviour Scale). Results: Children diagnosed with ADHD had a significantly increased occurrence of sleep problems. Difficulties relating to bedtime and unsettled sleep were significantly more frequent in the ADHD group than in the other groups. Children with ADHD showed prolonged sleep onset latency, but no difference was shown regarding numbers of awakenings per night and total sleep time per night. Comorbid oppositional defiant disorder appeared not to have an added effect on problematic behaviour around bedtime. Conclusion: Parents of children with ADHD report that their children do not sleep properly more often than other parents. The ADHD group report problems with bedtime resistance, problems with sleep onset latency, unsettled sleep and nightmares more often than the control groups. It may therefore be relevant for clinicians to initiate a closer examination of those cases reporting sleep difficulties
    corecore