15,148 research outputs found
Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphansâ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a âsystems-wideâ functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins
Automated data integration for developmental biological research
In an era exploding with genome-scale data, a major challenge for developmental biologists is how to extract significant clues from these publicly available data to benefit our studies of individual genes, and how to use them to improve our understanding of development at a systems level. Several studies have successfully demonstrated new approaches to classic developmental questions by computationally integrating various genome-wide data sets. Such computational approaches have shown great potential for facilitating research: instead of testing 20,000 genes, researchers might test 200 to the same effect. We discuss the nature and state of this art as it applies to developmental research
Enhancing the functional content of protein interaction networks
Protein interaction networks are a promising type of data for studying
complex biological systems. However, despite the rich information embedded in
these networks, they face important data quality challenges of noise and
incompleteness that adversely affect the results obtained from their analysis.
Here, we explore the use of the concept of common neighborhood similarity
(CNS), which is a form of local structure in networks, to address these issues.
Although several CNS measures have been proposed in the literature, an
understanding of their relative efficacies for the analysis of interaction
networks has been lacking. We follow the framework of graph transformation to
convert the given interaction network into a transformed network corresponding
to a variety of CNS measures evaluated. The effectiveness of each measure is
then estimated by comparing the quality of protein function predictions
obtained from its corresponding transformed network with those from the
original network. Using a large set of S. cerevisiae interactions, and a set of
136 GO terms, we find that several of the transformed networks produce more
accurate predictions than those obtained from the original network. In
particular, the measure proposed here performs particularly well for
this task. Further investigation reveals that the two major factors
contributing to this improvement are the abilities of CNS measures, especially
, to prune out noisy edges and introduce new links between
functionally related proteins
Integration of molecular network data reconstructs Gene Ontology.
Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakersâ yeasts proteinâprotein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
Modelling signaling networks underlying plant defence
Transcriptional reprogramming plays a significant role in governing plant responses to pathogens. The underlying regulatory networks are complex and dynamic, responding to numerous input signals. Most network modelling studies to date have used large-scale expression data sets from public repositories but defence network models with predictive ability have also been inferred from single time series data sets, and sophisticated biological insights generated from focused experiments containing multiple network perturbations. Using multiple network inference methods, or combining network inference with additional data, such as promoter motifs, can enhance the ability of the model to predict gene function or regulatory relationships. Network topology can highlight key signaling components and provides a systems level understanding of plant defence
Strategies for increasing the applicability of biological network inference
The manipulation of cellular state has many promising applications, including stem cell biology and regenerative medicine, biofuel production, and stress resistant crop development. The construction of interaction maps promises to enhance our ability to engineer cellular behavior. Within the last 15 years, many methods have been developed to infer the structure of the gene regulatory interaction map from gene abundance snapshots provided by high-throughput experimental data. However, relatively little research has focused on using gene regulatory network models for the prediction and manipulation of cellular behavior. This dissertation examines and applies strategies to utilize the predictive power of gene network models to guide experimentation and engineering efforts. First, we developed methods to improve gene network models by integrating interaction evidence sources, in order to utilize the full predictive power of the models. Next, we explored the power of networks models to guide experimental efforts through inference and analysis of a regulatory network in the pathogenic fungus Cryptococcus neoformans. Finally, we develop a novel, network-guided algorithm to select genetic interventions for engineering transcriptional state. We apply this method to select intervention strains for improving biofuel production in a mixed glucose-xylose environment. The contributions in this dissertation provide the first thorough examination, systematic application, and quantitative evaluation of the utilization of network models for guiding cellular engineering
- âŠ