14,638 research outputs found
Inferring gene ontologies from pairwise similarity data.
MotivationWhile the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.MethodsWe consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.ResultsFor task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).ConclusionThis study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data
SANA NetGO: A combinatorial approach to using Gene Ontology (GO) terms to score network alignments
Gene Ontology (GO) terms are frequently used to score alignments between
protein-protein interaction (PPI) networks. Methods exist to measure the GO
similarity between two proteins in isolation, but pairs of proteins in a
network alignment are not isolated: each pairing is implicitly dependent upon
every other pairing via the alignment itself. Current methods fail to take into
account the frequency of GO terms across the networks, and attempt to account
for common GO terms in an ad hoc fashion by imposing arbitrary rules on when to
"allow" GO terms based on their location in the GO hierarchy, rather than using
readily available frequency information in the PPI networks themselves. Here we
develop a new measure, NetGO, that naturally weighs infrequent, informative GO
terms more heavily than frequent, less informative GO terms, without requiring
arbitrary cutoffs. In particular, NetGO down-weights the score of frequent GO
terms according to their frequency in the networks being aligned. This is a
global measure applicable only to alignments, independent of pairwise GO
measures, in the same sense that the edge-based EC or S3 scores are global
measures of topological similarity independent of pairwise topological
similarities. We demonstrate the superiority of NetGO by creating alignments of
predetermined quality based on homologous pairs of nodes and show that NetGO
correlates with alignment quality much better than any existing GO-based
alignment measures. We also demonstrate that NetGO provides a measure of
taxonomic similarity between species, consistent with existing taxonomic
measures--a feature not shared with existing GO-based network alignment
measures. Finally, we re-score alignments produced by almost a dozen aligners
from a previous study and show that NetGO does a better job than existing
measures at separating good alignments from bad ones
Integration of molecular network data reconstructs Gene Ontology.
Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
Using Ontology Fingerprints to evaluate genome-wide association study results
We describe an approach to characterize genes or phenotypes via ontology fingerprints which are composed of Gene Ontology (GO) terms overrepresented among those PubMed abstracts linked to the genes or phenotypes. We then quantify the biological relevance between genes and phenotypes by comparing their ontology fingerprints to calculate a similarity score. We validated this approach by correctly identifying genes belong to their biological pathways with high accuracy, and applied this approach to evaluate GWA study by ranking genes associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, LPL, and CETP for HDL; LDLR, APOE and APOB for LDL; and LPL, APOA1 and APOB for triglyceride. In addition, we identified some top ranked genes linking to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation
Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases
Phenotypes are the observable characteristics of an organism arising from its
response to the environment. Phenotypes associated with engineered and natural
genetic variation are widely recorded using phenotype ontologies in model
organisms, as are signs and symptoms of human Mendelian diseases in databases
such as OMIM and Orphanet. Exploiting these resources, several computational
methods have been developed for integration and analysis of phenotype data to
identify the genetic etiology of diseases or suggest plausible interventions. A
similar resource would be highly useful not only for rare and Mendelian
diseases, but also for common, complex and infectious diseases. We apply a
semantic text- mining approach to identify the phenotypes (signs and symptoms)
associated with over 8,000 diseases. We demonstrate that our method generates
phenotypes that correctly identify known disease-associated genes in mice and
humans with high accuracy. Using a phenotypic similarity measure, we generate a
human disease network in which diseases that share signs and symptoms cluster
together, and we use this network to identify phenotypic disease modules
PTOMSM: A modified version of Topological Overlap Measure used for predicting Protein-Protein Interaction Network
A variety of methods are developed to integrating diverse biological data to predict novel interaction relationship between proteins. However, traditional integration can only generate protein interaction pairs within existing relationships. Therefore, we propose a modified version of Topological Overlap Measure to identify not only extant direct PPIs links, but also novel protein interactions that can be indirectly inferred from various relationships between proteins. Our method is more powerful than a naïve Bayesian-network-based integration in PPI prediction, and could generate more reliable candidate PPIs. Furthermore, we examined the influence of the sizes of training and test datasets on prediction, and further demonstrated the effectiveness of PTOMSM in predicting PPI. More importantly, this method can be extended naturally to predict other types of biological networks, and may be combined with Bayesian method to further improve the prediction
- …