18,987 research outputs found
SANA NetGO: A combinatorial approach to using Gene Ontology (GO) terms to score network alignments
Gene Ontology (GO) terms are frequently used to score alignments between
protein-protein interaction (PPI) networks. Methods exist to measure the GO
similarity between two proteins in isolation, but pairs of proteins in a
network alignment are not isolated: each pairing is implicitly dependent upon
every other pairing via the alignment itself. Current methods fail to take into
account the frequency of GO terms across the networks, and attempt to account
for common GO terms in an ad hoc fashion by imposing arbitrary rules on when to
"allow" GO terms based on their location in the GO hierarchy, rather than using
readily available frequency information in the PPI networks themselves. Here we
develop a new measure, NetGO, that naturally weighs infrequent, informative GO
terms more heavily than frequent, less informative GO terms, without requiring
arbitrary cutoffs. In particular, NetGO down-weights the score of frequent GO
terms according to their frequency in the networks being aligned. This is a
global measure applicable only to alignments, independent of pairwise GO
measures, in the same sense that the edge-based EC or S3 scores are global
measures of topological similarity independent of pairwise topological
similarities. We demonstrate the superiority of NetGO by creating alignments of
predetermined quality based on homologous pairs of nodes and show that NetGO
correlates with alignment quality much better than any existing GO-based
alignment measures. We also demonstrate that NetGO provides a measure of
taxonomic similarity between species, consistent with existing taxonomic
measures--a feature not shared with existing GO-based network alignment
measures. Finally, we re-score alignments produced by almost a dozen aligners
from a previous study and show that NetGO does a better job than existing
measures at separating good alignments from bad ones
The Inferred Cardiogenic Gene Regulatory Network in the Mammalian Heart
Cardiac development is a complex, multiscale process encompassing cell fate adoption, differentiation and morphogenesis. To elucidate pathways underlying this process, a recently developed algorithm to reverse engineer gene regulatory networks was applied to time-course microarray data obtained from the developing mouse heart. Approximately 200 genes of interest were input into the algorithm to generate putative network topologies that are capable of explaining the experimental data via model simulation. To cull specious network interactions, thousands of putative networks are merged and filtered to generate scale-free, hierarchical networks that are statistically significant and biologically relevant. The networks are validated with known gene interactions and used to predict regulatory pathways important for the developing mammalian heart. Area under the precision-recall curve and receiver operator characteristic curve are 9% and 58%, respectively. Of the top 10 ranked predicted interactions, 4 have already been validated. The algorithm is further tested using a network enriched with known interactions and another depleted of them. The inferred networks contained more interactions for the enriched network versus the depleted network. In all test cases, maximum performance of the algorithm was achieved when the purely data-driven method of network inference was combined with a data-independent, functional-based association method. Lastly, the network generated from the list of approximately 200 genes of interest was expanded using gene-profile uniqueness metrics to include approximately 900 additional known mouse genes and to form the most likely cardiogenic gene regulatory network. The resultant network supports known regulatory interactions and contains several novel cardiogenic regulatory interactions. The method outlined herein provides an informative approach to network inference and leads to clear testable hypotheses related to gene regulation
Inferring gene ontologies from pairwise similarity data.
MotivationWhile the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.MethodsWe consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.ResultsFor task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).ConclusionThis study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data
NET-GE: a novel NETwork-based Gene Enrichment for detecting biological processes associated to Mendelian diseases
Enrichment analysis is a widely applied procedure for shedding light on the molecular mechanisms and functions at the basis of phenotypes, for enlarging the dataset of possibly related genes/proteins and for helping interpretation and prioritization of newly determined variations. Several standard and Network-based enrichment methods are available. Both approaches rely on the annotations that characterize the genes/proteins included in the input set; network based ones also include in different ways physical and functional relationships among different genes or proteins that can be extracted from the available biological networks of interactions
Analysis of the human diseasome reveals phenotype modules across common, genetic, and infectious diseases
Phenotypes are the observable characteristics of an organism arising from its
response to the environment. Phenotypes associated with engineered and natural
genetic variation are widely recorded using phenotype ontologies in model
organisms, as are signs and symptoms of human Mendelian diseases in databases
such as OMIM and Orphanet. Exploiting these resources, several computational
methods have been developed for integration and analysis of phenotype data to
identify the genetic etiology of diseases or suggest plausible interventions. A
similar resource would be highly useful not only for rare and Mendelian
diseases, but also for common, complex and infectious diseases. We apply a
semantic text- mining approach to identify the phenotypes (signs and symptoms)
associated with over 8,000 diseases. We demonstrate that our method generates
phenotypes that correctly identify known disease-associated genes in mice and
humans with high accuracy. Using a phenotypic similarity measure, we generate a
human disease network in which diseases that share signs and symptoms cluster
together, and we use this network to identify phenotypic disease modules
Automated Generation of Cross-Domain Analogies via Evolutionary Computation
Analogy plays an important role in creativity, and is extensively used in
science as well as art. In this paper we introduce a technique for the
automated generation of cross-domain analogies based on a novel evolutionary
algorithm (EA). Unlike existing work in computational analogy-making restricted
to creating analogies between two given cases, our approach, for a given case,
is capable of creating an analogy along with the novel analogous case itself.
Our algorithm is based on the concept of "memes", which are units of culture,
or knowledge, undergoing variation and selection under a fitness measure, and
represents evolving pieces of knowledge as semantic networks. Using a fitness
function based on Gentner's structure mapping theory of analogies, we
demonstrate the feasibility of spontaneously generating semantic networks that
are analogous to a given base network.Comment: Conference submission, International Conference on Computational
Creativity 2012 (8 pages, 6 figures
- …