43 research outputs found
Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation
Gene-set enrichment analysis is a useful technique to help functionally characterize large gene lists, such as the results of gene expression experiments. This technique finds functionally coherent gene-sets, such as pathways, that are statistically over-represented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of gene-sets used by many current enrichment analysis software works against this ideal.To overcome gene-set redundancy and help in the interpretation of large gene lists, we developed âEnrichment Mapâ, a network-based visualization method for gene-set enrichment results. Gene-sets are organized in a network, where each set is a node and edges represent gene overlap between sets. Automated network layout groups related gene-sets into network clusters, enabling the user to quickly identify the major enriched functional themes and more easily interpret the enrichment results.)
SeqHound: biological sequence and structure database as a platform for bioinformatics research
BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit
Structure-Templated Predictions of Novel Protein Interactions from Sequence Information
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domainâmotif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information
WordCloud: a Cytoscape plugin to create a visual semantic summary of networks
<p>Abstract</p> <p>Background</p> <p>When biological networks are studied, it is common to look for clusters, i.e. sets of nodes that are highly inter-connected. To understand the biological meaning of a cluster, the user usually has to sift through many textual annotations that are associated with biological entities.</p> <p>Findings</p> <p>The WordCloud Cytoscape plugin generates a visual summary of these annotations by displaying them as a tag cloud, where more frequent words are displayed using a larger font size. Word co-occurrence in a phrase can be visualized by arranging words in clusters or as a network.</p> <p>Conclusions</p> <p>WordCloud provides a concise visual summary of annotations which is helpful for network analysis and interpretation. WordCloud is freely available at <url>http://baderlab.org/Software/WordCloudPlugin</url></p
netDx: Software for building interpretable patient classifiers by multi-'omic data integration using patient similarity networks
Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data \u2013 a common problem in real-world data \u2013 without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits
The Biomolecular Interaction Network Database in PSI-MI 2.5
The Biomolecular Interaction Network Database (BIND) is a major source of curated biomolecular interactions, which has been unmaintained for the last few years, a trend which will eventually result in the loss of a significant amount of unique biomolecular interaction information, mostly as database identifiers become out of date. To help reverse this trend, we converted BIND to a standard format, Proteomics Standard Initiative-Molecular Interaction 2.5, starting from the last curated data release (from 2005) available in a custom XML format and made the core components (interactions and complexes) plus additional valuable curated information available for download (http://download.baderlab.org/BINDTranslation/). Major work during the conversion process was required to update out of date molecule identifiers resulting in a more comprehensive conversion of BIND, by measures including number of species and interactor types covered, than what is currently accessible elsewhere. This work also highlights issues of data modeling, controlled vocabulary adoption and data cleaning that can serve as a general case study on the future compatibility of interaction databases
Mammary molecular portraits reveal lineage-specific features and progenitor cell vulnerabilities.
The mammary epithelium depends on specific lineages and their stem and progenitor function to accommodate hormone-triggered physiological demands in the adult female. Perturbations of these lineages underpin breast cancer risk, yet our understanding of normal mammary cell composition is incomplete. Here, we build a multimodal resource for the adult gland through comprehensive profiling of primary cell epigenomes, transcriptomes, and proteomes. We define systems-level relationships between chromatin-DNA-RNA-protein states, identify lineage-specific DNA methylation of transcription factor binding sites, and pinpoint proteins underlying progesterone responsiveness. Comparative proteomics of estrogen and progesterone receptor-positive and -negative cell populations, extensive target validation, and drug testing lead to discovery of stem and progenitor cell vulnerabilities. Top epigenetic drugs exert cytostatic effects; prevent adult mammary cell expansion, clonogenicity, and mammopoiesis; and deplete stem cell frequency. Select drugs also abrogate human breast progenitor cell activity in normal and high-risk patient samples. This integrative computational and functional study provides fundamental insight into mammary lineage and stem cell biology
miR-126 Regulates Distinct Self-Renewal Outcomes in Normal and Malignant Hematopoietic Stem Cells
SummaryTo investigate miRNA function in human acute myeloid leukemia (AML) stem cells (LSC), we generated a prognostic LSC-associated miRNA signature derived from functionally validated subpopulations of AML samples. For one signature miRNA, miR-126, high bioactivity aggregated all in vivo patient sample LSC activity into a single sorted population, tightly coupling miR-126 expression to LSC function. Through functional studies, miR-126 was found to restrain cell cycle progression, prevent differentiation, and increase self-renewal of primary LSC in vivo. Compared with prior results showing miR-126 regulation of normal hematopoietic stem cell (HSC) cycling, these functional stem effects are opposite between LSC and HSC. Combined transcriptome and proteome analysis demonstrates that miR-126 targets the PI3K/AKT/MTOR signaling pathway, preserving LSC quiescence and promoting chemotherapy resistance
Delineation of Two Clinically and Molecularly Distinct Subgroups of Posterior Fossa Ependymoma
Despite the histological similarity of ependymomas from throughout the neuroaxis, the disease likely comprises multiple independent entities, each with a distinct molecular pathogenesis. Transcriptional profiling of two large independent cohorts of ependymoma reveals the existence of two demographically, transcriptionally, genetically, and clinically distinct groups of posterior fossa (PF) ependymomas. Group A patients are younger, have laterally located tumors with a balanced genome, and are much more likely to exhibit recurrence, metastasis at recurrence, and death compared with Group B patients. Identification and optimization of immunohistochemical (IHC) markers for PF ependymoma subgroups allowed validation of our findings on a third independent cohort, using a human ependymoma tissue microarray, and provides a tool for prospective prognostication and stratification of PF ependymoma patients