115 research outputs found

    Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation

    Get PDF
    Gene-set enrichment analysis is a useful technique to help functionally characterize large gene lists, such as the results of gene expression experiments. This technique finds functionally coherent gene-sets, such as pathways, that are statistically over-represented in a given gene list. Ideally, the number of resulting sets is smaller than the number of genes in the list, thus simplifying interpretation. However, the increasing number and redundancy of gene-sets used by many current enrichment analysis software works against this ideal.To overcome gene-set redundancy and help in the interpretation of large gene lists, we developed “Enrichment Map”, a network-based visualization method for gene-set enrichment results. Gene-sets are organized in a network, where each set is a node and edges represent gene overlap between sets. Automated network layout groups related gene-sets into network clusters, enabling the user to quickly identify the major enriched functional themes and more easily interpret the enrichment results.)

    The human genome and drug discovery after a decade. Roads (still) not taken

    Full text link
    The draft sequence of the human genome became available almost a decade ago but the encoded proteome is not being explored to its fullest. Our bibliometric analysis of several large protein families, including those known to be "druggable", reveals that, even today, most papers focus on proteins that were known prior to 2000. It is evident that one or more aspects of the biomedical research system severely limits the exploration of the proteins in the 'dark matter' of the proteome, despite unbiased genetic approaches that have pointed to their functional relevance. It is perhaps not surprising that relatively few genome-derived targets have led to approved drugs.Comment: 14 pages, 5 figure

    Mammary molecular portraits reveal lineage-specific features and progenitor cell vulnerabilities.

    Get PDF
    The mammary epithelium depends on specific lineages and their stem and progenitor function to accommodate hormone-triggered physiological demands in the adult female. Perturbations of these lineages underpin breast cancer risk, yet our understanding of normal mammary cell composition is incomplete. Here, we build a multimodal resource for the adult gland through comprehensive profiling of primary cell epigenomes, transcriptomes, and proteomes. We define systems-level relationships between chromatin-DNA-RNA-protein states, identify lineage-specific DNA methylation of transcription factor binding sites, and pinpoint proteins underlying progesterone responsiveness. Comparative proteomics of estrogen and progesterone receptor-positive and -negative cell populations, extensive target validation, and drug testing lead to discovery of stem and progenitor cell vulnerabilities. Top epigenetic drugs exert cytostatic effects; prevent adult mammary cell expansion, clonogenicity, and mammopoiesis; and deplete stem cell frequency. Select drugs also abrogate human breast progenitor cell activity in normal and high-risk patient samples. This integrative computational and functional study provides fundamental insight into mammary lineage and stem cell biology

    SeqHound: biological sequence and structure database as a platform for bioinformatics research

    Get PDF
    BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit

    Structure-Templated Predictions of Novel Protein Interactions from Sequence Information

    Get PDF
    The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information

    WordCloud: a Cytoscape plugin to create a visual semantic summary of networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>When biological networks are studied, it is common to look for clusters, i.e. sets of nodes that are highly inter-connected. To understand the biological meaning of a cluster, the user usually has to sift through many textual annotations that are associated with biological entities.</p> <p>Findings</p> <p>The WordCloud Cytoscape plugin generates a visual summary of these annotations by displaying them as a tag cloud, where more frequent words are displayed using a larger font size. Word co-occurrence in a phrase can be visualized by arranging words in clusters or as a network.</p> <p>Conclusions</p> <p>WordCloud provides a concise visual summary of annotations which is helpful for network analysis and interpretation. WordCloud is freely available at <url>http://baderlab.org/Software/WordCloudPlugin</url></p

    netDx: Software for building interpretable patient classifiers by multi-'omic data integration using patient similarity networks

    Get PDF
    Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data \u2013 a common problem in real-world data \u2013 without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits

    The Biomolecular Interaction Network Database in PSI-MI 2.5

    Get PDF
    The Biomolecular Interaction Network Database (BIND) is a major source of curated biomolecular interactions, which has been unmaintained for the last few years, a trend which will eventually result in the loss of a significant amount of unique biomolecular interaction information, mostly as database identifiers become out of date. To help reverse this trend, we converted BIND to a standard format, Proteomics Standard Initiative-Molecular Interaction 2.5, starting from the last curated data release (from 2005) available in a custom XML format and made the core components (interactions and complexes) plus additional valuable curated information available for download (http://download.baderlab.org/BINDTranslation/). Major work during the conversion process was required to update out of date molecule identifiers resulting in a more comprehensive conversion of BIND, by measures including number of species and interactor types covered, than what is currently accessible elsewhere. This work also highlights issues of data modeling, controlled vocabulary adoption and data cleaning that can serve as a general case study on the future compatibility of interaction databases

    Developing a confidence measure for protein-protein interactions in Saccharomyces cerevisiae

    No full text
    Advances in high throughput techniques for the generation of protein-protein interactions have produced a wealth of data. Inherent to these techniques is a large amount of false positive results necessitating careful analysis of the data. In order to pursue further analysis of the interaction data without propagating false positive data into subsequent studies it is vital that researchers are able to filter interaction data to retrieve only the high quality records. Taking a subset of curated yeast interactions from BIND, Dip, MINT and IntAct as a high confidence set of interactions we collated a set of features that characterize the interaction. A randomly generated protein interaction subset serves as our low confidence or negative set. We developed a probabilistic framework for assigning confidence to binary protein interactions by training an SVM to deduce the likelihood, or confidence score, of a binary interaction based on its set of features.M.Sc
    corecore