31,842 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    Functional analysis of BARD1 missense variants in homology-directed repair and damage sensitivity

    Get PDF
    The BARD1 protein, which heterodimerizes with BRCA1, is encoded by a known breast cancer susceptibility gene. While several BARD1 variants have been identified as pathogenic, many more missense variants exist that do not occur frequently enough to assign a clinical risk. In this paper, whole exome sequencing of over 10,000 cancer samples from 33 cancer types identified from somatic mutations and loss of heterozygosity in tumors 76 potentially cancer-associated BARD1 missense and truncation variants. These variants were tested in a functional assay for homology-directed repair (HDR), as HDR deficiencies have been shown to correlate with clinical pathogenicity for BRCA1 variants. From these 76 variants, 4 in the ankyrin repeat domain and 5 in the BRCT domain were found to be non-functional in HDR. Two known benign variants were found to be functional in HDR, and three known pathogenic variants were non-functional, supporting the notion that the HDR assay can be used to predict the clinical risk of BARD1 variants. The identification of HDR-deficient variants in the ankyrin repeat domain indicates there are DNA repair functions associated with this domain that have not been closely examined. In order to examine whether BARD1-associated loss of HDR function results in DNA damage sensitivity, cells expressing non-functional BARD1 variants were treated with ionizing radiation or cisplatin. These cells were found to be more sensitive to DNA damage, and variations in the residual HDR function of non-functional variants did not correlate with variations in sensitivity. These findings improve the understanding of BARD1 functional domains in DNA repair and support that this functional assay is useful for predicting the cancer association of BARD1 variants.</div

    The G protein-coupled receptor heterodimer network (GPCR-HetNet) and its hub components

    Get PDF
    G protein-coupled receptors (GPCRs) oligomerization has emerged as a vital characteristic of receptor structure. Substantial experimental evidence supports the existence of GPCR-GPCR interactions in a coordinated and cooperative manner. However, despite the current development of experimental techniques for large-scale detection of GPCR heteromers, in order to understand their connectivity it is necessary to develop novel tools to study the global heteroreceptor networks. To provide insight into the overall topology of the GPCR heteromers and identify key players, a collective interaction network was constructed. Experimental interaction data for each of the individual human GPCR protomers was obtained manually from the STRING and SCOPUS databases. The interaction data were used to build and analyze the network using Cytoscape software. The network was treated as undirected throughout the study. It is comprised of 156 nodes, 260 edges and has a scale-free topology. Connectivity analysis reveals a significant dominance of intrafamily versus interfamily connections. Most of the receptors within the network are linked to each other by a small number of edges. DRD2, OPRM, ADRB2, AA2AR, AA1R, OPRK, OPRD and GHSR are identified as hubs. In a network representation 10 modules/clusters also appear as a highly interconnected group of nodes. Information on this GPCR network can improve our understanding of molecular integration. GPCR-HetNet has been implemented in Java and is freely available at http://www.iiia.csic.es/similar to ismel/GPCR-Nets/index.html

    A mitochondrial-focused genetic interaction map reveals a scaffold-like complex required for inner membrane organization in mitochondria.

    Get PDF
    To broadly explore mitochondrial structure and function as well as the communication of mitochondria with other cellular pathways, we constructed a quantitative, high-density genetic interaction map (the MITO-MAP) in Saccharomyces cerevisiae. The MITO-MAP provides a comprehensive view of mitochondrial function including insights into the activity of uncharacterized mitochondrial proteins and the functional connection between mitochondria and the ER. The MITO-MAP also reveals a large inner membrane-associated complex, which we term MitOS for mitochondrial organizing structure, comprised of Fcj1/Mitofilin, a conserved inner membrane protein, and five additional components. MitOS physically and functionally interacts with both outer and inner membrane components and localizes to extended structures that wrap around the inner membrane. We show that MitOS acts in concert with ATP synthase dimers to organize the inner membrane and promote normal mitochondrial morphology. We propose that MitOS acts as a conserved mitochondrial skeletal structure that differentiates regions of the inner membrane to establish the normal internal architecture of mitochondria

    bdbms -- A Database Management System for Biological Data

    Full text link
    Biologists are increasingly using databases for storing and managing their data. Biological databases typically consist of a mixture of raw data, metadata, sequences, annotations, and related data obtained from various sources. Current database technology lacks several functionalities that are needed by biological databases. In this paper, we introduce bdbms, an extensible prototype database management system for supporting biological data. bdbms extends the functionalities of current DBMSs to include: (1) Annotation and provenance management including storage, indexing, manipulation, and querying of annotation and provenance as first class objects in bdbms, (2) Local dependency tracking to track the dependencies and derivations among data items, (3) Update authorization to support data curation via content-based authorization, in contrast to identity-based authorization, and (4) New access methods and their supporting operators that support pattern matching on various types of compressed biological data types. This paper presents the design of bdbms along with the techniques proposed to support these functionalities including an extension to SQL. We also outline some open issues in building bdbms.Comment: This article is published under a Creative Commons License Agreement (http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute, display, and perform the work, make derivative works and make commercial use of the work, but, you must attribute the work to the author and CIDR 2007. 3rd Biennial Conference on Innovative Data Systems Research (CIDR) January 710, 2007, Asilomar, California, US
    corecore