2,808 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Dataremix: Aesthetic Experiences of Big Data and Data Abstraction

    Get PDF
    This PhD by published work expands on the contribution to knowledge in two recent large-scale transdisciplinary artistic research projects: ATLAS in silico and INSTRUMENT | One Antarctic Night and their exhibited and published outputs. The thesis reflects upon this practice-based artistic research that interrogates data abstraction: the digitization, datafication and abstraction of culture and nature, as vast and abstract digital data. The research is situated in digital arts practices that engage a combination of big (scientific) data as artistic material, embodied interaction in virtual environments, and poetic recombination. A transdisciplinary and collaborative artistic practice, x-resonance, provides a framework for the hybrid processes, outcomes, and contributions to knowledge from the research. These are purposefully and productively situated at the objective | subjective interface, have potential to convey multiple meanings simultaneously to a variety of audiences and resist disciplinary definition. In the course of the research, a novel methodology emerges, dataremix, which is employed and iteratively evolved through artistic practice to address the research questions: 1) How can a visceral and poetic experience of data abstraction be created? and 2) How would one go about generating an artistically-informed (scientific) discovery? Several interconnected contributions to knowledge arise through the first research question: creation of representational elements for artistic visualization of big (scientific) data that includes four new forms (genomic calligraphy, algorithmic objects as natural specimens, scalable auditory data signatures, and signal objects); an aesthetic of slowness that contributes an extension to the operative forces in Jevbratt’s inverted sublime of looking down and in to also include looking fast and slow; an extension of Corby’s objective and subjective image consisting of “informational and aesthetic components” to novel virtual environments created from big 3 (scientific) data that extend Davies’ poetic virtual spatiality to poetic objective | subjective generative virtual spaces; and an extension of Seaman’s embodied interactive recombinant poetics through embodied interaction in virtual environments as a recapitulation of scientific (objective) and algorithmic processes through aesthetic (subjective) physical gestures. These contributions holistically combine in the artworks ATLAS in silico and INSTRUMENT | One Antarctic Night to create visceral poetic experiences of big data abstraction. Contributions to knowledge from the first research question develop artworks that are visceral and poetic experiences of data abstraction, and which manifest the objective | subjective through art. Contributions to knowledge from the second research question occur through the process of the artworks functioning as experimental systems in which experiments using analytical tools from the scientific domain are enacted within the process of creation of the artwork. The results are “returned” into the artwork. These contributions are: elucidating differences in DNA helix bending and curvature along regions of gene sequences specified as either introns or exons, revealing nuanced differences in BLAST results in relation to genomics sequence metadata, and cross-correlation of astronomical data to identify putative variable signals from astronomical objects for further scientific evaluation

    Analysis methods for studying the 3D architecture of the genome

    Get PDF

    Big Data Proteogenomics and High Performance Computing: Challenges and Opportunities

    Get PDF
    Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage, transfer, analysis and visualization. Integrating these big data sets (NGS+MS) for proteogenomics studies compounds all of the associated computational problems. Existing sequential algorithms for these proteogenomics datasets analysis are inadequate for big data and high performance computing (HPC) solutions are almost non-existent. The purpose of this paper is to introduce the big data problem of proteogenomics and the associated challenges in analyzing, storing and transferring these data sets. Further, opportunities for high performance computing research community are identified and possible future directions are discussed
    • …
    corecore