12,324 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    PTOMSM: A modified version of Topological Overlap Measure used for predicting Protein-Protein Interaction Network

    Get PDF
    A variety of methods are developed to integrating diverse biological data to predict novel interaction relationship between proteins. However, traditional integration can only generate protein interaction pairs within existing relationships. Therefore, we propose a modified version of Topological Overlap Measure to identify not only extant direct PPIs links, but also novel protein interactions that can be indirectly inferred from various relationships between proteins. Our method is more powerful than a naïve Bayesian-network-based integration in PPI prediction, and could generate more reliable candidate PPIs. Furthermore, we examined the influence of the sizes of training and test datasets on prediction, and further demonstrated the effectiveness of PTOMSM in predicting PPI. More importantly, this method can be extended naturally to predict other types of biological networks, and may be combined with Bayesian method to further improve the prediction

    Learning over Knowledge-Base Embeddings for Recommendation

    Full text link
    State-of-the-art recommendation algorithms -- especially the collaborative filtering (CF) based approaches with shallow or deep models -- usually work with various unstructured information sources for recommendation, such as textual reviews, visual images, and various implicit or explicit feedbacks. Though structured knowledge bases were considered in content-based approaches, they have been largely neglected recently due to the availability of vast amount of data, and the learning power of many complex models. However, structured knowledge bases exhibit unique advantages in personalized recommendation systems. When the explicit knowledge about users and items is considered for recommendation, the system could provide highly customized recommendations based on users' historical behaviors. A great challenge for using knowledge bases for recommendation is how to integrated large-scale structured and unstructured data, while taking advantage of collaborative filtering for highly accurate performance. Recent achievements on knowledge base embedding sheds light on this problem, which makes it possible to learn user and item representations while preserving the structure of their relationship with external knowledge. In this work, we propose to reason over knowledge base embeddings for personalized recommendation. Specifically, we propose a knowledge base representation learning approach to embed heterogeneous entities for recommendation. Experimental results on real-world dataset verified the superior performance of our approach compared with state-of-the-art baselines

    How to access ancient landscapes? Field survey and legacy data integration for research on Greek and Roman settlement patterns in Eastern Sicily

    Get PDF
    The integration of field survey data from Eastern Sicily (the Plain of Catania) with legacy data avai- lable for the region will expand our knowledge on Mediterranean ancient rural landscapes. With an extent of 430 km2, the area is a perfect case study due to its geographical unity and the number of archaeological projects (both excavations and surveys) carried out within it in recent decades. Indeed, combining data from earlier research projects with new archaeological survey data allows us to conduct a settlement pattern analysis of the project’s study area. Heterogenous datasets have been integrated throu- gh their implementation into a geo-database, featuring the management and integration of topographical units and archaeological entities through semantic relations. Through geospatial data analysis based on the complete gazetteer of archaeological sites (whether sherd scatters, ruins, caves dwellings, tombs or tracks), a new image of rural landscapes for this area of Eastern Sicily from the Greek Archaic to the Late Roman Age can be visualized, beyond the traditional Sicilia frumentaria narrative

    GPU-Based Volume Rendering of Noisy Multi-Spectral Astronomical Data

    Full text link
    Traditional analysis techniques may not be sufficient for astronomers to make the best use of the data sets that current and future instruments, such as the Square Kilometre Array and its Pathfinders, will produce. By utilizing the incredible pattern-recognition ability of the human mind, scientific visualization provides an excellent opportunity for astronomers to gain valuable new insight and understanding of their data, particularly when used interactively in 3D. The goal of our work is to establish the feasibility of a real-time 3D monitoring system for data going into the Australian SKA Pathfinder archive. Based on CUDA, an increasingly popular development tool, our work utilizes the massively parallel architecture of modern graphics processing units (GPUs) to provide astronomers with an interactive 3D volume rendering for multi-spectral data sets. Unlike other approaches, we are targeting real time interactive visualization of datasets larger than GPU memory while giving special attention to data with low signal to noise ratio - two critical aspects for astronomy that are missing from most existing scientific visualization software packages. Our framework enables the astronomer to interact with the geometrical representation of the data, and to control the volume rendering process to generate a better representation of their datasets.Comment: 4 pages, 1 figure, to appear in the proceedings of ADASS XIX, Oct 4-8 2009, Sapporo, Japan (ASP Conf. Series
    • …
    corecore