12,324 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
PTOMSM: A modified version of Topological Overlap Measure used for predicting Protein-Protein Interaction Network
A variety of methods are developed to integrating diverse biological data to predict novel interaction relationship between proteins. However, traditional integration can only generate protein interaction pairs within existing relationships. Therefore, we propose a modified version of Topological Overlap Measure to identify not only extant direct PPIs links, but also novel protein interactions that can be indirectly inferred from various relationships between proteins. Our method is more powerful than a naïve Bayesian-network-based integration in PPI prediction, and could generate more reliable candidate PPIs. Furthermore, we examined the influence of the sizes of training and test datasets on prediction, and further demonstrated the effectiveness of PTOMSM in predicting PPI. More importantly, this method can be extended naturally to predict other types of biological networks, and may be combined with Bayesian method to further improve the prediction
Learning over Knowledge-Base Embeddings for Recommendation
State-of-the-art recommendation algorithms -- especially the collaborative
filtering (CF) based approaches with shallow or deep models -- usually work
with various unstructured information sources for recommendation, such as
textual reviews, visual images, and various implicit or explicit feedbacks.
Though structured knowledge bases were considered in content-based approaches,
they have been largely neglected recently due to the availability of vast
amount of data, and the learning power of many complex models.
However, structured knowledge bases exhibit unique advantages in personalized
recommendation systems. When the explicit knowledge about users and items is
considered for recommendation, the system could provide highly customized
recommendations based on users' historical behaviors. A great challenge for
using knowledge bases for recommendation is how to integrated large-scale
structured and unstructured data, while taking advantage of collaborative
filtering for highly accurate performance. Recent achievements on knowledge
base embedding sheds light on this problem, which makes it possible to learn
user and item representations while preserving the structure of their
relationship with external knowledge. In this work, we propose to reason over
knowledge base embeddings for personalized recommendation. Specifically, we
propose a knowledge base representation learning approach to embed
heterogeneous entities for recommendation. Experimental results on real-world
dataset verified the superior performance of our approach compared with
state-of-the-art baselines
How to access ancient landscapes? Field survey and legacy data integration for research on Greek and Roman settlement patterns in Eastern Sicily
The integration of field survey data from Eastern Sicily (the Plain of Catania) with legacy data avai- lable for the region will expand our knowledge on Mediterranean ancient rural landscapes. With an extent of 430 km2, the area is a perfect case study due to its geographical unity and the number of archaeological projects (both excavations and surveys) carried out within it in recent decades. Indeed, combining data from earlier research projects with new archaeological survey data allows us to conduct a settlement pattern analysis of the project’s study area. Heterogenous datasets have been integrated throu- gh their implementation into a geo-database, featuring the management and integration of topographical units and archaeological entities through semantic relations. Through geospatial data analysis based on the complete gazetteer of archaeological sites (whether sherd scatters, ruins, caves dwellings, tombs or tracks), a new image of rural landscapes for this area of Eastern Sicily from the Greek Archaic to the Late Roman Age can be visualized, beyond the traditional Sicilia frumentaria narrative
GPU-Based Volume Rendering of Noisy Multi-Spectral Astronomical Data
Traditional analysis techniques may not be sufficient for astronomers to make
the best use of the data sets that current and future instruments, such as the
Square Kilometre Array and its Pathfinders, will produce. By utilizing the
incredible pattern-recognition ability of the human mind, scientific
visualization provides an excellent opportunity for astronomers to gain
valuable new insight and understanding of their data, particularly when used
interactively in 3D. The goal of our work is to establish the feasibility of a
real-time 3D monitoring system for data going into the Australian SKA
Pathfinder archive.
Based on CUDA, an increasingly popular development tool, our work utilizes
the massively parallel architecture of modern graphics processing units (GPUs)
to provide astronomers with an interactive 3D volume rendering for
multi-spectral data sets. Unlike other approaches, we are targeting real time
interactive visualization of datasets larger than GPU memory while giving
special attention to data with low signal to noise ratio - two critical aspects
for astronomy that are missing from most existing scientific visualization
software packages. Our framework enables the astronomer to interact with the
geometrical representation of the data, and to control the volume rendering
process to generate a better representation of their datasets.Comment: 4 pages, 1 figure, to appear in the proceedings of ADASS XIX, Oct 4-8
2009, Sapporo, Japan (ASP Conf. Series
- …