5,840 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
A Survey on Graph Kernels
Graph kernels have become an established and widely-used technique for
solving classification tasks on graphs. This survey gives a comprehensive
overview of techniques for kernel-based graph classification developed in the
past 15 years. We describe and categorize graph kernels based on properties
inherent to their design, such as the nature of their extracted graph features,
their method of computation and their applicability to problems in practice. In
an extensive experimental evaluation, we study the classification accuracy of a
large suite of graph kernels on established benchmarks as well as new datasets.
We compare the performance of popular kernels with several baseline methods and
study the effect of applying a Gaussian RBF kernel to the metric induced by a
graph kernel. In doing so, we find that simple baselines become competitive
after this transformation on some datasets. Moreover, we study the extent to
which existing graph kernels agree in their predictions (and prediction errors)
and obtain a data-driven categorization of kernels as result. Finally, based on
our experimental results, we derive a practitioner's guide to kernel-based
graph classification
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
Mobile landmark search (MLS) recently receives increasing attention for its
great practical values. However, it still remains unsolved due to two important
challenges. One is high bandwidth consumption of query transmission, and the
other is the huge visual variations of query images sent from mobile devices.
In this paper, we propose a novel hashing scheme, named as canonical view based
discrete multi-modal hashing (CV-DMH), to handle these problems via a novel
three-stage learning procedure. First, a submodular function is designed to
measure visual representativeness and redundancy of a view set. With it,
canonical views, which capture key visual appearances of landmark with limited
redundancy, are efficiently discovered with an iterative mining strategy.
Second, multi-modal sparse coding is applied to transform visual features from
multiple modalities into an intermediate representation. It can robustly and
adaptively characterize visual contents of varied landmark images with certain
canonical views. Finally, compact binary codes are learned on intermediate
representation within a tailored discrete binary embedding model which
preserves visual relations of images measured with canonical views and removes
the involved noises. In this part, we develop a new augmented Lagrangian
multiplier (ALM) based optimization method to directly solve the discrete
binary codes. We can not only explicitly deal with the discrete constraint, but
also consider the bit-uncorrelated constraint and balance constraint together.
Experiments on real world landmark datasets demonstrate the superior
performance of CV-DMH over several state-of-the-art methods
- …