2,222 research outputs found
Correlation-based Data Representation
The Dagstuhl Seminar \u27Similarity-based Clustering and its
Application to Medicine and Biology\u27 (07131) held in March 25--30, 2007,
provided an excellent atmosphere for in-depth discussions
about the research frontier of computational methods
for relevant applications of biomedical clustering and beyond.
We address some highlighted issues about correlation-based data
analysis in this seminar postribution.
First, some prominent correlation measures are briefly revisited.
Then, a focus is put on Pearson correlation, because of its
widespread use in biomedical sciences and because of
its analytic accessibility.
A connection to Euclidean distance of z-score transformed
data outlined.
Cost function optimization of correlation-based data representation
is discussed for which, finally, applications to visualization
and clustering of gene expression data are given
Hybrid ACO and SVM algorithm for pattern classification
Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to
solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the
SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parameters’ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while
the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification
accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO
Interactive Segmentation, Uncertainty and Learning
Interactive segmentation is an important paradigm in image processing. To minimize the number of user interactions (“seeds”) required until the result is correct, the computer should actively query the human for input at the most critical locations, in analogy to active learning. These locations are found by means of suitable uncertainty measures. I propose various such measures for the watershed cut algorithm along with a theoretical analysis of some of their properties in Chapter 2.
Furthermore, real-world images often admit many different segmentations that have nearly the same quality according to the underlying energy function. The diversity of these solutions may be a powerful uncertainty indicator.
In Chapter 3 the crucial prerequisite in the context of
seeded segmentation with minimum spanning trees (i.e. edge-weighted watersheds) is provided. Specifically, it is shown how to efficiently enumerate the k smallest spanning trees that result in different segmentations.
Furthermore, I propose a scheme that allows to partition an image into a previously unknown number of segments, using only minimal supervision in terms of a few must-link and cannot-link annotations. The algorithm presented in Chapter 4 makes no use of regional data terms, learning instead what constitutes a likely boundary between segments. Since boundaries are only implicitly specified through cannot-link constraints, this is a hard and nonconvex latent
variable problem. This problem is adressed in a greedy fashion using a randomized decision tree on features associated with interpixel edges. I propose to use a structured purity criterion during tree construction and also show how a backtracking strategy can be used to prevent the greedy search from ending up in poor local optima.
The problem of learning a boundary classifier from sparse user annotations is also considered in Chapter 5. Here the problem is mapped to a multiple instance learning task where positive bags consist of paths on a graph that cross a segmentation boundary and negative bags consist
of paths inside a user scribble.
Multiple instance learning is also the topic of Chapter 6. Here I propose a multiple instance learning algorithm based on randomized decision trees. Experiments on the typical benchmark data sets show that this model’s prediction performance is clearly better than earlier tree based
methods, and is only slightly below that of more expensive methods.
Finally, a flow graph based computation library is discussed in Chapter 7. The presented
library is used as the backend in a interactive learning and segmentation toolkit and supports a
rich set of notification mechanisms for the interaction with a graphical user interface
A Survey on Compiler Autotuning using Machine Learning
Since the mid-1990s, researchers have been trying to use machine-learning
based approaches to solve a number of different compiler optimization problems.
These techniques primarily enhance the quality of the obtained results and,
more importantly, make it feasible to tackle two main compiler optimization
problems: optimization selection (choosing which optimizations to apply) and
phase-ordering (choosing the order of applying optimizations). The compiler
optimization space continues to grow due to the advancement of applications,
increasing number of compiler optimizations, and new target architectures.
Generic optimization passes in compilers cannot fully leverage newly introduced
optimizations and, therefore, cannot keep up with the pace of increasing
options. This survey summarizes and classifies the recent advances in using
machine learning for the compiler optimization field, particularly on the two
major problems of (1) selecting the best optimizations and (2) the
phase-ordering of optimizations. The survey highlights the approaches taken so
far, the obtained results, the fine-grain classification among different
approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our
Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated
quarterly here (Send me your new published papers to be added in the
subsequent version) History: Received November 2016; Revised August 2017;
Revised February 2018; Accepted March 2018
A Survey Of Methods For Explaining Black Box Models
In the last years many accurate decision support systems have been
constructed as black boxes, that is as systems that hide their internal logic
to the user. This lack of explanation constitutes both a practical and an
ethical issue. The literature reports many approaches aimed at overcoming this
crucial weakness sometimes at the cost of scarifying accuracy for
interpretability. The applications in which black box decision systems can be
used are various, and each approach is typically developed to provide a
solution for a specific problem and, as a consequence, delineating explicitly
or implicitly its own definition of interpretability and explanation. The aim
of this paper is to provide a classification of the main problems addressed in
the literature with respect to the notion of explanation and the type of black
box system. Given a problem definition, a black box type, and a desired
explanation this survey should help the researcher to find the proposals more
useful for his own work. The proposed classification of approaches to open
black box models should also be useful for putting the many research open
questions in perspective.Comment: This work is currently under review on an international journa
Sparse representation based hyperspectral image compression and classification
Abstract
This thesis presents a research work on applying sparse representation to lossy hyperspectral image
compression and hyperspectral image classification. The proposed lossy hyperspectral image
compression framework introduces two types of dictionaries distinguished by the terms sparse
representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively.
The former is learnt in the spectral domain to exploit the spectral correlations, and the
latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in
hyperspectral images. To alleviate the computational demand of dictionary learning, either a
base dictionary trained offline or an update of the base dictionary is employed in the compression
framework. The proposed compression method is evaluated in terms of different objective
metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including
JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of
both SRSD and MSSD approaches.
For the proposed hyperspectral image classification method, we utilize the sparse coefficients
for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular,
the discriminative character of the sparse coefficients is enhanced by incorporating contextual
information using local mean filters. The classification performance is evaluated and compared
to a number of similar or representative methods. The results show that our approach could outperform
other approaches based on SVM or sparse representation.
This thesis makes the following contributions. It provides a relatively thorough investigation
of applying sparse representation to lossy hyperspectral image compression. Specifically,
it reveals the effectiveness of sparse representation for the exploitation of spectral correlations
in hyperspectral images. In addition, we have shown that the discriminative character of sparse
coefficients can lead to superior performance in hyperspectral image classification.EM201
- …