2,222 research outputs found

    Correlation-based Data Representation

    Get PDF
    The Dagstuhl Seminar \u27Similarity-based Clustering and its Application to Medicine and Biology\u27 (07131) held in March 25--30, 2007, provided an excellent atmosphere for in-depth discussions about the research frontier of computational methods for relevant applications of biomedical clustering and beyond. We address some highlighted issues about correlation-based data analysis in this seminar postribution. First, some prominent correlation measures are briefly revisited. Then, a focus is put on Pearson correlation, because of its widespread use in biomedical sciences and because of its analytic accessibility. A connection to Euclidean distance of z-score transformed data outlined. Cost function optimization of correlation-based data representation is discussed for which, finally, applications to visualization and clustering of gene expression data are given

    Hybrid ACO and SVM algorithm for pattern classification

    Get PDF
    Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parameters’ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO

    Interactive Segmentation, Uncertainty and Learning

    Get PDF
    Interactive segmentation is an important paradigm in image processing. To minimize the number of user interactions (“seeds”) required until the result is correct, the computer should actively query the human for input at the most critical locations, in analogy to active learning. These locations are found by means of suitable uncertainty measures. I propose various such measures for the watershed cut algorithm along with a theoretical analysis of some of their properties in Chapter 2. Furthermore, real-world images often admit many different segmentations that have nearly the same quality according to the underlying energy function. The diversity of these solutions may be a powerful uncertainty indicator. In Chapter 3 the crucial prerequisite in the context of seeded segmentation with minimum spanning trees (i.e. edge-weighted watersheds) is provided. Specifically, it is shown how to efficiently enumerate the k smallest spanning trees that result in different segmentations. Furthermore, I propose a scheme that allows to partition an image into a previously unknown number of segments, using only minimal supervision in terms of a few must-link and cannot-link annotations. The algorithm presented in Chapter 4 makes no use of regional data terms, learning instead what constitutes a likely boundary between segments. Since boundaries are only implicitly specified through cannot-link constraints, this is a hard and nonconvex latent variable problem. This problem is adressed in a greedy fashion using a randomized decision tree on features associated with interpixel edges. I propose to use a structured purity criterion during tree construction and also show how a backtracking strategy can be used to prevent the greedy search from ending up in poor local optima. The problem of learning a boundary classifier from sparse user annotations is also considered in Chapter 5. Here the problem is mapped to a multiple instance learning task where positive bags consist of paths on a graph that cross a segmentation boundary and negative bags consist of paths inside a user scribble. Multiple instance learning is also the topic of Chapter 6. Here I propose a multiple instance learning algorithm based on randomized decision trees. Experiments on the typical benchmark data sets show that this model’s prediction performance is clearly better than earlier tree based methods, and is only slightly below that of more expensive methods. Finally, a flow graph based computation library is discussed in Chapter 7. The presented library is used as the backend in a interactive learning and segmentation toolkit and supports a rich set of notification mechanisms for the interaction with a graphical user interface

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    A Survey Of Methods For Explaining Black Box Models

    Get PDF
    In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.Comment: This work is currently under review on an international journa

    Sparse representation based hyperspectral image compression and classification

    Get PDF
    Abstract This thesis presents a research work on applying sparse representation to lossy hyperspectral image compression and hyperspectral image classification. The proposed lossy hyperspectral image compression framework introduces two types of dictionaries distinguished by the terms sparse representation spectral dictionary (SRSD) and multi-scale spectral dictionary (MSSD), respectively. The former is learnt in the spectral domain to exploit the spectral correlations, and the latter in wavelet multi-scale spectral domain to exploit both spatial and spectral correlations in hyperspectral images. To alleviate the computational demand of dictionary learning, either a base dictionary trained offline or an update of the base dictionary is employed in the compression framework. The proposed compression method is evaluated in terms of different objective metrics, and compared to selected state-of-the-art hyperspectral image compression schemes, including JPEG 2000. The numerical results demonstrate the effectiveness and competitiveness of both SRSD and MSSD approaches. For the proposed hyperspectral image classification method, we utilize the sparse coefficients for training support vector machine (SVM) and k-nearest neighbour (kNN) classifiers. In particular, the discriminative character of the sparse coefficients is enhanced by incorporating contextual information using local mean filters. The classification performance is evaluated and compared to a number of similar or representative methods. The results show that our approach could outperform other approaches based on SVM or sparse representation. This thesis makes the following contributions. It provides a relatively thorough investigation of applying sparse representation to lossy hyperspectral image compression. Specifically, it reveals the effectiveness of sparse representation for the exploitation of spectral correlations in hyperspectral images. In addition, we have shown that the discriminative character of sparse coefficients can lead to superior performance in hyperspectral image classification.EM201
    • …
    corecore