14,148 research outputs found

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery

    Full text link
    We propose a multi-agent algorithm able to automatically discover relevant regularities in a given dataset, determining at the same time the set of configurations of the adopted parametric dissimilarity measure yielding compact and separated clusters. Each agent operates independently by performing a Markovian random walk on a suitable weighted graph representation of the input dataset. Such a weighted graph representation is induced by the specific parameter configuration of the dissimilarity measure adopted by the agent, which searches and takes decisions autonomously for one cluster at a time. Results show that the algorithm is able to discover parameter configurations that yield a consistent and interpretable collection of clusters. Moreover, we demonstrate that our algorithm shows comparable performances with other similar state-of-the-art algorithms when facing specific clustering problems

    Classifier selection with permutation tests

    Get PDF
    This work presents a content-based recommender system for machine learning classifier algorithms. Given a new data set, a recommendation of what classifier is likely to perform best is made based on classifier performance over similar known data sets. This similarity is measured according to a data set characterization that includes several state-of-the-art metrics taking into account physical structure, statistics, and information theory. A novelty with respect to prior work is the use of a robust approach based on permutation tests to directly assess whether a given learning algorithm is able to exploit the attributes in a data set to predict class labels, and compare it to the more commonly used F-score metric for evaluating classifier performance. To evaluate our approach, we have conducted an extensive experimentation including 8 of the main machine learning classification methods with varying configurations and 65 binary data sets, leading to over 2331 experiments. Our results show that using the information from the permutation test clearly improves the quality of the recommendations.Peer ReviewedPostprint (author's final draft

    Clustering Stability: An Overview

    Get PDF
    A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we give a high-level overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications

    Mean-Field Theory of Meta-Learning

    Full text link
    We discuss here the mean-field theory for a cellular automata model of meta-learning. The meta-learning is the process of combining outcomes of individual learning procedures in order to determine the final decision with higher accuracy than any single learning method. Our method is constructed from an ensemble of interacting, learning agents, that acquire and process incoming information using various types, or different versions of machine learning algorithms. The abstract learning space, where all agents are located, is constructed here using a fully connected model that couples all agents with random strength values. The cellular automata network simulates the higher level integration of information acquired from the independent learning trials. The final classification of incoming input data is therefore defined as the stationary state of the meta-learning system using simple majority rule, yet the minority clusters that share opposite classification outcome can be observed in the system. Therefore, the probability of selecting proper class for a given input data, can be estimated even without the prior knowledge of its affiliation. The fuzzy logic can be easily introduced into the system, even if learning agents are build from simple binary classification machine learning algorithms by calculating the percentage of agreeing agents.Comment: 23 page

    How Many Topics? Stability Analysis for Topic Models

    Full text link
    Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the "over-clustering" of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process.Comment: Improve readability of plots. Add minor clarification
    corecore