19 research outputs found

    Pairwise meta-rules for better meta-learning-based algorithm ranking

    Get PDF
    In this paper, we present a novel meta-feature generation method in the context of meta-learning, which is based on rules that compare the performance of individual base learners in a one-against-one manner. In addition to these new meta-features, we also introduce a new meta-learner called Approximate Ranking Tree Forests (ART Forests) that performs very competitively when compared with several state-of-the-art meta-learners. Our experimental results are based on a large collection of datasets and show that the proposed new techniques can improve the overall performance of meta-learning for algorithm ranking significantly. A key point in our approach is that each performance figure of any base learner for any specific dataset is generated by optimising the parameters of the base learner separately for each dataset

    Towards Meta-learning over Data Streams

    Get PDF
    Modern society produces vast streams of data. Many stream mining algorithms have been developed to capture general trends in these streams, and make predictions for future observations, but relatively little is known about which algorithms perform particularly well on which kinds of data. Moreover, it is possible that the characteristics of the data change over time, and thus that a different algorithm should be recommended at various points in time. Figure 1 illustrates this. As such, we are dealing with the Algorithm Selection Problem [9] in a data stream setting. Based on measurable meta-features from a window of observations from a data stream, a meta-algorithm is built that predicts the best classifier for the next window. Our results show that this meta-algorithm is competitive with state-of-the art data streaming ensembles, such as OzaBag [6], OzaBoost [6] and Leveraged Bagging [3]

    Toward Optimal Run Racing: Application to Deep Learning Calibration

    Full text link
    This paper aims at one-shot learning of deep neural nets, where a highly parallel setting is considered to address the algorithm calibration problem - selecting the best neural architecture and learning hyper-parameter values depending on the dataset at hand. The notoriously expensive calibration problem is optimally reduced by detecting and early stopping non-optimal runs. The theoretical contribution regards the optimality guarantees within the multiple hypothesis testing framework. Experimentations on the Cifar10, PTB and Wiki benchmarks demonstrate the relevance of the approach with a principled and consistent improvement on the state of the art with no extra hyper-parameter

    Mesurer la proximité entre corpus par de nouveaux méta-descripteurs

    Get PDF
    Devant le nombre d'algorithmes de classification existants, trouver l'algorithme qui sera le plus adapté pour classer un corpus de documents est une tâche difficile. La métaclassification apparaît aujourd'hui très utile pour aider à déterminer, en fonction des expériences passées, quel devrait être l'algorithme le plus pertinent par rapport à notre corpus. L'idée sous jacente est que "si un algorithme s'est montré particulièrement adapté pour un corpus, il devrait avoir le même comportement sur un corpus assez similaire". Dans cet article, nous proposons de nouveaux méta-descripteurs reposant sur les notions de similarités pour améliorer l'étape de méta-classification. Les expérimentations menées sur différents jeux de données réelles montrent la pertinence de nos nouveaux descripteurs. (Résumé d'auteur

    Une approche par dissimilarité pour la caractérisation de jeux de données

    Get PDF
    La caractérisation de jeu de données reste un verrou majeur de l'analyse de données intelligente. Une majorité d'approches à ce problème agrègent les informations décrivant les attributs individuels des jeux de données, ce qui représente une perte d'information. Nous proposons une approche par dissimilarité permettant d'éviter cette agrégation, et étudions son intérêt dans la caractérisation des performances d'algorithmes de classifications, et dans la résolution de problèmes de méta-apprentissage

    Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams

    Get PDF
    Ensembles of classifiers are among the best performing classifiers available in many data mining applications. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. In this paper, we study the use of heterogeneous ensembles, comprised of fundamentally different model types. Heterogeneous ensembles have proven successful in the classical batch data setting, however they do not easily transfer to the data stream setting. We therefore introduce the Online Performance Estimation framework, which can be used in data stream ensembles to weight the votes of (heterogeneous) ensemble members differently across the stream. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging. All experimental results from this work are easily reproducible and publicly available on OpenML for further analysis

    A Recommendation System for Meta-modeling: A Meta-learning Based Approach

    Get PDF
    Various meta-modeling techniques have been developed to replace computationally expensive simulation models. The performance of these meta-modeling techniques on different models is varied which makes existing model selection/recommendation approaches (e.g., trial-and-error, ensemble) problematic. To address these research gaps, we propose a general meta-modeling recommendation system using meta-learning which can automate the meta-modeling recommendation process by intelligently adapting the learning bias to problem characterizations. The proposed intelligent recommendation system includes four modules: (1) problem module, (2) meta-feature module which includes a comprehensive set of meta-features to characterize the geometrical properties of problems, (3) meta-learner module which compares the performance of instance-based and model-based learning approaches for optimal framework design, and (4) performance evaluation module which introduces two criteria, Spearman\u27s ranking correlation coefficient and hit ratio, to evaluate the system on the accuracy of model ranking prediction and the precision of the best model recommendation, respectively. To further improve the performance of meta-learning for meta-modeling recommendation, different types of feature reduction techniques, including singular value decomposition, stepwise regression and ReliefF, are studied. Experiments show that our proposed framework is able to achieve 94% correlation on model rankings, and a 91% hit ratio on best model recommendation. Moreover, the computational cost of meta-modeling recommendation is significantly reduced from an order of minutes to seconds compared to traditional trial-and-error and ensemble process. The proposed framework can significantly advance the research in meta-modeling recommendation, and can be applied for data-driven system modeling

    Meta Learning Recommendation System for Classification

    Get PDF
    A data driven approach is an emerging paradigm for the handling of analytic problems. In this paradigm the mantra is to let the data speak freely. However, when using machine learning algorithms, the data does not naturally reveal the best or even a good approach for algorithm choice. One method to let the algorithm reveal itself is through the use of Meta Learning, which uses the features of a dataset to determine a useful model to represent the entire dataset. This research proposes an improvement on the meta-model recommendation system by adding classification problems to the candidate problem space with appropriate evaluation metrics for these additional problems. This research predicts the relative performance of six machine learning algorithms using support vector regression with a radial basis function as the meta learner. Six sets of data of various complexity are explored using this recommendation system and at its best, the system recommends the best algorithm 67% of the time and a good algorithm from 67% to 100% of the time depending on how good is defined
    corecore