15 research outputs found

    Échantillonnage progressif guidé pour stabiliser la courbe d'apprentissage

    Get PDF
    National audienceL'un des enjeux de l'apprentissage artificiel est de pouvoir fonctionner avec des volumes de données toujours plus grands. Bien qu'il soit généralement admis que plus un ensemble d'apprentissage est large et plus les résultats sont performants, il existe des limites à la masse d'informations qu'un algorithme d'apprentissage peut manipuler. Pour résoudre ce problème, nous proposons d'améliorer la méthode d'échantillonnage progressif en guidant la construction d'un ensemble d'apprentissage réduit à partir d'un large ensemble de données. L'apprentissage à partir de l'ensemble réduit doit conduire à des performances similaires à l'apprentissage effectué avec l'ensemble complet. Le guidage de l'échantillonnage s'appuie sur une connaissance a priori qui accélère la convergence de l'algorithme. Cette approche présente trois avantages : 1) l'ensemble d'apprentissage réduit est composé des cas les plus représentatifs de l'ensemble complet; 2) la courbe d'apprentissage est stabilisée; 3) la détection de convergence est accélérée. L'application de cette méthode à des données classiques et à des données provenant d'unités de soins intensifs révèle qu'il est possible de réduire de façon significative un ensemble d'apprentissage sans diminuer la performance de l'apprentissage

    Learning Curves Prediction for a Transformers-Based Model

    Get PDF
    One of the main challenges when training or fine-tuning a machine learning model concerns the number of observations necessary to achieve satisfactory performance. While, in general, more training observations result in a better-performing model, collecting more data can be time-consuming, expensive, or even impossible. For this reason, investigating the relationship between the dataset's size and the performance of a machine learning model is fundamental to deciding, with a certain likelihood, the minimum number of observations that are necessary to ensure a satisfactory-performing model is obtained as a result of the training process. The learning curve represents the relationship between the dataset’s size and the performance of the model and is especially useful when choosing a model for a specific task or planning the annotation work of a dataset. Thus, the purpose of this paper is to find the functions that best fit the learning curves of a Transformers-based model (LayoutLM) when fine-tuned to extract information from invoices. Two new datasets of invoices are made available for such a task. Combined with a third dataset already available online, 22 sub-datasets are defined, and their learning curves are plotted based on cross-validation results. The functions are fit using a non-linear least squares technique. The results show that both a bi-asymptotic and a Morgan-Mercer-Flodin function fit the learning curves extremely well. Also, an empirical relation is presented to predict the learning curve from a single parameter that may be easily obtained in the early stage of the annotation process. Doi: 10.28991/ESJ-2023-07-05-03 Full Text: PD

    A Survey of Learning Curves with Bad Behavior: or How More Data Need Not Lead to Better Performance

    Full text link
    Plotting a learner's generalization performance against the training set size results in a so-called learning curve. This tool, providing insight in the behavior of the learner, is also practically valuable for model selection, predicting the effect of more training data, and reducing the computational complexity of training. We set out to make the (ideal) learning curve concept precise and briefly discuss the aforementioned usages of such curves. The larger part of this survey's focus, however, is on learning curves that show that more data does not necessarily leads to better generalization performance. A result that seems surprising to many researchers in the field of artificial intelligence. We point out the significance of these findings and conclude our survey with an overview and discussion of open problems in this area that warrant further theoretical and empirical investigation.Comment: arXiv admin note: substantial text overlap with arXiv:2103.1094

    Time Series Classifier Recommendation by a Meta-Learning Approach

    Get PDF
    This work addresses time series classifier recommendation for the first time in the literature by considering several recommendation forms or meta-targets: classifier accuracies, complete ranking, top-M ranking, best set and best classifier. For this, an ad-hoc set of quick estimators of the accuracies of the candidate classifiers (landmarkers) are designed, which are used as predictors for the recommendation system. The performance of our recommender is compared with the performance of a standard method for non-sequential data and a set of baseline methods, which our method outperforms in 7 of the 9 considered scenarios. Since some meta-targets can be inferred from the predictions of other more fine-grained meta-targets, the last part of the work addresses the hierarchical inference of meta-targets. The experimentation suggests that, in many cases, a single model is sufficient to output many types of meta-targets with competitive results

    Meta-Learning in the Area of Data Mining

    Get PDF
    Tato práce popisuje možnosti využití meta-učení v oblasti dolování dat. Popisuje problémy a úlohy dolování dat, na které je možné aplikovat meta-učení, se zaměřením na klasifikaci. Obsahuje přehled technik meta-učení a jejich možné využití v dolování dat, především v oblasti selekce modelu. Popisuje návrh a implementaci meta-učícího systém pro podporu klasifikačních úloh v dolování. Systém používá statistiku a teorii informací pro charakterizaci datových sad uložených v bázi meta-znalostí. Z báze pak vytváří meta-klasifikátor, který predikuje vhodný model pro nové datové sady. V závěru jsou diskutovány výsledky získané experimenty se systémem s více než 20 datovými sadami reprezentujícími klasifikační úlohy z různých oblastí a uvádí možnosti dalších rozšíření projektu.This paper describes the use of meta-learning in the area of data mining. It describes the problems and tasks of data mining where meta-learning can be applied, with a focus on classification. It provides an overview of meta-learning techniques and their possible application in data mining, especially  model selection. It describes design and implementation of meta-learning system to support classification tasks in data mining. The system uses statistics and information theory to characterize data sets stored in the meta-knowledge base. The meta-classifier is created from the base and predicts the most suitable model for the new data set. The conclusion discusses results of the experiments with more than 20 data sets representing clasification tasks from different areas and suggests possible extensions of the project.

    Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks

    Full text link
    Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead

    The Shape of Learning Curves: a Review

    Full text link
    Learning curves provide insight into the dependence of a learner's generalization performance on the training set size. This important tool can be used for model selection, to predict the effect of more training data, and to reduce the computational complexity of model training and hyperparameter tuning. This review recounts the origins of the term, provides a formal definition of the learning curve, and briefly covers basics such as its estimation. Our main contribution is a comprehensive overview of the literature regarding the shape of learning curves. We discuss empirical and theoretical evidence that supports well-behaved curves that often have the shape of a power law or an exponential. We consider the learning curves of Gaussian processes, the complex shapes they can display, and the factors influencing them. We draw specific attention to examples of learning curves that are ill-behaved, showing worse learning performance with more training data. To wrap up, we point out various open problems that warrant deeper empirical and theoretical investigation. All in all, our review underscores that learning curves are surprisingly diverse and no universal model can be identified

    Meta-learning Performance Prediction of Highly Configurable Systems: A Cost-oriented Approach

    Get PDF
    A key challenge of the development and maintenance of configurable systems is to predict the performance of individual system variants based on the features selected. It is usually infeasible to measure the performance of all possible variants, due to feature combinatorics. Previous approaches predict performance based on small samples of measured variants, but it is still open how to dynamically determine an ideal sample that balances prediction accuracy and measurement effort. In this work, we adapt two widely-used sampling strategies for performance prediction to the domain of configurable systems and evaluate them in terms of sampling cost, which considers prediction accuracy and measurement effort simultaneously. To generate an initial sample, we develop two sampling algorithms. One based on a traditional method of t-way feature coverage, and another based on a new heuristic of feature-frequencies. Using empirical data from six real-world systems, we evaluate the two sampling algorithms and discuss trade-offs. Furthermore, we conduct extensive sensitivity analysis of the cost model metric we use for evaluation, and analyze stability of learning behavior of the subject systems

    Contributions to Time Series Classification: Meta-Learning and Explainability

    Get PDF
    141 p.La presente tesis incluye 3 contribuciones de diferentes tipos al área de la clasificación supervisada de series temporales, un campo en auge por la cantidad de series temporales recolectadas día a día en una gran variedad en ámbitos. En este contexto, la cantidad de métodos disponibles para clasificar series temporales es cada vez más grande, siendo los clasificadores cada vez más competitivos y variados. De esta manera, la primera contribución de la tesis consiste en proponer una taxonomía de los clasificadores de series temporales basados en distancias, donde se hace una revisión exhaustiva de los métodos existentes y sus costes computacionales. Además, desde el punto de vista de un/a usuario/a no experto/a (incluso desde la de un/a experto/a), elegir un clasificador adecuado para un problema concreto es una tarea difícil. En la segunda contribución, por tanto, se aborda la recomendación de clasificadores de series temporales, para lo que usaremos un enfoque basado en el meta-aprendizaje. Por último, la tercera contribución consiste en proponer un método para explicar la predicción de los clasificadores de series temporales, en el que calculamos la relevancia de cada región de una serie en la predicción. Este método de explicación está basado en perturbaciones, para lo que consideraremos transformaciones específicas y realistas para las series temporales

    Contributions to Time Series Classification: Meta-Learning and Explainability

    Get PDF
    This thesis includes 3 contributions of different types to the area of supervised time series classification, a growing field of research due to the amount of time series collected daily in a wide variety of domains. In this context, the number of methods available for classifying time series is increasing, and the classifiers are becoming more and more competitive and varied. Thus, the first contribution of the thesis consists of proposing a taxonomy of distance-based time series classifiers, where an exhaustive review of the existing methods and their computational costs is made. Moreover, from the point of view of a non-expert user (even from that of an expert), choosing a suitable classifier for a given problem is a difficult task. The second contribution, therefore, deals with the recommendation of time series classifiers, for which we will use a meta-learning approach. Finally, the third contribution consists of proposing a method to explain the prediction of time series classifiers, in which we calculate the relevance of each region of a series in the prediction. This method of explanation is based on perturbations, for which we will consider specific and realistic transformations for the time series.BES-2016-07689
    corecore