15 research outputs found
Échantillonnage progressif guidé pour stabiliser la courbe d'apprentissage
National audienceL'un des enjeux de l'apprentissage artificiel est de pouvoir fonctionner avec des volumes de données toujours plus grands. Bien qu'il soit généralement admis que plus un ensemble d'apprentissage est large et plus les résultats sont performants, il existe des limites à la masse d'informations qu'un algorithme d'apprentissage peut manipuler. Pour résoudre ce problème, nous proposons d'améliorer la méthode d'échantillonnage progressif en guidant la construction d'un ensemble d'apprentissage réduit à partir d'un large ensemble de données. L'apprentissage à partir de l'ensemble réduit doit conduire à des performances similaires à l'apprentissage effectué avec l'ensemble complet. Le guidage de l'échantillonnage s'appuie sur une connaissance a priori qui accélère la convergence de l'algorithme. Cette approche présente trois avantages : 1) l'ensemble d'apprentissage réduit est composé des cas les plus représentatifs de l'ensemble complet; 2) la courbe d'apprentissage est stabilisée; 3) la détection de convergence est accélérée. L'application de cette méthode à des données classiques et à des données provenant d'unités de soins intensifs révèle qu'il est possible de réduire de façon significative un ensemble d'apprentissage sans diminuer la performance de l'apprentissage
Learning Curves Prediction for a Transformers-Based Model
One of the main challenges when training or fine-tuning a machine learning model concerns the number of observations necessary to achieve satisfactory performance. While, in general, more training observations result in a better-performing model, collecting more data can be time-consuming, expensive, or even impossible. For this reason, investigating the relationship between the dataset's size and the performance of a machine learning model is fundamental to deciding, with a certain likelihood, the minimum number of observations that are necessary to ensure a satisfactory-performing model is obtained as a result of the training process. The learning curve represents the relationship between the dataset’s size and the performance of the model and is especially useful when choosing a model for a specific task or planning the annotation work of a dataset. Thus, the purpose of this paper is to find the functions that best fit the learning curves of a Transformers-based model (LayoutLM) when fine-tuned to extract information from invoices. Two new datasets of invoices are made available for such a task. Combined with a third dataset already available online, 22 sub-datasets are defined, and their learning curves are plotted based on cross-validation results. The functions are fit using a non-linear least squares technique. The results show that both a bi-asymptotic and a Morgan-Mercer-Flodin function fit the learning curves extremely well. Also, an empirical relation is presented to predict the learning curve from a single parameter that may be easily obtained in the early stage of the annotation process. Doi: 10.28991/ESJ-2023-07-05-03 Full Text: PD
A Survey of Learning Curves with Bad Behavior: or How More Data Need Not Lead to Better Performance
Plotting a learner's generalization performance against the training set size
results in a so-called learning curve. This tool, providing insight in the
behavior of the learner, is also practically valuable for model selection,
predicting the effect of more training data, and reducing the computational
complexity of training. We set out to make the (ideal) learning curve concept
precise and briefly discuss the aforementioned usages of such curves. The
larger part of this survey's focus, however, is on learning curves that show
that more data does not necessarily leads to better generalization performance.
A result that seems surprising to many researchers in the field of artificial
intelligence. We point out the significance of these findings and conclude our
survey with an overview and discussion of open problems in this area that
warrant further theoretical and empirical investigation.Comment: arXiv admin note: substantial text overlap with arXiv:2103.1094
Time Series Classifier Recommendation by a Meta-Learning Approach
This work addresses time series classifier recommendation for the first time in the literature by considering several recommendation forms or meta-targets: classifier accuracies, complete ranking, top-M ranking, best set and best classifier. For this, an ad-hoc set of quick estimators of the accuracies of the candidate classifiers (landmarkers) are designed, which are used as predictors for the recommendation system. The performance of our recommender is compared with the performance of a standard method for non-sequential data and a set of baseline methods, which our method outperforms in 7 of the 9 considered scenarios. Since some meta-targets can be inferred from the predictions of other more fine-grained meta-targets, the last part of the work addresses the hierarchical inference of meta-targets. The experimentation suggests that, in many cases, a single model is sufficient to output many types of meta-targets with competitive results
Meta-Learning in the Area of Data Mining
Tato práce popisuje možnosti využití meta-učení v oblasti dolování dat. Popisuje problémy a úlohy dolování dat, na které je možné aplikovat meta-učení, se zaměřením na klasifikaci. Obsahuje přehled technik meta-učení a jejich možné využití v dolování dat, především v oblasti selekce modelu. Popisuje návrh a implementaci meta-učícího systém pro podporu klasifikačních úloh v dolování. Systém používá statistiku a teorii informací pro charakterizaci datových sad uložených v bázi meta-znalostí. Z báze pak vytváří meta-klasifikátor, který predikuje vhodný model pro nové datové sady. V závěru jsou diskutovány výsledky získané experimenty se systémem s více než 20 datovými sadami reprezentujícími klasifikační úlohy z různých oblastí a uvádí možnosti dalších rozšíření projektu.This paper describes the use of meta-learning in the area of data mining. It describes the problems and tasks of data mining where meta-learning can be applied, with a focus on classification. It provides an overview of meta-learning techniques and their possible application in data mining, especially model selection. It describes design and implementation of meta-learning system to support classification tasks in data mining. The system uses statistics and information theory to characterize data sets stored in the meta-knowledge base. The meta-classifier is created from the base and predicts the most suitable model for the new data set. The conclusion discusses results of the experiments with more than 20 data sets representing clasification tasks from different areas and suggests possible extensions of the project.
Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks
Learning curve extrapolation aims to predict model performance in later
epochs of training, based on the performance in earlier epochs. In this work,
we argue that, while the inherent uncertainty in the extrapolation of learning
curves warrants a Bayesian approach, existing methods are (i) overly
restrictive, and/or (ii) computationally expensive. We describe the first
application of prior-data fitted neural networks (PFNs) in this context. A PFN
is a transformer, pre-trained on data generated from a prior, to perform
approximate Bayesian inference in a single forward pass. We propose LC-PFN, a
PFN trained to extrapolate 10 million artificial right-censored learning curves
generated from a parametric prior proposed in prior art using MCMC. We
demonstrate that LC-PFN can approximate the posterior predictive distribution
more accurately than MCMC, while being over 10 000 times faster. We also show
that the same LC-PFN achieves competitive performance extrapolating a total of
20 000 real learning curves from four learning curve benchmarks (LCBench,
NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model
architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets
with varying input modalities (tabular, image, text, and protein data).
Finally, we investigate its potential in the context of model selection and
find that a simple LC-PFN based predictive early stopping criterion obtains 2 -
6x speed-ups on 45 of these datasets, at virtually no overhead
The Shape of Learning Curves: a Review
Learning curves provide insight into the dependence of a learner's
generalization performance on the training set size. This important tool can be
used for model selection, to predict the effect of more training data, and to
reduce the computational complexity of model training and hyperparameter
tuning. This review recounts the origins of the term, provides a formal
definition of the learning curve, and briefly covers basics such as its
estimation. Our main contribution is a comprehensive overview of the literature
regarding the shape of learning curves. We discuss empirical and theoretical
evidence that supports well-behaved curves that often have the shape of a power
law or an exponential. We consider the learning curves of Gaussian processes,
the complex shapes they can display, and the factors influencing them. We draw
specific attention to examples of learning curves that are ill-behaved, showing
worse learning performance with more training data. To wrap up, we point out
various open problems that warrant deeper empirical and theoretical
investigation. All in all, our review underscores that learning curves are
surprisingly diverse and no universal model can be identified
Meta-learning Performance Prediction of Highly Configurable Systems: A Cost-oriented Approach
A key challenge of the development and maintenance of configurable systems is to predict the performance of individual system variants based on the features selected. It is usually infeasible to measure the performance of all possible variants, due to feature combinatorics. Previous approaches predict performance based on small samples of measured variants, but it is still open how to dynamically determine an ideal sample that balances prediction accuracy and measurement effort. In this work, we adapt two widely-used sampling strategies for performance prediction to the domain of configurable systems and evaluate them in terms of sampling cost, which considers prediction accuracy and measurement effort simultaneously. To generate an initial sample, we develop two sampling algorithms. One based on a traditional method of t-way feature coverage, and another based on a new heuristic of feature-frequencies. Using empirical data from six real-world systems, we evaluate the two sampling algorithms and discuss trade-offs. Furthermore, we conduct extensive sensitivity analysis of the cost model metric we use for evaluation, and analyze stability of learning behavior of the subject systems
Contributions to Time Series Classification: Meta-Learning and Explainability
141 p.La presente tesis incluye 3 contribuciones de diferentes tipos al área de la clasificación supervisada de series temporales, un campo en auge por la cantidad de series temporales recolectadas día a día en una gran variedad en ámbitos. En este contexto, la cantidad de métodos disponibles para clasificar series temporales es cada vez más grande, siendo los clasificadores cada vez más competitivos y variados. De esta manera, la primera contribución de la tesis consiste en proponer una taxonomía de los clasificadores de series temporales basados en distancias, donde se hace una revisión exhaustiva de los métodos existentes y sus costes computacionales. Además, desde el punto de vista de un/a usuario/a no experto/a (incluso desde la de un/a experto/a), elegir un clasificador adecuado para un problema concreto es una tarea difícil. En la segunda contribución, por tanto, se aborda la recomendación de clasificadores de series temporales, para lo que usaremos un enfoque basado en el meta-aprendizaje. Por último, la tercera contribución consiste en proponer un método para explicar la predicción de los clasificadores de series temporales, en el que calculamos la relevancia de cada región de una serie en la predicción. Este método de explicación está basado en perturbaciones, para lo que consideraremos transformaciones específicas y realistas para las series temporales
Contributions to Time Series Classification: Meta-Learning and Explainability
This thesis includes 3 contributions of different types to the area of supervised time series classification, a growing field of research due to the amount of time series collected daily in a wide variety of domains. In this context, the number of methods available for classifying time series is increasing, and the classifiers are becoming more and more competitive and varied. Thus, the first contribution of the thesis consists of proposing a taxonomy of distance-based time series classifiers, where an exhaustive review of the existing methods and their computational costs is made. Moreover, from the point of view of a non-expert user (even from that of an expert), choosing a suitable classifier for a given problem is a difficult task. The second contribution, therefore, deals with the recommendation of time series classifiers, for which we will use a meta-learning approach. Finally, the third contribution consists of proposing a method to explain the prediction of time series classifiers, in which we calculate the relevance of each region of a series in the prediction. This method of explanation is based on perturbations, for which we will consider specific and realistic transformations for the time series.BES-2016-07689