Search CORE

680 research outputs found

Statistical Mechanics of Soft Margin Classifiers

Author: A. Buhot
A. Buhot
B. Martos
C. Cortes
C. J. C. Burges
E. Gardner
G. Györgyi
G. Györgyi
H. S. Seung
J. Hertz
J.-I. Inoue
M. B. Gordon
M. Opper
M. Opper
M. Seeger
Mirta B. Gordon
O. Kinouchi
P. Peretto
P. Reimann
P. Sollich
R. Dietrich
R. Meir
S. Risau-Gusman
S. Risau-Gusman
S.-I. Amari
Sebastian Risau-Gusman
T. Cover
T. L. H. Watkin
T. Uezu
V. Vapnik
W. Krauth
Publication venue: 'American Physical Society (APS)'
Publication date: 18/02/2001
Field of study

We study the typical learning properties of the recently introduced Soft Margin Classifiers (SMCs), learning realizable and unrealizable tasks, with the tools of Statistical Mechanics. We derive analytically the behaviour of the learning curves in the regime of very large training sets. We obtain exponential and power laws for the decay of the generalization error towards the asymptotic value, depending on the task and on general characteristics of the distribution of stabilities of the patterns to be learned. The optimal learning curves of the SMCs, which give the minimal generalization error, are obtained by tuning the coefficient controlling the trade-off between the error and the regularization terms in the cost function. If the task is realizable by the SMC, the optimal performance is better than that of a hard margin Support Vector Machine and is very close to that of a Bayesian classifier.Comment: 26 pages, 12 figures, submitted to Physical Review

arXiv.org e-Print Archive

Crossref

Beyond neural scaling laws: beating power law scaling via data pruning

Author: Ganguli Surya
Geirhos Robert
Morcos Ari S.
Shekhar Shashank
Sorscher Ben
Publication venue
Publication date: 21/04/2023
Field of study

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how in theory we can break beyond power law scaling and potentially even reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this improved scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling in practice on ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.Comment: Outstanding Paper Award @ NeurIPS 2022. Added github link to metric score

arXiv.org e-Print Archive

A survey on online active learning

Author: Cacciarelli Davide
Kulahci Murat
Publication venue
Publication date: 14/03/2023
Field of study

Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

arXiv.org e-Print Archive

Estimating wind turbine generators failures using machine learning

Author: João Miguel Guimarães Fidalgo Roque
Publication venue
Publication date: 07/07/2017
Field of study

The objective of this thesis is to estimate failures of wind turbine generators, using real data. It will seek to predict the failure and model it's reliability.In order to achieve this goal, machine learning algorithms, such as neural networks, support vector machines and decision trees will be used

Repositório Aberto da Universidade do Porto

Supervised Learning - An Introduction:Lectures given at the 30th Canary Islands Winter School of Astrophysics

Author: Biehl Michael
Publication venue: Machine Learning Reports
Publication date: 01/04/2019
Field of study

Proceedings - University of Groningen

A hybrid approach

Author: Castelli Mauro
Costa-Mendes Ricardo
Cruz-Jesus Frederico
Oliveira Tiago
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2021
Field of study

Costa-Mendes, R., Oliveira, T., Castelli, M., & Cruz-Jesus, F. (2021). A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach. Education and Information Technologies, 26(2), 1527-1547. https://doi.org/10.1007/s10639-020-10316-yThis article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p-values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Neuronal Models of Motor Sequence Learning in the Songbird

Author: Westkott Maren
Publication venue
Publication date: 01/01/2016
Field of study

Communication of complex content is an important ability in our everyday life. For communication to be possible, several requirements need to be met: The individual communicated to has to learn to associate a certain meaning with a given sound. In the brain, this sound is represented as a spatio-temporal pattern of spikes, which will thus have to be associated with a different spike pattern representing its meaning. In this thesis, models for associative learning in spiking neurons are introduced in chapters 6 and 7. There, a new biologically plausible learning mechanism is proposed, where a property of the neuronal dynamics - the hyperpolarization of a neuron after each spike it produces - is coupled with a homeostatic plasticity mechanism, which acts to balance inputs into the neuron. In chapter 6, the mechanism used is a version of spike timing dependent plasticity (STDP), a property that was experimentally observed: The direction and amplitude of synaptic change depends on the precise timing of pre- and postsynaptic spiking activity. This mechanism is applied to associative learning of output spikes in response to purely spatial spiking patterns. In chapter 7, a new learning rule is introduced, which is derived from the objective of a balanced membrane potential. This learning rule is shown to be equivalent to a version of STDP and applied to associative learning of precisely timed output spikes in response to spatio-temporal input patterns. The individual communicating has to learn to reproduce certain sounds (which can be associated with a given meaning). To that end, a memory of the sound sequence has to be formed. Since sound sequences are represented as sequences of activation patterns in the brain, learning of a given sequence of spike patterns is an interesting problem for theoretical considerations Here, it is shown that the biologically plausible learning mechanism introduced for associative learning enables recurrently coupled networks of spiking neurons to learn to reproduce given sequences of spikes. These results are presented in chapter 9. Finally, the communicator has to translate the sensory memory into motor actions that serve to reproduce the target sound. This process is investigated in the framework of inverse model learning, where the learner learns to invert the action-perception cycle by mapping perceptions back onto the actions that caused them. Two different setups for inverse model learning are investigated: In chapter 5, a simple setup for inverse model learning is coupled with the learning algorithm used for Perceptron learning in chapter 6 and it is shown that models of the sound generation and perception process, which are non-linear and non-local in time, can be inverted, if the width of the distribution of time delays of self-generated inputs caused by an individual motor spike is not too large. This limitation is mitigated by the model introduced in chapter 8. Both these models have experimentally testable consequences, namely a dip in the autocorrelation function of the spike times in the motor population of the duration of the loop delay, i.e. the time it takes for a motor activation to cause a sound and thus a sensory activation and the time that this sensory activation takes to be looped back to the motor population. Furthermore, both models predict neurons, which are active during the sound generation and during the passive playback of the sound with a time delay equivalent to the loop delay. Finally, the inverse model presented in chapter 8 additionally predicts mirror neurons without a time delay. Both types of mirror neurons have been observed in the songbird [GKGH14, PPNM08], a popular animal model for vocal imitation learning

E-LIB Dokumentserver - Staats und Universitätsbibliothek Bremen