10 research outputs found

    Semistochastic Quadratic Bound Methods

    Full text link
    Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.Comment: 11 pages, 1 figur

    Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning

    Get PDF
    International audienceMajorization-minimization algorithms consist of successively minimizing a sequence of upper bounds of the objective function. These upper bounds are tight at the current estimate, and each iteration monotonically drives the objective function downhill. Such a simple principle is widely applicable and has been very popular in various scientific fields, especially in signal processing and statistics. In this paper, we propose an incremental majorization-minimization scheme for minimizing a large sum of continuous functions, a problem of utmost importance in machine learning. We present convergence guarantees for non-convex and convex optimization when the upper bounds approximate the objective up to a smooth error; we call such upper bounds ''first-order surrogate functions''. More precisely, we study asymptotic stationary point guarantees for non-convex problems, and for convex ones, we provide convergence rates for the expected objective function value. We apply our scheme to composite optimization and obtain a new incremental proximal gradient algorithm with linear convergence rate for strongly convex functions. In our experiments, we show that our method is competitive with the state of the art for solving machine learning problems such as logistic regression when the number of training samples is large enough, and we demonstrate its usefulness for sparse estimation with non-convex penalties

    Apprentissage Ă  grande Ă©chelle et applications

    Get PDF
    This thesis presents my main research activities in statistical machine learning aftermy PhD, starting from my post-doc at UC Berkeley to my present research position atInria Grenoble. The first chapter introduces the context and a summary of my scientificcontributions and emphasizes the importance of pluri-disciplinary research. For instance,mathematical optimization has become central in machine learning and the interplay betweensignal processing, statistics, bioinformatics, and computer vision is stronger thanever. With many scientific and industrial fields producing massive amounts of data, theimpact of machine learning is potentially huge and diverse. However, dealing with massivedata raises also many challenges. In this context, the manuscript presents differentcontributions, which are organized in three main topics.Chapter 2 is devoted to large-scale optimization in machine learning with a focus onalgorithmic methods. We start with majorization-minimization algorithms for structuredproblems, including block-coordinate, incremental, and stochastic variants. These algorithmsare analyzed in terms of convergence rates for convex problems and in terms ofconvergence to stationary points for non-convex ones. We also introduce fast schemesfor minimizing large sums of convex functions and principles to accelerate gradient-basedapproaches, based on Nesterov’s acceleration and on Quasi-Newton approaches.Chapter 3 presents the paradigm of deep kernel machine, which is an alliance betweenkernel methods and multilayer neural networks. In the context of visual recognition, weintroduce a new invariant image model called convolutional kernel networks, which is anew type of convolutional neural network with a reproducing kernel interpretation. Thenetwork comes with simple and effective principles to do unsupervised learning, and iscompatible with supervised learning via backpropagation rules.Chapter 4 is devoted to sparse estimation—that is, the automatic selection of modelvariables for explaining observed data; in particular, this chapter presents the result ofpluri-disciplinary collaborations in bioinformatics and neuroscience where the sparsityprinciple is a key to build intepretable predictive models.Finally, the last chapter concludes the manuscript and suggests future perspectives.Ce mémoire présente mes activités de recherche en apprentissage statistique après mathèse de doctorat, dans une période allant de mon post-doctorat à UC Berkeley jusqu’àmon activité actuelle de chercheur chez Inria. Le premier chapitre fournit un contextescientifique dans lequel s’inscrivent mes travaux et un résumé de mes contributions, enmettant l’accent sur l’importance de la recherche pluri-disciplinaire. L’optimisation mathématiqueest ainsi devenue un outil central en apprentissage statistique et les interactionsavec les communautés de vision artificielle, traitement du signal et bio-informatiquen’ont jamais été aussi fortes. De nombreux domaines scientifiques et industriels produisentdes données massives, mais les traiter efficacement nécessite de lever de nombreux verrousscientifiques. Dans ce contexte, ce mémoire présente différentes contributions, qui sontorganisées en trois thématiques.Le chapitre 2 est dédié à l’optimisation à large échelle en apprentissage statistique.Dans un premier lieu, nous étudions plusieurs variantes d’algorithmes de majoration/minimisationpour des problèmes structurés, telles que des variantes par bloc de variables,incrémentales, et stochastiques. Chaque algorithme est analysé en terme de taux deconvergence lorsque le problème est convexe, et nous montrons la convergence de ceux-civers des points stationnaires dans le cas contraire. Des méthodes de minimisation rapidespour traiter le cas de sommes finies de fonctions sont aussi introduites, ainsi que desalgorithmes d’accélération pour les techniques d’optimisation de premier ordre.Le chapitre 3 présente le paradigme des méthodes à noyaux profonds, que l’on peutinterpréter comme un mariage entre les méthodes à noyaux classiques et les techniquesd’apprentissage profond. Dans le contexte de la reconnaissance visuelle, ce chapitre introduitun nouveau modèle d’image invariant appelé réseau convolutionnel à noyaux, qui estun nouveau type de réseau de neurones convolutionnel avec une interprétation en termesde noyaux reproduisants. Le réseau peut être appris simplement sans supervision grâceà des techniques classiques d’approximation de noyaux, mais est aussi compatible avecl’apprentissage supervisé grâce à des règles de backpropagation.Le chapitre 4 est dédié à l’estimation parcimonieuse, c’est à dire, à la séléction automatiquede variables permettant d’expliquer des données observées. En particulier, cechapitre décrit des collaborations pluri-disciplinaires en bioinformatique et neuroscience,où le principe de parcimonie est crucial pour obtenir des modèles prédictifs interprétables.Enfin, le dernier chapitre conclut ce mémoire et présente des perspectives futures

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Modeling High-Dimensional Audio Sequences with Recurrent Neural Networks

    Get PDF
    Cette thèse étudie des modèles de séquences de haute dimension basés sur des réseaux de neurones récurrents (RNN) et leur application à la musique et à la parole. Bien qu'en principe les RNN puissent représenter les dépendances à long terme et la dynamique temporelle complexe propres aux séquences d'intérêt comme la vidéo, l'audio et la langue naturelle, ceux-ci n'ont pas été utilisés à leur plein potentiel depuis leur introduction par Rumelhart et al. (1986a) en raison de la difficulté de les entraîner efficacement par descente de gradient. Récemment, l'application fructueuse de l'optimisation Hessian-free et d'autres techniques d'entraînement avancées ont entraîné la recrudescence de leur utilisation dans plusieurs systèmes de l'état de l'art. Le travail de cette thèse prend part à ce développement. L'idée centrale consiste à exploiter la flexibilité des RNN pour apprendre une description probabiliste de séquences de symboles, c'est-à-dire une information de haut niveau associée aux signaux observés, qui en retour pourra servir d'à priori pour améliorer la précision de la recherche d'information. Par exemple, en modélisant l'évolution de groupes de notes dans la musique polyphonique, d'accords dans une progression harmonique, de phonèmes dans un énoncé oral ou encore de sources individuelles dans un mélange audio, nous pouvons améliorer significativement les méthodes de transcription polyphonique, de reconnaissance d'accords, de reconnaissance de la parole et de séparation de sources audio respectivement. L'application pratique de nos modèles à ces tâches est détaillée dans les quatre derniers articles présentés dans cette thèse. Dans le premier article, nous remplaçons la couche de sortie d'un RNN par des machines de Boltzmann restreintes conditionnelles pour décrire des distributions de sortie multimodales beaucoup plus riches. Dans le deuxième article, nous évaluons et proposons des méthodes avancées pour entraîner les RNN. Dans les quatre derniers articles, nous examinons différentes façons de combiner nos modèles symboliques à des réseaux profonds et à la factorisation matricielle non-négative, notamment par des produits d'experts, des architectures entrée/sortie et des cadres génératifs généralisant les modèles de Markov cachés. Nous proposons et analysons également des méthodes d'inférence efficaces pour ces modèles, telles la recherche vorace chronologique, la recherche en faisceau à haute dimension, la recherche en faisceau élagué et la descente de gradient. Finalement, nous abordons les questions de l'étiquette biaisée, du maître imposant, du lissage temporel, de la régularisation et du pré-entraînement.This thesis studies models of high-dimensional sequences based on recurrent neural networks (RNNs) and their application to music and speech. While in principle RNNs can represent the long-term dependencies and complex temporal dynamics present in real-world sequences such as video, audio and natural language, they have not been used to their full potential since their introduction by Rumelhart et al. (1986a) due to the difficulty to train them efficiently by gradient-based optimization. In recent years, the successful application of Hessian-free optimization and other advanced training techniques motivated an increase of their use in many state-of-the-art systems. The work of this thesis is part of this development. The main idea is to exploit the power of RNNs to learn a probabilistic description of sequences of symbols, i.e. high-level information associated with observed signals, that in turn can be used as a prior to improve the accuracy of information retrieval. For example, by modeling the evolution of note patterns in polyphonic music, chords in a harmonic progression, phones in a spoken utterance, or individual sources in an audio mixture, we can improve significantly the accuracy of polyphonic transcription, chord recognition, speech recognition and audio source separation respectively. The practical application of our models to these tasks is detailed in the last four articles presented in this thesis. In the first article, we replace the output layer of an RNN with conditional restricted Boltzmann machines to describe much richer multimodal output distributions. In the second article, we review and develop advanced techniques to train RNNs. In the last four articles, we explore various ways to combine our symbolic models with deep networks and non-negative matrix factorization algorithms, namely using products of experts, input/output architectures, and generative frameworks that generalize hidden Markov models. We also propose and analyze efficient inference procedures for those models, such as greedy chronological search, high-dimensional beam search, dynamic programming-like pruned beam search and gradient descent. Finally, we explore issues such as label bias, teacher forcing, temporal smoothing, regularization and pre-training
    corecore