Search CORE

83 research outputs found

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

Author: Sugiyama Masashi
Xie Zeke
Yuan Li
Zhu Zhanxing
Publication venue
Publication date: 01/01/2021
Field of study

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks. Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning. However, it turned out that the injected simple random noise cannot work as well as SGN, which is anisotropic and parameter-dependent. For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach that is a powerful alternative to conventional Momentum in classic optimizers. The introduced PNM method maintains two approximate independent momentum terms. Then, we can control the magnitude of SGN explicitly by adjusting the momentum difference. We theoretically prove the convergence guarantee and the generalization advantage of PNM over Stochastic Gradient Descent (SGD). By incorporating PNM into the two conventional optimizers, SGD with Momentum and Adam, our extensive experiments empirically verified the significant advantage of the PNM-based variants over the corresponding conventional Momentum-based optimizers.Comment: ICML 2021; 20 pages; 13 figures; Key Words: deep learning theory, optimizer, momentum, generalization, gradient nois

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

A theory of (almost) zero resource speech recognition

Author: Bharadwaj Sujeeth Subramanya
Publication venue
Publication date
Field of study

Automatic speech recognition has matured into a commercially successful technology, enabling voice-based interfaces for smartphones, smart TVs, and many other consumer devices. The overwhelming popularity, however, is still limited to languages such as English, Japanese, and German, where vast amounts of labeled training data are available. For most other languages, it is prohibitively expensive to 1) collect and transcribe the speech data required to learn good acoustic models; and 2) acquire adequate text to estimate meaningful language models. A theory of unsupervised and semi-supervised techniques for speech recognition is therefore essential. This thesis focuses on HMM-based sequence clustering and examines acoustic modeling, language modeling, and applications beyond the components of an ASR, such as anomaly detection, from the vantage point of PAC-Bayesian theory. The first part of this thesis extends standard PAC-Bayesian bounds to address the sequential nature of speech and language signals. A novel algorithm, based on sparsifying the cluster assignment probabilities with a Renyi entropy prior, is shown to provably minimize the generalization error of any probabilistic model (e.g. HMMs). The second part examines application-specific loss functions such as cluster purity and perplexity. Empirical results on a variety of tasks -- acoustic event detection, class-based language modeling, and unsupervised sequence anomaly detection -- confirm the practicality of the theory and algorithms developed in this thesis

Illinois Digital Environment for Access to Learning and Scholarship Repository

A study on the application of topic models to motif finding algorithms

Author
Publication venue: BioMed Central
Publication date: 22/12/2016
Field of study

Springer - Publisher Connector

Poisson random fields for dynamic feature models

Author: Jenkins Paul
Perrone Valerio
Spanò Dario
Teh Yee Whye
Publication venue: Journal of Machine Learning Research
Publication date: 01/01/2017
Field of study

We present the Wright-Fisher Indian buffet process (WF-IBP), a probabilistic model for time-dependent data assumed to have been generated by an unknown number of latent features. This model is suitable as a prior in Bayesian nonparametric feature allocation models in which the features underlying the observed data exhibit a dependency structure over time. More specifically, we establish a new framework for generating dependent Indian buffet processes, where the Poisson random field model from population genetics is used as a way of constructing dependent beta processes. Inference in the model is complex, and we describe a sophisticated Markov Chain Monte Carlo algorithm for exact posterior simulation. We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015

Warwick Research Archives Portal Repository

Oxford University Research Archive

Building task-oriented machine translation systems

Author: Sanchis Trilles Germán
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 20/09/2012
Field of study

La principal meta de esta tesis es desarrollar sistemas de traduccion interactiva que presenten mayor sinergia con sus usuarios potenciales. Por ello, el objetivo es hacer los sistemas estado del arte mas ergonomicos, intuitivos y eficientes, con el fin de que el experto humano se sienta mas comodo al utilizarlos. Con este fin se presentan diferentes t�ecnicas enfocadas a mejorar la adaptabilidad y el tiempo de respuesta de los sistemas de traduccion automatica subyacentes, as�ÿ como tambien se presenta una estrategia cuya finalidad es mejorar la interaccion hombre-m�aquina. Todo ello con el proposito ultimo de rellenar el hueco existente entre el estado del arte en traduccion automatica y las herramientas que los traductores humanos tienen a su disposici�on. En lo que respecta al tiempo de respuesta de los sistemas de traducci�on autom�atica, en esta tesis se presenta una t�ecnica de poda de los par�ametros de los modelos de traducci�on actuales, cuya intuici�on est�a basada en el concepto de segmentaci�on biling¤ue, pero que termina por evolucionar hacia una estrategia de re-estimaci�on de dichos par�ametros. Utilizando esta estrategia se obtienen resultados experimentales que demuestran que es posible podar la tabla de segmentos hasta en un 97%, sin mermar por ello la calidad de las traducciones obtenidas. Adem�as, estos resultados son coherentes en diferentes pares de lenguas, lo cual evidencia que la t�ecnica que se presenta aqu�ÿ es efectiva en un entorno de traducci�on autom�atica tradicional, y por lo tanto podr�ÿa ser utilizada directamente en un escenario de post-edici�on. Sin embargo, los experimentos llevados a cabo en traducci�on interactiva son ligeramente menos convincentes, pues implican la necesidad de llegar a un compromiso entre el tiempo de respuesta y la calidad de los sufijos producidos. Por otra parte, se presentan dos t�ecnicas de adaptaci�on, con el prop�osito de mejorar la adaptabilidad de los sistemas de traducci�on autom�atica. La primeraSanchis Trilles, G. (2012). Building task-oriented machine translation systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17174Palanci

Crossref

RiuNet

Automatic acquisition of language models for speech recognition

Author: McCandless Michael Kyle
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1994
Field of study

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.Includes bibliographical references (leaves 138-141).by Michael Kyle McCanless.M.S

CiteSeerX

DSpace@MIT