Search CORE

395 research outputs found

Primal-Dual Rates and Certificates

Author: Dünner Celestine
Forte Simone
Jaggi Martin
Takáč Martin
Publication venue
Publication date: 02/06/2016
Field of study

We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.Comment: appearing at ICML 2016 - Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

An Exponential Lower Bound on the Complexity of Regularization Paths

Author: Gärtner Bernd
Jaggi Martin
Maria Clément
Publication venue
Publication date: 01/01/2012
Field of study

For a variety of regularized optimization problems in machine learning, algorithms computing the entire solution path have been developed recently. Most of these methods are quadratic programs that are parameterized by a single parameter, as for example the Support Vector Machine (SVM). Solution path algorithms do not only compute the solution for one particular value of the regularization parameter but the entire path of solutions, making the selection of an optimal parameter much easier. It has been assumed that these piecewise linear solution paths have only linear complexity, i.e. linearly many bends. We prove that for the support vector machine this complexity can be exponential in the number of training points in the worst case. More strongly, we construct a single instance of n input points in d dimensions for an SVM such that at least \Theta(2^{n/2}) = \Theta(2^d) many distinct subsets of support vectors occur as the regularization parameter changes.Comment: Journal version, 28 Pages, 5 Figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features

Author: Gupta Prakhar
Jaggi Martin
Pagliardini Matteo
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 21/06/2017
Field of study

The recent tremendous success of unsupervised word embeddings in a multitude of applications raises the obvious question if similar methods could be derived to improve embeddings (i.e. semantic representations) of word sequences as well. We present a simple but efficient unsupervised objective to train distributed representations of sentences. Our method outperforms the state-of-the-art unsupervised models on most benchmark tasks, highlighting the robustness of the produced general-purpose sentence embeddings.Comment: NAACL 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Better Word Embeddings by Disentangling Contextual n-Gram Information

Author: Gupta Prakhar
Jaggi Martin
Pagliardini Matteo
Publication venue
Publication date: 01/01/2019
Field of study

Pre-trained word vectors are ubiquitous in Natural Language Processing applications. In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings. We claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the unigrams, resulting in better stand-alone word embeddings. We empirically show the validity of our hypothesis by outperforming other competing word representation models by a significant margin on a wide variety of tasks. We make our models publicly available.Comment: NAACL 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Faster Coordinate Descent via Adaptive Importance Sampling

Author: Cevher Volkan
Jaggi Martin
Perekrestenko Dmytro
Publication venue
Publication date: 07/03/2017
Field of study

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory.Comment: appearing at AISTATS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne