42 research outputs found
On Convex Envelopes and Regularization of Non-Convex Functionals without moving Global Minima
We provide theory for the computation of convex envelopes of non-convex
functionals including an l2-term, and use these to suggest a method for
regularizing a more general set of problems. The applications are particularly
aimed at compressed sensing and low rank recovery problems but the theory
relies on results which potentially could be useful also for other types of
non-convex problems. For optimization problems where the l2-term contains a
singular matrix we prove that the regularizations never move the global minima.
This result in turn relies on a theorem concerning the structure of convex
envelopes which is interesting in its own right. It says that at any point
where the convex envelope does not touch the non-convex functional we
necessarily have a direction in which the convex envelope is affine.Comment: arXiv admin note: text overlap with arXiv:1609.0937
Sturmian morphisms, the braid group B_4, Christoffel words and bases of F_2
We give a presentation by generators and relations of a certain monoid
generating a subgroup of index two in the group Aut(F_2) of automorphisms of
the rank two free group F_2 and show that it can be realized as a monoid in the
group B_4 of braids on four strings. In the second part we use Christoffel
words to construct an explicit basis of F_2 lifting any given basis of the free
abelian group Z^2. We further give an algorithm allowing to decide whether two
elements of F_2 form a basis or not. We also show that, under suitable
conditions, a basis has a unique conjugate consisting of two palindromes.Comment: 25 pages, 4 figure
A regularization technique in dynamic optimization
In this dissertation we discuss certain aspects of a parametric regularization technique which is based on recent work by R. Goebel. For proper, lower semicontinuous, and convex functions, this regularization is self-dual with respect to convex conjugation, and a simple extension of this smoothing exhibits the same feature when applied to proper, closed, and saddle functions. In Chapter 1 we give a introduction to convex and saddle function theory, which includes new results on the convergence of saddle function values that were not previously available in the form presented. In Chapter 2, we define the regularization and extend some of the properties previously shown in the convex case to the saddle one. Furthermore, we investigate the properties of this regularization without convexity assumptions. In particular, we show that for a prox-bounded function the family of infimal values of the regularization converges to the infimal values of the given function, even when the given function might not have a minimizer. Also we show that for a general type of prox-regular functions the regularization is locally convex, even though their Moreau envelope might fail to have this property. Moreover, we apply the regularization technique to Lagrangians of convex optimization problems in two different settings, and describe the convergence of the associated saddle values and the value functions. We also employ the regularization in fully convex problems in calculus of variations, in Chapter 3, in the setting studied by R. Rockafellar and P. Wolenski. In this case, we extend a result by Rockafellar on the Lipschitz continuity of the proximal mapping of the value function jointly in the time and state variables, which in turn implies the same regularity for the gradient of the self-dual regularization. Finally, we attach a software code to use with SCAT (Symbolic Convex Analysis Toolbox) in order to symbolically compute the regularization for functions of one variable
Optimization tools for non-asymptotic statistics in exponential families
Les familles exponentielles sont une classe de modèles omniprésente en statistique.
D'une part, elle peut modéliser n'importe quel type de données.
En fait la plupart des distributions communes en font partie : Gaussiennes, variables catégoriques, Poisson, Gamma, Wishart, Dirichlet.
D'autre part elle est à la base des modèles linéaires généralisés (GLM), une classe de modèles fondamentale en apprentissage automatique.
Enfin les mathématiques qui les sous-tendent sont souvent magnifiques, grâce à leur lien avec la dualité convexe et la transformée de Laplace.
L'auteur de cette thèse a fréquemment été motivé par cette beauté.
Dans cette thèse, nous faisons trois contributions à l'intersection de l'optimisation et des statistiques, qui tournent toutes autour de la famille exponentielle.
La première contribution adapte et améliore un algorithme d'optimisation à variance réduite appelé ascension des coordonnées duales stochastique (SDCA), pour entraîner une classe particulière de GLM appelée champ aléatoire conditionnel (CRF). Les CRF sont un des piliers de la prédiction structurée. Les CRF étaient connus pour être difficiles à entraîner jusqu'à la découverte des technique d'optimisation à variance réduite. Notre version améliorée de SDCA obtient des performances favorables comparées à l'état de l'art antérieur et actuel.
La deuxième contribution s'intéresse à la découverte causale.
Les familles exponentielles sont fréquemment utilisées dans les modèles graphiques, et en particulier dans les modèles graphique causaux.
Cette contribution mène l'enquête sur une conjecture spécifique qui a attiré l'attention dans de précédents travaux : les modèles causaux s'adaptent plus rapidement aux perturbations de l'environnement.
Nos résultats, obtenus à partir de théorèmes d'optimisation, soutiennent cette hypothèse sous certaines conditions. Mais sous d'autre conditions, nos résultats contredisent cette hypothèse. Cela appelle à une précision de cette hypothèse, ou à une sophistication de notre notion de modèle causal.
La troisième contribution s'intéresse à une propriété fondamentale des familles exponentielles.
L'une des propriétés les plus séduisantes des familles exponentielles est la forme close de l'estimateur du maximum de vraisemblance (MLE), ou maximum a posteriori (MAP) pour un choix naturel de prior conjugué.
Ces deux estimateurs sont utilisés presque partout, souvent sans même y penser.
(Combien de fois calcule-t-on une moyenne et une variance pour des données en cloche sans penser au modèle Gaussien sous-jacent ?)
Pourtant la littérature actuelle manque de résultats sur la convergence de ces modèles pour des tailles d'échantillons finis, lorsque l'on mesure la qualité de ces modèles avec la divergence de Kullback-Leibler (KL).
Pourtant cette divergence est la mesure de différence standard en théorie de l'information.
En établissant un parallèle avec l'optimisation, nous faisons quelques pas vers un tel résultat, et nous relevons quelques directions pouvant mener à des progrès, tant en statistiques qu'en optimisation.
Ces trois contributions mettent des outil d'optimisation au service des statistiques dans les familles exponentielles : améliorer la vitesse d'apprentissage de GLM de prédiction structurée, caractériser la vitesse d'adaptation de modèles causaux, estimer la vitesse d'apprentissage de modèles omniprésents.
En traçant des ponts entre statistiques et optimisation, cette thèse fait progresser notre maîtrise de méthodes fondamentales d'apprentissage automatique.Exponential families are a ubiquitous class of models in statistics.
On the one hand, they can model any data type.
Actually, the most common distributions are exponential families: Gaussians, categorical, Poisson, Gamma, Wishart, or Dirichlet.
On the other hand, they sit at the core of generalized linear models (GLM), a foundational class of models in machine learning.
They are also supported by beautiful mathematics thanks to their connection with convex duality and the Laplace transform.
This beauty is definitely responsible for the existence of this thesis.
In this manuscript, we make three contributions at the intersection of optimization and statistics, all revolving around exponential families.
The first contribution adapts and improves a variance reduction optimization algorithm called stochastic dual coordinate ascent (SDCA) to train a particular class of GLM called conditional random fields (CRF). CRF are one of the cornerstones of structured prediction. CRF were notoriously hard to train until the advent of variance reduction techniques, and our improved version of SDCA performs favorably compared to the previous state-of-the-art.
The second contribution focuses on causal discovery.
Exponential families are widely used in graphical models, and in particular in causal graphical models.
This contribution investigates a specific conjecture that gained some traction in previous work: causal models adapt faster to perturbations of the environment.
Using results from optimization, we find strong support for this assumption when the perturbation is coming from an intervention on a cause, and support against this assumption when perturbation is coming from an intervention on an effect.
These pieces of evidence are calling for a refinement of the conjecture.
The third contribution addresses a fundamental property of exponential families.
One of the most appealing properties of exponential families is its closed-form maximum likelihood estimate (MLE) and maximum a posteriori (MAP) for a natural choice of conjugate prior. These two estimators are used almost everywhere, often unknowingly
-- how often are mean and variance computed for bell-shaped data without thinking about the Gaussian model they underly?
Nevertheless, literature to date lacks results on the finite sample convergence property of the information (Kulback-Leibler) divergence between these estimators and the true distribution.
Drawing on a parallel with optimization, we take some steps towards such a result, and we highlight directions for progress both in statistics and optimization.
These three contributions are all using tools from optimization at the service of statistics in exponential families: improving upon an algorithm to learn GLM, characterizing the adaptation speed of causal models, and estimating the learning speed of ubiquitous models.
By tying together optimization and statistics, this thesis is taking a step towards a better understanding of the fundamentals of machine learning
A Modern Introduction to Online Learning
In this monograph, I introduce the basic concepts of Online Learning through
a modern view of Online Convex Optimization. Here, online learning refers to
the framework of regret minimization under worst-case assumptions. I present
first-order and second-order algorithms for online learning with convex losses,
in Euclidean and non-Euclidean settings. All the algorithms are clearly
presented as instantiation of Online Mirror Descent or
Follow-The-Regularized-Leader and their variants. Particular attention is given
to the issue of tuning the parameters of the algorithms and learning in
unbounded domains, through adaptive and parameter-free online learning
algorithms. Non-convex losses are dealt through convex surrogate losses and
through randomization. The bandit setting is also briefly discussed, touching
on the problem of adversarial and stochastic multi-armed bandits. These notes
do not require prior knowledge of convex analysis and all the required
mathematical tools are rigorously explained. Moreover, all the proofs have been
carefully chosen to be as simple and as short as possible.Comment: Fixed more typos, added more history bits, added local norms bounds
for OMD and FTR
Revisiting Chernoff Information with Likelihood Ratio Exponential Families
The Chernoff information between two probability measures is a statistical
divergence measuring their deviation defined as their maximally skewed
Bhattacharyya distance. Although the Chernoff information was originally
introduced for bounding the Bayes error in statistical hypothesis testing, the
divergence found many other applications due to its empirical robustness
property found in applications ranging from information fusion to quantum
information. From the viewpoint of information theory, the Chernoff information
can also be interpreted as a minmax symmetrization of the Kullback--Leibler
divergence. In this paper, we first revisit the Chernoff information between
two densities of a measurable Lebesgue space by considering the exponential
families induced by their geometric mixtures: The so-called likelihood ratio
exponential families. Second, we show how to (i) solve exactly the Chernoff
information between any two univariate Gaussian distributions or get a
closed-form formula using symbolic computing, (ii) report a closed-form formula
of the Chernoff information of centered Gaussians with scaled covariance
matrices and (iii) use a fast numerical scheme to approximate the Chernoff
information between any two multivariate Gaussian distributions.Comment: 41 page