14 research outputs found
Differentially Private Empirical Risk Minimization with Sparsity-Inducing Norms
Differential privacy is concerned about the prediction quality while
measuring the privacy impact on individuals whose information is contained in
the data. We consider differentially private risk minimization problems with
regularizers that induce structured sparsity. These regularizers are known to
be convex but they are often non-differentiable. We analyze the standard
differentially private algorithms, such as output perturbation, Frank-Wolfe and
objective perturbation. Output perturbation is a differentially private
algorithm that is known to perform well for minimizing risks that are strongly
convex. Previous works have derived excess risk bounds that are independent of
the dimensionality. In this paper, we assume a particular class of convex but
non-smooth regularizers that induce structured sparsity and loss functions for
generalized linear models. We also consider differentially private Frank-Wolfe
algorithms to optimize the dual of the risk minimization problem. We derive
excess risk bounds for both these algorithms. Both the bounds depend on the
Gaussian width of the unit ball of the dual norm. We also show that objective
perturbation of the risk minimization problems is equivalent to the output
perturbation of a dual optimization problem. This is the first work that
analyzes the dual optimization problems of risk minimization problems in the
context of differential privacy
Active-set Methods for Submodular Minimization Problems
International audienceWe consider the submodular function minimization (SFM) and the quadratic minimization problemsregularized by the Lov'asz extension of the submodular function. These optimization problemsare intimately related; for example,min-cut problems and total variation denoising problems, wherethe cut function is submodular and its Lov'asz extension is given by the associated total variation.When a quadratic loss is regularized by the total variation of a cut function, it thus becomes atotal variation denoising problem and we use the same terminology in this paper for “general” submodularfunctions. We propose a new active-set algorithm for total variation denoising with theassumption of an oracle that solves the corresponding SFM problem. This can be seen as localdescent algorithm over ordered partitions with explicit convergence guarantees. It is more flexiblethan the existing algorithms with the ability for warm-restarts using the solution of a closely relatedproblem. Further, we also consider the case when a submodular function can be decomposed intothe sum of two submodular functions F1 and F2 and assume SFM oracles for these two functions.We propose a new active-set algorithm for total variation denoising (and hence SFM by thresholdingthe solution at zero). This algorithm also performs local descent over ordered partitions and itsability to warm start considerably improves the performance of the algorithm. In the experiments,we compare the performance of the proposed algorithms with state-of-the-art algorithms, showingthat it reduces the calls to SFM oracles
The Graph Cut Kernel for Ranked Data
Many algorithms for ranked data become computationally intractable as the
number of objects grows due to the complex geometric structure induced by
rankings. An additional challenge is posed by partial rankings, i.e. rankings
in which the preference is only known for a subset of all objects. For these
reasons, state-of-the-art methods cannot scale to real-world applications, such
as recommender systems. We address this challenge by exploiting the geometric
structure of ranked data and additional available information about the objects
to derive a kernel for ranking based on the graph cut function. The graph cut
kernel combines the efficiency of submodular optimization with the theoretical
properties of kernel-based methods. The graph cut kernel combines the
efficiency of submodular optimization with the theoretical properties of
kernel-based methods
Sliced Multi-Marginal Optimal Transport
Multi-marginal optimal transport enables one to compare multiple probability
measures, which increasingly finds application in multi-task learning problems.
One practical limitation of multi-marginal transport is computational
scalability in the number of measures, samples and dimensionality. In this
work, we propose a multi-marginal optimal transport paradigm based on random
one-dimensional projections, whose (generalized) distance we term the sliced
multi-marginal Wasserstein distance. To construct this distance, we introduce a
characterization of the one-dimensional multi-marginal Kantorovich problem and
use it to highlight a number of properties of the sliced multi-marginal
Wasserstein distance. In particular, we show that (i) the sliced multi-marginal
Wasserstein distance is a (generalized) metric that induces the same topology
as the standard Wasserstein distance, (ii) it admits a dimension-free sample
complexity, (iii) it is tightly connected with the problem of barycentric
averaging under the sliced-Wasserstein metric. We conclude by illustrating the
sliced multi-marginal Wasserstein on multi-task density estimation and
multi-dynamics reinforcement learning problems
Maximizing submodular functions using probabilistic graphical models
We consider the problem of maximizing submodular functions; while this
problem is known to be NP-hard, several numerically efficient local search
techniques with approximation guarantees are available. In this paper, we
propose a novel convex relaxation which is based on the relationship between
submodular functions, entropies and probabilistic graphical models. In a
graphical model, the entropy of the joint distribution decomposes as a sum of
marginal entropies of subsets of variables; moreover, for any distribution, the
entropy of the closest distribution factorizing in the graphical model provides
an bound on the entropy. For directed graphical models, this last property
turns out to be a direct consequence of the submodularity of the entropy
function, and allows the generalization of graphical-model-based upper bounds
to any submodular functions. These upper bounds may then be jointly maximized
with respect to a set, while minimized with respect to the graph, leading to a
convex variational inference scheme for maximizing submodular functions, based
on outer approximations of the marginal polytope and maximum likelihood bounded
treewidth structures. By considering graphs of increasing treewidths, we may
then explore the trade-off between computational complexity and tightness of
the relaxation. We also present extensions to constrained problems and
maximizing the difference of submodular functions, which include all possible
set functions
Convex Relaxations for Learning Bounded-Treewidth Decomposable Graphs
We consider the problem of learning the structureofundirectedgraphicalmodelswith bounded treewidth, within the maximum likelihood framework. This is an NP-hard problem and most approaches consider local search techniques. In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperforest polytopes with special structures. A supergradient method is used to solve the dual problem, with a run-time complexity of O(k 3 n k+2 logn) for each iteration, where n is the number of variables and k is a bound on the treewidth. We compare our approach to state-of-the-art methods on synthetic datasets and classical benchmarks, showing the gains of the novel convex approach. 1
Learning to Segment Document Images
A hierarchical framework for document segmentation is proposed as an optimization problem. The model incorporates the dependencies between various levels of the hierarchy unlike traditional document segmentation algorithms