2,918 research outputs found
Bethe Projections for Non-Local Inference
Many inference problems in structured prediction are naturally solved by
augmenting a tractable dependency structure with complex, non-local auxiliary
objectives. This includes the mean field family of variational inference
algorithms, soft- or hard-constrained inference using Lagrangian relaxation or
linear programming, collective graphical models, and forms of semi-supervised
learning such as posterior regularization. We present a method to
discriminatively learn broad families of inference objectives, capturing
powerful non-local statistics of the latent variables, while maintaining
tractable and provably fast inference using non-Euclidean projected gradient
descent with a distance-generating function given by the Bethe entropy. We
demonstrate the performance and flexibility of our method by (1) extracting
structured citations from research papers by learning soft global constraints,
(2) achieving state-of-the-art results on a widely-used handwriting recognition
task using a novel learned non-convex inference procedure, and (3) providing a
fast and highly scalable algorithm for the challenging problem of inference in
a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201
Nonconvex Sparse Spectral Clustering by Alternating Direction Method of Multipliers and Its Convergence Analysis
Spectral Clustering (SC) is a widely used data clustering method which first
learns a low-dimensional embedding of data by computing the eigenvectors of
the normalized Laplacian matrix, and then performs k-means on to get
the final clustering result. The Sparse Spectral Clustering (SSC) method
extends SC with a sparse regularization on by using the block
diagonal structure prior of in the ideal case. However, encouraging
to be sparse leads to a heavily nonconvex problem which is
challenging to solve and the work (Lu, Yan, and Lin 2016) proposes a convex
relaxation in the pursuit of this aim indirectly. However, the convex
relaxation generally leads to a loose approximation and the quality of the
solution is not clear. This work instead considers to solve the nonconvex
formulation of SSC which directly encourages to be sparse. We propose
an efficient Alternating Direction Method of Multipliers (ADMM) to solve the
nonconvex SSC and provide the convergence guarantee. In particular, we prove
that the sequences generated by ADMM always exist a limit point and any limit
point is a stationary point. Our analysis does not impose any assumptions on
the iterates and thus is practical. Our proposed ADMM for nonconvex problems
allows the stepsize to be increasing but upper bounded, and this makes it very
efficient in practice. Experimental analysis on several real data sets verifies
the effectiveness of our method.Comment: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
201
A distributed primal-dual interior-point method for loosely coupled problems using ADMM
In this paper we propose an efficient distributed algorithm for solving
loosely coupled convex optimization problems. The algorithm is based on a
primal-dual interior-point method in which we use the alternating direction
method of multipliers (ADMM) to compute the primal-dual directions at each
iteration of the method. This enables us to join the exceptional convergence
properties of primal-dual interior-point methods with the remarkable
parallelizability of ADMM. The resulting algorithm has superior computational
properties with respect to ADMM directly applied to our problem. The amount of
computations that needs to be conducted by each computing agent is far less. In
particular, the updates for all variables can be expressed in closed form,
irrespective of the type of optimization problem. The most expensive
computational burden of the algorithm occur in the updates of the primal
variables and can be precomputed in each iteration of the interior-point
method. We verify and compare our method to ADMM in numerical experiments.Comment: extended version, 50 pages, 9 figure
Lagrangian Relaxation for Mixed-Integer Linear Programming: Importance, Challenges, Recent Advancements, and Opportunities
Operations in areas of importance to society are frequently modeled as
Mixed-Integer Linear Programming (MILP) problems. While MILP problems suffer
from combinatorial complexity, Lagrangian Relaxation has been a beacon of hope
to resolve the associated difficulties through decomposition. Due to the
non-smooth nature of Lagrangian dual functions, the coordination aspect of the
method has posed serious challenges. This paper presents several significant
historical milestones (beginning with Polyak's pioneering work in 1967) toward
improving Lagrangian Relaxation coordination through improved optimization of
non-smooth functionals. Finally, this paper presents the most recent
developments in Lagrangian Relaxation for fast resolution of MILP problems. The
paper also briefly discusses the opportunities that Lagrangian Relaxation can
provide at this point in time
-Box Optimization for Green Cloud-RAN via Network Adaptation
In this paper, we propose a reformulation for the Mixed Integer Programming
(MIP) problem into an exact and continuous model through using the -box
technique to recast the binary constraints into a box with an sphere
constraint. The reformulated problem can be tackled by a dual ascent algorithm
combined with a Majorization-Minimization (MM) method for the subproblems to
solve the network power consumption problem of the Cloud Radio Access Network
(Cloud-RAN), and which leads to solving a sequence of Difference of Convex (DC)
subproblems handled by an inexact MM algorithm. After obtaining the final
solution, we use it as the initial result of the bi-section Group Sparse
Beamforming (GSBF) algorithm to promote the group-sparsity of beamformers,
rather than using the weighted -norm. Simulation results
indicate that the new method outperforms the bi-section GSBF algorithm by
achieving smaller network power consumption, especially in sparser cases, i.e.,
Cloud-RANs with a lot of Remote Radio Heads (RRHs) but fewer users.Comment: 4 pages, 4 figure
A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem
In this paper, we consider the sparse eigenvalue problem wherein the goal is
to obtain a sparse solution to the generalized eigenvalue problem. We achieve
this by constraining the cardinality of the solution to the generalized
eigenvalue problem and obtain sparse principal component analysis (PCA), sparse
canonical correlation analysis (CCA) and sparse Fisher discriminant analysis
(FDA) as special cases. Unlike the -norm approximation to the
cardinality constraint, which previous methods have used in the context of
sparse PCA, we propose a tighter approximation that is related to the negative
log-likelihood of a Student's t-distribution. The problem is then framed as a
d.c. (difference of convex functions) program and is solved as a sequence of
convex programs by invoking the majorization-minimization method. The resulting
algorithm is proved to exhibit \emph{global convergence} behavior, i.e., for
any random initialization, the sequence (subsequence) of iterates generated by
the algorithm converges to a stationary point of the d.c. program. The
performance of the algorithm is empirically demonstrated on both sparse PCA
(finding few relevant genes that explain as much variance as possible in a
high-dimensional gene dataset) and sparse CCA (cross-language document
retrieval and vocabulary selection for music retrieval) applications.Comment: 40 page
- β¦