Search CORE

2,918 research outputs found

Bethe Projections for Non-Local Inference

Author: Belanger David
McCallum Andrew
Sheldon Daniel
Vilnis Luke
Publication venue
Publication date: 28/11/2016
Field of study

Many inference problems in structured prediction are naturally solved by augmenting a tractable dependency structure with complex, non-local auxiliary objectives. This includes the mean field family of variational inference algorithms, soft- or hard-constrained inference using Lagrangian relaxation or linear programming, collective graphical models, and forms of semi-supervised learning such as posterior regularization. We present a method to discriminatively learn broad families of inference objectives, capturing powerful non-local statistics of the latent variables, while maintaining tractable and provably fast inference using non-Euclidean projected gradient descent with a distance-generating function given by the Bethe entropy. We demonstrate the performance and flexibility of our method by (1) extracting structured citations from research papers by learning soft global constraints, (2) achieving state-of-the-art results on a widely-used handwriting recognition task using a novel learned non-convex inference procedure, and (3) providing a fast and highly scalable algorithm for the challenging problem of inference in a collective graphical model applied to bird migration.Comment: minor bug fix to appendix. appeared in UAI 201

arXiv.org e-Print Archive

CiteSeerX

Nonconvex Sparse Spectral Clustering by Alternating Direction Method of Multipliers and Its Convergence Analysis

Author: Feng Jiashi
Lin Zhouchen
Lu Canyi
Yan Shuicheng
Publication venue
Publication date: 08/12/2017
Field of study

Spectral Clustering (SC) is a widely used data clustering method which first learns a low-dimensional embedding

U

of data by computing the eigenvectors of the normalized Laplacian matrix, and then performs k-means on

U^\top

to get the final clustering result. The Sparse Spectral Clustering (SSC) method extends SC with a sparse regularization on

UU^\top

by using the block diagonal structure prior of

UU^\top

in the ideal case. However, encouraging

UU^\top

to be sparse leads to a heavily nonconvex problem which is challenging to solve and the work (Lu, Yan, and Lin 2016) proposes a convex relaxation in the pursuit of this aim indirectly. However, the convex relaxation generally leads to a loose approximation and the quality of the solution is not clear. This work instead considers to solve the nonconvex formulation of SSC which directly encourages

UU^\top

to be sparse. We propose an efficient Alternating Direction Method of Multipliers (ADMM) to solve the nonconvex SSC and provide the convergence guarantee. In particular, we prove that the sequences generated by ADMM always exist a limit point and any limit point is a stationary point. Our analysis does not impose any assumptions on the iterates and thus is practical. Our proposed ADMM for nonconvex problems allows the stepsize to be increasing but upper bounded, and this makes it very efficient in practice. Experimental analysis on several real data sets verifies the effectiveness of our method.Comment: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A distributed primal-dual interior-point method for loosely coupled problems using ADMM

Author: Annergren Mariette
Hansson Anders
Pakazad Sina Khoshfetrat
Wahlberg Bo
Publication venue
Publication date: 09/02/2015
Field of study

In this paper we propose an efficient distributed algorithm for solving loosely coupled convex optimization problems. The algorithm is based on a primal-dual interior-point method in which we use the alternating direction method of multipliers (ADMM) to compute the primal-dual directions at each iteration of the method. This enables us to join the exceptional convergence properties of primal-dual interior-point methods with the remarkable parallelizability of ADMM. The resulting algorithm has superior computational properties with respect to ADMM directly applied to our problem. The amount of computations that needs to be conducted by each computing agent is far less. In particular, the updates for all variables can be expressed in closed form, irrespective of the type of optimization problem. The most expensive computational burden of the algorithm occur in the updates of the primal variables and can be precomputed in each iteration of the interior-point method. We verify and compare our method to ADMM in numerical experiments.Comment: extended version, 50 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Lagrangian Relaxation for Mixed-Integer Linear Programming: Importance, Challenges, Recent Advancements, and Opportunities

Author: Bragin Mikhail A.
Publication venue
Publication date: 09/04/2023
Field of study

Operations in areas of importance to society are frequently modeled as Mixed-Integer Linear Programming (MILP) problems. While MILP problems suffer from combinatorial complexity, Lagrangian Relaxation has been a beacon of hope to resolve the associated difficulties through decomposition. Due to the non-smooth nature of Lagrangian dual functions, the coordination aspect of the method has posed serious challenges. This paper presents several significant historical milestones (beginning with Polyak's pioneering work in 1967) toward improving Lagrangian Relaxation coordination through improved optimization of non-smooth functionals. Finally, this paper presents the most recent developments in Lagrangian Relaxation for fast resolution of MILP problems. The paper also briefly discusses the opportunities that Lagrangian Relaxation can provide at this point in time

arXiv.org e-Print Archive

$L_2$ -Box Optimization for Green Cloud-RAN via Network Adaptation

Author: Shi Yuanming
Wang Hao
Wu Qiong
Zhang Fan
Publication venue
Publication date: 29/11/2017
Field of study

In this paper, we propose a reformulation for the Mixed Integer Programming (MIP) problem into an exact and continuous model through using the

\ell_2

-box technique to recast the binary constraints into a box with an

\ell_2

sphere constraint. The reformulated problem can be tackled by a dual ascent algorithm combined with a Majorization-Minimization (MM) method for the subproblems to solve the network power consumption problem of the Cloud Radio Access Network (Cloud-RAN), and which leads to solving a sequence of Difference of Convex (DC) subproblems handled by an inexact MM algorithm. After obtaining the final solution, we use it as the initial result of the bi-section Group Sparse Beamforming (GSBF) algorithm to promote the group-sparsity of beamformers, rather than using the weighted

\ell_1 / \ell_2

-norm. Simulation results indicate that the new method outperforms the bi-section GSBF algorithm by achieving smaller network power consumption, especially in sparser cases, i.e., Cloud-RANs with a lot of Remote Radio Heads (RRHs) but fewer users.Comment: 4 pages, 4 figure

arXiv.org e-Print Archive

Crossref

A D.C. Programming Approach to the Sparse Generalized Eigenvalue Problem

Author: Lanckriet Gert
Sriperumbudur Bharath
Torres David
Publication venue
Publication date: 01/01/2009
Field of study

In this paper, we consider the sparse eigenvalue problem wherein the goal is to obtain a sparse solution to the generalized eigenvalue problem. We achieve this by constraining the cardinality of the solution to the generalized eigenvalue problem and obtain sparse principal component analysis (PCA), sparse canonical correlation analysis (CCA) and sparse Fisher discriminant analysis (FDA) as special cases. Unlike the

\ell_1

-norm approximation to the cardinality constraint, which previous methods have used in the context of sparse PCA, we propose a tighter approximation that is related to the negative log-likelihood of a Student's t-distribution. The problem is then framed as a d.c. (difference of convex functions) program and is solved as a sequence of convex programs by invoking the majorization-minimization method. The resulting algorithm is proved to exhibit \emph{global convergence} behavior, i.e., for any random initialization, the sequence (subsequence) of iterates generated by the algorithm converges to a stationary point of the d.c. program. The performance of the algorithm is empirically demonstrated on both sparse PCA (finding few relevant genes that explain as much variance as possible in a high-dimensional gene dataset) and sparse CCA (cross-language document retrieval and vocabulary selection for music retrieval) applications.Comment: 40 page

arXiv.org e-Print Archive

CiteSeerX