6,217 research outputs found
An Inexact Successive Quadratic Approximation Method for Convex L-1 Regularized Optimization
We study a Newton-like method for the minimization of an objective function
that is the sum of a smooth convex function and an l-1 regularization term.
This method, which is sometimes referred to in the literature as a proximal
Newton method, computes a step by minimizing a piecewise quadratic model of the
objective function. In order to make this approach efficient in practice, it is
imperative to perform this inner minimization inexactly. In this paper, we give
inexactness conditions that guarantee global convergence and that can be used
to control the local rate of convergence of the iteration. Our inexactness
conditions are based on a semi-smooth function that represents a (continuous)
measure of the optimality conditions of the problem, and that embodies the
soft-thresholding iteration. We give careful consideration to the algorithm
employed for the inner minimization, and report numerical results on two test
sets originating in machine learning
Practical Inexact Proximal Quasi-Newton Method with Global Complexity Analysis
Recently several methods were proposed for sparse optimization which make
careful use of second-order information [10, 28, 16, 3] to improve local
convergence rates. These methods construct a composite quadratic approximation
using Hessian information, optimize this approximation using a first-order
method, such as coordinate descent and employ a line search to ensure
sufficient descent. Here we propose a general framework, which includes
slightly modified versions of existing algorithms and also a new algorithm,
which uses limited memory BFGS Hessian approximations, and provide a novel
global convergence rate analysis, which covers methods that solve subproblems
via coordinate descent
Distributed Multi-Task Relationship Learning
Multi-task learning aims to learn multiple tasks jointly by exploiting their
relatedness to improve the generalization performance for each task.
Traditionally, to perform multi-task learning, one needs to centralize data
from all the tasks to a single machine. However, in many real-world
applications, data of different tasks may be geo-distributed over different
local machines. Due to heavy communication caused by transmitting the data and
the issue of data privacy and security, it is impossible to send data of
different task to a master machine to perform multi-task learning. Therefore,
in this paper, we propose a distributed multi-task learning framework that
simultaneously learns predictive models for each task as well as task
relationships between tasks alternatingly in the parameter server paradigm. In
our framework, we first offer a general dual form for a family of regularized
multi-task relationship learning methods. Subsequently, we propose a
communication-efficient primal-dual distributed optimization algorithm to solve
the dual problem by carefully designing local subproblems to make the dual
problem decomposable. Moreover, we provide a theoretical convergence analysis
for the proposed algorithm, which is specific for distributed multi-task
relationship learning. We conduct extensive experiments on both synthetic and
real-world datasets to evaluate our proposed framework in terms of
effectiveness and convergence.Comment: To appear in KDD 201
Local-Aggregate Modeling for Big-Data via Distributed Optimization: Applications to Neuroimaging
Technological advances have led to a proliferation of structured big data
that have matrix-valued covariates. We are specifically motivated to build
predictive models for multi-subject neuroimaging data based on each subject's
brain imaging scans. This is an ultra-high-dimensional problem that consists of
a matrix of covariates (brain locations by time points) for each subject; few
methods currently exist to fit supervised models directly to this tensor data.
We propose a novel modeling and algorithmic strategy to apply generalized
linear models (GLMs) to this massive tensor data in which one set of variables
is associated with locations. Our method begins by fitting GLMs to each
location separately, and then builds an ensemble by blending information across
locations through regularization with what we term an aggregating penalty. Our
so called, Local-Aggregate Model, can be fit in a completely distributed manner
over the locations using an Alternating Direction Method of Multipliers (ADMM)
strategy, and thus greatly reduces the computational burden. Furthermore, we
propose to select the appropriate model through a novel sequence of faster
algorithmic solutions that is similar to regularization paths. We will
demonstrate both the computational and predictive modeling advantages of our
methods via simulations and an EEG classification problem.Comment: 41 pages, 5 figures and 3 table
- …