2,199 research outputs found
Sparse Bilinear Logistic Regression
In this paper, we introduce the concept of sparse bilinear logistic
regression for decision problems involving explanatory variables that are
two-dimensional matrices. Such problems are common in computer vision,
brain-computer interfaces, style/content factorization, and parallel factor
analysis. The underlying optimization problem is bi-convex; we study its
solution and develop an efficient algorithm based on block coordinate descent.
We provide a theoretical guarantee for global convergence and estimate the
asymptotical convergence rate using the Kurdyka-{\L}ojasiewicz inequality. A
range of experiments with simulated and real data demonstrate that sparse
bilinear logistic regression outperforms current techniques in several
important applications.Comment: 27 pages, 5 figure
FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
Advertising and feed ranking are essential to many Internet companies such as
Facebook and Sina Weibo. Among many real-world advertising and feed ranking
systems, click through rate (CTR) prediction plays a central role. There are
many proposed models in this field such as logistic regression, tree based
models, factorization machine based models and deep learning based CTR models.
However, many current works calculate the feature interactions in a simple way
such as Hadamard product and inner product and they care less about the
importance of features. In this paper, a new model named FiBiNET as an
abbreviation for Feature Importance and Bilinear feature Interaction NETwork is
proposed to dynamically learn the feature importance and fine-grained feature
interactions. On the one hand, the FiBiNET can dynamically learn the importance
of features via the Squeeze-Excitation network (SENET) mechanism; on the other
hand, it is able to effectively learn the feature interactions via bilinear
function. We conduct extensive experiments on two real-world datasets and show
that our shallow model outperforms other shallow models such as factorization
machine(FM) and field-aware factorization machine(FFM). In order to improve
performance further, we combine a classical deep neural network(DNN) component
with the shallow model to be a deep model. The deep FiBiNET consistently
outperforms the other state-of-the-art deep models such as DeepFM and extreme
deep factorization machine(XdeepFM).Comment: 8 pages,5 figure
Supervised Dictionary Learning
It is now well established that sparse signal models are well suited to
restoration tasks and can effectively be learned from audio, image, and video
data. Recent research has been aimed at learning discriminative sparse models
instead of purely reconstructive ones. This paper proposes a new step in that
direction, with a novel sparse representation for signals belonging to
different classes in terms of a shared dictionary and multiple class-decision
functions. The linear variant of the proposed model admits a simple
probabilistic interpretation, while its most general variant admits an
interpretation in terms of kernels. An optimization framework for learning all
the components of the proposed model is presented, along with experimental
results on standard handwritten digit and texture classification tasks
Block stochastic gradient iteration for convex and nonconvex optimization
The stochastic gradient (SG) method can minimize an objective function
composed of a large number of differentiable functions, or solve a stochastic
optimization problem, to a moderate accuracy. The block coordinate
descent/update (BCD) method, on the other hand, handles problems with multiple
blocks of variables by updating them one at a time; when the blocks of
variables are easier to update individually than together, BCD has a lower
per-iteration cost. This paper introduces a method that combines the features
of SG and BCD for problems with many components in the objective and with
multiple (blocks of) variables.
Specifically, a block stochastic gradient (BSG) method is proposed for
solving both convex and nonconvex programs. At each iteration, BSG approximates
the gradient of the differentiable part of the objective by randomly sampling a
small set of data or sampling a few functions from the sum term in the
objective, and then, using those samples, it updates all the blocks of
variables in either a deterministic or a randomly shuffled order. Its
convergence for both convex and nonconvex cases are established in different
senses. In the convex case, the proposed method has the same order of
convergence rate as the SG method. In the nonconvex case, its convergence is
established in terms of the expected violation of a first-order optimality
condition. The proposed method was numerically tested on problems including
stochastic least squares and logistic regression, which are convex, as well as
low-rank tensor recovery and bilinear logistic regression, which are nonconvex
Hybrid approximate message passing
Gaussian and quadratic approximations of message passing algorithms on graphs have attracted considerable recent attention due to their computational simplicity, analytic tractability, and wide applicability in optimization and statistical inference problems. This paper presents a systematic framework for incorporating such approximate message passing (AMP) methods in general graphical models. The key concept is a partition of dependencies of a general graphical model into strong and weak edges, with the weak edges representing interactions through aggregates of small, linearizable couplings of variables. AMP approximations based on the Central Limit Theorem can be readily applied to aggregates of many weak edges and integrated with standard message passing updates on the strong edges. The resulting algorithm, which we call hybrid generalized approximate message passing (HyGAMP), can yield significantly simpler implementations of sum-product and max-sum loopy belief propagation. By varying the partition of strong and weak edges, a performance--complexity trade-off can be achieved. Group sparsity and multinomial logistic regression problems are studied as examples of the proposed methodology.The work of S. Rangan was supported in part by the National Science Foundation under Grants 1116589, 1302336, and 1547332, and in part by the industrial affiliates of NYU WIRELESS. The work of A. K. Fletcher was supported in part by the National Science Foundation under Grants 1254204 and 1738286 and in part by the Office of Naval Research under Grant N00014-15-1-2677. The work of V. K. Goyal was supported in part by the National Science Foundation under Grant 1422034. The work of E. Byrne and P. Schniter was supported in part by the National Science Foundation under Grant CCF-1527162. (1116589 - National Science Foundation; 1302336 - National Science Foundation; 1547332 - National Science Foundation; 1254204 - National Science Foundation; 1738286 - National Science Foundation; 1422034 - National Science Foundation; CCF-1527162 - National Science Foundation; NYU WIRELESS; N00014-15-1-2677 - Office of Naval Research
An Efficient Primal-Dual Prox Method for Non-Smooth Optimization
We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet efficient method for a family
of non-smooth optimization problems where the dual form of the loss function is
bilinear in primal and dual variables. We cast a non-smooth optimization
problem into a minimax optimization problem, and develop a primal dual prox
method that solves the minimax optimization problem at a rate of
{assuming that the proximal step can be efficiently solved}, significantly
faster than a standard subgradient descent method that has an
convergence rate. Our empirical study verifies the efficiency of the proposed
method for various non-smooth optimization problems that arise ubiquitously in
machine learning by comparing it to the state-of-the-art first order methods
Compact Bilinear Pooling
Bilinear models has been shown to achieve impressive performance on a wide
range of visual tasks, such as semantic segmentation, fine grained recognition
and face recognition. However, bilinear features are high dimensional,
typically on the order of hundreds of thousands to a few million, which makes
them impractical for subsequent analysis. We propose two compact bilinear
representations with the same discriminative power as the full bilinear
representation but with only a few thousand dimensions. Our compact
representations allow back-propagation of classification errors enabling an
end-to-end optimization of the visual recognition system. The compact bilinear
representations are derived through a novel kernelized analysis of bilinear
pooling which provide insights into the discriminative power of bilinear
pooling, and a platform for further research in compact pooling methods.
Experimentation illustrate the utility of the proposed representations for
image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP
- …