160,634 research outputs found
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
Efficient First Order Methods for Linear Composite Regularizers
A wide class of regularization problems in machine learning and statistics
employ a regularization term which is obtained by composing a simple convex
function \omega with a linear transformation. This setting includes Group Lasso
methods, the Fused Lasso and other total variation methods, multi-task learning
methods and many more. In this paper, we present a general approach for
computing the proximity operator of this class of regularizers, under the
assumption that the proximity operator of the function \omega is known in
advance. Our approach builds on a recent line of research on optimal first
order optimization methods and uses fixed point iterations for numerically
computing the proximity operator. It is more general than current approaches
and, as we show with numerical simulations, computationally more efficient than
available first order methods which do not achieve the optimal rate. In
particular, our method outperforms state of the art O(1/T) methods for
overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused
Lasso and tree structured Group Lasso.Comment: 19 pages, 8 figure
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions. In this
paper we present a unified analysis framework for the analysis of quantitative
image data using a Bayesian functional mixed model approach. This framework is
flexible enough to handle complex, irregular images with many local features,
and can model the simultaneous effects of multiple factors on the image
intensities and account for the correlation between images induced by the
design. We introduce a general isomorphic modeling approach to fitting the
functional mixed model, of which the wavelet-based functional mixed model is
one special case. With suitable modeling choices, this approach leads to
efficient calculations and can result in flexible modeling and adaptive
smoothing of the salient features in the data. The proposed method has the
following advantages: it can be run automatically, it produces inferential
plots indicating which regions of the image are associated with each factor, it
simultaneously considers the practical and statistical significance of
findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …