5,856 research outputs found
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation
As opposed to manual feature engineering which is tedious and difficult to
scale, network representation learning has attracted a surge of research
interests as it automates the process of feature learning on graphs. The
learned low-dimensional node vector representation is generalizable and eases
the knowledge discovery process on graphs by enabling various off-the-shelf
machine learning tools to be directly applied. Recent research has shown that
the past decade of network embedding approaches either explicitly factorize a
carefully designed matrix to obtain the low-dimensional node vector
representation or are closely related to implicit matrix factorization, with
the fundamental assumption that the factorized node connectivity matrix is
low-rank. Nonetheless, the global low-rank assumption does not necessarily hold
especially when the factorized matrix encodes complex node interactions, and
the resultant single low-rank embedding matrix is insufficient to capture all
the observed connectivity patterns. In this regard, we propose a novel
multi-level network embedding framework BoostNE, which can learn multiple
network embedding representations of different granularity from coarse to fine
without imposing the prevalent global low-rank assumption. The proposed BoostNE
method is also in line with the successful gradient boosting method in ensemble
learning as multiple weak embeddings lead to a stronger and more effective one.
We assess the effectiveness of the proposed BoostNE framework by comparing it
with existing state-of-the-art network embedding methods on various datasets,
and the experimental results corroborate the superiority of the proposed
BoostNE network embedding framework
Online Machine Learning in Big Data Streams
The area of online machine learning in big data streams covers algorithms
that are (1) distributed and (2) work from data streams with only a limited
possibility to store past data. The first requirement mostly concerns software
architectures and efficient algorithms. The second one also imposes nontrivial
theoretical restrictions on the modeling methods: In the data stream model,
older data is no longer available to revise earlier suboptimal modeling
decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures
and libraries as well as machine learning models for online learning. We
highlight the most important ideas for classification, regression,
recommendation, and unsupervised modeling from streaming data, and we show how
they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to
be comprehensive in describing all existing methods and solutions; rather, we
give pointers to the most important resources in the field. All related
sub-fields, online algorithms, online learning, and distributed data processing
are hugely dominant in current research and development with conceptually new
research results and software components emerging at the time of writing. In
this article, we refer to several survey results, both for distributed data
processing and for online machine learning. Compared to past surveys, our
article is different because we discuss recommender systems in extended detail
Factorization tricks for LSTM networks
We present two simple ways of reducing the number of parameters and
accelerating the training of large Long Short-Term Memory (LSTM) networks: the
first one is "matrix factorization by design" of LSTM matrix into the product
of two smaller matrices, and the second one is partitioning of LSTM matrix, its
inputs and states into the independent groups. Both approaches allow us to
train large LSTM networks significantly faster to the near state-of the art
perplexity while using significantly less RNN parameters.Comment: accepted to ICLR 2017 Worksho
Topic Compositional Neural Language Model
We propose a Topic Compositional Neural Language Model (TCNLM), a novel
method designed to simultaneously capture both the global semantic meaning and
the local word ordering structure in a document. The TCNLM learns the global
semantic coherence of a document via a neural topic model, and the probability
of each learned latent topic is further used to build a Mixture-of-Experts
(MoE) language model, where each expert (corresponding to one topic) is a
recurrent neural network (RNN) that accounts for learning the local structure
of a word sequence. In order to train the MoE model efficiently, a matrix
factorization method is applied, by extending each weight matrix of the RNN to
be an ensemble of topic-dependent weight matrices. The degree to which each
member of the ensemble is used is tied to the document-dependent probability of
the corresponding topics. Experimental results on several corpora show that the
proposed approach outperforms both a pure RNN-based model and other
topic-guided language models. Further, our model yields sensible topics, and
also has the capacity to generate meaningful sentences conditioned on given
topics.Comment: To appear in AISTATS 2018, updated versio
Latent Variable Modeling with Diversity-Inducing Mutual Angular Regularization
Latent Variable Models (LVMs) are a large family of machine learning models
providing a principled and effective way to extract underlying patterns,
structure and knowledge from observed data. Due to the dramatic growth of
volume and complexity of data, several new challenges have emerged and cannot
be effectively addressed by existing LVMs: (1) How to capture long-tail
patterns that carry crucial information when the popularity of patterns is
distributed in a power-law fashion? (2) How to reduce model complexity and
computational cost without compromising the modeling power of LVMs? (3) How to
improve the interpretability and reduce the redundancy of discovered patterns?
To addresses the three challenges discussed above, we develop a novel
regularization technique for LVMs, which controls the geometry of the latent
space during learning to enable the learned latent components of LVMs to be
diverse in the sense that they are favored to be mutually different from each
other, to accomplish long-tail coverage, low redundancy, and better
interpretability. We propose a mutual angular regularizer (MAR) to encourage
the components in LVMs to have larger mutual angles. The MAR is non-convex and
non-smooth, entailing great challenges for optimization. To cope with this
issue, we derive a smooth lower bound of the MAR and optimize the lower bound
instead. We show that the monotonicity of the lower bound is closely aligned
with the MAR to qualify the lower bound as a desirable surrogate of the MAR.
Using neural network (NN) as an instance, we analyze how the MAR affects the
generalization performance of NN. On two popular latent variable models ---
restricted Boltzmann machine and distance metric learning, we demonstrate that
MAR can effectively capture long-tail patterns, reduce model complexity without
sacrificing expressivity and improve interpretability
Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization
We present a hybrid method for latent information discovery on the data sets
containing both text content and connection structure based on constrained low
rank approximation. The new method jointly optimizes the Nonnegative Matrix
Factorization (NMF) objective function for text clustering and the Symmetric
NMF (SymNMF) objective function for graph clustering. We propose an effective
algorithm for the joint NMF objective function, based on a block coordinate
descent (BCD) framework. The proposed hybrid method discovers content
associations via latent connections found using SymNMF. The method can also be
applied with a natural conversion of the problem when a hypergraph formulation
is used or the content is associated with hypergraph edges.
Experimental results show that by simultaneously utilizing both content and
connection structure, our hybrid method produces higher quality clustering
results compared to the other NMF clustering methods that uses content alone
(standard NMF) or connection structure alone (SymNMF). We also present some
interesting applications to several types of real world data such as citation
recommendations of papers. The hybrid method proposed in this paper can also be
applied to general data expressed with both feature space vectors and pairwise
similarities and can be extended to the case with multiple feature spaces or
multiple similarity measures.Comment: 9 pages, Submitted to a conference, Feb. 201
Sparse convolutional coding for neuronal ensemble identification
Cell ensembles, originally proposed by Donald Hebb in 1949, are subsets of
synchronously firing neurons and proposed to explain basic firing behavior in
the brain. Despite having been studied for many years no conclusive evidence
has been presented yet for their existence and involvement in information
processing such that their identification is still a topic of modern research,
especially since simultaneous recordings of large neuronal population have
become possible in the past three decades. These large recordings pose a
challenge for methods allowing to identify individual neurons forming cell
ensembles and their time course of activity inside the vast amounts of spikes
recorded. Related work so far focused on the identification of purely simulta-
neously firing neurons using techniques such as Principal Component Analysis.
In this paper we propose a new algorithm based on sparse convolution coding
which is also able to find ensembles with temporal structure. Application of
our algorithm to synthetically generated datasets shows that it outperforms
previous work and is able to accurately identify temporal cell ensembles even
when those contain overlapping neurons or when strong background noise is
present.Comment: 12 pages, 6 figure
Image Retrieval using Histogram Factorization and Contextual Similarity Learning
Image retrieval has been a top topic in the field of both computer vision and
machine learning for a long time. Content based image retrieval, which tries to
retrieve images from a database visually similar to a query image, has
attracted much attention. Two most important issues of image retrieval are the
representation and ranking of the images. Recently, bag-of-words based method
has shown its power as a representation method. Moreover, nonnegative matrix
factorization is also a popular way to represent the data samples. In addition,
contextual similarity learning has also been studied and proven to be an
effective method for the ranking problem. However, these technologies have
never been used together. In this paper, we developed an effective image
retrieval system by representing each image using the bag-of-words method as
histograms, and then apply the nonnegative matrix factorization to factorize
the histograms, and finally learn the ranking score using the contextual
similarity learning method. The proposed novel system is evaluated on a large
scale image database and the effectiveness is shown.Comment: 4 page
Divide-and-Conquer Learning by Anchoring a Conical Hull
We reduce a broad class of machine learning problems, usually addressed by EM
or sampling, to the problem of finding the extremal rays spanning the
conical hull of a data point set. These "anchors" lead to a global solution
and a more interpretable model that can even outperform EM and sampling on
generalization error. To find the anchors, we propose a novel
divide-and-conquer learning scheme "DCA" that distributes the problem to
same-type sub-problems on different low-D random
hyperplanes, each can be solved by any solver. For the 2D sub-problem, we
present a non-iterative solver that only needs to compute an array of cosine
values and its max/min entries. DCA also provides a faster subroutine for other
methods to check whether a point is covered in a conical hull, which improves
algorithm design in multiple dimensions and brings significant speedup to
learning. We apply our method to GMM, HMM, LDA, NMF and subspace clustering,
then show its competitive performance and scalability over other methods on
rich datasets.Comment: 26 pages, long version, in updatin
Collaborative Recommendation with Auxiliary Data: A Transfer Learning View
Intelligent recommendation technology has been playing an increasingly
important role in various industry applications such as e-commerce product
promotion and Internet advertisement display. Besides users' feedbacks (e.g.,
numerical ratings) on items as usually exploited by some typical recommendation
algorithms, there are often some additional data such as users' social circles
and other behaviors. Such auxiliary data are usually related to users'
preferences on items behind the numerical ratings. Collaborative recommendation
with auxiliary data (CRAD) aims to leverage such additional information so as
to improve the personalization services, which have received much attention
from both researchers and practitioners.
Transfer learning (TL) is proposed to extract and transfer knowledge from
some auxiliary data in order to assist the learning task on some target data.
In this paper, we consider the CRAD problem from a transfer learning view,
especially on how to achieve knowledge transfer from some auxiliary data.
First, we give a formal definition of transfer learning for CRAD (TL-CRAD).
Second, we extend the existing categorization of TL techniques (i.e., adaptive,
collective and integrative knowledge transfer algorithm styles) with three
knowledge transfer strategies (i.e., prediction rule, regularization and
constraint). Third, we propose a novel generic knowledge transfer framework for
TL-CRAD. Fourth, we describe some representative works of each specific
knowledge transfer strategy of each algorithm style in detail, which are
expected to inspire further works. Finally, we conclude the paper with some
summary discussions and several future directions
- …