4,575 research outputs found
Online Machine Learning in Big Data Streams
The area of online machine learning in big data streams covers algorithms
that are (1) distributed and (2) work from data streams with only a limited
possibility to store past data. The first requirement mostly concerns software
architectures and efficient algorithms. The second one also imposes nontrivial
theoretical restrictions on the modeling methods: In the data stream model,
older data is no longer available to revise earlier suboptimal modeling
decisions as the fresh data arrives.
In this article, we provide an overview of distributed software architectures
and libraries as well as machine learning models for online learning. We
highlight the most important ideas for classification, regression,
recommendation, and unsupervised modeling from streaming data, and we show how
they are implemented in various distributed data stream processing systems.
This article is a reference material and not a survey. We do not attempt to
be comprehensive in describing all existing methods and solutions; rather, we
give pointers to the most important resources in the field. All related
sub-fields, online algorithms, online learning, and distributed data processing
are hugely dominant in current research and development with conceptually new
research results and software components emerging at the time of writing. In
this article, we refer to several survey results, both for distributed data
processing and for online machine learning. Compared to past surveys, our
article is different because we discuss recommender systems in extended detail
RELEAF: An Algorithm for Learning and Exploiting Relevance
Recommender systems, medical diagnosis, network security, etc., require
on-going learning and decision-making in real time. These -- and many others --
represent perfect examples of the opportunities and difficulties presented by
Big Data: the available information often arrives from a variety of sources and
has diverse features so that learning from all the sources may be valuable but
integrating what is learned is subject to the curse of dimensionality. This
paper develops and analyzes algorithms that allow efficient learning and
decision-making while avoiding the curse of dimensionality. We formalize the
information available to the learner/decision-maker at a particular time as a
context vector which the learner should consider when taking actions. In
general the context vector is very high dimensional, but in many settings, the
most relevant information is embedded into only a few relevant dimensions. If
these relevant dimensions were known in advance, the problem would be simple --
but they are not. Moreover, the relevant dimensions may be different for
different actions. Our algorithm learns the relevant dimensions for each
action, and makes decisions based in what it has learned. Formally, we build on
the structure of a contextual multi-armed bandit by adding and exploiting a
relevance relation. We prove a general regret bound for our algorithm whose
time order depends only on the maximum number of relevant dimensions among all
the actions, which in the special case where the relevance relation is
single-valued (a function), reduces to ; in the
absence of a relevance relation, the best known contextual bandit algorithms
achieve regret , where is the full dimension of
the context vector.Comment: to appear in IEEE Journal of Selected Topics in Signal Processing,
201
Locally Non-linear Embeddings for Extreme Multi-label Learning
The objective in extreme multi-label learning is to train a classifier that
can automatically tag a novel data point with the most relevant subset of
labels from an extremely large label set. Embedding based approaches make
training and prediction tractable by assuming that the training label matrix is
low-rank and hence the effective number of labels can be reduced by projecting
the high dimensional label vectors onto a low dimensional linear subspace.
Still, leading embedding approaches have been unable to deliver high prediction
accuracies or scale to large problems as the low rank assumption is violated in
most real world applications.
This paper develops the X-One classifier to address both limitations. The
main technical contribution in X-One is a formulation for learning a small
ensemble of local distance preserving embeddings which can accurately predict
infrequently occurring (tail) labels. This allows X-One to break free of the
traditional low-rank assumption and boost classification accuracy by learning
embeddings which preserve pairwise distances between only the nearest label
vectors.
We conducted extensive experiments on several real-world as well as benchmark
data sets and compared our method against state-of-the-art methods for extreme
multi-label classification. Experiments reveal that X-One can make
significantly more accurate predictions then the state-of-the-art methods
including both embeddings (by as much as 35%) as well as trees (by as much as
6%). X-One can also scale efficiently to data sets with a million labels which
are beyond the pale of leading embedding methods
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation
As opposed to manual feature engineering which is tedious and difficult to
scale, network representation learning has attracted a surge of research
interests as it automates the process of feature learning on graphs. The
learned low-dimensional node vector representation is generalizable and eases
the knowledge discovery process on graphs by enabling various off-the-shelf
machine learning tools to be directly applied. Recent research has shown that
the past decade of network embedding approaches either explicitly factorize a
carefully designed matrix to obtain the low-dimensional node vector
representation or are closely related to implicit matrix factorization, with
the fundamental assumption that the factorized node connectivity matrix is
low-rank. Nonetheless, the global low-rank assumption does not necessarily hold
especially when the factorized matrix encodes complex node interactions, and
the resultant single low-rank embedding matrix is insufficient to capture all
the observed connectivity patterns. In this regard, we propose a novel
multi-level network embedding framework BoostNE, which can learn multiple
network embedding representations of different granularity from coarse to fine
without imposing the prevalent global low-rank assumption. The proposed BoostNE
method is also in line with the successful gradient boosting method in ensemble
learning as multiple weak embeddings lead to a stronger and more effective one.
We assess the effectiveness of the proposed BoostNE framework by comparing it
with existing state-of-the-art network embedding methods on various datasets,
and the experimental results corroborate the superiority of the proposed
BoostNE network embedding framework
Constrained Multi-Slot Optimization for Ranking Recommendations
Ranking items to be recommended to users is one of the main problems in large
scale social media applications. This problem can be set up as a
multi-objective optimization problem to allow for trading off multiple,
potentially conflicting objectives (that are driven by those items) against
each other. Most previous approaches to this problem optimize for a single slot
without considering the interaction effect of these items on one another.
In this paper, we develop a constrained multi-slot optimization formulation,
which allows for modeling interactions among the items on the different slots.
We characterize the solution in terms of problem parameters and identify
conditions under which an efficient solution is possible. The problem
formulation results in a quadratically constrained quadratic program (QCQP). We
provide an algorithm that gives us an efficient solution by relaxing the
constraints of the QCQP minimally. Through simulated experiments, we show the
benefits of modeling interactions in a multi-slot ranking context, and the
speed and accuracy of our QCQP approximate solver against other state of the
art methods.Comment: 12 Pages, 6 figure
Low-rank Tensor Bandits
In recent years, multi-dimensional online decision making has been playing a
crucial role in many practical applications such as online recommendation and
digital marketing. To solve it, we introduce stochastic low-rank tensor
bandits, a class of bandits whose mean rewards can be represented as a low-rank
tensor. We propose two learning algorithms, tensor epoch-greedy and tensor
elimination, and develop finite-time regret bounds for them. We observe that
tensor elimination has an optimal dependency on the time horizon, while tensor
epoch-greedy has a sharper dependency on tensor dimensions. Numerical
experiments further back up these theoretical findings and show that our
algorithms outperform various state-of-the-art approaches that ignore the
tensor low-rank structure
Harnessing Structures in Big Data via Guaranteed Low-Rank Matrix Estimation
Low-rank modeling plays a pivotal role in signal processing and machine
learning, with applications ranging from collaborative filtering, video
surveillance, medical imaging, to dimensionality reduction and adaptive
filtering. Many modern high-dimensional data and interactions thereof can be
modeled as lying approximately in a low-dimensional subspace or manifold,
possibly with additional structures, and its proper exploitations lead to
significant reduction of costs in sensing, computation and storage. In recent
years, there is a plethora of progress in understanding how to exploit low-rank
structures using computationally efficient procedures in a provable manner,
including both convex and nonconvex approaches. On one side, convex relaxations
such as nuclear norm minimization often lead to statistically optimal
procedures for estimating low-rank matrices, where first-order methods are
developed to address the computational challenges; on the other side, there is
emerging evidence that properly designed nonconvex procedures, such as
projected gradient descent, often provide globally optimal solutions with a
much lower computational cost in many problems. This survey article will
provide a unified overview of these recent advances on low-rank matrix
estimation from incomplete measurements. Attention is paid to rigorous
characterization of the performance of these algorithms, and to problems where
the low-rank matrix have additional structural properties that require new
algorithmic designs and theoretical analysis.Comment: To appear in IEEE Signal Processing Magazin
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech
recognition, computer vision and natural language processing. However, the
exploration of deep neural networks on recommender systems has received
relatively less scrutiny. In this work, we strive to develop techniques based
on neural networks to tackle the key problem in recommendation -- collaborative
filtering -- on the basis of implicit feedback. Although some recent work has
employed deep learning for recommendation, they primarily used it to model
auxiliary information, such as textual descriptions of items and acoustic
features of musics. When it comes to model the key factor in collaborative
filtering -- the interaction between user and item features, they still
resorted to matrix factorization and applied an inner product on the latent
features of users and items. By replacing the inner product with a neural
architecture that can learn an arbitrary function from data, we present a
general framework named NCF, short for Neural network-based Collaborative
Filtering. NCF is generic and can express and generalize matrix factorization
under its framework. To supercharge NCF modelling with non-linearities, we
propose to leverage a multi-layer perceptron to learn the user-item interaction
function. Extensive experiments on two real-world datasets show significant
improvements of our proposed NCF framework over the state-of-the-art methods.
Empirical evidence shows that using deeper layers of neural networks offers
better recommendation performance.Comment: 10 pages, 7 figure
Stratified and Time-aware Sampling based Adaptive Ensemble Learning for Streaming Recommendations
Recommender systems have played an increasingly important role in providing
users with tailored suggestions based on their preferences. However, the
conventional offline recommender systems cannot handle the ubiquitous data
stream well. To address this issue, Streaming Recommender Systems (SRSs) have
emerged in recent years, which incrementally train recommendation models on
newly received data for effective real-time recommendations. Focusing on new
data only benefits addressing concept drift, i.e., the changing user
preferences towards items. However, it impedes capturing long-term user
preferences. In addition, the commonly existing underload and overload problems
should be well tackled for higher accuracy of streaming recommendations. To
address these problems, we propose a Stratified and Time-aware Sampling based
Adaptive Ensemble Learning framework, called STS-AEL, to improve the accuracy
of streaming recommendations. In STS-AEL, we first devise stratified and
time-aware sampling to extract representative data from both new data and
historical data to address concept drift while capturing long-term user
preferences. Also, incorporating the historical data benefits utilizing the
idle resources in the underload scenario more effectively. After that, we
propose adaptive ensemble learning to efficiently process the overloaded data
in parallel with multiple individual recommendation models, and then
effectively fuse the results of these models with a sequential adaptive
mechanism. Extensive experiments conducted on three real-world datasets
demonstrate that STS-AEL, in all the cases, significantly outperforms the
state-of-the-art SRSs.Comment: This paper was accepted by Applied Intelligence in July 202
Progresses and Challenges in Link Prediction
Link prediction is a paradigmatic problem in network science, which aims at
estimating the existence likelihoods of nonobserved links, based on known
topology. After a brief introduction of the standard problem and metrics of
link prediction, this Perspective will summarize representative progresses
about local similarity indices, link predictability, network embedding, matrix
completion, ensemble learning and others, mainly extracted from thousands of
related publications in the last decade. Finally, this Perspective will outline
some long-standing challenges for future studies.Comment: 45 pages, 1 tabl
- …