441 research outputs found
Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks
Factorization Machines (FMs) are a supervised learning approach that enhances
the linear regression model by incorporating the second-order feature
interactions. Despite effectiveness, FM can be hindered by its modelling of all
feature interactions with the same weight, as not all feature interactions are
equally useful and predictive. For example, the interactions with useless
features may even introduce noises and adversely degrade the performance. In
this work, we improve FM by discriminating the importance of different feature
interactions. We propose a novel model named Attentional Factorization Machine
(AFM), which learns the importance of each feature interaction from data via a
neural attention network. Extensive experiments on two real-world datasets
demonstrate the effectiveness of AFM. Empirically, it is shown on regression
task AFM betters FM with a relative improvement, and consistently
outperforms the state-of-the-art deep learning methods Wide&Deep and DeepCross
with a much simpler structure and fewer model parameters. Our implementation of
AFM is publicly available at:
https://github.com/hexiangnan/attentional_factorization_machineComment: 7 pages, 5 figure
How to Reuse and Compose Knowledge for a Lifetime of Tasks: A Survey on Continual Learning and Functional Composition
A major goal of artificial intelligence (AI) is to create an agent capable of
acquiring a general understanding of the world. Such an agent would require the
ability to continually accumulate and build upon its knowledge as it encounters
new experiences. Lifelong or continual learning addresses this setting, whereby
an agent faces a continual stream of problems and must strive to capture the
knowledge necessary for solving each new task it encounters. If the agent is
capable of accumulating knowledge in some form of compositional representation,
it could then selectively reuse and combine relevant pieces of knowledge to
construct novel solutions. Despite the intuitive appeal of this simple idea,
the literatures on lifelong learning and compositional learning have proceeded
largely separately. In an effort to promote developments that bridge between
the two fields, this article surveys their respective research landscapes and
discusses existing and future connections between them
Multi-task multiple kernel machines for personalized pain recognition from functional near-infrared spectroscopy brain signals
Currently there is no validated objective measure of pain. Recent
neuroimaging studies have explored the feasibility of using functional
near-infrared spectroscopy (fNIRS) to measure alterations in brain function in
evoked and ongoing pain. In this study, we applied multi-task machine learning
methods to derive a practical algorithm for pain detection derived from fNIRS
signals in healthy volunteers exposed to a painful stimulus. Especially, we
employed multi-task multiple kernel learning to account for the inter-subject
variability in pain response. Our results support the use of fNIRS and machine
learning techniques in developing objective pain detection, and also highlight
the importance of adopting personalized analysis in the process.Comment: International Conference on Pattern Recognition (ICPR
Deep Self-Taught Learning for Handwritten Character Recognition
Recent theoretical and empirical work in statistical machine learning has
demonstrated the importance of learning algorithms for deep architectures,
i.e., function classes obtained by composing multiple non-linear
transformations. Self-taught learning (exploiting unlabeled examples or
examples from other distributions) has already been applied to deep learners,
but mostly to show the advantage of unlabeled examples. Here we explore the
advantage brought by {\em out-of-distribution examples}. For this purpose we
developed a powerful generator of stochastic variations and noise processes for
character images, including not only affine transformations but also slant,
local elastic deformations, changes in thickness, background images, grey level
changes, contrast, occlusion, and various types of noise. The
out-of-distribution examples are obtained from these highly distorted images or
by including examples of object classes different from those in the target test
set. We show that {\em deep learners benefit more from out-of-distribution
examples than a corresponding shallow learner}, at least in the area of
handwritten character recognition. In fact, we show that they beat previously
published results and reach human-level performance on both handwritten digit
classification and 62-class handwritten character recognition
An Introduction to Lifelong Supervised Learning
This primer is an attempt to provide a detailed summary of the different
facets of lifelong learning. We start with Chapter 2 which provides a
high-level overview of lifelong learning systems. In this chapter, we discuss
prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction
a high-level organization of different lifelong learning approaches (Section
2.5), enumerate the desiderata for an ideal lifelong learning system (Section
2.6), discuss how lifelong learning is related to other learning paradigms
(Section 2.7), describe common metrics used to evaluate lifelong learning
systems (Section 2.8). This chapter is more useful for readers who are new to
lifelong learning and want to get introduced to the field without focusing on
specific approaches or benchmarks. The remaining chapters focus on specific
aspects (either learning algorithms or benchmarks) and are more useful for
readers who are looking for specific approaches or benchmarks. Chapter 3
focuses on regularization-based approaches that do not assume access to any
data from previous tasks. Chapter 4 discusses memory-based approaches that
typically use a replay buffer or an episodic memory to save subset of data
across different tasks. Chapter 5 focuses on different architecture families
(and their instantiations) that have been proposed for training lifelong
learning systems. Following these different classes of learning algorithms, we
discuss the commonly used evaluation benchmarks and metrics for lifelong
learning (Chapter 6) and wrap up with a discussion of future challenges and
important research directions in Chapter 7.Comment: Lifelong Learning Prime
Subspace regularizers for few-shot class incremental learning
Few-shot class incremental learning---the problem of updating a trained classifier to discriminate among an expanded set of classes with limited labeled data---is a key challenge for machine learning systems deployed in non-stationary environments. Existing approaches to the problem rely on complex model architectures and training procedures that are difficult to tune and re-use. In this paper, we present an extremely simple approach that enables the use of ordinary logistic regression classifiers for few-shot incremental learning. The key to this approach is a new family of subspace regularization schemes that encourage weight vectors for new classes to lie close to the subspace spanned by the weights of existing classes. When combined with pretrained convolutional feature extractors, logistic regression models trained with subspace regularization outperform specialized, state-of-the-art approaches to few-shot incremental image classification by up to 23% on the miniImageNet dataset. Because of its simplicity, subspace regularization can be straightforwardly configured to incorporate additional background information about the new classes (including class names and descriptions specified in natural language); this offers additional control over the trade-off between existing and new classes. Our results show that simple geometric regularization of class representations offers an effective tool for continual learning.000000000000000000000000000000000000000000000000000000010241 - University of California, Berkeleyhttps://openreview.net/forum?id=boJy41J-tnQFirst author draf
- …