12,471 research outputs found
Patterns for Learning with Side Information
Supervised, semi-supervised, and unsupervised learning estimate a function
given input/output samples. Generalization of the learned function to unseen
data can be improved by incorporating side information into learning. Side
information are data that are neither from the input space nor from the output
space of the function, but include useful information for learning it. In this
paper we show that learning with side information subsumes a variety of related
approaches, e.g. multi-task learning, multi-view learning and learning using
privileged information. Our main contributions are (i) a new perspective that
connects these previously isolated approaches, (ii) insights about how these
methods incorporate different types of prior knowledge, and hence implement
different patterns, (iii) facilitating the application of these methods in
novel tasks, as well as (iv) a systematic experimental evaluation of these
patterns in two supervised learning tasks.Comment: The first two authors contributed equally to this wor
Robobarista: Learning to Manipulate Novel Objects via Deep Multimodal Embedding
There is a large variety of objects and appliances in human environments,
such as stoves, coffee dispensers, juice extractors, and so on. It is
challenging for a roboticist to program a robot for each of these object types
and for each of their instantiations. In this work, we present a novel approach
to manipulation planning based on the idea that many household objects share
similarly-operated object parts. We formulate the manipulation planning as a
structured prediction problem and learn to transfer manipulation strategy
across different objects by embedding point-cloud, natural language, and
manipulation trajectory data into a shared embedding space using a deep neural
network. In order to learn semantically meaningful spaces throughout our
network, we introduce a method for pre-training its lower layers for multimodal
feature embedding and a method for fine-tuning this embedding space using a
loss-based margin. In order to collect a large number of manipulation
demonstrations for different objects, we develop a new crowd-sourcing platform
called Robobarista. We test our model on our dataset consisting of 116 objects
and appliances with 249 parts along with 250 language instructions, for which
there are 1225 crowd-sourced manipulation demonstrations. We further show that
our robot with our model can even prepare a cup of a latte with appliances it
has never seen before.Comment: Journal Versio
A Neural Network Approach to Joint Modeling Social Networks and Mobile Trajectories
The accelerated growth of mobile trajectories in location-based services
brings valuable data resources to understand users' moving behaviors. Apart
from recording the trajectory data, another major characteristic of these
location-based services is that they also allow the users to connect whomever
they like. A combination of social networking and location-based services is
called as location-based social networks (LBSN). As shown in previous works,
locations that are frequently visited by socially-related persons tend to be
correlated, which indicates the close association between social connections
and trajectory behaviors of users in LBSNs. In order to better analyze and mine
LBSN data, we present a novel neural network model which can joint model both
social networks and mobile trajectories. In specific, our model consists of two
components: the construction of social networks and the generation of mobile
trajectories. We first adopt a network embedding method for the construction of
social networks: a networking representation can be derived for a user. The key
of our model lies in the component of generating mobile trajectories. We have
considered four factors that influence the generation process of mobile
trajectories, namely user visit preference, influence of friends, short-term
sequential contexts and long-term sequential contexts. To characterize the last
two contexts, we employ the RNN and GRU models to capture the sequential
relatedness in mobile trajectories at different levels, i.e., short term or
long term. Finally, the two components are tied by sharing the user network
representations. Experimental results on two important applications demonstrate
the effectiveness of our model. Especially, the improvement over baselines is
more significant when either network structure or trajectory data is sparse.Comment: Accepted by ACM TOI
Clustering of Driving Encounter Scenarios Using Connected Vehicle Trajectories
Multi-vehicle interaction behavior classification and analysis offer in-depth
knowledge to make an efficient decision for autonomous vehicles. This paper
aims to cluster a wide range of driving encounter scenarios based only on
multi-vehicle GPS trajectories. Towards this end, we propose a generic
unsupervised learning framework comprising two layers: feature representation
layer and clustering layer. In the layer of feature representation, we combine
the deep autoencoders with a distance-based measure to map the sequential
observations of driving encounters into a computationally tractable space that
allows quantifying the spatiotemporal interaction characteristics of two
vehicles. The clustering algorithm is then applied to the extracted
representations to gather homogeneous driving encounters into groups. Our
proposed generic framework is then evaluated using 2,568 naturalistic driving
encounters. Experimental results demonstrate that our proposed generic
framework incorporated with unsupervised learning can cluster multi-trajectory
data into distinct groups. These clustering results could benefit
decision-making policy analysis and design for autonomous vehicles.Comment: 12 pages, 11 figure
Plan2Vec: Unsupervised Representation Learning by Latent Plans
In this paper we introduce plan2vec, an unsupervised representation learning
approach that is inspired by reinforcement learning. Plan2vec constructs a
weighted graph on an image dataset using near-neighbor distances, and then
extrapolates this local metric to a global embedding by distilling
path-integral over planned path. When applied to control, plan2vec offers a way
to learn goal-conditioned value estimates that are accurate over long horizons
that is both compute and sample efficient. We demonstrate the effectiveness of
plan2vec on one simulated and two challenging real-world image datasets.
Experimental results show that plan2vec successfully amortizes the planning
cost, enabling reactive planning that is linear in memory and computation
complexity rather than exhaustive over the entire state space.Comment: code available at https://geyang.github.io/plan2ve
Learning a Deep Model for Human Action Recognition from Novel Viewpoints
Recognizing human actions from unknown and unseen (novel) views is a
challenging problem. We propose a Robust Non-Linear Knowledge Transfer Model
(R-NKTM) for human action recognition from novel views. The proposed R-NKTM is
a deep fully-connected neural network that transfers knowledge of human actions
from any unknown view to a shared high-level virtual view by finding a
non-linear virtual path that connects the views. The R-NKTM is learned from
dense trajectories of synthetic 3D human models fitted to real motion capture
data and generalizes to real videos of human actions. The strength of our
technique is that we learn a single R-NKTM for all actions and all viewpoints
for knowledge transfer of any real human action video without the need for
re-training or fine-tuning the model. Thus, R-NKTM can efficiently scale to
incorporate new action classes. R-NKTM is learned with dummy labels and does
not require knowledge of the camera viewpoint at any stage. Experiments on
three benchmark cross-view human action datasets show that our method
outperforms existing state-of-the-art
Beat histogram features for rhythm-based musical genre classification using multiple novelty functions
In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks
Spatial-Temporal Relation Networks for Multi-Object Tracking
Recent progress in multiple object tracking (MOT) has shown that a robust
similarity score is key to the success of trackers. A good similarity score is
expected to reflect multiple cues, e.g. appearance, location, and topology,
over a long period of time. However, these cues are heterogeneous, making them
hard to be combined in a unified network. As a result, existing methods usually
encode them in separate networks or require a complex training approach. In
this paper, we present a unified framework for similarity measurement which
could simultaneously encode various cues and perform reasoning across both
spatial and temporal domains. We also study the feature representation of a
tracklet-object pair in depth, showing a proper design of the pair features can
well empower the trackers. The resulting approach is named spatial-temporal
relation networks (STRN). It runs in a feed-forward way and can be trained in
an end-to-end manner. The state-of-the-art accuracy was achieved on all of the
MOT15-17 benchmarks using public detection and online settings
Artificial Neural Networks Applied to Taxi Destination Prediction
We describe our first-place solution to the ECML/PKDD discovery challenge on
taxi destination prediction. The task consisted in predicting the destination
of a taxi based on the beginning of its trajectory, represented as a
variable-length sequence of GPS points, and diverse associated
meta-information, such as the departure time, the driver id and client
information. Contrary to most published competitor approaches, we used an
almost fully automated approach based on neural networks and we ranked first
out of 381 teams. The architectures we tried use multi-layer perceptrons,
bidirectional recurrent neural networks and models inspired from recently
introduced memory networks. Our approach could easily be adapted to other
applications in which the goal is to predict a fixed-length output from a
variable-length sequence.Comment: ECML/PKDD discovery challeng
Deep Affinity Network for Multiple Object Tracking
Multiple Object Tracking (MOT) plays an important role in solving many
fundamental problems in video analysis in computer vision. Most MOT methods
employ two steps: Object Detection and Data Association. The first step detects
objects of interest in every frame of a video, and the second establishes
correspondence between the detected objects in different frames to obtain their
tracks. Object detection has made tremendous progress in the last few years due
to deep learning. However, data association for tracking still relies on hand
crafted constraints such as appearance, motion, spatial proximity, grouping
etc. to compute affinities between the objects in different frames. In this
paper, we harness the power of deep learning for data association in tracking
by jointly modelling object appearances and their affinities between different
frames in an end-to-end fashion. The proposed Deep Affinity Network (DAN)
learns compact; yet comprehensive features of pre-detected objects at several
levels of abstraction, and performs exhaustive pairing permutations of those
features in any two frames to infer object affinities. DAN also accounts for
multiple objects appearing and disappearing between video frames. We exploit
the resulting efficient affinity computations to associate objects in the
current frame deep into the previous frames for reliable on-line tracking. Our
technique is evaluated on popular multiple object tracking challenges MOT15,
MOT17 and UA-DETRAC. Comprehensive benchmarking under twelve evaluation metrics
demonstrates that our approach is among the best performing techniques on the
leader board for these challenges. The open source implementation of our work
is available at https://github.com/shijieS/SST.git.Comment: To appear in IEEE TPAM
- …