16,372 research outputs found
A survey on trajectory clustering analysis
This paper comprehensively surveys the development of trajectory clustering.
Considering the critical role of trajectory data mining in modern intelligent
systems for surveillance security, abnormal behavior detection, crowd behavior
analysis, and traffic control, trajectory clustering has attracted growing
attention. Existing trajectory clustering methods can be grouped into three
categories: unsupervised, supervised and semi-supervised algorithms. In spite
of achieving a certain level of development, trajectory clustering is limited
in its success by complex conditions such as application scenarios and data
dimensions. This paper provides a holistic understanding and deep insight into
trajectory clustering, and presents a comprehensive analysis of representative
methods and promising future directions
Deep Trajectory for Recognition of Human Behaviours
Identifying human actions in complex scenes is widely considered as a
challenging research problem due to the unpredictable behaviors and variation
of appearances and postures. For extracting variations in motion and postures,
trajectories provide meaningful way. However, simple trajectories are normally
represented by vector of spatial coordinates. In order to identify human
actions, we must exploit structural relationship between different
trajectories. In this paper, we propose a method that divides the video into N
number of segments and then for each segment we extract trajectories. We then
compute trajectory descriptor for each segment which capture the structural
relationship among different trajectories in the video segment. For trajectory
descriptor, we project all extracted trajectories on the canvas. This will
result in texture image which can store the relative motion and structural
relationship among the trajectories. We then train Convolution Neural Network
(CNN) to capture and learn the representation from dense trajectories. .
Experimental results shows that our proposed method out performs state of the
art methods by 90.01% on benchmark data set
Rapid online learning and robust recall in a neuromorphic olfactory circuit
We present a neural algorithm for the rapid online learning and
identification of odorant samples under noise, based on the architecture of the
mammalian olfactory bulb and implemented on the Intel Loihi neuromorphic
system. As with biological olfaction, the spike timing-based algorithm utilizes
distributed, event-driven computations and rapid (one-shot) online learning.
Spike timing-dependent plasticity rules operate iteratively over sequential
gamma-frequency packets to construct odor representations from the activity of
chemosensor arrays mounted in a wind tunnel. Learned odorants then are reliably
identified despite strong destructive interference. Noise resistance is further
enhanced by neuromodulation and contextual priming. Lifelong learning
capabilities are enabled by adult neurogenesis. The algorithm is applicable to
any signal identification problem in which high-dimensional signals are
embedded in unknown backgrounds.Comment: 52 text pages; 8 figures. Version 3 includes a new figure and
additional detail
A Neural Network Approach to Joint Modeling Social Networks and Mobile Trajectories
The accelerated growth of mobile trajectories in location-based services
brings valuable data resources to understand users' moving behaviors. Apart
from recording the trajectory data, another major characteristic of these
location-based services is that they also allow the users to connect whomever
they like. A combination of social networking and location-based services is
called as location-based social networks (LBSN). As shown in previous works,
locations that are frequently visited by socially-related persons tend to be
correlated, which indicates the close association between social connections
and trajectory behaviors of users in LBSNs. In order to better analyze and mine
LBSN data, we present a novel neural network model which can joint model both
social networks and mobile trajectories. In specific, our model consists of two
components: the construction of social networks and the generation of mobile
trajectories. We first adopt a network embedding method for the construction of
social networks: a networking representation can be derived for a user. The key
of our model lies in the component of generating mobile trajectories. We have
considered four factors that influence the generation process of mobile
trajectories, namely user visit preference, influence of friends, short-term
sequential contexts and long-term sequential contexts. To characterize the last
two contexts, we employ the RNN and GRU models to capture the sequential
relatedness in mobile trajectories at different levels, i.e., short term or
long term. Finally, the two components are tied by sharing the user network
representations. Experimental results on two important applications demonstrate
the effectiveness of our model. Especially, the improvement over baselines is
more significant when either network structure or trajectory data is sparse.Comment: Accepted by ACM TOI
Robobarista: Learning to Manipulate Novel Objects via Deep Multimodal Embedding
There is a large variety of objects and appliances in human environments,
such as stoves, coffee dispensers, juice extractors, and so on. It is
challenging for a roboticist to program a robot for each of these object types
and for each of their instantiations. In this work, we present a novel approach
to manipulation planning based on the idea that many household objects share
similarly-operated object parts. We formulate the manipulation planning as a
structured prediction problem and learn to transfer manipulation strategy
across different objects by embedding point-cloud, natural language, and
manipulation trajectory data into a shared embedding space using a deep neural
network. In order to learn semantically meaningful spaces throughout our
network, we introduce a method for pre-training its lower layers for multimodal
feature embedding and a method for fine-tuning this embedding space using a
loss-based margin. In order to collect a large number of manipulation
demonstrations for different objects, we develop a new crowd-sourcing platform
called Robobarista. We test our model on our dataset consisting of 116 objects
and appliances with 249 parts along with 250 language instructions, for which
there are 1225 crowd-sourced manipulation demonstrations. We further show that
our robot with our model can even prepare a cup of a latte with appliances it
has never seen before.Comment: Journal Versio
Robobarista: Object Part based Transfer of Manipulation Trajectories from Crowd-sourcing in 3D Pointclouds
There is a large variety of objects and appliances in human environments,
such as stoves, coffee dispensers, juice extractors, and so on. It is
challenging for a roboticist to program a robot for each of these object types
and for each of their instantiations. In this work, we present a novel approach
to manipulation planning based on the idea that many household objects share
similarly-operated object parts. We formulate the manipulation planning as a
structured prediction problem and design a deep learning model that can handle
large noise in the manipulation demonstrations and learns features from three
different modalities: point-clouds, language and trajectory. In order to
collect a large number of manipulation demonstrations for different objects, we
developed a new crowd-sourcing platform called Robobarista. We test our model
on our dataset consisting of 116 objects with 249 parts along with 250 language
instructions, for which there are 1225 crowd-sourced manipulation
demonstrations. We further show that our robot can even manipulate objects it
has never seen before.Comment: In International Symposium on Robotics Research (ISRR) 201
Human Action Recognition and Prediction: A Survey
Derived from rapid advances in computer vision and machine learning, video
analysis tasks have been moving from inferring the present state to predicting
the future state. Vision-based action recognition and prediction from videos
are such tasks, where action recognition is to infer human actions (present
state) based upon complete action executions, and action prediction to predict
human actions (future state) based upon incomplete action executions. These two
tasks have become particularly prevalent topics recently because of their
explosively emerging real-world applications, such as visual surveillance,
autonomous driving vehicle, entertainment, and video retrieval, etc. Many
attempts have been devoted in the last a few decades in order to build a robust
and effective framework for action recognition and prediction. In this paper,
we survey the complete state-of-the-art techniques in the action recognition
and prediction. Existing models, popular algorithms, technical difficulties,
popular action databases, evaluation protocols, and promising future directions
are also provided with systematic discussions
Learning a Deep Model for Human Action Recognition from Novel Viewpoints
Recognizing human actions from unknown and unseen (novel) views is a
challenging problem. We propose a Robust Non-Linear Knowledge Transfer Model
(R-NKTM) for human action recognition from novel views. The proposed R-NKTM is
a deep fully-connected neural network that transfers knowledge of human actions
from any unknown view to a shared high-level virtual view by finding a
non-linear virtual path that connects the views. The R-NKTM is learned from
dense trajectories of synthetic 3D human models fitted to real motion capture
data and generalizes to real videos of human actions. The strength of our
technique is that we learn a single R-NKTM for all actions and all viewpoints
for knowledge transfer of any real human action video without the need for
re-training or fine-tuning the model. Thus, R-NKTM can efficiently scale to
incorporate new action classes. R-NKTM is learned with dummy labels and does
not require knowledge of the camera viewpoint at any stage. Experiments on
three benchmark cross-view human action datasets show that our method
outperforms existing state-of-the-art
Temporal Cycle-Consistency Learning
We introduce a self-supervised representation learning method based on the
task of temporal alignment between videos. The method trains a network using
temporal cycle consistency (TCC), a differentiable cycle-consistency loss that
can be used to find correspondences across time in multiple videos. The
resulting per-frame embeddings can be used to align videos by simply matching
frames using the nearest-neighbors in the learned embedding space.
To evaluate the power of the embeddings, we densely label the Pouring and
Penn Action video datasets for action phases. We show that (i) the learned
embeddings enable few-shot classification of these action phases, significantly
reducing the supervised training requirements; and (ii) TCC is complementary to
other methods of self-supervised learning in videos, such as Shuffle and Learn
and Time-Contrastive Networks. The embeddings are also used for a number of
applications based on alignment (dense temporal correspondence) between video
pairs, including transfer of metadata of synchronized modalities between videos
(sounds, temporal semantic labels), synchronized playback of multiple videos,
and anomaly detection. Project webpage:
https://sites.google.com/view/temporal-cycle-consistency .Comment: Accepted at CVPR 2019. Project webpage:
https://sites.google.com/view/temporal-cycle-consistenc
Artificial Neural Networks Applied to Taxi Destination Prediction
We describe our first-place solution to the ECML/PKDD discovery challenge on
taxi destination prediction. The task consisted in predicting the destination
of a taxi based on the beginning of its trajectory, represented as a
variable-length sequence of GPS points, and diverse associated
meta-information, such as the departure time, the driver id and client
information. Contrary to most published competitor approaches, we used an
almost fully automated approach based on neural networks and we ranked first
out of 381 teams. The architectures we tried use multi-layer perceptrons,
bidirectional recurrent neural networks and models inspired from recently
introduced memory networks. Our approach could easily be adapted to other
applications in which the goal is to predict a fixed-length output from a
variable-length sequence.Comment: ECML/PKDD discovery challeng
- …