4,757 research outputs found
Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals
Vehicle re-identification is an important problem and has many applications
in video surveillance and intelligent transportation. It gains increasing
attention because of the recent advances of person re-identification
techniques. However, unlike person re-identification, the visual differences
between pairs of vehicle images are usually subtle and even challenging for
humans to distinguish. Incorporating additional spatio-temporal information is
vital for solving the challenging re-identification task. Existing vehicle
re-identification methods ignored or used over-simplified models for the
spatio-temporal relations between vehicle images. In this paper, we propose a
two-stage framework that incorporates complex spatio-temporal information for
effectively regularizing the re-identification results. Given a pair of vehicle
images with their spatio-temporal information, a candidate
visual-spatio-temporal path is first generated by a chain MRF model with a
deeply learned potential function, where each visual-spatio-temporal state
corresponds to an actual vehicle image with its spatio-temporal information. A
Siamese-CNN+Path-LSTM model takes the candidate path as well as the pairwise
queries to generate their similarity score. Extensive experiments and analysis
show the effectiveness of our proposed method and individual components.Comment: To appear in ICCV 201
Spatio-temporal Person Retrieval via Natural Language Queries
In this paper, we address the problem of spatio-temporal person retrieval
from multiple videos using a natural language query, in which we output a tube
(i.e., a sequence of bounding boxes) which encloses the person described by the
query. For this problem, we introduce a novel dataset consisting of videos
containing people annotated with bounding boxes for each second and with five
natural language descriptions. To retrieve the tube of the person described by
a given natural language query, we design a model that combines methods for
spatio-temporal human detection and multimodal retrieval. We conduct
comprehensive experiments to compare a variety of tube and text representations
and multimodal retrieval methods, and present a strong baseline in this task as
well as demonstrate the efficacy of our tube representation and multimodal
feature embedding technique. Finally, we demonstrate the versatility of our
model by applying it to two other important tasks.Comment: Accepted to ICCV201
Probabilistic Semantic Retrieval for Surveillance Videos with Activity Graphs
We present a novel framework for finding complex activities matching
user-described queries in cluttered surveillance videos. The wide diversity of
queries coupled with unavailability of annotated activity data limits our
ability to train activity models. To bridge the semantic gap we propose to let
users describe an activity as a semantic graph with object attributes and
inter-object relationships associated with nodes and edges, respectively. We
learn node/edge-level visual predictors during training and, at test-time,
propose to retrieve activity by identifying likely locations that match the
semantic graph. We formulate a novel CRF based probabilistic activity
localization objective that accounts for mis-detections, mis-classifications
and track-losses, and outputs a likelihood score for a candidate grounded
location of the query in the video. We seek groundings that maximize overall
precision and recall. To handle the combinatorial search over all
high-probability groundings, we propose a highest precision subgraph matching
algorithm. Our method outperforms existing retrieval methods on benchmarked
datasets.Comment: 1520-9210 (c) 2018 IEEE. This paper has been accepted by IEEE
Transactions on Multimedia. Print ISSN: 1520-9210. Online ISSN: 1941-0077.
Preprint link is https://ieeexplore.ieee.org/document/8438958
A Compact Representation for Trips over Networks built on self-indexes
Representing the movements of objects (trips) over a network in a compact way
while retaining the capability of exploiting such data effectively is an
important challenge of real applications. We present a new Compact Trip
Representation (CTR) that handles the spatio-temporal data associated with
users' trips over transportation networks. Depending on the network and types
of queries, nodes in the network can represent intersections, stops, or even
street segments.
CTR represents separately sequences of nodes and the time instants when users
traverse these nodes. The spatial component is handled with a data structure
based on the well-known Compressed Suffix Array (CSA), which provides both a
compact representation and interesting indexing capabilities. The temporal
component is self-indexed with either a Hu-Tucker-shaped Wavelet-tree or a
Wavelet Matrix that solve range-interval queries efficiently. We show how CTR
can solve relevant counting-based spatial, temporal, and spatio-temporal
queries over large sets of trips. Experimental results show the space
requirements (around 50-70% of the space needed by a compact non-indexed
baseline) and query efficiency (most queries are solved in the range of 1-1000
microseconds) of CTR.Comment: 42 page
Deep Siamese Networks with Bayesian non-Parametrics for Video Object Tracking
We present a novel algorithm utilizing a deep Siamese neural network as a
general object similarity function in combination with a Bayesian optimization
(BO) framework to encode spatio-temporal information for efficient object
tracking in video. In particular, we treat the video tracking problem as a
dynamic (i.e. temporally-evolving) optimization problem. Using Gaussian Process
priors, we model a dynamic objective function representing the location of a
tracked object in each frame. By exploiting temporal correlations, the proposed
method queries the search space in a statistically principled and efficient
way, offering several benefits over current state of the art video tracking
methods
ATTENTION: ATTackEr traceback using MAC layer abNormality detecTION
Denial-of-Service (DoS) and Distributed DoS (DDoS) attacks can cause serious
problems in wireless networks due to limited network and host resources.
Attacker traceback is a promising solution to take a proper countermeasure near
the attack origins, to discourage attackers from launching attacks, and for
forensics. However, attacker traceback in Mobile Ad-hoc Networks (MANETs) is a
challenging problem due to the dynamic topology, and limited network resources.
It is especially difficult to trace back attacker(s) when they are moving to
avoid traceback. In this paper, we introduce the ATTENTION protocol framework,
which pays special attention to MAC layer abnormal activity under attack.
ATTENTION consists of three classes, namely, coarse-grained traceback,
fine-grained traceback and spatio-temporal fusion architecture. For
energy-efficient attacker searching in MANETs, we also utilize small-world
model. Our simulation analysis shows 79% of success rate in DoS attacker
traceback with coarse-grained attack signature. In addition, with fine-grained
attack signature, it shows 97% of success rate in DoS attacker traceback and
83% of success rate in DDoS attacker traceback. We also show that ATTENTION has
robustness against node collusion and mobility
Referring to Objects in Videos using Spatio-Temporal Identifying Descriptions
This paper presents a new task, the grounding of spatio-temporal identifying
descriptions in videos. Previous work suggests potential bias in existing
datasets and emphasizes the need for a new data creation schema to better model
linguistic structure. We introduce a new data collection scheme based on
grammatical constraints for surface realization to enable us to investigate the
problem of grounding spatio-temporal identifying descriptions in videos. We
then propose a two-stream modular attention network that learns and grounds
spatio-temporal identifying descriptions based on appearance and motion. We
show that motion modules help to ground motion-related words and also help to
learn in appearance modules because modular neural networks resolve task
interference between modules. Finally, we propose a future challenge and a need
for a robust system arising from replacing ground truth visual annotations with
automatic video object detector and temporal event localization
Semantic-based Anomalous Pattern Discovery in Moving Object Trajectories
In this work, we investigate a novel semantic approach for pattern discovery
in trajectories that, relying on ontologies, enhances object movement
information with event semantics. The approach can be applied to the detection
of movement patterns and behaviors whenever the semantics of events occurring
along the trajectory is, explicitly or implicitly, available. In particular, we
tested it against an exacting case scenario in maritime surveillance, i.e., the
discovery of suspicious container transportations.
The methodology we have developed entails the formalization of the
application domain through a domain ontology, extending the Moving Object
Ontology (MOO) described in this paper. Afterwards, movement patterns have to
be formalized, either as Description Logic (DL) axioms or queries, enabling the
retrieval of the trajectories that follow the patterns.
In our experimental evaluation, we have considered a real world dataset of 18
Million of container events describing the deed undertaken in a port to
accomplish the shipping (e.g., loading on a vessel, export operation).
Leveraging events, we have reconstructed almost 300 thousand container
trajectories referring to 50 thousand containers travelling along three years.
We have formalized the anomalous itinerary patterns as DL axioms, testing
different ontology APIs and DL reasoners to retrieve the suspicious
transportations.
Our experiments demonstrate that the approach is feasible and efficient. In
particular, the joint use of Pellet and SPARQL-DL enables to detect the
trajectories following a given pattern in a reasonable time with big size
datasets
Context-Aware Query Selection for Active Learning in Event Recognition
Activity recognition is a challenging problem with many practical
applications. In addition to the visual features, recent approaches have
benefited from the use of context, e.g., inter-relationships among the
activities and objects. However, these approaches require data to be labeled,
entirely available beforehand, and not designed to be updated continuously,
which make them unsuitable for surveillance applications. In contrast, we
propose a continuous-learning framework for context-aware activity recognition
from unlabeled video, which has two distinct advantages over existing methods.
First, it employs a novel active-learning technique that not only exploits the
informativeness of the individual activities but also utilizes their contextual
information during query selection; this leads to significant reduction in
expensive manual annotation effort. Second, the learned models can be adapted
online as more data is available. We formulate a conditional random field model
that encodes the context and devise an information-theoretic approach that
utilizes entropy and mutual information of the nodes to compute the set of most
informative queries, which are labeled by a human. These labels are combined
with graphical inference techniques for incremental updates. We provide a
theoretical formulation of the active learning framework with an analytic
solution. Experiments on six challenging datasets demonstrate that our
framework achieves superior performance with significantly less manual
labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine
Intelligence (T-PAMI
CADP: A Novel Dataset for CCTV Traffic Camera based Accident Analysis
This paper presents a novel dataset for traffic accidents analysis. Our goal
is to resolve the lack of public data for research about automatic
spatio-temporal annotations for traffic safety in the roads. Through the
analysis of the proposed dataset, we observed a significant degradation of
object detection in pedestrian category in our dataset, due to the object sizes
and complexity of the scenes. To this end, we propose to integrate contextual
information into conventional Faster R-CNN using Context Mining (CM) and
Augmented Context Mining (ACM) to complement the accuracy for small pedestrian
detection. Our experiments indicate a considerable improvement in object
detection accuracy: +8.51% for CM and +6.20% for ACM. Finally, we demonstrate
the performance of accident forecasting in our dataset using Faster R-CNN and
an Accident LSTM architecture. We achieved an average of 1.684 seconds in terms
of Time-To-Accident measure with an Average Precision of 47.25%. Our Webpage
for the paper is https://goo.gl/cqK2wEComment: Accepted at IEEE International Workshop on Traffic and Street
Surveillance for Safety and Security, First three authors contributed
equally, 7 pages + 1 Reference
- …