45 research outputs found
Vehicle Trajectories from Unlabeled Data through Iterative Plane Registration
One of the most complex aspects of autonomous driving concerns understanding the surrounding environment. In particular, the interest falls on detecting which agents are populating it and how they are moving. The capacity to predict how these may act in the near future would allow an autonomous vehicle to safely plan its trajectory, minimizing the risks for itself and others. In this work we propose an automatic trajectory annotation method exploiting an Iterative Plane Registration algorithm based on homographies and semantic segmentations. The output of our technique is a set of holistic trajectories (past-present-future) paired with a single image context, useful to train a predictive model
"Forget" the Forget Gate: Estimating Anomalies in Videos using Self-contained Long Short-Term Memory Networks
Abnormal event detection is a challenging task that requires effectively
handling intricate features of appearance and motion. In this paper, we present
an approach of detecting anomalies in videos by learning a novel LSTM based
self-contained network on normal dense optical flow. Due to their sigmoid
implementations, standard LSTM's forget gate is susceptible to overlooking and
dismissing relevant content in long sequence tasks like abnormality detection.
The forget gate mitigates participation of previous hidden state for
computation of cell state prioritizing current input. In addition, the
hyperbolic tangent activation of standard LSTMs sacrifices performance when a
network gets deeper. To tackle these two limitations, we introduce a bi-gated,
light LSTM cell by discarding the forget gate and introducing sigmoid
activation. Specifically, the LSTM architecture we come up with fully sustains
content from previous hidden state thereby enabling the trained model to be
robust and make context-independent decision during evaluation. Removing the
forget gate results in a simplified and undemanding LSTM cell with improved
performance effectiveness and computational efficiency. Empirical evaluations
show that the proposed bi-gated LSTM based network outperforms various LSTM
based models verifying its effectiveness for abnormality detection and
generalization tasks on CUHK Avenue and UCSD datasets.Comment: 16 pages, 7 figures, Computer Graphics International (CGI) 202
Real-time Embedded Person Detection and Tracking for Shopping Behaviour Analysis
Shopping behaviour analysis through counting and tracking of people in
shop-like environments offers valuable information for store operators and
provides key insights in the stores layout (e.g. frequently visited spots).
Instead of using extra staff for this, automated on-premise solutions are
preferred. These automated systems should be cost-effective, preferably on
lightweight embedded hardware, work in very challenging situations (e.g.
handling occlusions) and preferably work real-time. We solve this challenge by
implementing a real-time TensorRT optimized YOLOv3-based pedestrian detector,
on a Jetson TX2 hardware platform. By combining the detector with a sparse
optical flow tracker we assign a unique ID to each customer and tackle the
problem of loosing partially occluded customers. Our detector-tracker based
solution achieves an average precision of 81.59% at a processing speed of 10
FPS. Besides valuable statistics, heat maps of frequently visited spots are
extracted and used as an overlay on the video stream
Distinguishing Posed and Spontaneous Smiles by Facial Dynamics
Smile is one of the key elements in identifying emotions and present state of
mind of an individual. In this work, we propose a cluster of approaches to
classify posed and spontaneous smiles using deep convolutional neural network
(CNN) face features, local phase quantization (LPQ), dense optical flow and
histogram of gradient (HOG). Eulerian Video Magnification (EVM) is used for
micro-expression smile amplification along with three normalization procedures
for distinguishing posed and spontaneous smiles. Although the deep CNN face
model is trained with large number of face images, HOG features outperforms
this model for overall face smile classification task. Using EVM to amplify
micro-expressions did not have a significant impact on classification accuracy,
while the normalizing facial features improved classification accuracy. Unlike
many manual or semi-automatic methodologies, our approach aims to automatically
classify all smiles into either `spontaneous' or `posed' categories, by using
support vector machines (SVM). Experimental results on large UvA-NEMO smile
database show promising results as compared to other relevant methods.Comment: 16 pages, 8 figures, ACCV 2016, Second Workshop on Spontaneous Facial
Behavior Analysi
4D Match Trees for Non-rigid Surface Alignment
This paper presents a method for dense 4D temporal alignment of partial reconstructions of non-rigid surfaces observed from single or multiple moving cameras of complex scenes. 4D Match Trees are introduced for robust global alignment of non-rigid shape based on the similarity between images across sequences and views. Wide-timeframe sparse correspondence between arbitrary pairs of images is established using a segmentation-based feature detector (SFD) which is demonstrated to give improved matching of non-rigid shape. Sparse SFD correspondence allows the similarity between any pair of image frames to be estimated for moving cameras and multiple views. This enables the 4D Match Tree to be constructed which minimises the observed change in non-rigid shape for global alignment across all images. Dense 4D temporal correspondence across all frames is then estimated by traversing the 4D Match tree using optical flow initialised from the sparse feature matches. The approach is evaluated on single and multiple view images sequences for alignment of partial surface reconstructions of dynamic objects in complex indoor and outdoor scenes to obtain a temporally consistent 4D representation. Comparison to previous 2D and 3D scene flow demonstrates that 4D Match Trees achieve reduced errors due to drift and improved robustness to large non-rigid deformations
Two-Frame Motion Estimation Based on Polynomial Expansion
This paper presents a novel two-frame motion estimation algorithm. The first step is to approximate each neighborhood of both frames by quadratic polynomials, which can be done efficiently using the polynomial expansion transform. From observing how an exact polynomial transforms under translation a method to estimate displacement fields from the polynomial expansion coefficients is derived and after a series of refinements leads to a robust algorithm. Evaluation on the Yosemite sequence shows good results
A Theoretical Comparison of Different Orientation Tensors
Orientation tensors is a powerful representation of local orientation. Over the years, several different approaches to estimate the tensors have appeared. The derivations of the different tensors vary to a great extent. This partly obstructs a theoretical comparison between them, which otherwise would be useful when one wants to choose the best tensor for a particular application. This paper shows that all the existing tensors can be derived using a common framework. The derivation is based on signal models and the concept of orientation functionals. The idea is to estimate a signal model and compute a suitable orientation functional in terms of the model parameters. The models used in this paper are polynomial models and quadrature models. This framework may also aid in the design of orientation tensors based on other signal models
Estimation of Speed and Distance of Surrounding Vehicles from a Single Camera
Deep Learning requires huge amount of data with related labels, that are necessary for proper training. Thanks to modern videogames, which aim at photorealism, it is possible to easily obtain synthetic dataset by extracting information directly from the game engine. The intent is to use data extracted from a videogame to obtain a representation of various scenarios and train a deep neural network to infer the information required for a specific task. In this work we focus on computer vision aids for automotive applications and we target to estimate the distance and speed of the surrounding vehicles by using a single dashboard camera. We propose two network models for distance and speed estimation, respectively. We show that training them by using synthetic images generated by a game engine is a viable solution that turns out to be very effective in real settings