5,530 research outputs found
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
We propose a novel approach for optical flow estimation , targeted at large
displacements with significant oc-clusions. It consists of two steps: i) dense
matching by edge-preserving interpolation from a sparse set of matches; ii)
variational energy minimization initialized with the dense matches. The
sparse-to-dense interpolation relies on an appropriate choice of the distance,
namely an edge-aware geodesic distance. This distance is tailored to handle
occlusions and motion boundaries -- two common and difficult issues for optical
flow computation. We also propose an approximation scheme for the geodesic
distance to allow fast computation without loss of performance. Subsequent to
the dense interpolation step, standard one-level variational energy
minimization is carried out on the dense matches to obtain the final flow
estimation. The proposed approach, called Edge-Preserving Interpolation of
Correspondences (EpicFlow) is fast and robust to large displacements. It
significantly outperforms the state of the art on MPI-Sintel and performs on
par on Kitti and Middlebury
DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval
In this paper, we address the problem of high performance and computationally
efficient content-based video retrieval in large-scale datasets. Current
methods typically propose either: (i) fine-grained approaches employing
spatio-temporal representations and similarity calculations, achieving high
performance at a high computational cost or (ii) coarse-grained approaches
representing/indexing videos as global vectors, where the spatio-temporal
structure is lost, providing low performance but also having low computational
cost. In this work, we propose a Knowledge Distillation framework, which we
call Distill-and-Select (DnS), that starting from a well-performing
fine-grained Teacher Network learns: a) Student Networks at different retrieval
performance and computational efficiency trade-offs and b) a Selection Network
that at test time rapidly directs samples to the appropriate student to
maintain both high retrieval performance and high computational efficiency. We
train several students with different architectures and arrive at different
trade-offs of performance and efficiency, i.e., speed and storage requirements,
including fine-grained students that store index videos using binary
representations. Importantly, the proposed scheme allows Knowledge Distillation
in large, unlabelled datasets -- this leads to good students. We evaluate DnS
on five public datasets on three different video retrieval tasks and
demonstrate a) that our students achieve state-of-the-art performance in
several cases and b) that our DnS framework provides an excellent trade-off
between retrieval performance, computational speed, and storage space. In
specific configurations, our method achieves similar mAP with the teacher but
is 20 times faster and requires 240 times less storage space. Our collected
dataset and implementation are publicly available:
https://github.com/mever-team/distill-and-select
Vehicle-Rear: A New Dataset to Explore Feature Fusion for Vehicle Identification Using Convolutional Neural Networks
This work addresses the problem of vehicle identification through
non-overlapping cameras. As our main contribution, we introduce a novel dataset
for vehicle identification, called Vehicle-Rear, that contains more than three
hours of high-resolution videos, with accurate information about the make,
model, color and year of nearly 3,000 vehicles, in addition to the position and
identification of their license plates. To explore our dataset we design a
two-stream CNN that simultaneously uses two of the most distinctive and
persistent features available: the vehicle's appearance and its license plate.
This is an attempt to tackle a major problem: false alarms caused by vehicles
with similar designs or by very close license plate identifiers. In the first
network stream, shape similarities are identified by a Siamese CNN that uses a
pair of low-resolution vehicle patches recorded by two different cameras. In
the second stream, we use a CNN for OCR to extract textual information,
confidence scores, and string similarities from a pair of high-resolution
license plate patches. Then, features from both streams are merged by a
sequence of fully connected layers for decision. In our experiments, we
compared the two-stream network against several well-known CNN architectures
using single or multiple vehicle features. The architectures, trained models,
and dataset are publicly available at https://github.com/icarofua/vehicle-rear
- …