834 research outputs found
Geodesic Distance Histogram Feature for Video Segmentation
This paper proposes a geodesic-distance-based feature that encodes global
information for improved video segmentation algorithms. The feature is a joint
histogram of intensity and geodesic distances, where the geodesic distances are
computed as the shortest paths between superpixels via their boundaries. We
also incorporate adaptive voting weights and spatial pyramid configurations to
include spatial information into the geodesic histogram feature and show that
this further improves results. The feature is generic and can be used as part
of various algorithms. In experiments, we test the geodesic histogram feature
by incorporating it into two existing video segmentation frameworks. This leads
to significantly better performance in 3D video segmentation benchmarks on two
datasets
Structure, dynamics and bifurcations of discrete solitons in trapped ion crystals
We study discrete solitons (kinks) accessible in state-of-the-art trapped ion
experiments, considering zigzag crystals and quasi-3D configurations, both
theoretically and experimentally. We first extend the theoretical understanding
of different phenomena predicted and recently experimentally observed in the
structure and dynamics of these topological excitations. Employing tools from
topological degree theory, we analyze bifurcations of crystal configurations in
dependence on the trapping parameters, and investigate the formation of kink
configurations and the transformations of kinks between different structures.
This allows us to accurately define and calculate the effective potential
experienced by solitons within the Wigner crystal, and study how this
(so-called Peierls-Nabarro) potential gets modified to a nonperiodic globally
trapping potential in certain parameter regimes. The kinks' rest mass (energy)
and spectrum of modes are computed and the dynamics of linear and nonlinear
kink oscillations are analyzed. We also present novel, experimentally observed,
configurations of kinks incorporating a large-mass defect realized by an
embedded molecular ion, and of pairs of interacting kinks stable for long
times, offering the perspective for exploring and exploiting complex collective
nonlinear excitations, controllable on the quantum level.Comment: 25 pages, 10 figures, v2 corrects Fig. 2 and adds some text and
reference
Learning to Extract Motion from Videos in Convolutional Neural Networks
This paper shows how to extract dense optical flow from videos with a
convolutional neural network (CNN). The proposed model constitutes a potential
building block for deeper architectures to allow using motion without resorting
to an external algorithm, \eg for recognition in videos. We derive our network
architecture from signal processing principles to provide desired invariances
to image contrast, phase and texture. We constrain weights within the network
to enforce strict rotation invariance and substantially reduce the number of
parameters to learn. We demonstrate end-to-end training on only 8 sequences of
the Middlebury dataset, orders of magnitude less than competing CNN-based
motion estimation methods, and obtain comparable performance to classical
methods on the Middlebury benchmark. Importantly, our method outputs a
distributed representation of motion that allows representing multiple,
transparent motions, and dynamic textures. Our contributions on network design
and rotation invariance offer insights nonspecific to motion estimation
Point-wise mutual information-based video segmentation with high temporal consistency
In this paper, we tackle the problem of temporally consistent boundary
detection and hierarchical segmentation in videos. While finding the best
high-level reasoning of region assignments in videos is the focus of much
recent research, temporal consistency in boundary detection has so far only
rarely been tackled. We argue that temporally consistent boundaries are a key
component to temporally consistent region assignment. The proposed method is
based on the point-wise mutual information (PMI) of spatio-temporal voxels.
Temporal consistency is established by an evaluation of PMI-based point
affinities in the spectral domain over space and time. Thus, the proposed
method is independent of any optical flow computation or previously learned
motion models. The proposed low-level video segmentation method outperforms the
learning-based state of the art in terms of standard region metrics
UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition
Current UAV-recorded datasets are mostly limited to action recognition and
object tracking, whereas the gesture signals datasets were mostly recorded in
indoor spaces. Currently, there is no outdoor recorded public video dataset for
UAV commanding signals. Gesture signals can be effectively used with UAVs by
leveraging the UAVs visual sensors and operational simplicity. To fill this gap
and enable research in wider application areas, we present a UAV gesture
signals dataset recorded in an outdoor setting. We selected 13 gestures
suitable for basic UAV navigation and command from general aircraft handling
and helicopter handling signals. We provide 119 high-definition video clips
consisting of 37151 frames. The overall baseline gesture recognition
performance computed using Pose-based Convolutional Neural Network (P-CNN) is
91.9 %. All the frames are annotated with body joints and gesture classes in
order to extend the dataset's applicability to a wider research area including
gesture recognition, action recognition, human pose recognition and situation
awareness.Comment: 12 pages, 4 figures, UAVision workshop, ECCV, 201
Video Object Detection with an Aligned Spatial-Temporal Memory
We introduce Spatial-Temporal Memory Networks for video object detection. At
its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent
computation unit to model long-term temporal appearance and motion dynamics.
The STMM's design enables full integration of pretrained backbone CNN weights,
which we find to be critical for accurate detection. Furthermore, in order to
tackle object motion in videos, we propose a novel MatchTrans module to align
the spatial-temporal memory from frame to frame. Our method produces
state-of-the-art results on the benchmark ImageNet VID dataset, and our
ablative studies clearly demonstrate the contribution of our different design
choices. We release our code and models at
http://fanyix.cs.ucdavis.edu/project/stmn/project.html
Dense Motion Estimation for Smoke
Motion estimation for highly dynamic phenomena such as smoke is an open
challenge for Computer Vision. Traditional dense motion estimation algorithms
have difficulties with non-rigid and large motions, both of which are
frequently observed in smoke motion. We propose an algorithm for dense motion
estimation of smoke. Our algorithm is robust, fast, and has better performance
over different types of smoke compared to other dense motion estimation
algorithms, including state of the art and neural network approaches. The key
to our contribution is to use skeletal flow, without explicit point matching,
to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this
paper we describe our algorithm in greater detail, and provide experimental
evidence to support our claims.Comment: ACCV201
A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects
Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach
A framework for automatic semantic video annotation
The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation
Occupation times of exclusion processes
In this paper we consider exclusion processes evolving on the one-dimensional lattice , under the diffusive time scale and starting from the invariant state - the Bernoulli product measure of parameter . Our goal consists in establishing the scaling
limits of the additive functional - {\em{ the occupation time of the origin}}. We present a method, recently introduced in \cite{G.J.}, from which a
{\em{local Boltzmann-Gibbs Principle}} can be derived for a general class of exclusion processes. In this case, this
principle says that is very well approximated to the additive functional of the density of particles. As a consequence, the scaling limits of
follow from the scaling limits of the density of particles. As examples we present the mean-zero exclusion, the symmetric simple exclusion and
the weakly asymmetric simple exclusion. For the latter under a strong asymmetry regime, the limit of is given in terms of the solution of the KPZ equation.FC
- …
