834 research outputs found

    Geodesic Distance Histogram Feature for Video Segmentation

    Full text link
    This paper proposes a geodesic-distance-based feature that encodes global information for improved video segmentation algorithms. The feature is a joint histogram of intensity and geodesic distances, where the geodesic distances are computed as the shortest paths between superpixels via their boundaries. We also incorporate adaptive voting weights and spatial pyramid configurations to include spatial information into the geodesic histogram feature and show that this further improves results. The feature is generic and can be used as part of various algorithms. In experiments, we test the geodesic histogram feature by incorporating it into two existing video segmentation frameworks. This leads to significantly better performance in 3D video segmentation benchmarks on two datasets

    Structure, dynamics and bifurcations of discrete solitons in trapped ion crystals

    Get PDF
    We study discrete solitons (kinks) accessible in state-of-the-art trapped ion experiments, considering zigzag crystals and quasi-3D configurations, both theoretically and experimentally. We first extend the theoretical understanding of different phenomena predicted and recently experimentally observed in the structure and dynamics of these topological excitations. Employing tools from topological degree theory, we analyze bifurcations of crystal configurations in dependence on the trapping parameters, and investigate the formation of kink configurations and the transformations of kinks between different structures. This allows us to accurately define and calculate the effective potential experienced by solitons within the Wigner crystal, and study how this (so-called Peierls-Nabarro) potential gets modified to a nonperiodic globally trapping potential in certain parameter regimes. The kinks' rest mass (energy) and spectrum of modes are computed and the dynamics of linear and nonlinear kink oscillations are analyzed. We also present novel, experimentally observed, configurations of kinks incorporating a large-mass defect realized by an embedded molecular ion, and of pairs of interacting kinks stable for long times, offering the perspective for exploring and exploiting complex collective nonlinear excitations, controllable on the quantum level.Comment: 25 pages, 10 figures, v2 corrects Fig. 2 and adds some text and reference

    Learning to Extract Motion from Videos in Convolutional Neural Networks

    Full text link
    This paper shows how to extract dense optical flow from videos with a convolutional neural network (CNN). The proposed model constitutes a potential building block for deeper architectures to allow using motion without resorting to an external algorithm, \eg for recognition in videos. We derive our network architecture from signal processing principles to provide desired invariances to image contrast, phase and texture. We constrain weights within the network to enforce strict rotation invariance and substantially reduce the number of parameters to learn. We demonstrate end-to-end training on only 8 sequences of the Middlebury dataset, orders of magnitude less than competing CNN-based motion estimation methods, and obtain comparable performance to classical methods on the Middlebury benchmark. Importantly, our method outputs a distributed representation of motion that allows representing multiple, transparent motions, and dynamic textures. Our contributions on network design and rotation invariance offer insights nonspecific to motion estimation

    Point-wise mutual information-based video segmentation with high temporal consistency

    Full text link
    In this paper, we tackle the problem of temporally consistent boundary detection and hierarchical segmentation in videos. While finding the best high-level reasoning of region assignments in videos is the focus of much recent research, temporal consistency in boundary detection has so far only rarely been tackled. We argue that temporally consistent boundaries are a key component to temporally consistent region assignment. The proposed method is based on the point-wise mutual information (PMI) of spatio-temporal voxels. Temporal consistency is established by an evaluation of PMI-based point affinities in the spectral domain over space and time. Thus, the proposed method is independent of any optical flow computation or previously learned motion models. The proposed low-level video segmentation method outperforms the learning-based state of the art in terms of standard region metrics

    UAV-GESTURE: A Dataset for UAV Control and Gesture Recognition

    Get PDF
    Current UAV-recorded datasets are mostly limited to action recognition and object tracking, whereas the gesture signals datasets were mostly recorded in indoor spaces. Currently, there is no outdoor recorded public video dataset for UAV commanding signals. Gesture signals can be effectively used with UAVs by leveraging the UAVs visual sensors and operational simplicity. To fill this gap and enable research in wider application areas, we present a UAV gesture signals dataset recorded in an outdoor setting. We selected 13 gestures suitable for basic UAV navigation and command from general aircraft handling and helicopter handling signals. We provide 119 high-definition video clips consisting of 37151 frames. The overall baseline gesture recognition performance computed using Pose-based Convolutional Neural Network (P-CNN) is 91.9 %. All the frames are annotated with body joints and gesture classes in order to extend the dataset's applicability to a wider research area including gesture recognition, action recognition, human pose recognition and situation awareness.Comment: 12 pages, 4 figures, UAVision workshop, ECCV, 201

    Video Object Detection with an Aligned Spatial-Temporal Memory

    Full text link
    We introduce Spatial-Temporal Memory Networks for video object detection. At its core, a novel Spatial-Temporal Memory module (STMM) serves as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM's design enables full integration of pretrained backbone CNN weights, which we find to be critical for accurate detection. Furthermore, in order to tackle object motion in videos, we propose a novel MatchTrans module to align the spatial-temporal memory from frame to frame. Our method produces state-of-the-art results on the benchmark ImageNet VID dataset, and our ablative studies clearly demonstrate the contribution of our different design choices. We release our code and models at http://fanyix.cs.ucdavis.edu/project/stmn/project.html

    Dense Motion Estimation for Smoke

    Full text link
    Motion estimation for highly dynamic phenomena such as smoke is an open challenge for Computer Vision. Traditional dense motion estimation algorithms have difficulties with non-rigid and large motions, both of which are frequently observed in smoke motion. We propose an algorithm for dense motion estimation of smoke. Our algorithm is robust, fast, and has better performance over different types of smoke compared to other dense motion estimation algorithms, including state of the art and neural network approaches. The key to our contribution is to use skeletal flow, without explicit point matching, to provide a sparse flow. This sparse flow is upgraded to a dense flow. In this paper we describe our algorithm in greater detail, and provide experimental evidence to support our claims.Comment: ACCV201

    A Multi-cut Formulation for Joint Segmentation and Tracking of Multiple Objects

    No full text
    Recently, Minimum Cost Multicut Formulations have been proposed and proven to be successful in both motion trajectory segmentation and multi-target tracking scenarios. Both tasks benefit from decomposing a graphical model into an optimal number of connected components based on attractive and repulsive pairwise terms. The two tasks are formulated on different levels of granularity and, accordingly, leverage mostly local information for motion segmentation and mostly high-level information for multi-target tracking. In this paper we argue that point trajectories and their local relationships can contribute to the high-level task of multi-target tracking and also argue that high-level cues from object detection and tracking are helpful to solve motion segmentation. We propose a joint graphical model for point trajectories and object detections whose Multicuts are solutions to motion segmentation {\it and} multi-target tracking problems at once. Results on the FBMS59 motion segmentation benchmark as well as on pedestrian tracking sequences from the 2D MOT 2015 benchmark demonstrate the promise of this joint approach

    A framework for automatic semantic video annotation

    Get PDF
    The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the ‘semantic gap’. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation

    Occupation times of exclusion processes

    Get PDF
    In this paper we consider exclusion processes {ηt:t0}\{\eta_t: t\geq{0}\} evolving on the one-dimensional lattice Z\mathbb{Z}, under the diffusive time scale tn2tn^2 and starting from the invariant state νρ\nu_\rho - the Bernoulli product measure of parameter ρ[0,1]\rho\in{[0,1]}. Our goal consists in establishing the scaling limits of the additive functional Γt:=0tn2ηs(0)ds\Gamma_t:=\int_{0}^{tn^2} \eta_s(0)\, ds - {\em{ the occupation time of the origin}}. We present a method, recently introduced in \cite{G.J.}, from which a {\em{local Boltzmann-Gibbs Principle}} can be derived for a general class of exclusion processes. In this case, this principle says that Γt\Gamma_t is very well approximated to the additive functional of the density of particles. As a consequence, the scaling limits of Γt\Gamma_t follow from the scaling limits of the density of particles. As examples we present the mean-zero exclusion, the symmetric simple exclusion and the weakly asymmetric simple exclusion. For the latter under a strong asymmetry regime, the limit of Γt\Gamma_t is given in terms of the solution of the KPZ equation.FC
    corecore