34,032 research outputs found
Online real-time crowd behavior detection in video sequences
Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach
Accurate Light Field Depth Estimation with Superpixel Regularization over Partially Occluded Regions
Depth estimation is a fundamental problem for light field photography
applications. Numerous methods have been proposed in recent years, which either
focus on crafting cost terms for more robust matching, or on analyzing the
geometry of scene structures embedded in the epipolar-plane images. Significant
improvements have been made in terms of overall depth estimation error;
however, current state-of-the-art methods still show limitations in handling
intricate occluding structures and complex scenes with multiple occlusions. To
address these challenging issues, we propose a very effective depth estimation
framework which focuses on regularizing the initial label confidence map and
edge strength weights. Specifically, we first detect partially occluded
boundary regions (POBR) via superpixel based regularization. Series of
shrinkage/reinforcement operations are then applied on the label confidence map
and edge strength weights over the POBR. We show that after weight
manipulations, even a low-complexity weighted least squares model can produce
much better depth estimation than state-of-the-art methods in terms of average
disparity error rate, occlusion boundary precision-recall rate, and the
preservation of intricate visual features
PersonRank: Detecting Important People in Images
Always, some individuals in images are more important/attractive than others
in some events such as presentation, basketball game or speech. However, it is
challenging to find important people among all individuals in images directly
based on their spatial or appearance information due to the existence of
diverse variations of pose, action, appearance of persons and various changes
of occasions. We overcome this difficulty by constructing a multiple
Hyper-Interaction Graph to treat each individual in an image as a node and
inferring the most active node referring to interactions estimated by various
types of clews. We model pairwise interactions between persons as the edge
message communicated between nodes, resulting in a bidirectional
pairwise-interaction graph. To enrich the personperson interaction estimation,
we further introduce a unidirectional hyper-interaction graph that models the
consensus of interaction between a focal person and any person in a local
region around. Finally, we modify the PageRank algorithm to infer the
activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union
of the pairwise-interaction and hyperinteraction graphs, and we call our
algorithm the PersonRank. In order to provide publicable datasets for
evaluation, we have contributed a new dataset called Multi-scene Important
People Image Dataset and gathered a NCAA Basketball Image Dataset from sports
game sequences. We have demonstrated that the proposed PersonRank outperforms
related methods clearly and substantially.Comment: 8 pages, conferenc
Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments
Existing simultaneous localization and mapping (SLAM) algorithms are not
robust in challenging low-texture environments because there are only few
salient features. The resulting sparse or semi-dense map also conveys little
information for motion planning. Though some work utilize plane or scene layout
for dense map regularization, they require decent state estimation from other
sources. In this paper, we propose real-time monocular plane SLAM to
demonstrate that scene understanding could improve both state estimation and
dense mapping especially in low-texture environments. The plane measurements
come from a pop-up 3D plane model applied to each single image. We also combine
planes with point based SLAM to improve robustness. On a public TUM dataset,
our algorithm generates a dense semantic 3D model with pixel depth error of 6.2
cm while existing SLAM algorithms fail. On a 60 m long dataset with loops, our
method creates a much better 3D model with state estimation error of 0.67%.Comment: International Conference on Intelligent Robots and Systems (IROS)
201
A survey of visual preprocessing and shape representation techniques
Many recent theories and methods proposed for visual preprocessing and shape representation are summarized. The survey brings together research from the fields of biology, psychology, computer science, electrical engineering, and most recently, neural networks. It was motivated by the need to preprocess images for a sparse distributed memory (SDM), but the techniques presented may also prove useful for applying other associative memories to visual pattern recognition. The material of this survey is divided into three sections: an overview of biological visual processing; methods of preprocessing (extracting parts of shape, texture, motion, and depth); and shape representation and recognition (form invariance, primitives and structural descriptions, and theories of attention)
- …