571 research outputs found

    Online real-time crowd behavior detection in video sequences

    Get PDF
    Automatically detecting events in crowded scenes is a challenging task in Computer Vision. A number of offline approaches have been proposed for solving the problem of crowd behavior detection, however the offline assumption limits their application in real-world video surveillance systems. In this paper, we propose an online and real-time method for detecting events in crowded video sequences. The proposed approach is based on the combination of visual feature extraction and image segmentation and it works without the need of a training phase. A quantitative experimental evaluation has been carried out on multiple publicly available video sequences, containing data from various crowd scenarios and different types of events, to demonstrate the effectiveness of the approach

    A Spatio-Temporal Multi-Scale Binary Descriptor

    Get PDF
    Binary descriptors are widely used for multi-view matching and robotic navigation. However, their matching performance decreases considerably under severe scale and viewpoint changes in non-planar scenes. To overcome this problem, we propose to encode the varying appearance of selected 3D scene points tracked by a moving camera with compact spatio-temporal descriptors. To this end, we first track interest points and capture their temporal variations at multiple scales. Then, we validate feature tracks through 3D reconstruction and compress the temporal sequence of descriptors by encoding the most frequent and stable binary values. Finally, we determine multi-scale correspondences across views with a matching strategy that handles severe scale differences. The proposed spatio-temporal multi-scale approach is generic and can be used with a variety of binary descriptors. We show the effectiveness of the joint multi-scale extraction and temporal reduction through comparisons of different temporal reduction strategies and the application to several binary descriptors

    ビデオ映像に対する人間動作の認識

    Get PDF
    Our overall purpose in this dissertation is automatic construction of a large-scale action database with Web data, which could be helpful for the better exploration of action recognition. We conducted large-scale experiments on 100 human actions and 12 nonhuman actions and obtained promissing results. This disseration is constructed with 6 chapters. In the followings, we briefly introduce the content of each chapter. In Chapter 1, recent approaches on action recognition as well as the necessity of building a large-scale action database and its difficulties are described. Then our works to solve the problem are concisely explained. In Chapter 2, the first work which introduces a framework of extracting automatically relevant video shots of specific actions from Web videos is described in details. This framework at first, selects relevant videos among thousands of Web videos for a given action using tag co-occurance and then, divides selected videos into video shots. Video shots are then ranked based on their visual linkage. The top ranked video shots are supposed to be the most related shots of the action. Moreover, our method of adopting Web images to shot ranking is also introduced. Finally, large-scale experiments on 100 human actions and 12 non-human actions and their results are described. In Chapter 3, the second work which aims to further improve shot ranking of the above framework by proposing a novel ranking method is introduced. Our proposed ranking method, which is called VisualTextualRank, is an extension of a conventional method, VisualRank, which is applied to shot ranking in Chapter 2. VisualTextualRank effectively employs both textual information and visual information extracted from the data. Our experiment results showed that using our method instead of the conventional ranking method could obtain more relevant shots. In Chapter 4, the third work which aims to obtain more informative and representative features of videos is described. Based on a conventional method of extracting spatiotemporal features which was adopted in Chapter 2 and Chapter 3, we propose to extract spatio-temporal features with triangulation of dense SURF keypoints. Shape features of the triangles along with visual features and motion features of their points are taken into account to form our features. By applying our method of feature extraction to the framework introduced in Chapter 2, we show that more relevant video shots can be retrieved at the top. Furthermore, the effectiveness of our method is also validated on action classification for UCF-101 and UCF-50 which are well-known large-scale data sets. The experiment results demonstrate that our features are comparable and complementary to the state-of-the-art. In Chapter 5, the final work which focuses on recognition of hand motion based actions is introduced. We propose a system of hand detection and tracking for unconstrained videos and extract hand movement based features from detected and tracked hand regions. These features are supposed to help improve results for hand motion based actions. To evaluate the performance of our system on hand detection, we use Video-Pose2.0 dataset which is a challenging dataset with uncontrolled videos. To validate the effectiveness of our features, we conduct experiments on ne-grained action recognition with \\playing instruments" group in UCF-101 data set. The experiment results show the efficiency of our system. In Chapter 6, our works with their major points and findings are summarized. We also consider the potential of applying the results obtained by our works to further researches.電気通信大学201

    Point triangulation through polyhedron collapse using the l∞ norm

    Get PDF
    Multi-camera triangulation of feature points based on a minimisation of the overall l(2) reprojection error can get stuck in suboptimal local minima or require slow global optimisation. For this reason, researchers have proposed optimising the l(infinity) norm of the l(2) single view reprojection errors, which avoids the problem of local minima entirely. In this paper we present a novel method for l(infinity) triangulation that minimizes the l(infinity) norm of the l(infinity) reprojection errors: this apparently small difference leads to a much faster but equally accurate solution which is related to the MLE under the assumption of uniform noise. The proposed method adopts a new optimisation strategy based on solving simple quadratic equations. This stands in contrast with the fastest existing methods, which solve a sequence of more complex auxiliary Linear Programming or Second Order Cone Problems. The proposed algorithm performs well: for triangulation, it achieves the same accuracy as existing techniques while executing faster and being straightforward to implement

    Event-based Motion Segmentation with Spatio-Temporal Graph Cuts

    Full text link
    Identifying independently moving objects is an essential task for dynamic scene understanding. However, traditional cameras used in dynamic scenes may suffer from motion blur or exposure artifacts due to their sampling principle. By contrast, event-based cameras are novel bio-inspired sensors that offer advantages to overcome such limitations. They report pixelwise intensity changes asynchronously, which enables them to acquire visual information at exactly the same rate as the scene dynamics. We develop a method to identify independently moving objects acquired with an event-based camera, i.e., to solve the event-based motion segmentation problem. We cast the problem as an energy minimization one involving the fitting of multiple motion models. We jointly solve two subproblems, namely event cluster assignment (labeling) and motion model fitting, in an iterative manner by exploiting the structure of the input event data in the form of a spatio-temporal graph. Experiments on available datasets demonstrate the versatility of the method in scenes with different motion patterns and number of moving objects. The evaluation shows state-of-the-art results without having to predetermine the number of expected moving objects. We release the software and dataset under an open source licence to foster research in the emerging topic of event-based motion segmentation

    Tracking, Detection and Registration in Microscopy Material Images

    Get PDF
    Fast and accurate characterization of fiber micro-structures plays a central role for material scientists to analyze physical properties of continuous fiber reinforced composite materials. In materials science, this is usually achieved by continuously crosssectioning a 3D material sample for a sequence of 2D microscopic images, followed by a fiber detection/tracking algorithm through the obtained image sequence. To speed up this process and be able to handle larger-size material samples, we propose sparse sampling with larger inter-slice distance in cross sectioning and develop a new algorithm that can robustly track large-scale fibers from such a sparsely sampled image sequence. In particular, the problem is formulated as multi-target tracking and Kalman filters are applied to track each fiber along the image sequence. One main challenge in this tracking process is to correctly associate each fiber to its observation given that 1) fiber observations are of large scale, crowded and show very similar appearances in a 2D slice, and 2) there may be a large gap between the predicted location of a fiber and its observation in the sparse sampling. To address this challenge, a novel group-wise association algorithm is developed by leveraging the fact that fibers are implanted in bundles and the fibers in the same bundle are highly correlated through the image sequence. Tracking-by-detection algorithms rely heavily on detection accuracy, especially the recall performance. The state-of-the-art fiber detection algorithms perform well under ideal conditions, but are not accurate where there are local degradations of image quality, due to contaminants on the material surface and/or defocus blur. Convolutional Neural Networks (CNN) could be used for this problem, but would require a large number of manual annotated fibers, which are not available. We propose an unsupervised learning method to accurately detect fibers on the large scale, which is robust against local degradations of image quality. The proposed method does not require manual annotations, but uses fiber shape/size priors and spatio-temporal consistency in tracking to simulate the supervision in the training of the CNN. Due to the significant microscope movement during the data acquisition, the sampled microscopy images might be not well aligned, which increases the difficulties for further large-scale fiber tracking. In this dissertation, we design an object tracking system which could accurately track large-scale fibers and simultaneously perform satisfactory image registration. Large-scale fiber tracking task is accomplished by Kalman filters based tracking methods. With the assistance of fiber tracking, the image registration is performed in a coarse-to-fine way. To evaluate the proposed methods, a dataset was collected by Air Force Research Laboratories (AFRL). The material scientists in AFRL used a serial sectioning instrument to cross-section the 3D material samples. During sample preparation, the samples are ground, cleaned, and then imaged. Experimental results on this collected dataset have demonstrated that the proposed methods yield significant improvements in large-scale fiber tracking and detection, together with satisfactory image registration
    corecore