14,811 research outputs found

    Human detection in crowded scenes

    Get PDF
    In this paper, our focus is to segment the foreground area for human detection. It is assumed that the foreground region has been detected. Accurate foreground contours are not required. The developed approach adopts a modified ISM (Implicit Shape Model) to collect some typical local patches of human being and their location information. Individuals are detected by grouping some local patches in the foreground area. The method can get good results in crowded scenes. Some examples based on CAVIAR dataset have been shown. A main contribution of the paper is that ISM model and joint occlusion analysis are combined for individual segmentation. There are mainly two advantages: First, with more sufficient information inside the foreground region, even the individuals inside a dense area can also be handled. Secondly, the method does not require an accurate foreground contour. A rough foreground area can be easily obtained in most situations. © 2010 IEEE.published_or_final_versionThe 17th IEEE International Conference on Image Processing (ICIP 2010), Hong Kong, 26-29 September 2010. In Proceedings of the 17th ICIP, 2010, p. 721-72

    Three-dimensional model-based human detection in crowded scenes

    Get PDF
    In this paper, the problem of human detection in crowded scenes is formulated as a maximum a posteriori problem, in which, given a set of candidates, predefined 3-D human shape models are matched with image evidence, provided by foreground extraction and probability of boundary, to estimate the human configuration. The optimal solution is obtained by decomposing the mutually related candidates into unoccluded and occluded ones in each iteration according to a graph description of the candidate relations and then only matching models for the unoccluded candidates. A candidate validation and rejection process based on minimum description length and local occlusion reasoning is carried out after each iteration of model matching. The advantage of the proposed optimization procedure is that its computational cost is much smaller than that of global optimization methods, while its performance is comparable to them. The proposed method achieves a detection rate of about 2% higher on a subset of images of the Caviar data set than the best result reported by previous works. We also demonstrate the performance of the proposed method using another challenging data set. © 2011 IEEE.published_or_final_versio

    Bayesian 3D model based human detection in crowded scenes using efficient optimization

    Get PDF
    In this paper, we solve the problem of human detection in crowded scenes using a Bayesian 3D model based method. Human candidates are first nominated by a head detector and a foot detector, then optimization is performed to find the best configuration of the candidates and their corresponding shape models. The solution is obtained by decomposing the mutually related candidates into un-occluded ones and occluded ones in each iteration, and then performing model matching for the un-occluded candidates. To this end, in addition to some obvious clues, we also derive a graph that depicts the inter-object relation so that unreasonable decomposition is avoided. The merit of the proposed optimization procedure is that its computational cost is similar to the greedy optimization methods while its performance is comparable to the global optimization approaches. For model matching, it is performed by employing both prior knowledge and image likelihood, where the priors include the distribution of individual shape models and the restriction on the inter-object distance in real world, and image likelihood is provided by foreground extraction and the edge information. After the model matching, a validation and rejection strategy based on minimum description length is applied to confirm the candidates that have reliable matching results. The proposed method is tested on both the publicly available Caviar dataset and a challenging dataset constructed by ourselves. The experimental results demonstrate the effectiveness of our approach. © 2010 IEEE.published_or_final_versionThe 2011 IEEE Workshop on Applications of Computer Vision (WACV 2011), Kona, HI., 5-7 January 2011. In Proceedings of WACV2011, 2011, p. 557-56

    BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation

    Full text link
    Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement, and instance-keypoint association problems. Our new instance embedding loss provides a learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression, and contrastive instance embedding learning, without additional computational cost during inference. BoIR is effective for crowded scenes, outperforming state-of-the-art on COCO val (0.8 AP), COCO test-dev (0.5 AP), CrowdPose (4.9 AP), and OCHuman (3.5 AP). Code will be available at https://github.com/uyoung-jeong/BoIRComment: Accepted to BMVC 2023, 19 pages including the appendix, 6 figures, 7 table

    Anomaly detection through spatio-temporal context modeling in crowded scenes

    Get PDF
    A novel statistical framework for modeling the intrinsic structure of crowded scenes and detecting abnormal activities is presented in this paper. The proposed framework essentially turns the anomaly detection process into two parts, namely, motion pattern representation and crowded context modeling. During the first stage, we averagely divide the spatio-temporal volume into atomic blocks. Considering the fact that mutual interference of several human body parts potentially happen in the same block, we propose an atomic motion pattern representation using the Gaussian Mixture Model (GMM) to distinguish the motions inside each block in a refined way. Usual motion patterns can thus be defined as a certain type of steady motion activities appearing at specific scene positions. During the second stage, we further use the Markov Random Field (MRF) model to characterize the joint label distributions over all the adjacent local motion patterns inside the same crowded scene, aiming at modeling the severely occluded situations in a crowded scene accurately. By combining the determinations from the two stages, a weighted scheme is proposed to automatically detect anomaly events from crowded scenes. The experimental results on several different outdoor and indoor crowded scenes illustrate the effectiveness of the proposed algorithm
    corecore