452 research outputs found

    Class-Agnostic Counting

    Full text link
    Nearly all existing counting methods are designed for a specific object class. Our work, however, aims to create a counting model able to count any class of object. To achieve this goal, we formulate counting as a matching problem, enabling us to exploit the image self-similarity property that naturally exists in object counting problems. We make the following three contributions: first, a Generic Matching Network (GMN) architecture that can potentially count any object in a class-agnostic manner; second, by reformulating the counting problem as one of matching objects, we can take advantage of the abundance of video data labeled for tracking, which contains natural repetitions suitable for training a counting model. Such data enables us to train the GMN. Third, to customize the GMN to different user requirements, an adapter module is used to specialize the model with minimal effort, i.e. using a few labeled examples, and adapting only a small fraction of the trained parameters. This is a form of few-shot learning, which is practical for domains where labels are limited due to requiring expert knowledge (e.g. microbiology). We demonstrate the flexibility of our method on a diverse set of existing counting benchmarks: specifically cells, cars, and human crowds. The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images. When training on the entire dataset, the proposed method outperforms all previous methods by a large margin.Comment: Asian Conference on Computer Vision (ACCV), 201

    TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image

    Full text link
    Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, first develops a pyramid tree representation module based on transformer blocks to extract multi-scale features during the encoding stage. Contextual attention-based feature fusion and tree density regressor modules are further designed to utilize the robust features from the encoder to estimate tree density maps in the decoder. Moreover, we propose a pyramid learning strategy that includes local tree density consistency and local tree count ranking losses to utilize unlabeled images into the training process. Finally, the tree counter token is introduced to regulate the network by computing the global tree counts for both labeled and unlabeled images. Our model was evaluated on two benchmark tree counting datasets, Jiangsu, and Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our TreeFormer outperforms the state of the art semi-supervised methods under the same setting and exceeds the fully-supervised methods using the same number of labeled images. The codes and datasets are available at https://github.com/HAAClassic/TreeFormer.Comment: Accepted in IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSIN

    Scene-specific crowd counting using synthetic training images

    Get PDF
    Crowd counting is a computer vision task on which considerable progress has recently been made thanks to convolutional neural networks. However, it remains a challenging task even in scene-specific settings, in real-world application scenarios where no representative images of the target scene are available, not even unlabelled, for training or fine-tuning a crowd counting model. Inspired by previous work in other computer vision tasks, we propose a simple but effective solution for the above application scenario, which consists of automatically building a scene-specific training set of synthetic images. Our solution does not require from end-users any manual annotation effort nor the collection of representative images of the target scene. Extensive experiments on several benchmark data sets show that the proposed solution can improve the effectiveness of existing crowd counting methods

    Efficient people counting with limited manual interferences

    Full text link
    © 2014 IEEE. People counting is a topic with various practical applications. Over the last decade, two general approaches have been proposed to tackle this problem: a) counting based on individual human detection; b)counting by measuring regression relation between the crowd density and number of people. Because the regression based method can avoid explicit people detection which faces several well-known challenges, it has been considered as a robust method particularly on a complicated environments. An efficient regression based method is proposed in this paper, which can be well adopted into any existing video surveillance system. It adopts color based segmentation to extract foreground regions in images. Regression is established based on the foreground density and the number of people. This method is fast and can deal with lighting condition changes. Experiments on public datasets and one captured dataset have shown the effectiveness and robustness of the method

    A Recent Trend in Individual Counting Approach Using Deep Network

    Get PDF
    In video surveillance scheme, counting individuals is regarded as a crucial task. Of all the individual counting techniques in existence, the regression technique can offer enhanced performance under overcrowded area. However, this technique is unable to specify the details of counting individual such that it fails in locating the individual. On contrary, the density map approach is very effective to overcome the counting problems in various situations such as heavy overlapping and low resolution. Nevertheless, this approach may break down in cases when only the heads of individuals appear in video scenes, and it is also restricted to the feature’s types. The popular technique to obtain the pertinent information automatically is Convolutional Neural Network (CNN). However, the CNN based counting scheme is unable to sufficiently tackle three difficulties, namely, distributions of non-uniform density, changes of scale and variation of drastic scale. In this study, we cater a review on current counting techniques which are in correlation with deep net in different applications of crowded scene. The goal of this work is to specify the effectiveness of CNN applied on popular individuals counting approaches for attaining higher precision results

    Crowd region detection in outdoor scenes using color spaces

    Get PDF

    Application of invariant moments for crowd analysis

    No full text
    The advancement in technology such as the use of CCTV has improved the effects of monitoring crowds. However, the drawback of using CCTV is that the observer might miss some information because monitoring crowds through CCTV system is very laborious and cannot be performed for all the cameras simultaneously. Hence, integrating the image processing techniques into the CCTV surveillance system could give numerous key advantages, and is in fact the only way to deploy effective and affordable intelligent video security systems. Meanwhile, in monitoring crowds, this approach may provide an automated crowd analysis which may also help to improve the prevention of incidents and accelerate action triggering. One of the image processing techniques which might be appropriate is moment invariants. The moments for an individual object have been used widely and successfully in lots of application such as pattern recognition, object identification or image reconstruction. However, until now, moments have not been widely used for a group of objects, such as crowds. A new method Translation Invariant Orthonormal Chebyshev Moments has been proposed. It has been used to estimate crowd density, and compared with two other methods, the Grey Level Dependency Matrix and Minkowski Fractal Dimension. The extracted features are classified into a range of density by using a Self Organizing Map. A comparison of the classification results is done to determine which method gives the best performance for measuring crowd density by vision. The Grey Level Dependency Matrix gives slightly better performance than the Translation Invariant Orthonormal Chebyshev Moments. However, the latter requires less computational resources
    corecore