975 research outputs found

    Counting with Focus for Free

    Get PDF
    This paper aims to count arbitrary objects in images. The leading counting approaches start from point annotations per object from which they construct density maps. Then, their training objective transforms input images to density maps through deep convolutional networks. We posit that the point annotations serve more supervision purposes than just constructing density maps. We introduce ways to repurpose the points for free. First, we propose supervised focus from segmentation, where points are converted into binary maps. The binary maps are combined with a network branch and accompanying loss function to focus on areas of interest. Second, we propose supervised focus from global density, where the ratio of point annotations to image pixels is used in another branch to regularize the overall density estimation. To assist both the density estimation and the focus from segmentation, we also introduce an improved kernel size estimator for the point annotations. Experiments on six datasets show that all our contributions reduce the counting error, regardless of the base network, resulting in state-of-the-art accuracy using only a single network. Finally, we are the first to count on WIDER FACE, allowing us to show the benefits of our approach in handling varying object scales and crowding levels. Code is available at https://github.com/shizenglin/Counting-with-Focus-for-FreeComment: ICCV, 201

    PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting

    Full text link
    Crowd counting, i.e., estimating the number of people in a crowded area, has attracted much interest in the research community. Although many attempts have been reported, crowd counting remains an open real-world problem due to the vast scale variations in crowd density within the interested area, and severe occlusion among the crowd. In this paper, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, that leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilizes these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. We also address the variation of crowdedness levels among different images with an exclusive Density-Aware Decoder (DAD). For this purpose, a classifier evaluates the density level of the input features and then passes them to the corresponding high and low crowded DAD modules. Finally, we generate an overall density map by considering the summation of low and high crowded density maps as spatial attention. Meanwhile, we employ two losses to create a precise density map for the input scene. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed PDANet in terms of the accuracy of counting and generated density maps over the well-known state of the arts

    CASA-Crowd: A Context-Aware Scale Aggregation CNN-Based Crowd Counting Technique

    Get PDF
    The accuracy of object-based computer vision techniques declines due to major challenges originating from large scale variation, varying shape, perspective variation, and lack of side information. To handle these challenges most of the crowd counting methods use multi-columns (restrict themselves to a set of specific density scenes), deploying a deeper and multi-networks for density estimation. However, these techniques suffer a lot of drawbacks such as extraction of identical features from multi-column, computationally complex architecture, overestimate the density estimation in sparse areas, underestimating in dense areas and averaging of feature maps result in reduced quality of density map. To overcome these drawbacks and to provide a state-of-the-art counting accuracy with comparable computational cost, we therefore propose a deeper and wider network: a Context-aware Scale Aggregation CNN-based Crowd Counting method (CASA-Crowd) to obtain the deep, varying scale and perspective varying features. Further, we include a dilated convolution with varying filter size to obtain contextual information. In addition, due to different dilation rates, a variation in receptive field size is more useful to overcome the perspective distortion. The quality of density map is enhanced while preserving the spatial dimension by obtaining a comparable computational complexity. We further evaluate our method on three well-known datasets: UCF_CC_50, ShanghaiTech Part_A, ShanghaiTech Part_B
    • …
    corecore