1,211 research outputs found

    Backdoor Attacks on Crowd Counting

    Full text link
    Crowd counting is a regression task that estimates the number of people in a scene image, which plays a vital role in a range of safety-critical applications, such as video surveillance, traffic monitoring and flow control. In this paper, we investigate the vulnerability of deep learning based crowd counting models to backdoor attacks, a major security threat to deep learning. A backdoor attack implants a backdoor trigger into a target model via data poisoning so as to control the model's predictions at test time. Different from image classification models on which most of existing backdoor attacks have been developed and tested, crowd counting models are regression models that output multi-dimensional density maps, thus requiring different techniques to manipulate. In this paper, we propose two novel Density Manipulation Backdoor Attacks (DMBA−^{-} and DMBA+^{+}) to attack the model to produce arbitrarily large or small density estimations. Experimental results demonstrate the effectiveness of our DMBA attacks on five classic crowd counting models and four types of datasets. We also provide an in-depth analysis of the unique challenges of backdooring crowd counting models and reveal two key elements of effective attacks: 1) full and dense triggers and 2) manipulation of the ground truth counts or density maps. Our work could help evaluate the vulnerability of crowd counting models to potential backdoor attacks.Comment: To appear in ACMMM 2022. 10pages, 6 figures and 2 table

    PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting

    Full text link
    Crowd counting, i.e., estimating the number of people in a crowded area, has attracted much interest in the research community. Although many attempts have been reported, crowd counting remains an open real-world problem due to the vast scale variations in crowd density within the interested area, and severe occlusion among the crowd. In this paper, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, that leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilizes these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. We also address the variation of crowdedness levels among different images with an exclusive Density-Aware Decoder (DAD). For this purpose, a classifier evaluates the density level of the input features and then passes them to the corresponding high and low crowded DAD modules. Finally, we generate an overall density map by considering the summation of low and high crowded density maps as spatial attention. Meanwhile, we employ two losses to create a precise density map for the input scene. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed PDANet in terms of the accuracy of counting and generated density maps over the well-known state of the arts

    Convolutional Neural Network for Accurate Crowd Counting and Destiny Estimation

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Nowadays, crowd and object counting has become an important task for a variety of applications, such as traffic control, public safety, urban planning, and video surveillance. It has also become a crucial part of building a high-level monitoring system such as video surveillance and crowd analysis. In these cases, dynamic crowd monitoring and analysis is extremely important for control management and social safety. Like the other computer vision issues, crowd counting and density estimation come with various kinds of challenges such as high clutters, occlusions, non-uniform distributions of objects or people, and intra-scene and inter-scene variations in appearance. Researchers and industrial partners have attempted to design and develop many sophisticated models to address various issues that exist in crowd counting. Especially in recent years, the number of researches in the crowd counting era became overwhelming with the domination of deep-learning and Convolution Neural Networks (CNNs) based models in various computer vision tasks. In this thesis, we revisit the crowd counting and propose various novel solutions to this problem. At first, we propose an Adaptive Counting Convolutional Neural Network (A-CCNN) and consider the scale variation of objects in a frame adaptively to improve the accuracy of counting. Our method takes advantages of contextual information to provide more accurate and adaptive density maps and crowd counting in a scene. Then, we focus on CNN pruning to further enhance the crowd counting models for real-time application and increase the performance of CCNN model. Thus, a new pruning strategy is proposed by considering the contributions of various filters to the final result. The filters in the original CCNN model are grouped into positive, negative, and irrelevant types. We prune the irrelevant filters, of which feature maps contain little information, and the negative filters determined by a mask learned from the training dataset. Our solution improves the results of the counting model without fine-tuning or retraining the pruned model. Finally, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, which leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilises these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed models in terms of the accuracy of counting as well as generated density maps over the well-known state-of-the-art approaches

    Collaborative Intelligent Cross-Camera Video Analytics at Edge: Opportunities and Challenges

    Full text link
    Nowadays, video cameras are deployed in large scale for spatial monitoring of physical places (e.g., surveillance systems in the context of smart cities). The massive camera deployment, however, presents new challenges for analyzing the enormous data, as the cost of high computational overhead of sophisticated deep learning techniques imposes a prohibitive overhead, in terms of energy consumption and processing throughput, on such resource-constrained edge devices. To address these limitations, this paper envisions a collaborative intelligent cross-camera video analytics paradigm at the network edge in which camera nodes adjust their pipelines (e.g., inference) to incorporate correlated observations and shared knowledge from other nodes' contents. By harassing redundant spatio-temporal to reduce the size of the inference search space in one hand, and intelligent collaboration between video nodes on the other, we discuss how such collaborative paradigm can considerably improve accuracy, reduce latency and decrease communication bandwidth compared to non-collaborative baselines. This paper also describes major opportunities and challenges in realizing such a paradigm.Comment: First International Workshop on Challenges in Artificial Intelligence and Machine Learnin

    A Unified Object Counting Network with Object Occupation Prior

    Full text link
    The counting task, which plays a fundamental role in numerous applications (e.g., crowd counting, traffic statistics), aims to predict the number of objects with various densities. Existing object counting tasks are designed for a single object class. However, it is inevitable to encounter newly coming data with new classes in our real world. We name this scenario as \textit{evolving object counting}. In this paper, we build the first evolving object counting dataset and propose a unified object counting network as the first attempt to address this task. The proposed model consists of two key components: a class-agnostic mask module and a class-incremental module. The class-agnostic mask module learns generic object occupation prior via predicting a class-agnostic binary mask (e.g., 1 denotes there exists an object at the considering position in an image and 0 otherwise). The class-incremental module is used to handle new coming classes and provides discriminative class guidance for density map prediction. The combined outputs of class-agnostic mask module and image feature extractor are used to predict the final density map. When new classes come, we first add new neural nodes into the last regression and classification layers of class-incremental module. Then, instead of retraining the model from scratch, we utilize knowledge distillation to help the model remember what have already learned about previous object classes. We also employ a support sample bank to store a small number of typical training samples of each class, which are used to prevent the model from forgetting key information of old data. With this design, our model can efficiently and effectively adapt to new coming classes while keeping good performance on already seen data without large-scale retraining. Extensive experiments on the collected dataset demonstrate the favorable performance.Comment: Under review; The dataset and code will be available at: https://github.com/Tanyjiang/EOC

    Model Compression Methods for YOLOv5: A Review

    Full text link
    Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.Comment: 18 pages, 7 Figure

    Efficient Convolutional Neural Networks for Image Classification and Regression

    Get PDF
    Neural networks have been a topic of research since 1970s and the Convolutional Neural Networks (CNNs) were first shown to work well for hand written digits recognition in 1998. These early networks were however still shallow and contained only a few layers. Moreover these networks were mostly trained on a small amount of data in contrast to the modern CNNs which contain hundreds of convolution layers and are trained on millions of images. However, this recent shift in machine learning comes at a cost. Modern neural networks have extremely large number of parameters and require huge amount of computations for training and inference. A 2018 study by OpenAI estimated that the compute requirements for training large models have been doubling every 3-4 months. By comparison Moore\u27s law has a 2 year doubling period. Thus, computational requirements have been surpassing the hardware capabilities very quickly. To address this issue, we have not only developed methods for reducing the computational cost for convolution neural networks, but also to enable them to train and continually learn in this framework. Specifically we have developed a new approach for compression based on spectral decomposition of filters, which replaces each convolution layer in the model with compact low-rank approximations. In doing so, we are motivated by the observation that the original filters in a convolution layer can be represented as weighted linear combinations of a set of 3D basis filters with one-dimensional weight kernels. While compression is achieved by using fewer basis filters, we show that these basis filters can be jointly finetuned along with the weight kernels to compensate for any loss in performance due to truncation, and to thereby achieve state of the art results on both classification and regression problems. We then, proposed using a minimum L1-norm regularizer to simultaneously train and compress the convolutional neural networks from scratch. Two popular neural network compression methods are pruning and low rank filter factorization. Since both of these methods depend on filter truncation, it is important for their success that the original filters are compact in the first place and the discarded filters contain little information. However conventional methods do not explicitly train filters for this purpose so any deletion of filters also discards useful information learned by the model. To address this problem we propose to train the model specifically for compression from scratch. We show that unlike conventionally trained networks, models trained with our approach can learn the same information in a much more compact fashion. Moreover the minimum L1 norm regularizer enables us to train the model for subsequent compression by either filter factorization or by pruning, thereby unifying these two compression strategies. We also show that our compression framework naturally extends to allow continual learning, where the model needs to learn continuously from new data as it becomes available. This introduces a problem referred to as catastrophic forgetting where the model fails to preserve previously learned information as it being trained on new data. We propose a solution which eliminates this problem altogether while also needing significantly less FLOPs as compared to other continual learning techniques

    Answering skyline queries over incomplete data with crowdsourcing (Extended Abstract)

    Get PDF
    • …
    corecore