1,211 research outputs found
Backdoor Attacks on Crowd Counting
Crowd counting is a regression task that estimates the number of people in a
scene image, which plays a vital role in a range of safety-critical
applications, such as video surveillance, traffic monitoring and flow control.
In this paper, we investigate the vulnerability of deep learning based crowd
counting models to backdoor attacks, a major security threat to deep learning.
A backdoor attack implants a backdoor trigger into a target model via data
poisoning so as to control the model's predictions at test time. Different from
image classification models on which most of existing backdoor attacks have
been developed and tested, crowd counting models are regression models that
output multi-dimensional density maps, thus requiring different techniques to
manipulate.
In this paper, we propose two novel Density Manipulation Backdoor Attacks
(DMBA and DMBA) to attack the model to produce arbitrarily large or
small density estimations. Experimental results demonstrate the effectiveness
of our DMBA attacks on five classic crowd counting models and four types of
datasets. We also provide an in-depth analysis of the unique challenges of
backdooring crowd counting models and reveal two key elements of effective
attacks: 1) full and dense triggers and 2) manipulation of the ground truth
counts or density maps. Our work could help evaluate the vulnerability of crowd
counting models to potential backdoor attacks.Comment: To appear in ACMMM 2022. 10pages, 6 figures and 2 table
PDANet: Pyramid Density-aware Attention Net for Accurate Crowd Counting
Crowd counting, i.e., estimating the number of people in a crowded area, has
attracted much interest in the research community. Although many attempts have
been reported, crowd counting remains an open real-world problem due to the
vast scale variations in crowd density within the interested area, and severe
occlusion among the crowd. In this paper, we propose a novel Pyramid
Density-Aware Attention-based network, abbreviated as PDANet, that leverages
the attention, pyramid scale feature and two branch decoder modules for
density-aware crowd counting. The PDANet utilizes these modules to extract
different scale features, focus on the relevant information, and suppress the
misleading ones. We also address the variation of crowdedness levels among
different images with an exclusive Density-Aware Decoder (DAD). For this
purpose, a classifier evaluates the density level of the input features and
then passes them to the corresponding high and low crowded DAD modules.
Finally, we generate an overall density map by considering the summation of low
and high crowded density maps as spatial attention. Meanwhile, we employ two
losses to create a precise density map for the input scene. Extensive
evaluations conducted on the challenging benchmark datasets well demonstrate
the superior performance of the proposed PDANet in terms of the accuracy of
counting and generated density maps over the well-known state of the arts
Convolutional Neural Network for Accurate Crowd Counting and Destiny Estimation
University of Technology Sydney. Faculty of Engineering and Information Technology.Nowadays, crowd and object counting has become an important task for a variety of applications, such as traffic control, public safety, urban planning, and video surveillance. It has also become a crucial part of building a high-level monitoring system such as video surveillance and crowd analysis. In these cases, dynamic crowd monitoring and analysis is extremely important for control management and social safety.
Like the other computer vision issues, crowd counting and density estimation come with various kinds of challenges such as high clutters, occlusions, non-uniform distributions of objects or people, and intra-scene and inter-scene variations in appearance. Researchers and industrial partners have attempted to design and develop many sophisticated models to address various issues that exist in crowd counting. Especially in recent years, the number of researches in the crowd counting era became overwhelming with the domination of deep-learning and Convolution Neural Networks (CNNs) based models in various computer vision tasks. In this thesis, we revisit the crowd counting and propose various novel solutions to this problem.
At first, we propose an Adaptive Counting Convolutional Neural Network (A-CCNN) and consider the scale variation of objects in a frame adaptively to improve the accuracy of counting. Our method takes advantages of contextual information to provide more accurate and adaptive density maps and crowd counting in a scene. Then, we focus on CNN pruning to further enhance the crowd counting models for real-time application and increase the performance of CCNN model. Thus, a new pruning strategy is proposed by considering the contributions of various filters to the final result. The filters in the original CCNN model are grouped into positive, negative, and irrelevant types. We prune the irrelevant filters, of which feature maps contain little information, and the negative filters determined by a mask learned from the training dataset. Our solution improves the results of the counting model without fine-tuning or retraining the pruned model. Finally, we propose a novel Pyramid Density-Aware Attention-based network, abbreviated as PDANet, which leverages the attention, pyramid scale feature and two branch decoder modules for density-aware crowd counting. The PDANet utilises these modules to extract different scale features, focus on the relevant information, and suppress the misleading ones. Extensive evaluations conducted on the challenging benchmark datasets well demonstrate the superior performance of the proposed models in terms of the accuracy of counting as well as generated density maps over the well-known state-of-the-art approaches
Collaborative Intelligent Cross-Camera Video Analytics at Edge: Opportunities and Challenges
Nowadays, video cameras are deployed in large scale for spatial monitoring of
physical places (e.g., surveillance systems in the context of smart cities).
The massive camera deployment, however, presents new challenges for analyzing
the enormous data, as the cost of high computational overhead of sophisticated
deep learning techniques imposes a prohibitive overhead, in terms of energy
consumption and processing throughput, on such resource-constrained edge
devices. To address these limitations, this paper envisions a collaborative
intelligent cross-camera video analytics paradigm at the network edge in which
camera nodes adjust their pipelines (e.g., inference) to incorporate correlated
observations and shared knowledge from other nodes' contents. By harassing
redundant spatio-temporal to reduce the size of the inference search space in
one hand, and intelligent collaboration between video nodes on the other, we
discuss how such collaborative paradigm can considerably improve accuracy,
reduce latency and decrease communication bandwidth compared to
non-collaborative baselines. This paper also describes major opportunities and
challenges in realizing such a paradigm.Comment: First International Workshop on Challenges in Artificial Intelligence
and Machine Learnin
A Unified Object Counting Network with Object Occupation Prior
The counting task, which plays a fundamental role in numerous applications
(e.g., crowd counting, traffic statistics), aims to predict the number of
objects with various densities. Existing object counting tasks are designed for
a single object class. However, it is inevitable to encounter newly coming data
with new classes in our real world. We name this scenario as \textit{evolving
object counting}. In this paper, we build the first evolving object counting
dataset and propose a unified object counting network as the first attempt to
address this task. The proposed model consists of two key components: a
class-agnostic mask module and a class-incremental module. The class-agnostic
mask module learns generic object occupation prior via predicting a
class-agnostic binary mask (e.g., 1 denotes there exists an object at the
considering position in an image and 0 otherwise). The class-incremental module
is used to handle new coming classes and provides discriminative class guidance
for density map prediction. The combined outputs of class-agnostic mask module
and image feature extractor are used to predict the final density map. When new
classes come, we first add new neural nodes into the last regression and
classification layers of class-incremental module. Then, instead of retraining
the model from scratch, we utilize knowledge distillation to help the model
remember what have already learned about previous object classes. We also
employ a support sample bank to store a small number of typical training
samples of each class, which are used to prevent the model from forgetting key
information of old data. With this design, our model can efficiently and
effectively adapt to new coming classes while keeping good performance on
already seen data without large-scale retraining. Extensive experiments on the
collected dataset demonstrate the favorable performance.Comment: Under review; The dataset and code will be available at:
https://github.com/Tanyjiang/EOC
Model Compression Methods for YOLOv5: A Review
Over the past few years, extensive research has been devoted to enhancing
YOLO object detectors. Since its introduction, eight major versions of YOLO
have been introduced with the purpose of improving its accuracy and efficiency.
While the evident merits of YOLO have yielded to its extensive use in many
areas, deploying it on resource-limited devices poses challenges. To address
this issue, various neural network compression methods have been developed,
which fall under three main categories, namely network pruning, quantization,
and knowledge distillation. The fruitful outcomes of utilizing model
compression methods, such as lowering memory usage and inference time, make
them favorable, if not necessary, for deploying large neural networks on
hardware-constrained edge devices. In this review paper, our focus is on
pruning and quantization due to their comparative modularity. We categorize
them and analyze the practical results of applying those methods to YOLOv5. By
doing so, we identify gaps in adapting pruning and quantization for compressing
YOLOv5, and provide future directions in this area for further exploration.
Among several versions of YOLO, we specifically choose YOLOv5 for its excellent
trade-off between recency and popularity in literature. This is the first
specific review paper that surveys pruning and quantization methods from an
implementation point of view on YOLOv5. Our study is also extendable to newer
versions of YOLO as implementing them on resource-limited devices poses the
same challenges that persist even today. This paper targets those interested in
the practical deployment of model compression methods on YOLOv5, and in
exploring different compression techniques that can be used for subsequent
versions of YOLO.Comment: 18 pages, 7 Figure
Efficient Convolutional Neural Networks for Image Classification and Regression
Neural networks have been a topic of research since 1970s and the Convolutional Neural Networks (CNNs) were first shown to work well for hand written digits recognition in 1998. These early networks were however still shallow and contained only a few layers. Moreover these networks were mostly trained on a small amount of data in contrast to the modern CNNs which contain hundreds of convolution layers and are trained on millions of images. However, this recent shift in machine learning comes at a cost. Modern neural networks have extremely large number of parameters and require huge amount of computations for training and inference. A 2018 study by OpenAI estimated that the compute requirements for training large models have been doubling every 3-4 months. By comparison Moore\u27s law has a 2 year doubling period. Thus, computational requirements have been surpassing the hardware capabilities very quickly. To address this issue, we have not only developed methods for reducing the computational cost for convolution neural networks, but also to enable them to train and continually learn in this framework. Specifically we have developed a new approach for compression based on spectral decomposition of filters, which replaces each convolution layer in the model with compact low-rank approximations. In doing so, we are motivated by the observation that the original filters in a convolution layer can be represented as weighted linear combinations of a set of 3D basis filters with one-dimensional weight kernels. While compression is achieved by using fewer basis filters, we show that these basis filters can be jointly finetuned along with the weight kernels to compensate for any loss in performance due to truncation, and to thereby achieve state of the art results on both classification and regression problems. We then, proposed using a minimum L1-norm regularizer to simultaneously train and compress the convolutional neural networks from scratch. Two popular neural network compression methods are pruning and low rank filter factorization. Since both of these methods depend on filter truncation, it is important for their success that the original filters are compact in the first place and the discarded filters contain little information. However conventional methods do not explicitly train filters for this purpose so any deletion of filters also discards useful information learned by the model. To address this problem we propose to train the model specifically for compression from scratch. We show that unlike conventionally trained networks, models trained with our approach can learn the same information in a much more compact fashion. Moreover the minimum L1 norm regularizer enables us to train the model for subsequent compression by either filter factorization or by pruning, thereby unifying these two compression strategies. We also show that our compression framework naturally extends to allow continual learning, where the model needs to learn continuously from new data as it becomes available. This introduces a problem referred to as catastrophic forgetting where the model fails to preserve previously learned information as it being trained on new data. We propose a solution which eliminates this problem altogether while also needing significantly less FLOPs as compared to other continual learning techniques
- …