364 research outputs found
MultIOD: Rehearsal-free Multihead Incremental Object Detector
Class-Incremental learning (CIL) is the ability of artificial agents to
accommodate new classes as they appear in a stream. It is particularly
interesting in evolving environments where agents have limited access to memory
and computational resources. The main challenge of class-incremental learning
is catastrophic forgetting, the inability of neural networks to retain past
knowledge when learning a new one. Unfortunately, most existing
class-incremental object detectors are applied to two-stage algorithms such as
Faster-RCNN and rely on rehearsal memory to retain past knowledge. We believe
that the current benchmarks are not realistic, and more effort should be
dedicated to anchor-free and rehearsal-free object detection. In this context,
we propose MultIOD, a class-incremental object detector based on CenterNet. Our
main contributions are: (1) we propose a multihead feature pyramid and
multihead detection architecture to efficiently separate class representations,
(2) we employ transfer learning between classes learned initially and those
learned incrementally to tackle catastrophic forgetting, and (3) we use a
class-wise non-max-suppression as a post-processing technique to remove
redundant boxes. Without bells and whistles, our method outperforms a range of
state-of-the-art methods on two Pascal VOC datasets.Comment: Under review at the WACV 2024 conferenc
Gradient-Based Post-Training Quantization: Challenging the Status Quo
Quantization has become a crucial step for the efficient deployment of deep
neural networks, where floating point operations are converted to simpler fixed
point operations. In its most naive form, it simply consists in a combination
of scaling and rounding transformations, leading to either a limited
compression rate or a significant accuracy drop. Recently, Gradient-based
post-training quantization (GPTQ) methods appears to be constitute a suitable
trade-off between such simple methods and more powerful, yet expensive
Quantization-Aware Training (QAT) approaches, particularly when attempting to
quantize LLMs, where scalability of the quantization process is of paramount
importance. GPTQ essentially consists in learning the rounding operation using
a small calibration set. In this work, we challenge common choices in GPTQ
methods. In particular, we show that the process is, to a certain extent,
robust to a number of variables (weight selection, feature augmentation, choice
of calibration set). More importantly, we derive a number of best practices for
designing more efficient and scalable GPTQ methods, regarding the problem
formulation (loss, degrees of freedom, use of non-uniform quantization schemes)
or optimization process (choice of variable and optimizer). Lastly, we propose
a novel importance-based mixed-precision technique. Those guidelines lead to
significant performance improvements on all the tested state-of-the-art GPTQ
methods and networks (e.g. +6.819 points on ViT for 4-bit quantization), paving
the way for the design of scalable, yet effective quantization methods
PIPE : Parallelized Inference Through Post-Training Quantization Ensembling of Residual Expansions
Deep neural networks (DNNs) are ubiquitous in computer vision and natural
language processing, but suffer from high inference cost. This problem can be
addressed by quantization, which consists in converting floating point
perations into a lower bit-width format. With the growing concerns on privacy
rights, we focus our efforts on data-free methods. However, such techniques
suffer from their lack of adaptability to the target devices, as a hardware
typically only support specific bit widths. Thus, to adapt to a variety of
devices, a quantization method shall be flexible enough to find good accuracy
v.s. speed trade-offs for every bit width and target device. To achieve this,
we propose PIPE, a quantization method that leverages residual error expansion,
along with group sparsity and an ensemble approximation for better
parallelization. PIPE is backed off by strong theoretical guarantees and
achieves superior performance on every benchmarked application (from vision to
NLP tasks), architecture (ConvNets, transformers) and bit-width (from int8 to
ternary quantization).Comment: arXiv admin note: substantial text overlap with arXiv:2203.1464
SAfER: Layer-Level Sensitivity Assessment for Efficient and Robust Neural Network Inference
Deep neural networks (DNNs) demonstrate outstanding performance across most
computer vision tasks. Some critical applications, such as autonomous driving
or medical imaging, also require investigation into their behavior and the
reasons behind the decisions they make. In this vein, DNN attribution consists
in studying the relationship between the predictions of a DNN and its inputs.
Attribution methods have been adapted to highlight the most relevant weights or
neurons in a DNN, allowing to more efficiently select which weights or neurons
can be pruned. However, a limitation of these approaches is that weights are
typically compared within each layer separately, while some layers might appear
as more critical than others. In this work, we propose to investigate DNN layer
importance, i.e. to estimate the sensitivity of the accuracy w.r.t.
perturbations applied at the layer level. To do so, we propose a novel dataset
to evaluate our method as well as future works. We benchmark a number of
criteria and draw conclusions regarding how to assess DNN layer importance and,
consequently, how to budgetize layers for increased DNN efficiency (with
applications for DNN pruning and quantization), as well as robustness to
hardware failure (e.g. bit swaps)
REx: Data-Free Residual Quantization Error Expansion
Deep neural networks (DNNs) are ubiquitous in computer vision and natural
language processing, but suffer from high inference cost. This problem can be
addressed by quantization, which consists in converting floating point
operations into a lower bit-width format. With the growing concerns on privacy
rights, we focus our efforts on data-free methods. However, such techniques
suffer from their lack of adaptability to the target devices, as a hardware
typically only support specific bit widths. Thus, to adapt to a variety of
devices, a quantization method shall be flexible enough to find good accuracy
v.s. speed trade-offs for every bit width and target device. To achieve this,
we propose REx, a quantization method that leverages residual error expansion,
along with group sparsity and an ensemble approximation for better
parallelization. REx is backed off by strong theoretical guarantees and
achieves superior performance on every benchmarked application (from vision to
NLP tasks), architecture (ConvNets, transformers) and bit-width (from int8 to
ternary quantization)
Multiple kernel learning SVM and statistical validation for facial landmark detection
Abstract — In this paper we present a robust and accurate method to detect 17 facial landmarks in expressive face images. We introduce a new multi-resolution framework based on the recent multiple kernel algorithm. Low resolution patches carry the global information of the face and give a coarse but robust detection of the desired landmark. High resolution patches, using local details, refine this location. This process is combined with a bootstrap process and a statistical validation, both improving the system robustness. Combining independent point detection and prior knowledge on the point distribution, the proposed detector is robust to variable lighting conditions and facial expressions. This detector is tested on several databases and the results reported can be compared favorably with the current state of the art point detectors. I
Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks
Deep neural networks (DNNs) have become ubiquitous in addressing a number of
problems, particularly in computer vision. However, DNN inference is
computationally intensive, which can be prohibitive e.g. when considering edge
devices. To solve this problem, a popular solution is DNN pruning, and more so
structured pruning, where coherent computational blocks (e.g. channels for
convolutional networks) are removed: as an exhaustive search of the space of
pruned sub-models is intractable in practice, channels are typically removed
iteratively based on an importance estimation heuristic. Recently, promising
latency-aware pruning methods were proposed, where channels are removed until
the network reaches a target budget of wall-clock latency pre-emptively
estimated on specific hardware. In this paper, we present Archtree, a novel
method for latency-driven structured pruning of DNNs. Archtree explores
multiple candidate pruned sub-models in parallel in a tree-like fashion,
allowing for a better exploration of the search space. Furthermore, it involves
on-the-fly latency estimation on the target hardware, accounting for closer
latencies as compared to the specified budget. Empirical results on several DNN
architectures and target hardware show that Archtree better preserves the
original model accuracy while better fitting the latency budget as compared to
existing state-of-the-art methods.Comment: 10 pages, 7 figure
Robust continuous prediction of human emotions using multiscale dynamic cues
Designing systems able to interact with humans in a natural manner is a complex and far from solved problem. A key aspect of natural interaction is the ability to understand and appropriately respond to human emotions. This paper details our response to the Audio/Visual Emotion Challenge (AVEC’12) whose goal is to continuously predict four affective signals describing human emotions (namely valence, arousal, expectancy and power). The proposed method uses log-magnitude Fourier spectra to extract multiscale dynamic descriptions of signals characterizing global and local face appearance as well as head movements and voice. We perform a kernel regression with very few representative samples selected via a supervised weighted-distance-based clustering, that leads to a high generalization power. For selecting features, we introduce a new correlation-based measure that takes into account a possible delay between the labels and the data and significantly increases robustness. We also propose a particularly fast regressor-level fusion framework to merge systems based on di↵erent modalities. Experiments have proven the e ciency of each key point of the proposed method and we obtain very promising results
- …