265 research outputs found
Quantitative analysis of properties and spatial relations of fuzzy image regions
Properties of objects and spatial relations between objects play an important role in rule-based approaches for high-level vision. The partial presence or absence of such properties and relationships can supply both positive and negative evidence for region labeling hypotheses. Similarly, fuzzy labeling of a region can generate new hypotheses pertaining to the properties of the region, its relation to the neighboring regions, and finally, the labels of the neighboring regions. In this paper, we present a unified methodology to characterize properties and spatial relationships of object regions in a digital image. The proposed methods can be used to arrive at more meaningful decisions about the contents of the scene
Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization
The out-of-distribution (OOD) problem generally arises when neural networks
encounter data that significantly deviates from the training data distribution,
i.e., in-distribution (InD). In this paper, we study the OOD problem from a
neuron activation view. We first formulate neuron activation states by
considering both the neuron output and its influence on model decisions. Then,
to characterize the relationship between neurons and OOD issues, we introduce
the \textit{neuron activation coverage} (NAC) -- a simple measure for neuron
behaviors under InD data. Leveraging our NAC, we show that 1) InD and OOD
inputs can be largely separated based on the neuron behavior, which
significantly eases the OOD detection problem and beats the 21 previous methods
over three benchmarks (CIFAR-10, CIFAR-100, and ImageNet-1K). 2) a positive
correlation between NAC and model generalization ability consistently holds
across architectures and datasets, which enables a NAC-based criterion for
evaluating model robustness. Compared to prevalent InD validation criteria, we
show that NAC not only can select more robust models, but also has a stronger
correlation with OOD test performance.Comment: 28 pages, 9 figures, 20 table
DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models
Dataset sanitization is a widely adopted proactive defense against
poisoning-based backdoor attacks, aimed at filtering out and removing poisoned
samples from training datasets. However, existing methods have shown limited
efficacy in countering the ever-evolving trigger functions, and often leading
to considerable degradation of benign accuracy. In this paper, we propose
DataElixir, a novel sanitization approach tailored to purify poisoned datasets.
We leverage diffusion models to eliminate trigger features and restore benign
features, thereby turning the poisoned samples into benign ones. Specifically,
with multiple iterations of the forward and reverse process, we extract
intermediary images and their predicted labels for each sample in the original
dataset. Then, we identify anomalous samples in terms of the presence of label
transition of the intermediary images, detect the target label by quantifying
distribution discrepancy, select their purified images considering pixel and
feature distance, and determine their ground-truth labels by training a benign
model. Experiments conducted on 9 popular attacks demonstrates that DataElixir
effectively mitigates various complex attacks while exerting minimal impact on
benign accuracy, surpassing the performance of baseline defense methods.Comment: Accepted by AAAI202
SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification
Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have
shown promising performance in various visual tasks. However, these methods are
primarily designed for single-label images, ignoring the considerable
discrepancies between single- and multi-label images, i.e., a multi-label image
involves multiple co-occurred categories and fickle object scales. On the other
hand, previous multi-label image classification (MLIC) methods tend to design
elaborate models, bringing expensive computation. In this paper, we introduce a
simple but effective augmentation strategy for multi-label image
classification, namely SpliceMix. The "splice" in our method is two-fold: 1)
Each mixed image is a splice of several downsampled images in the form of a
grid, where the semantics of images attending to mixing are blended without
object deficiencies for alleviating co-occurred bias; 2) We splice mixed images
and the original mini-batch to form a new SpliceMixed mini-batch, which allows
an image with different scales to contribute to training together. Furthermore,
such splice in our SpliceMixed mini-batch enables interactions between mixed
images and original regular images. We also offer a simple and non-parametric
extension based on consistency learning (SpliceMix-CL) to show the flexible
extensibility of our SpliceMix. Extensive experiments on various tasks
demonstrate that only using SpliceMix with a baseline model (e.g., ResNet)
achieves better performance than state-of-the-art methods. Moreover, the
generalizability of our SpliceMix is further validated by the improvements in
current MLIC methods when married with our SpliceMix. The code is available at
https://github.com/zuiran/SpliceMix.Comment: 13 pages, 10 figure
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks
Graph Neural Networks (GNNs) tend to suffer from high computation costs due
to the exponentially increasing scale of graph data and the number of model
parameters, which restricts their utility in practical applications. To this
end, some recent works focus on sparsifying GNNs with the lottery ticket
hypothesis (LTH) to reduce inference costs while maintaining performance
levels. However, the LTH-based methods suffer from two major drawbacks: 1) they
require exhaustive and iterative training of dense models, resulting in an
extremely large training computation cost, and 2) they only trim graph
structures and model parameters but ignore the node feature dimension, where
significant redundancy exists. To overcome the above limitations, we propose a
comprehensive graph gradual pruning framework termed CGP. This is achieved by
designing a during-training graph pruning paradigm to dynamically prune GNNs
within one training process. Unlike LTH-based methods, the proposed CGP
approach requires no re-training, which significantly reduces the computation
costs. Furthermore, we design a co-sparsifying strategy to comprehensively trim
all three core elements of GNNs: graph structures, node features, and model
parameters. Meanwhile, aiming at refining the pruning operation, we introduce a
regrowth process into our CGP framework, in order to re-establish the pruned
but important connections. The proposed CGP is evaluated by using a node
classification task across 6 GNN architectures, including shallow models (GCN
and GAT), shallow-but-deep-propagation models (SGC and APPNP), and deep models
(GCNII and ResGCN), on a total of 14 real-world graph datasets, including
large-scale graph datasets from the challenging Open Graph Benchmark.
Experiments reveal that our proposed strategy greatly improves both training
and inference efficiency while matching or even exceeding the accuracy of
existing methods.Comment: 29 pages, 27 figures, submitting to IEEE TNNL
Localization and mapping algorithm based on Lidar-IMU-Camera fusion
Positioning and mapping technology is a difficult and hot topic in autonomous driving environment sensing systems. In a complex traffic environment, the signal of the Global Navigation Satellite System (GNSS) will be blocked, leading to inaccurate vehicle positioning. To ensure the security of automatic electric campus vehicles, this study is based on the Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain (LEGO-LOAM) algorithm with a monocular vision system added. An algorithm framework based on Lidar-IMU-Camera (Lidar means light detection and ranging) fusion was proposed. A lightweight monocular vision odometer model was used, and the LEGO-LOAM system was employed to initialize monocular vision. The visual odometer information was taken as the initial value of the laser odometer. At the back-end opti9mization phase error state, the Kalman filtering fusion algorithm was employed to fuse the visual odometer and LEGO-LOAM system for positioning. The visual word bag model was applied to perform loopback detection. Taking the test results into account, the laser radar loopback detection was further optimized, reducing the accumulated positioning error. The real car experiment results showed that our algorithm could improve the mapping quality and positioning accuracy in the campus environment. The Lidar-IMU-Camera algorithm framework was verified on the Hong Kong city dataset UrbanNav. Compared with the LEGO-LOAM algorithm, the results show that the proposed algorithm can effectively reduce map drift, improve map resolution, and output more accurate driving trajectory information
- …