265 research outputs found

    Quantitative analysis of properties and spatial relations of fuzzy image regions

    Get PDF
    Properties of objects and spatial relations between objects play an important role in rule-based approaches for high-level vision. The partial presence or absence of such properties and relationships can supply both positive and negative evidence for region labeling hypotheses. Similarly, fuzzy labeling of a region can generate new hypotheses pertaining to the properties of the region, its relation to the neighboring regions, and finally, the labels of the neighboring regions. In this paper, we present a unified methodology to characterize properties and spatial relationships of object regions in a digital image. The proposed methods can be used to arrive at more meaningful decisions about the contents of the scene

    Neuron Activation Coverage: Rethinking Out-of-distribution Detection and Generalization

    Full text link
    The out-of-distribution (OOD) problem generally arises when neural networks encounter data that significantly deviates from the training data distribution, i.e., in-distribution (InD). In this paper, we study the OOD problem from a neuron activation view. We first formulate neuron activation states by considering both the neuron output and its influence on model decisions. Then, to characterize the relationship between neurons and OOD issues, we introduce the \textit{neuron activation coverage} (NAC) -- a simple measure for neuron behaviors under InD data. Leveraging our NAC, we show that 1) InD and OOD inputs can be largely separated based on the neuron behavior, which significantly eases the OOD detection problem and beats the 21 previous methods over three benchmarks (CIFAR-10, CIFAR-100, and ImageNet-1K). 2) a positive correlation between NAC and model generalization ability consistently holds across architectures and datasets, which enables a NAC-based criterion for evaluating model robustness. Compared to prevalent InD validation criteria, we show that NAC not only can select more robust models, but also has a stronger correlation with OOD test performance.Comment: 28 pages, 9 figures, 20 table

    DataElixir: Purifying Poisoned Dataset to Mitigate Backdoor Attacks via Diffusion Models

    Full text link
    Dataset sanitization is a widely adopted proactive defense against poisoning-based backdoor attacks, aimed at filtering out and removing poisoned samples from training datasets. However, existing methods have shown limited efficacy in countering the ever-evolving trigger functions, and often leading to considerable degradation of benign accuracy. In this paper, we propose DataElixir, a novel sanitization approach tailored to purify poisoned datasets. We leverage diffusion models to eliminate trigger features and restore benign features, thereby turning the poisoned samples into benign ones. Specifically, with multiple iterations of the forward and reverse process, we extract intermediary images and their predicted labels for each sample in the original dataset. Then, we identify anomalous samples in terms of the presence of label transition of the intermediary images, detect the target label by quantifying distribution discrepancy, select their purified images considering pixel and feature distance, and determine their ground-truth labels by training a benign model. Experiments conducted on 9 popular attacks demonstrates that DataElixir effectively mitigates various complex attacks while exerting minimal impact on benign accuracy, surpassing the performance of baseline defense methods.Comment: Accepted by AAAI202

    SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification

    Full text link
    Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this paper, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also offer a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the flexible extensibility of our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix. The code is available at https://github.com/zuiran/SpliceMix.Comment: 13 pages, 10 figure

    Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks

    Full text link
    Graph Neural Networks (GNNs) tend to suffer from high computation costs due to the exponentially increasing scale of graph data and the number of model parameters, which restricts their utility in practical applications. To this end, some recent works focus on sparsifying GNNs with the lottery ticket hypothesis (LTH) to reduce inference costs while maintaining performance levels. However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists. To overcome the above limitations, we propose a comprehensive graph gradual pruning framework termed CGP. This is achieved by designing a during-training graph pruning paradigm to dynamically prune GNNs within one training process. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Furthermore, we design a co-sparsifying strategy to comprehensively trim all three core elements of GNNs: graph structures, node features, and model parameters. Meanwhile, aiming at refining the pruning operation, we introduce a regrowth process into our CGP framework, in order to re-establish the pruned but important connections. The proposed CGP is evaluated by using a node classification task across 6 GNN architectures, including shallow models (GCN and GAT), shallow-but-deep-propagation models (SGC and APPNP), and deep models (GCNII and ResGCN), on a total of 14 real-world graph datasets, including large-scale graph datasets from the challenging Open Graph Benchmark. Experiments reveal that our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.Comment: 29 pages, 27 figures, submitting to IEEE TNNL

    Localization and mapping algorithm based on Lidar-IMU-Camera fusion

    Get PDF
    Positioning and mapping technology is a difficult and hot topic in autonomous driving environment sensing systems. In a complex traffic environment, the signal of the Global Navigation Satellite System (GNSS) will be blocked, leading to inaccurate vehicle positioning. To ensure the security of automatic electric campus vehicles, this study is based on the Lightweight and Ground-Optimized Lidar Odometry and Mapping on Variable Terrain (LEGO-LOAM) algorithm with a monocular vision system added. An algorithm framework based on Lidar-IMU-Camera (Lidar means light detection and ranging) fusion was proposed. A lightweight monocular vision odometer model was used, and the LEGO-LOAM system was employed to initialize monocular vision. The visual odometer information was taken as the initial value of the laser odometer. At the back-end opti9mization phase error state, the Kalman filtering fusion algorithm was employed to fuse the visual odometer and LEGO-LOAM system for positioning. The visual word bag model was applied to perform loopback detection. Taking the test results into account, the laser radar loopback detection was further optimized, reducing the accumulated positioning error. The real car experiment results showed that our algorithm could improve the mapping quality and positioning accuracy in the campus environment. The Lidar-IMU-Camera algorithm framework was verified on the Hong Kong city dataset UrbanNav. Compared with the LEGO-LOAM algorithm, the results show that the proposed algorithm can effectively reduce map drift, improve map resolution, and output more accurate driving trajectory information
    corecore