43 research outputs found

    RBFormer: Improve Adversarial Robustness of Transformer by Robust Bias

    Full text link
    Recently, there has been a surge of interest and attention in Transformer-based structures, such as Vision Transformer (ViT) and Vision Multilayer Perceptron (VMLP). Compared with the previous convolution-based structures, the Transformer-based structure under investigation showcases a comparable or superior performance under its distinctive attention-based input token mixer strategy. Introducing adversarial examples as a robustness consideration has had a profound and detrimental impact on the performance of well-established convolution-based structures. This inherent vulnerability to adversarial attacks has also been demonstrated in Transformer-based structures. In this paper, our emphasis lies on investigating the intrinsic robustness of the structure rather than introducing novel defense measures against adversarial attacks. To address the susceptibility to robustness issues, we employ a rational structure design approach to mitigate such vulnerabilities. Specifically, we enhance the adversarial robustness of the structure by increasing the proportion of high-frequency structural robust biases. As a result, we introduce a novel structure called Robust Bias Transformer-based Structure (RBFormer) that shows robust superiority compared to several existing baseline structures. Through a series of extensive experiments, RBFormer outperforms the original structures by a significant margin, achieving an impressive improvement of +16.12% and +5.04% across different evaluation criteria on CIFAR-10 and ImageNet-1k, respectively.Comment: BMVC 202

    Time and frequency localized pulse shape for resolution enhancement in STFT-BOTDR

    No full text
    Short Time Fourier Transform-Brillouin Optical Time Domain Reflectometry (STFT-BOTDR) implements STFT over the full frequency spectrum to measure the distributed temperature and strain along the optic fiber, providing new research advances in dynamic distributed sensing. The spatial and frequency resolution of the dynamic sensing is limited by the Signal to Noise Ratio (SNR) and the Time-Frequency (T-F) localization of the input pulse shape. T-F localization is fundamentally important for the communication system, which suppresses interchannel interference (ICI) and intersymbol interference (ISI) to improve the transmission quality in multi-carrier modulation (MCM). This paper demonstrates that the T-F localized input pulse shape can enhance the SNR, the spatial and frequency resolution in STFT-BOTDR. Simulation and experiments of T-F localized different pulses shapes are conducted to compare the limitation of the system resolution. The result indicates that rectangular pulse should be selected to optimize the spatial resolution, Lorentzian pulse could be chosen to optimize the frequency resolution, while Gaussian shape pulse can be used in general applications for its balanced performance in both spatial and frequency resolution. Meanwhile, T-F localization is proved to be useful in the pulse shape selection for system resolution optimization

    Image Captioning in news report scenario

    Full text link
    Image captioning strives to generate pertinent captions for specified images, situating itself at the crossroads of Computer Vision (CV) and Natural Language Processing (NLP). This endeavor is of paramount importance with far-reaching applications in recommendation systems, news outlets, social media, and beyond. Particularly within the realm of news reporting, captions are expected to encompass detailed information, such as the identities of celebrities captured in the images. However, much of the existing body of work primarily centers around understanding scenes and actions. In this paper, we explore the realm of image captioning specifically tailored for celebrity photographs, illustrating its broad potential for enhancing news industry practices. This exploration aims to augment automated news content generation, thereby facilitating a more nuanced dissemination of information. Our endeavor shows a broader horizon, enriching the narrative in news reporting through a more intuitive image captioning framework.Comment: 10 pages, 4 figure

    Gaining the Sparse Rewards by Exploring Binary Lottery Tickets in Spiking Neural Network

    Full text link
    Spiking Neural Network (SNN) as a brain-inspired strategy receives lots of attention because of the high-sparsity and low-power properties derived from its inherent spiking information state. To further improve the efficiency of SNN, some works declare that the Lottery Tickets (LTs) Hypothesis, which indicates that the Artificial Neural Network (ANN) contains a subnetwork without sacrificing the performance of the original network, also exists in SNN. However, the spiking information handled by SNN has a natural similarity and affinity with binarization in sparsification. Therefore, to further explore SNN efficiency, this paper focuses on (1) the presence or absence of LTs in the binary SNN, and (2) whether the spiking mechanism is a superior strategy in terms of handling binary information compared to simple model binarization. To certify these consumptions, a sparse training method is proposed to find Binary Weights Spiking Lottery Tickets (BinW-SLT) under different network structures. Through comprehensive evaluations, we show that BinW-SLT could attain up to +5.86% and +3.17% improvement on CIFAR-10 and CIFAR-100 compared with binary LTs, as well as achieve 1.86x and 8.92x energy saving compared with full-precision SNN and ANN.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl
    corecore