654 research outputs found

    Learning to Guide Decoding for Image Captioning

    Full text link
    Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at each time step. The guiding network can be plugged into the current encoder-decoder framework and trained in an end-to-end manner. Hence, the guiding vector can be adaptively learned according to the signal from the decoder, making itself to embed information from both image and language. Additionally, discriminative supervision can be employed to further improve the quality of guidance. The advantages of our proposed approach are verified by experiments carried out on the MS COCO dataset.Comment: AAAI-1

    Adaptive Local Steps Federated Learning with Differential Privacy Driven by Convergence Analysis

    Full text link
    Federated Learning (FL) is a distributed machine learning technique that allows model training among multiple devices or organizations without sharing data. However, while FL ensures that the raw data is not directly accessible to external adversaries, adversaries can still obtain some statistical information about the data through differential attacks. Differential Privacy (DP) has been proposed, which adds noise to the model or gradients to prevent adversaries from inferring private information from the transmitted parameters. We reconsider the framework of differential privacy federated learning in resource-constrained scenarios (privacy budget and communication resources). We analyze the convergence of federated learning with differential privacy (DPFL) on resource-constrained scenarios and propose an Adaptive Local Steps Differential Privacy Federated Learning (ALS-DPFL) algorithm. We experiment our algorithm on the FashionMNIST and Cifar-10 datasets and achieve quite good performance relative to previous work

    A bearing fault detection method with low-dimensional compressed measurements of vibration signal

    Get PDF
    The traditional bearing fault detection method is achieved often by sampling the bearing vibration data under the Shannon sampling theorem. Then the information of the bearing state can be extracted from the vibration data, which is used in fault detection. A long-term and continuous monitoring needs to sample and store large amounts of raw vibration signals, which will burden the data storage and transmission greatly. For this problem, a new bearing fault detection method based on compressed sensing is presented, which just needs to sample and store a small amount of compressed observation data and uses these data directly to achieve the fault detection. Firstly, an over-complete dictionary is trained, on which the vibration signals corresponded to normal state can be decomposed sparsely. Then, the bearing fault detection can be achieved based on the difference of the sparse representation errors between the compressed signals in normal state and fault state on this dictionary. The fault detection results of the proposed method with different parameters are analyzed. The effectiveness of the method is validated by the experimental tests

    AU-Supervised Convolutional Vision Transformers for Synthetic Facial Expression Recognition

    Full text link
    The paper describes our proposed methodology for the six basic expression classification track of Affective Behavior Analysis in-the-wild (ABAW) Competition 2022. In Learing from Synthetic Data(LSD) task, facial expression recognition (FER) methods aim to learn the representation of expression from the artificially generated data and generalise to real data. Because of the ambiguous of the synthetic data and the objectivity of the facial Action Unit (AU), we resort to the AU information for performance boosting, and make contributions as follows. First, to adapt the model to synthetic scenarios, we use the knowledge from pre-trained large-scale face recognition data. Second, we propose a conceptually-new framework, termed as AU-Supervised Convolutional Vision Transformers (AU-CVT), which clearly improves the performance of FER by jointly training auxiliary datasets with AU or pseudo AU labels. Our AU-CVT achieved F1 score as 0.68630.6863, accuracy as 0.74330.7433 on the validation set. The source code of our work is publicly available online: https://github.com/msy1412/ABAW
    corecore