121 research outputs found

    Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

    Full text link
    The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a pre-segmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet'18 DenseCaption dataset and Charades-STA dataset while observing only 10 or less clips per video.Comment: AAAI 201

    Coordinating a Supply Chain When Manufacturer Makes Cost Reduction Investment in Supplier

    Get PDF
    We consider a supply chain consisting of an upstream supplier and a downstream manufacturer, in which the supplier provides a component to the manufacturer, facing a price-sensitive and uncertain demand. The manufacturer makes cost reduction investment in the supplier to improve the supplier’s production efficiency, which benefits the entire supply chain. We derive the optimal investment and operating decisions. Both the centralized and decentralized supply chains are studied. We show that the optimal investment and operating decisions in the decentralized setting may deviate from that in the centralized setting. To avoid the profit loss caused by such a deviation, we develop a coordination mechanism by introducing a combined policy of revenue-sharing policy and investment cost-sharing policy. We also show that the developed coordination mechanism can achieve Pareto improvement for the two players

    The Research of Three Regions Acquisition and Analysis System of Pulse Based on Flexible Sensor

    Get PDF
    The objectification of pulse diagnosis is very important to the development and inheritance of TCM, the first step is how to collect more abundant and comprehensive pulse information quickly, reduce the threshold of users for using pulse diagnosis equipment. The existing pulse diagnosis equipment has some limitations, such as single acquisition site, complex compression form and serious dependence on professionals for correcting-pulse position selection. Therefore, a three-pulse diagnosis system based on flexible sensor is designed, which uses a new type of flexible sensor as the data acquisition port, combined with upper computer software and lower computer software to achieve goals of intelligent decompression and data acquisition from Cun, Guan, Chi. The equipment not only greatly reduces the difficulty for users to find correct pulse position identification, but also collect non-destructive pulse information, which provides a new acquisition mode for the pulse diagnosis instrument

    RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

    Full text link
    We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition. This task, however, is extremely challenging due to 1) the highly complex spatial-temporal information in videos; and 2) the lack of labeled data for training. Unlike the representation learning for static images, it is difficult to construct a suitable self-supervised task to well model both motion and appearance features. More recently, several attempts have been made to learn video representation through video playback speed prediction. However, it is non-trivial to obtain precise speed labels for the videos. More critically, the learnt models may tend to focus on motion pattern and thus may not learn appearance features well. In this paper, we observe that the relative playback speed is more consistent with motion pattern, and thus provide more effective and stable supervision for representation learning. Therefore, we propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels. In this way, we are able to well perceive speed and learn better motion features. Moreover, to ensure the learning of appearance features, we further propose an appearance-focused task, where we enforce the model to perceive the appearance difference between two video clips. We show that optimizing the two tasks jointly consistently improves the performance on two downstream tasks, namely action recognition and video retrieval. Remarkably, for action recognition on UCF101 dataset, we achieve 93.7% accuracy without the use of labeled data for pre-training, which outperforms the ImageNet supervised pre-trained model. Code and pre-trained models can be found at https://github.com/PeihaoChen/RSPNet.Comment: Accepted by AAAI-2021. Code and pre-trained models can be found at https://github.com/PeihaoChen/RSPNe

    Oriented Three-Dimensional Magnetic Biskyrmion in MnNiGa Bulk Crystals

    Full text link
    A biskyrmion consists of two bound, topologically stable skyrmion spin textures. These coffee-bean-shaped objects have been observed in real-space in thin plates using Lorentz transmission electron microscopy (LTEM). From LTEM imaging alone, it is not clear whether biskyrmions are surface-confined objects, or, analogously to skyrmions in non-centrosymmetric helimagnets, three-dimensional tube-like structures in bulk sample. Here, we investigate the biskyrmion form factor in single- and polycrystalline MnNiGa samples using small angle neutron scattering (SANS). We find that biskyrmions are not long-range ordered, not even in single-crystals. Surprisingly all of the disordered biskyrmions have their in-plane symmetry axis aligned along certain directions, governed by the magnetocrystalline anisotropy. This anisotropic nature of biskyrmions may be further exploited to encode information
    • …
    corecore