268 research outputs found

    Two-Stream Action Recognition-Oriented Video Super-Resolution

    Full text link
    We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets--UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy.Comment: Accepted to ICCV 2019. Code: https://github.com/AlanZhang1995/TwoStreamS

    GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving

    Full text link
    Autonomous vehicles operating in complex real-world environments require accurate predictions of interactive behaviors between traffic participants. While existing works focus on modeling agent interactions based on their past trajectories, their future interactions are often ignored. This paper addresses the interaction prediction problem by formulating it with hierarchical game theory and proposing the GameFormer framework to implement it. Specifically, we present a novel Transformer decoder structure that uses the prediction results from the previous level together with the common environment background to iteratively refine the interaction process. Moreover, we propose a learning process that regulates an agent's behavior at the current level to respond to other agents' behaviors from the last level. Through experiments on a large-scale real-world driving dataset, we demonstrate that our model can achieve state-of-the-art prediction accuracy on the interaction prediction task. We also validate the model's capability to jointly reason about the ego agent's motion plans and other agents' behaviors in both open-loop and closed-loop planning tests, outperforming a variety of baseline methods

    Evaluating Alzheimer's Disease Progression by Modeling Crosstalk Network Disruption

    Get PDF
    Aβ, tau and P-tau have been widely accepted as reliable markers for Alzheimer’s disease (AD). The crosstalk between these markers forms a complex network. AD may induce the integral variation and disruption of the network. The aim of this study was to develop a novel mathematic model based on a simplified crosstalk network to evaluate the disease progression of AD. The integral variation of the network is measured by three integral disruption parameters. The robustness of network is evaluated by network disruption probability. Presented results show that network disruption probability has a good linear relationship with Mini Mental State Examination (MMSE). The proposed model combined with Support vector machine (SVM) achieves a relative high 10-fold cross-validated performance in classification of AD vs normal and mild cognitive impairment (MCI) vs normal (95% accuracy, 95% sensitivity, 95% specificity for AD vs normal; 90% accuracy, 94% sensitivity, 83% specificity for MCI vs normal). This research evaluates the progression of AD and facilitates AD early diagnosis

    An image is worth 1000 lies: adversarial transferability across prompts on vision-language models

    Get PDF
    Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual adversarial perturbation with learnable textual prompts, which are designed to counteract the misleading effects of the adversarial image. By doing this, CroPA significantly improves the transferability of adversarial examples across prompts. Extensive experiments are conducted to verify the strong cross-prompt adversarial transferability of CroPA with prevalent VLMs including Flamingo, BLIP-2, and InstructBLIP in various different tasks

    Inspection of delamination defect in first wall panel of Tokamak device by using laser infrared thermography technique

    Get PDF
    First wall panels (FWPs), which adjoin the inner wall of the blanket modules in the vacuum vessel (VV) of a Tokamak device, are in structures of multilayer bounded together with a solid welding technique in order to perform its heat exchange, VV protection, and neutron breeding functions. The quality of the welding joint between layers is the key factor for FWP integrity. In order to conduct online inspection of the delamination defect in the FWPs, a nondestructive testing (NDT) method capable to detect delamination defect without accessing into the VV is required. In this paper, the feasibility of the laser infrared thermography (LIRT) testing NDT method was investigated experimentally for this purpose. To clarify its detectability under practical VV environment, inspections of several inspection modes were conducted based on the practical structure of FWP and VV of the EAST Tokamak device, i.e., modes of different distances and angles of FWPs toward the LIRT transducers. In practice, an LIRT testing system was established and several double-layered plate specimens with different artificial delamination defects were inspected under the selected testing conditions. Through thermography signal reconstruction, an image processing algorithm was proposed and adopted to enhance the defect detectability. From the results of different inspection modes, it was found that the angle factor may worsen the inspection precision and reduce the detectability for delamination defects in case of big defect depth-to-width ratio, even though the LIRT method is still applicable for inspection of relative large defects in FWP. Finally, the detectability in different inspection modes was clarified, which proved the feasibility of LIRT for FWP online inspection
    • …
    corecore