268 research outputs found
Two-Stream Action Recognition-Oriented Video Super-Resolution
We study the video super-resolution (SR) problem for facilitating video
analytics tasks, e.g. action recognition, instead of for visual quality. The
popular action recognition methods based on convolutional networks, exemplified
by two-stream networks, are not directly applicable on video of low spatial
resolution. This can be remedied by performing video SR prior to recognition,
which motivates us to improve the SR procedure for recognition accuracy.
Tailored for two-stream action recognition networks, we propose two video SR
methods for the spatial and temporal streams respectively. On the one hand, we
observe that regions with action are more important to recognition, and we
propose an optical-flow guided weighted mean-squared-error loss for our
spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving
objects. On the other hand, we observe that existing video SR methods incur
temporal discontinuity between frames, which also worsens the recognition
accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR)
training that emphasizes the temporal continuity between consecutive frames. We
perform experiments using two state-of-the-art action recognition networks and
two well-known datasets--UCF101 and HMDB51. Results demonstrate the
effectiveness of our proposed SoSR and ToSR in improving recognition accuracy.Comment: Accepted to ICCV 2019. Code:
https://github.com/AlanZhang1995/TwoStreamS
GameFormer: Game-theoretic Modeling and Learning of Transformer-based Interactive Prediction and Planning for Autonomous Driving
Autonomous vehicles operating in complex real-world environments require
accurate predictions of interactive behaviors between traffic participants.
While existing works focus on modeling agent interactions based on their past
trajectories, their future interactions are often ignored. This paper addresses
the interaction prediction problem by formulating it with hierarchical game
theory and proposing the GameFormer framework to implement it. Specifically, we
present a novel Transformer decoder structure that uses the prediction results
from the previous level together with the common environment background to
iteratively refine the interaction process. Moreover, we propose a learning
process that regulates an agent's behavior at the current level to respond to
other agents' behaviors from the last level. Through experiments on a
large-scale real-world driving dataset, we demonstrate that our model can
achieve state-of-the-art prediction accuracy on the interaction prediction
task. We also validate the model's capability to jointly reason about the ego
agent's motion plans and other agents' behaviors in both open-loop and
closed-loop planning tests, outperforming a variety of baseline methods
Evaluating Alzheimer's Disease Progression by Modeling Crosstalk Network Disruption
Aβ, tau and P-tau have been widely accepted as reliable markers for Alzheimer’s disease (AD). The crosstalk between these markers forms a complex network. AD may induce the integral variation and disruption of the network. The aim of this study was to develop a novel mathematic model based on a simplified crosstalk network to evaluate the disease progression of AD. The integral variation of the network is measured by three integral disruption parameters. The robustness of network is evaluated by network disruption probability. Presented results show that network disruption probability has a good linear relationship with Mini Mental State Examination (MMSE). The proposed model combined with Support vector machine (SVM) achieves a relative high 10-fold cross-validated performance in classification of AD vs normal and mild cognitive impairment (MCI) vs normal (95% accuracy, 95% sensitivity, 95% specificity for AD vs normal; 90% accuracy, 94% sensitivity, 83% specificity for MCI vs normal). This research evaluates the progression of AD and facilitates AD early diagnosis
An image is worth 1000 lies: adversarial transferability across prompts on vision-language models
Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same adversarial perturbations can fool different task-specific models. Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given? This question essentially introduces a novel perspective on adversarial transferability: cross-prompt adversarial transferability. In this work, we propose the Cross-Prompt Attack (CroPA). This proposed method updates the visual adversarial perturbation with learnable textual prompts, which are designed to counteract the misleading effects of the adversarial image. By doing this, CroPA significantly improves the transferability of adversarial examples across prompts. Extensive experiments are conducted to verify the strong cross-prompt adversarial transferability of CroPA with prevalent VLMs including Flamingo, BLIP-2, and InstructBLIP in various different tasks
Inspection of delamination defect in first wall panel of Tokamak device by using laser infrared thermography technique
First wall panels (FWPs), which adjoin the inner wall of the blanket modules in the vacuum vessel (VV) of a Tokamak device, are in structures of multilayer bounded together with a solid welding technique in order to perform its heat exchange, VV protection, and neutron breeding functions. The quality of the welding joint between layers is the key factor for FWP integrity. In order to conduct online inspection of the delamination defect in the FWPs, a nondestructive testing (NDT) method capable to detect delamination defect without accessing into the VV is required. In this paper, the feasibility of the laser infrared thermography (LIRT) testing NDT method was investigated experimentally for this purpose. To clarify its detectability under practical VV environment, inspections of several inspection modes were conducted based on the practical structure of FWP and VV of the EAST Tokamak device, i.e., modes of different distances and angles of FWPs toward the LIRT transducers. In practice, an LIRT testing system was established and several double-layered plate specimens with different artificial delamination defects were inspected under the selected testing conditions. Through thermography signal reconstruction, an image processing algorithm was proposed and adopted to enhance the defect detectability. From the results of different inspection modes, it was found that the angle factor may worsen the inspection precision and reduce the detectability for delamination defects in case of big defect depth-to-width ratio, even though the LIRT method is still applicable for inspection of relative large defects in FWP. Finally, the detectability in different inspection modes was clarified, which proved the feasibility of LIRT for FWP online inspection
- …