6,015 research outputs found

    SCANet: A Self- and Cross-Attention Network for Audio-Visual Speech Separation

    Full text link
    The integration of different modalities, such as audio and visual information, plays a crucial role in human perception of the surrounding environment. Recent research has made significant progress in designing fusion modules for audio-visual speech separation. However, they predominantly focus on multi-modal fusion architectures situated either at the top or bottom positions, rather than comprehensively considering multi-modal fusion at various hierarchical positions within the network. In this paper, we propose a novel model called self- and cross-attention network (SCANet), which leverages the attention mechanism for efficient audio-visual feature fusion. SCANet consists of two types of attention blocks: self-attention (SA) and cross-attention (CA) blocks, where the CA blocks are distributed at the top (TCA), middle (MCA) and bottom (BCA) of SCANet. These blocks maintain the ability to learn modality-specific features and enable the extraction of different semantics from audio-visual features. Comprehensive experiments on three standard audio-visual separation benchmarks (LRS2, LRS3, and VoxCeleb2) demonstrate the effectiveness of SCANet, outperforming existing state-of-the-art (SOTA) methods while maintaining comparable inference time.Comment: 14 pages, 3 figure

    Variability in the impacts of partisan conflict: a new perspective from bank credit

    Get PDF
    The purpose of this article is to analyse the impact of partisan conflict on bank credit, and take the global financial crisis as the time node to analyse the variability of this impact before and after the financial crisis. This article examines the impacts of partisan conflict on the bank credit by employing the US data covering the past 40 years and captures the variability in the effects of partisan conflict based on the rolling sample and time-varying parameter VAR analysis. The full sample results reveal that one standard deviation partisan conflict shock will shrink the bank credit growth rate to nonfinancial sectors, and the negative effects of partisan conflict on bank credit are more substantial after the global financial crisis. The rolling sample and time-varying parameter VAR analysis further confirm that the impacts of partisan conflict shock have varied substantially over time, where bank credit still negatively reacts to the impacts of partisan conflict in recent periods. Additionally, we estimate two extended models and support the intermediate role of economic policy uncertainty in transmitting the partisan conflict and the substitution effect of cross-border bank lending on domestic bank credit. Finally, our major results are unchanged by performing a series of robustness checks. The conclusion of this article is that partisan conflict has a significant impact on bank credit and shows obvious variability, which is more significant after the global financial crisis

    Learning Motor Skills of Reactive Reaching and Grasping of Objects

    Get PDF
    Reactive grasping of objects is an essential capability of autonomous robot manipulation, which is yet challenging to learn such sensorimotor control to coordinate coherent hand-finger motions and be robust against disturbances and failures. This work proposed a deep reinforcement learning based scheme to train feedback control policies which can coordinate reaching and grasping actions in presence of uncertainties. We formulated geometric metrics and task-orientated quantities to design the reward, which enabled efficient exploration of grasping policies. Further, to improve the success rate, we deployed key initial states of difficult hand-finger poses to train policies to overcome potential failures due to challenging configurations. The extensive simulation validations and benchmarks demonstrated that the learned policy was robust to grasp both static and moving objects. Moreover, the policy generated successful failure recoveries within a short time in difficult configurations and was robust with synthetic noises in the state feedback which were unseen during training
    corecore