1,249 research outputs found

    Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization

    Full text link
    The objective of the sound source localization task is to enable machines to detect the location of sound-making objects within a visual scene. While the audio modality provides spatial cues to locate the sound source, existing approaches only use audio as an auxiliary role to compare spatial regions of the visual modality. Humans, on the other hand, utilize both audio and visual modalities as spatial cues to locate sound sources. In this paper, we propose an audio-visual spatial integration network that integrates spatial cues from both modalities to mimic human behavior when detecting sound-making objects. Additionally, we introduce a recursive attention network to mimic human behavior of iterative focusing on objects, resulting in more accurate attention regions. To effectively encode spatial information from both modalities, we propose audio-visual pair matching loss and spatial region alignment loss. By utilizing the spatial cues of audio-visual modalities and recursively focusing objects, our method can perform more robust sound source localization. Comprehensive experimental results on the Flickr SoundNet and VGG-Sound Source datasets demonstrate the superiority of our proposed method over existing approaches. Our code is available at: https://github.com/VisualAIKHU/SIRA-SSLComment: Camera-Ready, ACM MM 202

    Class-Attentive Diffusion Network for Semi-Supervised Classification

    Get PDF
    Recently, graph neural networks for semi-supervised classification have been widely studied. However, existing methods only use the information of limited neighbors and do not deal with the inter-class connections in graphs. In this paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD), a new aggregation scheme that adaptively aggregates nodes probably of the same class among K-hop neighbors. To this end, we first propose a novel stochastic process, called Class-Attentive Diffusion (CAD), that strengthens attention to intra-class nodes and attenuates attention to inter-class nodes. In contrast to the existing diffusion methods with a transition matrix determined solely by the graph structure, CAD considers both the node features and the graph structure with the design of our class-attentive transition matrix that utilizes a classifier. Then, we further propose an adaptive update scheme that leverages different reflection ratios of the diffusion result for each node depending on the local class-context. As the main advantage, AdaCAD alleviates the problem of undesired mixing of inter-class features caused by discrepancies between node labels and the graph topology. Built on AdaCAD, we construct a simple model called Class-Attentive Diffusion Network (CAD-Net). Extensive experiments on seven benchmark datasets consistently demonstrate the efficacy of the proposed method and our CAD-Net significantly outperforms the state-of-the-art methods. Code is available at https://github.com/ljin0429/CAD-Net.Comment: Accepted to AAAI 202

    Confidence-Based Feature Imputation for Graphs with Partially Known Features

    Full text link
    This paper investigates a missing feature imputation problem for graph learning tasks. Several methods have previously addressed learning tasks on graphs with missing features. However, in cases of high rates of missing features, they were unable to avoid significant performance degradation. To overcome this limitation, we introduce a novel concept of channel-wise confidence in a node feature, which is assigned to each imputed channel feature of a node for reflecting certainty of the imputation. We then design pseudo-confidence using the channel-wise shortest path distance between a missing-feature node and its nearest known-feature node to replace unavailable true confidence in an actual learning process. Based on the pseudo-confidence, we propose a novel feature imputation scheme that performs channel-wise inter-node diffusion and node-wise inter-channel propagation. The scheme can endure even at an exceedingly high missing rate (e.g., 99.5\%) and it achieves state-of-the-art accuracy for both semi-supervised node classification and link prediction on various datasets containing a high rate of missing features. Codes are available at https://github.com/daehoum1/pcfi.Comment: Accepted to ICLR 2023. 28 page

    Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

    Full text link
    The goal of the multi-sound source localization task is to localize sound sources from the mixture individually. While recent multi-sound source localization methods have shown improved performance, they face challenges due to their reliance on prior information about the number of objects to be separated. In this paper, to overcome this limitation, we present a novel multi-sound source localization method that can perform localization without prior knowledge of the number of sound sources. To achieve this goal, we propose an iterative object identification (IOI) module, which can recognize sound-making objects in an iterative manner. After finding the regions of sound-making objects, we devise object similarity-aware clustering (OSC) loss to guide the IOI module to effectively combine regions of the same object but also distinguish between different objects and backgrounds. It enables our method to perform accurate localization of sound-making objects without any prior knowledge. Extensive experimental results on the MUSIC and VGGSound benchmarks show the significant performance improvements of the proposed method over the existing methods for both single and multi-source. Our code is available at: https://github.com/VisualAIKHU/NoPrior_MultiSSLComment: Accepted at CVPR 202

    Genetic Approach to Elucidation of Sasang Constitutional Medicine

    Get PDF
    Sasang Constitutional Medicine (SCM) offers a medical principle that classifies humans into four constitution groups and guides their treatment with constitution-matched medical assistance. The principle of this traditional medicine, although requires significant scientific support, appears to suggest a genetic influence on constitution type. The relative frequency of constitution types in a population, for instance, has remained relatively constant since Jema Lee first described them from his observations. In addition, the body compartment concept of SCM appears to be related to the anterio–posterior patterning of the embryonic gut and associated internal organs. This study describes the attributes of the constitution concept of SCM that can be interpreted in the language of genetics and current approaches to identity the genetic factors that make up the constitution. These efforts should make it possible to interpret the principle of this traditional medicine scientifically. Considering the recent trend in medicine that pursues individualized or tailored medical offerings, once SCM is proven to be explainable with scientific evidence, it will be able to contribute to and take a place in the rapidly evolving medicine environment

    RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models

    Full text link
    Recently, large-scale vision-language pre-training models and visual semantic embedding methods have significantly improved image-text matching (ITM) accuracy on MS COCO 5K test set. However, it is unclear how robust these state-of-the-art (SOTA) models are when using them in the wild. In this paper, we propose a novel evaluation benchmark to stress-test the robustness of ITM models. To this end, we add various fooling images and captions to a retrieval pool. Specifically, we change images by inserting unrelated images, and change captions by substituting a noun, which can change the meaning of a sentence. We discover that just adding these newly created images and captions to the test set can degrade performances (i.e., Recall@1) of a wide range of SOTA models (e.g., 81.9% →\rightarrow 64.5% in BLIP, 66.1% →\rightarrow 37.5% in VSE∞\infty). We expect that our findings can provide insights for improving the robustness of the vision-language models and devising more diverse stress-test methods in cross-modal retrieval task. Source code and dataset will be available at https://github.com/pseulki/rococo

    Northern Barbados accretionary prism: Structure, deformation, and fluid flow interpreted from 3D seismic and well-log data

    Get PDF
    We reanalyzed 3D seismic reflection and logging-while-drilling data from the toe of the northern Barbados accretionary prism to interpret structure, deformation, and fluid flow related to subduction processes. The seafloor amplitude and coherence reveal an abrupt change in the thrust orientation from NNE at the thrust front and north and NNW about 5 km west of the thrust front. These thrust sets are separated by a triangular-shaped quiet area, which may represent a zone of low strength. The northeast-trending band of strong negative amplitude and high coherence in the décollement, known to be an interval of arrested consolidation, overlaps the quiet area, suggesting that the arrested consolidation may be related to the lack of thrust imbrication, and thus, vertical drainage for fluid in the accretionary prism. Fractal analysis of the décollement and top of the subducting oceanic basement indicates that the relief of the décollement correlates with the topography of the oceanic basement. Differential compaction of the underthrust sediment overlying the rugged oceanic basement, together with the basement faults that penetrate into the décollement probably caused relief or even faulting in the décollement
    • …
    corecore