1,249 research outputs found
Audio-Visual Spatial Integration and Recursive Attention for Robust Sound Source Localization
The objective of the sound source localization task is to enable machines to
detect the location of sound-making objects within a visual scene. While the
audio modality provides spatial cues to locate the sound source, existing
approaches only use audio as an auxiliary role to compare spatial regions of
the visual modality. Humans, on the other hand, utilize both audio and visual
modalities as spatial cues to locate sound sources. In this paper, we propose
an audio-visual spatial integration network that integrates spatial cues from
both modalities to mimic human behavior when detecting sound-making objects.
Additionally, we introduce a recursive attention network to mimic human
behavior of iterative focusing on objects, resulting in more accurate attention
regions. To effectively encode spatial information from both modalities, we
propose audio-visual pair matching loss and spatial region alignment loss. By
utilizing the spatial cues of audio-visual modalities and recursively focusing
objects, our method can perform more robust sound source localization.
Comprehensive experimental results on the Flickr SoundNet and VGG-Sound Source
datasets demonstrate the superiority of our proposed method over existing
approaches. Our code is available at: https://github.com/VisualAIKHU/SIRA-SSLComment: Camera-Ready, ACM MM 202
Class-Attentive Diffusion Network for Semi-Supervised Classification
Recently, graph neural networks for semi-supervised classification have been
widely studied. However, existing methods only use the information of limited
neighbors and do not deal with the inter-class connections in graphs. In this
paper, we propose Adaptive aggregation with Class-Attentive Diffusion (AdaCAD),
a new aggregation scheme that adaptively aggregates nodes probably of the same
class among K-hop neighbors. To this end, we first propose a novel stochastic
process, called Class-Attentive Diffusion (CAD), that strengthens attention to
intra-class nodes and attenuates attention to inter-class nodes. In contrast to
the existing diffusion methods with a transition matrix determined solely by
the graph structure, CAD considers both the node features and the graph
structure with the design of our class-attentive transition matrix that
utilizes a classifier. Then, we further propose an adaptive update scheme that
leverages different reflection ratios of the diffusion result for each node
depending on the local class-context. As the main advantage, AdaCAD alleviates
the problem of undesired mixing of inter-class features caused by discrepancies
between node labels and the graph topology. Built on AdaCAD, we construct a
simple model called Class-Attentive Diffusion Network (CAD-Net). Extensive
experiments on seven benchmark datasets consistently demonstrate the efficacy
of the proposed method and our CAD-Net significantly outperforms the
state-of-the-art methods. Code is available at
https://github.com/ljin0429/CAD-Net.Comment: Accepted to AAAI 202
Confidence-Based Feature Imputation for Graphs with Partially Known Features
This paper investigates a missing feature imputation problem for graph
learning tasks. Several methods have previously addressed learning tasks on
graphs with missing features. However, in cases of high rates of missing
features, they were unable to avoid significant performance degradation. To
overcome this limitation, we introduce a novel concept of channel-wise
confidence in a node feature, which is assigned to each imputed channel feature
of a node for reflecting certainty of the imputation. We then design
pseudo-confidence using the channel-wise shortest path distance between a
missing-feature node and its nearest known-feature node to replace unavailable
true confidence in an actual learning process. Based on the pseudo-confidence,
we propose a novel feature imputation scheme that performs channel-wise
inter-node diffusion and node-wise inter-channel propagation. The scheme can
endure even at an exceedingly high missing rate (e.g., 99.5\%) and it achieves
state-of-the-art accuracy for both semi-supervised node classification and link
prediction on various datasets containing a high rate of missing features.
Codes are available at https://github.com/daehoum1/pcfi.Comment: Accepted to ICLR 2023. 28 page
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
The goal of the multi-sound source localization task is to localize sound
sources from the mixture individually. While recent multi-sound source
localization methods have shown improved performance, they face challenges due
to their reliance on prior information about the number of objects to be
separated. In this paper, to overcome this limitation, we present a novel
multi-sound source localization method that can perform localization without
prior knowledge of the number of sound sources. To achieve this goal, we
propose an iterative object identification (IOI) module, which can recognize
sound-making objects in an iterative manner. After finding the regions of
sound-making objects, we devise object similarity-aware clustering (OSC) loss
to guide the IOI module to effectively combine regions of the same object but
also distinguish between different objects and backgrounds. It enables our
method to perform accurate localization of sound-making objects without any
prior knowledge. Extensive experimental results on the MUSIC and VGGSound
benchmarks show the significant performance improvements of the proposed method
over the existing methods for both single and multi-source. Our code is
available at: https://github.com/VisualAIKHU/NoPrior_MultiSSLComment: Accepted at CVPR 202
Genetic Approach to Elucidation of Sasang Constitutional Medicine
Sasang Constitutional Medicine (SCM) offers a medical principle that classifies humans into four constitution groups and guides their treatment with constitution-matched medical assistance. The principle of this traditional medicine, although requires significant scientific support, appears to suggest a genetic influence on constitution type. The relative frequency of constitution types in a population, for instance, has remained relatively constant since Jema Lee first described them from his observations. In addition, the body compartment concept of SCM appears to be related to the anterio–posterior patterning of the embryonic gut and associated internal organs. This study describes the attributes of the constitution concept of SCM that can be interpreted in the language of genetics and current approaches to identity the genetic factors that make up the constitution. These efforts should make it possible to interpret the principle of this traditional medicine scientifically. Considering the recent trend in medicine that pursues individualized or tailored medical offerings, once SCM is proven to be explainable with scientific evidence, it will be able to contribute to and take a place in the rapidly evolving medicine environment
RoCOCO: Robust Benchmark MS-COCO to Stress-test Robustness of Image-Text Matching Models
Recently, large-scale vision-language pre-training models and visual semantic
embedding methods have significantly improved image-text matching (ITM)
accuracy on MS COCO 5K test set. However, it is unclear how robust these
state-of-the-art (SOTA) models are when using them in the wild. In this paper,
we propose a novel evaluation benchmark to stress-test the robustness of ITM
models. To this end, we add various fooling images and captions to a retrieval
pool. Specifically, we change images by inserting unrelated images, and change
captions by substituting a noun, which can change the meaning of a sentence. We
discover that just adding these newly created images and captions to the test
set can degrade performances (i.e., Recall@1) of a wide range of SOTA models
(e.g., 81.9% 64.5% in BLIP, 66.1% 37.5% in
VSE). We expect that our findings can provide insights for improving
the robustness of the vision-language models and devising more diverse
stress-test methods in cross-modal retrieval task. Source code and dataset will
be available at https://github.com/pseulki/rococo
Northern Barbados accretionary prism: Structure, deformation, and fluid flow interpreted from 3D seismic and well-log data
We reanalyzed 3D seismic reflection and logging-while-drilling data from the toe
of the northern Barbados accretionary prism to interpret structure, deformation, and fluid
flow related to subduction processes. The seafloor amplitude and coherence reveal an
abrupt change in the thrust orientation from NNE at the thrust front and north and NNW
about 5 km west of the thrust front. These thrust sets are separated by a triangular-shaped
quiet area, which may represent a zone of low strength. The northeast-trending band of
strong negative amplitude and high coherence in the décollement, known to be an interval
of arrested consolidation, overlaps the quiet area, suggesting that the arrested consolidation
may be related to the lack of thrust imbrication, and thus, vertical drainage for fluid
in the accretionary prism. Fractal analysis of the décollement and top of the subducting
oceanic basement indicates that the relief of the décollement correlates with the topography
of the oceanic basement. Differential compaction of the underthrust sediment overlying
the rugged oceanic basement, together with the basement faults that penetrate into the
décollement probably caused relief or even faulting in the décollement
- …