684 research outputs found
Underwater Acoustic Signal Recognition Based on Salient Feature
With the rapid advancement of technology, the recognition of underwater
acoustic signals in complex environments has become increasingly crucial.
Currently, mainstream underwater acoustic signal recognition relies primarily
on time-frequency analysis to extract spectral features, finding widespread
applications in the field. However, existing recognition methods heavily depend
on expert systems, facing limitations such as restricted knowledge bases and
challenges in handling complex relationships. These limitations stem from the
complexity and maintenance difficulties associated with rules or inference
engines. Recognizing the potential advantages of deep learning in handling
intricate relationships, this paper proposes a method utilizing neural networks
for underwater acoustic signal recognition. The proposed approach involves
continual learning of features extracted from spectra for the classification of
underwater acoustic signals. Deep learning models can automatically learn
abstract features from data and continually adjust weights during training to
enhance classification performance
Causal Mediation Analysis with a Three-Dimensional Image Mediator
Causal mediation analysis is increasingly abundant in biology, psychology,
and epidemiology studies, etc. In particular, with the advent of the big data
era, the issue of high-dimensional mediators is becoming more prevalent. In
neuroscience, with the widespread application of magnetic resonance technology
in the field of brain imaging, studies on image being a mediator emerged. In
this study, a novel causal mediation analysis method with a three-dimensional
image mediator is proposed. We define the average casual effects under the
potential outcome framework, explore several sufficient conditions for the
valid identification, and develop techniques for estimation and inference. To
verify the effectiveness of the proposed method, a series of simulations under
various scenarios is performed. Finally, the proposed method is applied to a
study on the causal effect of mothers delivery mode on
childs IQ development. It is found that the white matter in certain
regions of the frontal-temporal areas has mediating effects.Comment: 35 pages, 9 figure
LLM Connection Graphs for Global Feature Extraction in Point Cloud Analysis
Graph convolutional networks (GCNs) have effectively utilized local connections for point cloud analysis. How- ever, capturing distant dependencies (i.e., global features) with a single local connection graph, such as the Euclidean k-nearest neighbor graph, remains challenging. To ad- dress this, we introduce the Multi-Space Graph Convolutional Network (PointGCNN), which leverages reinforcement learning to adaptively construct connection graphs in multiple latent spaces, integrating both local and non-local dependencies. Initially, we encode and concatenate low- level local features from Euclidean and Eigenvalue spaces. Convolution layers are then hierarchically built, with each layer forming dynamic connection graphs to guide the propagation of low-level features. [1,2,3,4,11,14,16]These implicitly constructed graphs enable our model to uncover hidden dependencies. The assorted connections from different graphs support the extraction of fine-grained features from various perspectives, enhancing complex scene recognition. Thus, our model can capture multiple global contexts beyond the local scope of a single space, providing strong robustness against perturbations. Experimental results demonstrate that the proposed method achieves state-of-the-art performance on two major public point cloud benchmarks
Training-Free Layout Control with Cross-Attention Guidance
Recent diffusion-based generators can produce high-quality images from
textual prompts. However, they often disregard textual instructions that
specify the spatial layout of the composition. We propose a simple approach
that achieves robust layout control without the need for training or
fine-tuning of the image generator. Our technique manipulates the
cross-attention layers that the model uses to interface textual and visual
information and steers the generation in the desired direction given, e.g., a
user-specified layout. To determine how to best guide attention, we study the
role of attention maps and explore two alternative strategies, forward and
backward guidance. We thoroughly evaluate our approach on three benchmarks and
provide several qualitative examples and a comparative analysis of the two
strategies that demonstrate the superiority of backward guidance compared to
forward guidance, as well as prior work. We further demonstrate the versatility
of layout guidance by extending it to applications such as editing the layout
and context of real images.Comment: WACV 2024, Project Page:
https://silent-chen.github.io/layout-guidance
DGE: direct gaussian 3D editing by consistent multi-view editing
We consider the problem of editing 3D objects and scenes
based on open-ended language instructions. A common approach to this
problem is to use a 2D image generator or editor to guide the 3D editing
process, obviating the need for 3D data. However, this process is often
inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through individual view edits
or score distillation sampling. A major disadvantage of this approach
is the slow convergence caused by aggregating inconsistent information
across views, as the guidance from 2D models is not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method
that addresses these issues in two stages. First, we modify a given highquality image editor like InstructPix2Pix to be multi-view consistent. To
do so, we propose a training-free approach that integrates cues from the
3D geometry of the underlying scene. Second, given a multi-view consistent edited sequence of images, we directly and efficiently optimize the
3D representation, which is based on 3D Gaussian Splatting. Because it
avoids incremental and iterative edits, DGE is significantly more accurate and efficient than existing approaches and offers additional benefits,
such as enabling selective editing of parts of the scene
- …