2,630 research outputs found
A Multi-In and Multi-Out Dendritic Neuron Model and its Optimization
Artificial neural networks (ANNs), inspired by the interconnection of real
neurons, have achieved unprecedented success in various fields such as computer
vision and natural language processing. Recently, a novel mathematical ANN
model, known as the dendritic neuron model (DNM), has been proposed to address
nonlinear problems by more accurately reflecting the structure of real neurons.
However, the single-output design limits its capability to handle multi-output
tasks, significantly lowering its applications. In this paper, we propose a
novel multi-in and multi-out dendritic neuron model (MODN) to tackle
multi-output tasks. Our core idea is to introduce a filtering matrix to the
soma layer to adaptively select the desired dendrites to regress each output.
Because such a matrix is designed to be learnable, MODN can explore the
relationship between each dendrite and output to provide a better solution to
downstream tasks. We also model a telodendron layer into MODN to simulate
better the real neuron behavior. Importantly, MODN is a more general and
unified framework that can be naturally specialized as the DNM by customizing
the filtering matrix. To explore the optimization of MODN, we investigate both
heuristic and gradient-based optimizers and introduce a 2-step training method
for MODN. Extensive experimental results performed on 11 datasets on both
binary and multi-class classification tasks demonstrate the effectiveness of
MODN, with respect to accuracy, convergence, and generality
Multilevel Saliency-Guided Self-Supervised Learning for Image Anomaly Detection
Anomaly detection (AD) is a fundamental task in computer vision. It aims to
identify incorrect image data patterns which deviate from the normal ones.
Conventional methods generally address AD by preparing augmented negative
samples to enforce self-supervised learning. However, these techniques
typically do not consider semantics during augmentation, leading to the
generation of unrealistic or invalid negative samples. Consequently, the
feature extraction network can be hindered from embedding critical features. In
this study, inspired by visual attention learning approaches, we propose
CutSwap, which leverages saliency guidance to incorporate semantic cues for
augmentation. Specifically, we first employ LayerCAM to extract multilevel
image features as saliency maps and then perform clustering to obtain multiple
centroids. To fully exploit saliency guidance, on each map, we select a pixel
pair from the cluster with the highest centroid saliency to form a patch pair.
Such a patch pair includes highly similar context information with dense
semantic correlations. The resulting negative sample is created by swapping the
locations of the patch pair. Compared to prior augmentation methods, CutSwap
generates more subtle yet realistic negative samples to facilitate quality
feature learning. Extensive experimental and ablative evaluations demonstrate
that our method achieves state-of-the-art AD performance on two mainstream AD
benchmark datasets
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining
Deep deraining networks, while successful in laboratory benchmarks,
consistently encounter substantial generalization issues when deployed in
real-world applications. A prevailing perspective in deep learning encourages
the use of highly complex training data, with the expectation that a richer
image content knowledge will facilitate overcoming the generalization problem.
However, through comprehensive and systematic experimentation, we discovered
that this strategy does not enhance the generalization capability of these
networks. On the contrary, it exacerbates the tendency of networks to overfit
to specific degradations. Our experiments reveal that better generalization in
a deraining network can be achieved by simplifying the complexity of the
training data. This is due to the networks are slacking off during training,
that is, learning the least complex elements in the image content and
degradation to minimize training loss. When the complexity of the background
image is less than that of the rain streaks, the network will prioritize the
reconstruction of the background, thereby avoiding overfitting to the rain
patterns and resulting in improved generalization performance. Our research not
only offers a valuable perspective and methodology for better understanding the
generalization problem in low-level vision tasks, but also displays promising
practical potential
Unmanned Aerial Vehicle Navigation Using Wide-Field Optical Flow and Intertial Sensors
This paper offers a set of novel navigation techniques that rely on the use of inertial sensors and wide-field optical flow information. The aircraft ground velocity and attitude states are estimated with an Unscented Information Filter (UIF) and are evaluated with respect to two sets of experimental flight data collected from an Unmanned Aerial Vehicle (UAV). Two different formulations are proposed, a full state formulation including velocity and attitude and a simplified formulation which assumes that the lateral and vertical velocity of the aircraft are negligible. An additional state is also considered within each formulation to recover the image distance which can be measured using a laser rangefinder. The results demonstrate that the full state formulation is able to estimate the aircraft ground velocity to within 1.3 m/s of a GPS receiver solution used as reference "truth" and regulate attitude angles within 1.4 degrees standard deviation of error for both sets of flight data
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis
This paper introduces an improved duration informed attention neural network
(DurIAN-E) for expressive and high-fidelity text-to-speech (TTS) synthesis.
Inherited from the original DurIAN model, an auto-regressive model structure in
which the alignments between the input linguistic information and the output
acoustic features are inferred from a duration model is adopted. Meanwhile the
proposed DurIAN-E utilizes multiple stacked SwishRNN-based Transformer blocks
as linguistic encoders. Style-Adaptive Instance Normalization (SAIN) layers are
exploited into frame-level encoders to improve the modeling ability of
expressiveness. A denoiser incorporating both denoising diffusion probabilistic
model (DDPM) for mel-spectrograms and SAIN modules is conducted to further
improve the synthetic speech quality and expressiveness. Experimental results
prove that the proposed expressive TTS model in this paper can achieve better
performance than the state-of-the-art approaches in both subjective mean
opinion score (MOS) and preference tests
Migrant Resettlement by Evolutionary Multi-objective Optimization
Migration has been a universal phenomenon, which brings opportunities as well
as challenges for global development. As the number of migrants (e.g.,
refugees) increases rapidly in recent years, a key challenge faced by each
country is the problem of migrant resettlement. This problem has attracted
scientific research attention, from the perspective of maximizing the
employment rate. Previous works mainly formulated migrant resettlement as an
approximately submodular optimization problem subject to multiple matroid
constraints and employed the greedy algorithm, whose performance, however, may
be limited due to its greedy nature. In this paper, we propose a new framework
MR-EMO based on Evolutionary Multi-objective Optimization, which reformulates
Migrant Resettlement as a bi-objective optimization problem that maximizes the
expected number of employed migrants and minimizes the number of dispatched
migrants simultaneously, and employs a Multi-Objective Evolutionary Algorithm
(MOEA) to solve the bi-objective problem. We implement MR-EMO using three
MOEAs, the popular NSGA-II, MOEA/D as well as the theoretically grounded GSEMO.
To further improve the performance of MR-EMO, we propose a specific MOEA,
called GSEMO-SR, using matrix-swap mutation and repair mechanism, which has a
better ability to search for feasible solutions. We prove that MR-EMO using
either GSEMO or GSEMO-SR can achieve better theoretical guarantees than the
previous greedy algorithm. Experimental results under the interview and
coordination migration models clearly show the superiority of MR-EMO (with
either NSGA-II, MOEA/D, GSEMO or GSEMO-SR) over previous algorithms, and that
using GSEMO-SR leads to the best performance of MR-EMO
- …