10,438 research outputs found
MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
A typical neural speech enhancement (SE) approach mainly handles speech and
noise mixtures, which is not optimal for singing voice enhancement scenarios.
Music source separation (MSS) models treat vocals and various accompaniment
components equally, which may reduce performance compared to the model that
only considers vocal enhancement. In this paper, we propose a novel multi-band
temporal-frequency neural network (MBTFNet) for singing voice enhancement,
which particularly removes background music, noise and even backing vocals from
singing recordings. MBTFNet combines inter and intra-band modeling for better
processing of full-band signals. Dual-path modeling are introduced to expand
the receptive field of the model. We propose an implicit personalized
enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which
further improves the performance of MBTFNet. Experiments show that our proposed
model significantly outperforms several state-of-the-art SE and MSS models
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
Improving GANs for Speech Enhancement
Generative adversarial networks (GAN) have recently been shown to be
efficient for speech enhancement. However, most, if not all, existing speech
enhancement GANs (SEGAN) make use of a single generator to perform one-stage
enhancement mapping. In this work, we propose to use multiple generators that
are chained to perform multi-stage enhancement mapping, which gradually refines
the noisy input signals in a stage-wise fashion. Furthermore, we study two
scenarios: (1) the generators share their parameters and (2) the generators'
parameters are independent. The former constrains the generators to learn a
common mapping that is iteratively applied at all enhancement stages and
results in a small model footprint. On the contrary, the latter allows the
generators to flexibly learn different enhancement mappings at different stages
of the network at the cost of an increased model size. We demonstrate that the
proposed multi-stage enhancement approach outperforms the one-stage SEGAN
baseline, where the independent generators lead to more favorable results than
the tied generators. The source code is available at
http://github.com/pquochuy/idsegan.Comment: This letter has been accepted for publication in IEEE Signal
Processing Letter
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
- …