24,624 research outputs found
Sub-Band Knowledge Distillation Framework for Speech Enhancement
In single-channel speech enhancement, methods based on full-band spectral
features have been widely studied. However, only a few methods pay attention to
non-full-band spectral features. In this paper, we explore a knowledge
distillation framework based on sub-band spectral mapping for single-channel
speech enhancement. Specifically, we divide the full frequency band into
multiple sub-bands and pre-train an elite-level sub-band enhancement model
(teacher model) for each sub-band. These teacher models are dedicated to
processing their own sub-bands. Next, under the teacher models' guidance, we
train a general sub-band enhancement model (student model) that works for all
sub-bands. Without increasing the number of model parameters and computational
complexity, the student model's performance is further improved. To evaluate
our proposed method, we conducted a large number of experiments on an
open-source data set. The final experimental results show that the guidance
from the elite-level teacher models dramatically improves the student model's
performance, which exceeds the full-band model by employing fewer parameters.Comment: Published in Interspeech 202
SNR-Based Teachers-Student Technique for Speech Enhancement
It is very challenging for speech enhancement methods to achieves robust
performance under both high signal-to-noise ratio (SNR) and low SNR
simultaneously. In this paper, we propose a method that integrates an SNR-based
teachers-student technique and time-domain U-Net to deal with this problem.
Specifically, this method consists of multiple teacher models and a student
model. We first train the teacher models under multiple small-range SNRs that
do not coincide with each other so that they can perform speech enhancement
well within the specific SNR range. Then, we choose different teacher models to
supervise the training of the student model according to the SNR of the
training data. Eventually, the student model can perform speech enhancement
under both high SNR and low SNR. To evaluate the proposed method, we
constructed a dataset with an SNR ranging from -20dB to 20dB based on the
public dataset. We experimentally analyzed the effectiveness of the SNR-based
teachers-student technique and compared the proposed method with several
state-of-the-art methods.Comment: Published in 2020 IEEE International Conference on Multimedia and
Expo (ICME 2020
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
AI Extenders: The Ethical and Societal Implications of Humans Cognitively Extended by AI
Humans and AI systems are usually portrayed as separate sys- tems that we need to align in values and goals. However, there is a great deal of AI technology found in non-autonomous systems that are used as cognitive tools by humans. Under the extended mind thesis, the functional contributions of these tools become as essential to our cognition as our brains. But AI can take cognitive extension towards totally new capabil- ities, posing new philosophical, ethical and technical chal- lenges. To analyse these challenges better, we define and place AI extenders in a continuum between fully-externalized systems, loosely coupled with humans, and fully-internalized processes, with operations ultimately performed by the brain, making the tool redundant. We dissect the landscape of cog- nitive capabilities that can foreseeably be extended by AI and examine their ethical implications. We suggest that cognitive extenders using AI be treated as distinct from other cognitive enhancers by all relevant stakeholders, including developers, policy makers, and human users
On the application of reservoir computing networks for noisy image recognition
Reservoir Computing Networks (RCNs) are a special type of single layer recurrent neural networks, in which the input and the recurrent connections are randomly generated and only the output weights are trained. Besides the ability to process temporal information, the key points of RCN are easy training and robustness against noise. Recently, we introduced a simple strategy to tune the parameters of RCNs. Evaluation in the domain of noise robust speech recognition proved that this method was effective. The aim of this work is to extend that study to the field of image processing, by showing that the proposed parameter tuning procedure is equally valid in the field of image processing and conforming that RCNs are apt at temporal modeling and are robust with respect to noise. In particular, we investigate the potential of RCNs in achieving competitive performance on the well-known MNIST dataset by following the aforementioned parameter optimizing strategy. Moreover, we achieve good noise robust recognition by utilizing such a network to denoise images and supplying them to a recognizer that is solely trained on clean images. The experiments demonstrate that the proposed RCN-based handwritten digit recognizer achieves an error rate of 0.81 percent on the clean test data of the MNIST benchmark and that the proposed RCN-based denoiser can effectively reduce the error rate on the various types of noise. (c) 2017 Elsevier B.V. All rights reserved
Classification of Radiology Reports Using Neural Attention Models
The electronic health record (EHR) contains a large amount of
multi-dimensional and unstructured clinical data of significant operational and
research value. Distinguished from previous studies, our approach embraces a
double-annotated dataset and strays away from obscure "black-box" models to
comprehensive deep learning models. In this paper, we present a novel neural
attention mechanism that not only classifies clinically important findings.
Specifically, convolutional neural networks (CNN) with attention analysis are
used to classify radiology head computed tomography reports based on five
categories that radiologists would account for in assessing acute and
communicable findings in daily practice. The experiments show that our CNN
attention models outperform non-neural models, especially when trained on a
larger dataset. Our attention analysis demonstrates the intuition behind the
classifier's decision by generating a heatmap that highlights attended terms
used by the CNN model; this is valuable when potential downstream medical
decisions are to be performed by human experts or the classifier information is
to be used in cohort construction such as for epidemiological studies
Military applications of automatic speech recognition and future requirements
An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit
- …