21,704 research outputs found
A Subband-Based SVM Front-End for Robust ASR
This work proposes a novel support vector machine (SVM) based robust
automatic speech recognition (ASR) front-end that operates on an ensemble of
the subband components of high-dimensional acoustic waveforms. The key issues
of selecting the appropriate SVM kernels for classification in frequency
subbands and the combination of individual subband classifiers using ensemble
methods are addressed. The proposed front-end is compared with state-of-the-art
ASR front-ends in terms of robustness to additive noise and linear filtering.
Experiments performed on the TIMIT phoneme classification task demonstrate the
benefits of the proposed subband based SVM front-end: it outperforms the
standard cepstral front-end in the presence of noise and linear filtering for
signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed
front-end with a conventional front-end such as MFCC yields further
improvements over the individual front ends across the full range of noise
levels
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Current learning machines have successfully solved hard application problems,
reaching high accuracy and displaying seemingly "intelligent" behavior. Here we
apply recent techniques for explaining decisions of state-of-the-art learning
machines and analyze various tasks from computer vision and arcade games. This
showcases a spectrum of problem-solving behaviors ranging from naive and
short-sighted, to well-informed and strategic. We observe that standard
performance evaluation metrics can be oblivious to distinguishing these diverse
problem solving behaviors. Furthermore, we propose our semi-automated Spectral
Relevance Analysis that provides a practically effective way of characterizing
and validating the behavior of nonlinear learning machines. This helps to
assess whether a learned model indeed delivers reliably for the problem that it
was conceived for. Furthermore, our work intends to add a voice of caution to
the ongoing excitement about machine intelligence and pledges to evaluate and
judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication
Inferring Room Semantics Using Acoustic Monitoring
Having knowledge of the environmental context of the user i.e. the knowledge
of the users' indoor location and the semantics of their environment, can
facilitate the development of many of location-aware applications. In this
paper, we propose an acoustic monitoring technique that infers semantic
knowledge about an indoor space \emph{over time,} using audio recordings from
it. Our technique uses the impulse response of these spaces as well as the
ambient sounds produced in them in order to determine a semantic label for
them. As we process more recordings, we update our \emph{confidence} in the
assigned label. We evaluate our technique on a dataset of single-speaker human
speech recordings obtained in different types of rooms at three university
buildings. In our evaluation, the confidence\emph{ }for the true label
generally outstripped the confidence for all other labels and in some cases
converged to 100\% with less than 30 samples.Comment: 2017 IEEE International Workshop on Machine Learning for Signal
Processing, Sept.\ 25--28, 2017, Tokyo, Japa
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
The SED Machine: a robotic spectrograph for fast transient classification
Current time domain facilities are finding several hundreds of transient
astronomical events a year. The discovery rate is expected to increase in the
future as soon as new surveys such as the Zwicky Transient Facility (ZTF) and
the Large Synoptic Sky Survey (LSST) come on line. At the present time, the
rate at which transients are classified is approximately one order or magnitude
lower than the discovery rate, leading to an increasing "follow-up drought".
Existing telescopes with moderate aperture can help address this deficit when
equipped with spectrographs optimized for spectral classification. Here, we
provide an overview of the design, operations and first results of the Spectral
Energy Distribution Machine (SEDM), operating on the Palomar 60-inch telescope
(P60). The instrument is optimized for classification and high observing
efficiency. It combines a low-resolution (R100) integral field unit (IFU)
spectrograph with "Rainbow Camera" (RC), a multi-band field acquisition camera
which also serves as multi-band (ugri) photometer. The SEDM was commissioned
during the operation of the intermediate Palomar Transient Factory (iPTF) and
has already proved lived up to its promise. The success of the SEDM
demonstrates the value of spectrographs optimized to spectral classification.
Introduction of similar spectrographs on existing telescopes will help
alleviate the follow-up drought and thereby accelerate the rate of discoveries.Comment: 21 pages, 20 figure
Astronomical Spectroscopy
Spectroscopy is one of the most important tools that an astronomer has for
studying the universe. This chapter begins by discussing the basics, including
the different types of optical spectrographs, with extension to the ultraviolet
and the near-infrared. Emphasis is given to the fundamentals of how
spectrographs are used, and the trade-offs involved in designing an
observational experiment. It then covers observing and reduction techniques,
noting that some of the standard practices of flat-fielding often actually
degrade the quality of the data rather than improve it. Although the focus is
on point sources, spatially resolved spectroscopy of extended sources is also
briefly discussed. Discussion of differential extinction, the impact of
crowding, multi-object techniques, optimal extractions, flat-fielding
considerations, and determining radial velocities and velocity dispersions
provide the spectroscopist with the fundamentals needed to obtain the best
data. Finally the chapter combines the previous material by providing some
examples of real-life observing experiences with several typical instruments.Comment: An abridged version of a chapter to appear in Planets, Stars and
Stellar Systems, to be published in 2011 by Springer. Slightly revise
- …