21,704 research outputs found

    A Subband-Based SVM Front-End for Robust ASR

    Full text link
    This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

    Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

    Full text link
    Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication

    Inferring Room Semantics Using Acoustic Monitoring

    Full text link
    Having knowledge of the environmental context of the user i.e. the knowledge of the users' indoor location and the semantics of their environment, can facilitate the development of many of location-aware applications. In this paper, we propose an acoustic monitoring technique that infers semantic knowledge about an indoor space \emph{over time,} using audio recordings from it. Our technique uses the impulse response of these spaces as well as the ambient sounds produced in them in order to determine a semantic label for them. As we process more recordings, we update our \emph{confidence} in the assigned label. We evaluate our technique on a dataset of single-speaker human speech recordings obtained in different types of rooms at three university buildings. In our evaluation, the confidence\emph{ }for the true label generally outstripped the confidence for all other labels and in some cases converged to 100\% with less than 30 samples.Comment: 2017 IEEE International Workshop on Machine Learning for Signal Processing, Sept.\ 25--28, 2017, Tokyo, Japa

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    The SED Machine: a robotic spectrograph for fast transient classification

    Get PDF
    Current time domain facilities are finding several hundreds of transient astronomical events a year. The discovery rate is expected to increase in the future as soon as new surveys such as the Zwicky Transient Facility (ZTF) and the Large Synoptic Sky Survey (LSST) come on line. At the present time, the rate at which transients are classified is approximately one order or magnitude lower than the discovery rate, leading to an increasing "follow-up drought". Existing telescopes with moderate aperture can help address this deficit when equipped with spectrographs optimized for spectral classification. Here, we provide an overview of the design, operations and first results of the Spectral Energy Distribution Machine (SEDM), operating on the Palomar 60-inch telescope (P60). The instrument is optimized for classification and high observing efficiency. It combines a low-resolution (R\sim100) integral field unit (IFU) spectrograph with "Rainbow Camera" (RC), a multi-band field acquisition camera which also serves as multi-band (ugri) photometer. The SEDM was commissioned during the operation of the intermediate Palomar Transient Factory (iPTF) and has already proved lived up to its promise. The success of the SEDM demonstrates the value of spectrographs optimized to spectral classification. Introduction of similar spectrographs on existing telescopes will help alleviate the follow-up drought and thereby accelerate the rate of discoveries.Comment: 21 pages, 20 figure

    Astronomical Spectroscopy

    Full text link
    Spectroscopy is one of the most important tools that an astronomer has for studying the universe. This chapter begins by discussing the basics, including the different types of optical spectrographs, with extension to the ultraviolet and the near-infrared. Emphasis is given to the fundamentals of how spectrographs are used, and the trade-offs involved in designing an observational experiment. It then covers observing and reduction techniques, noting that some of the standard practices of flat-fielding often actually degrade the quality of the data rather than improve it. Although the focus is on point sources, spatially resolved spectroscopy of extended sources is also briefly discussed. Discussion of differential extinction, the impact of crowding, multi-object techniques, optimal extractions, flat-fielding considerations, and determining radial velocities and velocity dispersions provide the spectroscopist with the fundamentals needed to obtain the best data. Finally the chapter combines the previous material by providing some examples of real-life observing experiences with several typical instruments.Comment: An abridged version of a chapter to appear in Planets, Stars and Stellar Systems, to be published in 2011 by Springer. Slightly revise
    corecore