Search CORE

156 research outputs found

Kymatio: Scattering Transforms in Python

Author: Andreux Mathieu
Andén Joakim
Angles Tomás
Belilovsky Eugene
Bruna Joan
Cella Carmine
Eickenberg Michael
Exarchakis Georgios
Hirn Matthew J.
Leonarduzzi Roberto
Lostanlen Vincent
Mallat Stéphane
Oyallon Edouard
Rochette Gaspar
Thiry Louis
Zarka John
Zhang Sixin
Publication venue
Publication date: 01/06/2019
Field of study

The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU), offering a considerable speed up over CPU implementations. The package also has a small memory footprint, resulting inefficient memory usage. The source code, documentation, and examples are available undera BSD license at https://www.kymat.io

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Discriminative Segmental Cascades for Feature-Rich Phone Recognition

Author: Gimpel Kevin
Livescu Karen
Tang Hao
Wang Weiran
Publication venue
Publication date: 03/08/2016
Field of study

Discriminative segmental models, such as segmental conditional random fields (SCRFs) and segmental structured support vector machines (SSVMs), have had success in speech recognition via both lattice rescoring and first-pass decoding. However, such models suffer from slow decoding, hampering the use of computationally expensive features, such as segment neural networks or other high-order features. A typical solution is to use approximate decoding, either by beam pruning in a single pass or by beam pruning to generate a lattice followed by a second pass. In this work, we study discriminative segmental models trained with a hinge loss (i.e., segmental structured SVMs). We show that beam search is not suitable for learning rescoring models in this approach, though it gives good approximate decoding performance when the model is already well-trained. Instead, we consider an approach inspired by structured prediction cascades, which use max-marginal pruning to generate lattices. We obtain a high-accuracy phonetic recognition system with several expensive feature types: a segment neural network, a second-order language model, and second-order phone boundary features

arXiv.org e-Print Archive

CiteSeerX

Adaptive DCTNet for Audio Signal Classification

Author: Gan Zhe
Lu Liang
Pu Yunchen
Thompson Andrew
Xian Yin
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 29/04/2017
Field of study

In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental results show that the A-DCTNet and Recurrent Neural Networks (RNN) achieve state-of-the-art performance in bird song classification rate, and improve artist identification accuracy in music data. They demonstrate A-DCTNet's applicability to signal processing problems.Comment: International Conference of Acoustic and Speech Signal Processing (ICASSP). New Orleans, United States, March, 201

arXiv.org e-Print Archive

Crossref

A Deep Representation for Invariance And Music Classification

Author: Evangelopoulos Georgios
Poggio Tomaso
Rosasco Lorenzo
Voinea Stephen
Zhang Chiyuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical architectures, we propose a novel, mid-level representation for acoustical signals, using the empirical distributions of projections on a set of templates and their transformations. Under the assumption that, by construction, this dictionary of templates is composed from similar classes, and samples the orbit of variance-inducing signal transformations (such as shift and scale), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling can then constitute layers of deep networks, for learning composite representations. We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification.Comment: 5 pages, CBMM Memo No. 002, (to appear) IEEE 2014 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref