Search CORE

386 research outputs found

Kymatio: Scattering Transforms in Python

Author: Andreux Mathieu
Andén Joakim
Angles Tomás
Belilovsky Eugene
Bruna Joan
Cella Carmine
Eickenberg Michael
Exarchakis Georgios
Hirn Matthew J.
Leonarduzzi Roberto
Lostanlen Vincent
Mallat Stéphane
Oyallon Edouard
Rochette Gaspar
Thiry Louis
Zarka John
Zhang Sixin
Publication venue
Publication date: 01/06/2019
Field of study

The wavelet scattering transform is an invariant signal representation suitable for many signal processing and machine learning applications. We present the Kymatio software package, an easy-to-use, high-performance Python implementation of the scattering transform in 1D, 2D, and 3D that is compatible with modern deep learning frameworks. All transforms may be executed on a GPU (in addition to CPU), offering a considerable speed up over CPU implementations. The package also has a small memory footprint, resulting inefficient memory usage. The source code, documentation, and examples are available undera BSD license at https://www.kymat.io

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Deep Dictionary Learning: A PARametric NETwork Approach

Author: Dai Liyi
Krim Hamid
Mahdizadehaghdam Shahin
Panahi Ashkan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/03/2018
Field of study

Deep dictionary learning seeks multiple dictionaries at different image scales to capture complementary coherent characteristics. We propose a method for learning a hierarchy of synthesis dictionaries with an image classification goal. The dictionaries and classification parameters are trained by a classification objective, and the sparse features are extracted by reducing a reconstruction loss in each layer. The reconstruction objectives in some sense regularize the classification problem and inject source signal information in the extracted features. The performance of the proposed hierarchical method increases by adding more layers, which consequently makes this model easier to tune and adapt. The proposed algorithm furthermore, shows remarkably lower fooling rate in presence of adversarial perturbation. The validation of the proposed approach is based on its classification performance using four benchmark datasets and is compared to a CNN of similar size

arXiv.org e-Print Archive

Chalmers Research

Second order scattering descriptors predict fMRI activity due to visual textures

Author: Eickenberg Michael
Gramfort Alexandre
Mehdi Senoussi
Pedregosa Fabian
Thirion Bertrand
Publication venue
Publication date: 21/06/2013
Field of study

Second layer scattering descriptors are known to provide good classification performance on natural quasi-stationary processes such as visual textures due to their sensitivity to higher order moments and continuity with respect to small deformations. In a functional Magnetic Resonance Imaging (fMRI) experiment we present visual textures to subjects and evaluate the predictive power of these descriptors with respect to the predictive power of simple contour energy - the first scattering layer. We are able to conclude not only that invariant second layer scattering coefficients better encode voxel activity, but also that well predicted voxels need not necessarily lie in known retinotopic regions.Comment: 3nd International Workshop on Pattern Recognition in NeuroImaging (2013

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-CEA

Terabyte-scale supervised 3D training and benchmarking dataset of the mouse kidney

Author: Hieber Simone
Kuo Willy
Kurtcuoglu Vartan
Müller Bert
Rossinelli Diego
Schulz Georg
Wenger Roland H.
Publication venue
Publication date: 21/07/2023
Field of study

The performance of machine learning algorithms, when used for segmenting 3D biomedical images, does not reach the level expected based on results achieved with 2D photos. This may be explained by the comparative lack of high-volume, high-quality training datasets, which require state-of-the-art imaging facilities, domain experts for annotation and large computational and personal resources. The HR-Kidney dataset presented in this work bridges this gap by providing 1.7 TB of artefact-corrected synchrotron radiation-based X-ray phase-contrast microtomography images of whole mouse kidneys and validated segmentations of 33 729 glomeruli, which corresponds to a one to two orders of magnitude increase over currently available biomedical datasets. The image sets also contain the underlying raw data, threshold- and morphology-based semi-automatic segmentations of renal vasculature and uriniferous tubules, as well as true 3D manual annotations. We therewith provide a broad basis for the scientific community to build upon and expand in the fields of image processing, data augmentation and machine learning, in particular unsupervised and semi-supervised learning investigations, as well as transfer learning and generative adversarial networks

arXiv.org e-Print Archive

Directory of Open Access Journals

Deep Multimodal Learning for Audio-Visual Speech Recognition

Author: Goel Vaibhava
Marcheret Etienne
Mroueh Youssef
Publication venue
Publication date: 22/01/2015
Field of study

In this paper, we present methods in deep multimodal learning for fusing speech and visual modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an approach where uni-modal deep networks are trained separately and their final hidden layers fused to obtain a joint feature space in which another deep network is built. While the audio network alone achieves a phone error rate (PER) of

41\%

under clean condition on the IBM large vocabulary audio-visual studio dataset, this fusion model achieves a PER of

35.83\%

demonstrating the tremendous value of the visual channel in phone classification even in audio with high signal to noise ratio. Second, we present a new deep network architecture that uses a bilinear softmax layer to account for class specific correlations between modalities. We show that combining the posteriors from the bilinear networks with those from the fused model mentioned above results in a further significant phone error rate reduction, yielding a final PER of

34.03\%

.Comment: ICASSP 201

arXiv.org e-Print Archive

Crossref