386 research outputs found
Kymatio: Scattering Transforms in Python
The wavelet scattering transform is an invariant signal representation
suitable for many signal processing and machine learning applications. We
present the Kymatio software package, an easy-to-use, high-performance Python
implementation of the scattering transform in 1D, 2D, and 3D that is compatible
with modern deep learning frameworks. All transforms may be executed on a GPU
(in addition to CPU), offering a considerable speed up over CPU
implementations. The package also has a small memory footprint, resulting
inefficient memory usage. The source code, documentation, and examples are
available undera BSD license at https://www.kymat.io
Deep Dictionary Learning: A PARametric NETwork Approach
Deep dictionary learning seeks multiple dictionaries at different image
scales to capture complementary coherent characteristics. We propose a method
for learning a hierarchy of synthesis dictionaries with an image classification
goal. The dictionaries and classification parameters are trained by a
classification objective, and the sparse features are extracted by reducing a
reconstruction loss in each layer. The reconstruction objectives in some sense
regularize the classification problem and inject source signal information in
the extracted features. The performance of the proposed hierarchical method
increases by adding more layers, which consequently makes this model easier to
tune and adapt. The proposed algorithm furthermore, shows remarkably lower
fooling rate in presence of adversarial perturbation. The validation of the
proposed approach is based on its classification performance using four
benchmark datasets and is compared to a CNN of similar size
Second order scattering descriptors predict fMRI activity due to visual textures
Second layer scattering descriptors are known to provide good classification
performance on natural quasi-stationary processes such as visual textures due
to their sensitivity to higher order moments and continuity with respect to
small deformations. In a functional Magnetic Resonance Imaging (fMRI)
experiment we present visual textures to subjects and evaluate the predictive
power of these descriptors with respect to the predictive power of simple
contour energy - the first scattering layer. We are able to conclude not only
that invariant second layer scattering coefficients better encode voxel
activity, but also that well predicted voxels need not necessarily lie in known
retinotopic regions.Comment: 3nd International Workshop on Pattern Recognition in NeuroImaging
(2013
Terabyte-scale supervised 3D training and benchmarking dataset of the mouse kidney
The performance of machine learning algorithms, when used for segmenting 3D
biomedical images, does not reach the level expected based on results achieved
with 2D photos. This may be explained by the comparative lack of high-volume,
high-quality training datasets, which require state-of-the-art imaging
facilities, domain experts for annotation and large computational and personal
resources. The HR-Kidney dataset presented in this work bridges this gap by
providing 1.7 TB of artefact-corrected synchrotron radiation-based X-ray
phase-contrast microtomography images of whole mouse kidneys and validated
segmentations of 33 729 glomeruli, which corresponds to a one to two orders of
magnitude increase over currently available biomedical datasets. The image sets
also contain the underlying raw data, threshold- and morphology-based
semi-automatic segmentations of renal vasculature and uriniferous tubules, as
well as true 3D manual annotations. We therewith provide a broad basis for the
scientific community to build upon and expand in the fields of image
processing, data augmentation and machine learning, in particular unsupervised
and semi-supervised learning investigations, as well as transfer learning and
generative adversarial networks
Deep Multimodal Learning for Audio-Visual Speech Recognition
In this paper, we present methods in deep multimodal learning for fusing
speech and visual modalities for Audio-Visual Automatic Speech Recognition
(AV-ASR). First, we study an approach where uni-modal deep networks are trained
separately and their final hidden layers fused to obtain a joint feature space
in which another deep network is built. While the audio network alone achieves
a phone error rate (PER) of under clean condition on the IBM large
vocabulary audio-visual studio dataset, this fusion model achieves a PER of
demonstrating the tremendous value of the visual channel in phone
classification even in audio with high signal to noise ratio. Second, we
present a new deep network architecture that uses a bilinear softmax layer to
account for class specific correlations between modalities. We show that
combining the posteriors from the bilinear networks with those from the fused
model mentioned above results in a further significant phone error rate
reduction, yielding a final PER of .Comment: ICASSP 201
- …