834 research outputs found
Curved Gabor Filters for Fingerprint Image Enhancement
Gabor filters play an important role in many application areas for the
enhancement of various types of images and the extraction of Gabor features.
For the purpose of enhancing curved structures in noisy images, we introduce
curved Gabor filters which locally adapt their shape to the direction of flow.
These curved Gabor filters enable the choice of filter parameters which
increase the smoothing power without creating artifacts in the enhanced image.
In this paper, curved Gabor filters are applied to the curved ridge and valley
structure of low-quality fingerprint images. First, we combine two orientation
field estimation methods in order to obtain a more robust estimation for very
noisy images. Next, curved regions are constructed by following the respective
local orientation and they are used for estimating the local ridge frequency.
Lastly, curved Gabor filters are defined based on curved regions and they are
applied for the enhancement of low-quality fingerprint images. Experimental
results on the FVC2004 databases show improvements of this approach in
comparison to state-of-the-art enhancement methods
Learning to detect dysarthria from raw speech
Speech classifiers of paralinguistic traits traditionally learn from diverse
hand-crafted low-level features, by selecting the relevant information for the
task at hand. We explore an alternative to this selection, by learning jointly
the classifier, and the feature extraction. Recent work on speech recognition
has shown improved performance over speech features by learning from the
waveform. We extend this approach to paralinguistic classification and propose
a neural network that can learn a filterbank, a normalization factor and a
compression power from the raw speech, jointly with the rest of the
architecture. We apply this model to dysarthria detection from sentence-level
audio recordings. Starting from a strong attention-based baseline on which
mel-filterbanks outperform standard low-level descriptors, we show that
learning the filters or the normalization and compression improves over fixed
features by 10% absolute accuracy. We also observe a gain over OpenSmile
features by learning jointly the feature extraction, the normalization, and the
compression factor with the architecture. This constitutes a first attempt at
learning jointly all these operations from raw audio for a speech
classification task.Comment: 5 pages, 3 figures, submitted to ICASS
Machine Analysis of Facial Expressions
No abstract
Idealized computational models for auditory receptive fields
This paper presents a theory by which idealized models of auditory receptive
fields can be derived in a principled axiomatic manner, from a set of
structural properties to enable invariance of receptive field responses under
natural sound transformations and ensure internal consistency between
spectro-temporal receptive fields at different temporal and spectral scales.
For defining a time-frequency transformation of a purely temporal sound
signal, it is shown that the framework allows for a new way of deriving the
Gabor and Gammatone filters as well as a novel family of generalized Gammatone
filters, with additional degrees of freedom to obtain different trade-offs
between the spectral selectivity and the temporal delay of time-causal temporal
window functions.
When applied to the definition of a second-layer of receptive fields from a
spectrogram, it is shown that the framework leads to two canonical families of
spectro-temporal receptive fields, in terms of spectro-temporal derivatives of
either spectro-temporal Gaussian kernels for non-causal time or the combination
of a time-causal generalized Gammatone filter over the temporal domain and a
Gaussian filter over the logspectral domain. For each filter family, the
spectro-temporal receptive fields can be either separable over the
time-frequency domain or be adapted to local glissando transformations that
represent variations in logarithmic frequencies over time. Within each domain
of either non-causal or time-causal time, these receptive field families are
derived by uniqueness from the assumptions.
It is demonstrated how the presented framework allows for computation of
basic auditory features for audio processing and that it leads to predictions
about auditory receptive fields with good qualitative similarity to biological
receptive fields measured in the inferior colliculus (ICC) and primary auditory
cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table
Feature Learning from Spectrograms for Assessment of Personality Traits
Several methods have recently been proposed to analyze speech and
automatically infer the personality of the speaker. These methods often rely on
prosodic and other hand crafted speech processing features extracted with
off-the-shelf toolboxes. To achieve high accuracy, numerous features are
typically extracted using complex and highly parameterized algorithms. In this
paper, a new method based on feature learning and spectrogram analysis is
proposed to simplify the feature extraction process while maintaining a high
level of accuracy. The proposed method learns a dictionary of discriminant
features from patches extracted in the spectrogram representations of training
speech segments. Each speech segment is then encoded using the dictionary, and
the resulting feature set is used to perform classification of personality
traits. Experiments indicate that the proposed method achieves state-of-the-art
results with a significant reduction in complexity when compared to the most
recent reference methods. The number of features, and difficulties linked to
the feature extraction process are greatly reduced as only one type of
descriptors is used, for which the 6 parameters can be tuned automatically. In
contrast, the simplest reference method uses 4 types of descriptors to which 6
functionals are applied, resulting in over 20 parameters to be tuned.Comment: 12 pages, 3 figure
A Compact CNN-Based Speech Enhancement With Adaptive Filter Design Using Gabor Function And Region-Aware Convolution
Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named ‘CNN-AFD’) using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., ‘region-aware’) while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at −5, 0 and 5 dB signal-to-noise ratios (SNRs)
- …