1,929 research outputs found
Curved Gabor Filters for Fingerprint Image Enhancement
Gabor filters play an important role in many application areas for the
enhancement of various types of images and the extraction of Gabor features.
For the purpose of enhancing curved structures in noisy images, we introduce
curved Gabor filters which locally adapt their shape to the direction of flow.
These curved Gabor filters enable the choice of filter parameters which
increase the smoothing power without creating artifacts in the enhanced image.
In this paper, curved Gabor filters are applied to the curved ridge and valley
structure of low-quality fingerprint images. First, we combine two orientation
field estimation methods in order to obtain a more robust estimation for very
noisy images. Next, curved regions are constructed by following the respective
local orientation and they are used for estimating the local ridge frequency.
Lastly, curved Gabor filters are defined based on curved regions and they are
applied for the enhancement of low-quality fingerprint images. Experimental
results on the FVC2004 databases show improvements of this approach in
comparison to state-of-the-art enhancement methods
Idealized computational models for auditory receptive fields
This paper presents a theory by which idealized models of auditory receptive
fields can be derived in a principled axiomatic manner, from a set of
structural properties to enable invariance of receptive field responses under
natural sound transformations and ensure internal consistency between
spectro-temporal receptive fields at different temporal and spectral scales.
For defining a time-frequency transformation of a purely temporal sound
signal, it is shown that the framework allows for a new way of deriving the
Gabor and Gammatone filters as well as a novel family of generalized Gammatone
filters, with additional degrees of freedom to obtain different trade-offs
between the spectral selectivity and the temporal delay of time-causal temporal
window functions.
When applied to the definition of a second-layer of receptive fields from a
spectrogram, it is shown that the framework leads to two canonical families of
spectro-temporal receptive fields, in terms of spectro-temporal derivatives of
either spectro-temporal Gaussian kernels for non-causal time or the combination
of a time-causal generalized Gammatone filter over the temporal domain and a
Gaussian filter over the logspectral domain. For each filter family, the
spectro-temporal receptive fields can be either separable over the
time-frequency domain or be adapted to local glissando transformations that
represent variations in logarithmic frequencies over time. Within each domain
of either non-causal or time-causal time, these receptive field families are
derived by uniqueness from the assumptions.
It is demonstrated how the presented framework allows for computation of
basic auditory features for audio processing and that it leads to predictions
about auditory receptive fields with good qualitative similarity to biological
receptive fields measured in the inferior colliculus (ICC) and primary auditory
cortex (A1) of mammals.Comment: 55 pages, 22 figures, 3 table
Examining the Combination of Multi-Band Processing and Channel Dropout for Robust Speech Recognition
Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System
Audio-visual automatic speech recognition (AVASR) is a speech recognition technique integrating audio and video signals as input. Traditional audio-only speech recognition system only uses acoustic information from an audio source. However the recognition performance degrades significantly in acoustically noisy environments. It has been shown that visual information also can be used to identify speech. To improve the speech recognition performance, audio-visual automatic speech recognition has been studied. In this paper, we focus on the design of the visual front end of an AVASR system, which mainly consists of face detection and lip localization. The front end is built upon the AVICAR database that was recorded in moving vehicles. Therefore, diverse lighting conditions and poor quality of imagery are the problems we must overcome.
We first propose the use of the Viola-Jones face detection algorithm that can process images rapidly with high detection accuracy. When the algorithm is applied to the AVICAR database, we reach an accuracy of 89% face detection rate. By separately detecting and integrating the detection results from all different color channels, we further improve the detection accuracy to 95%. To reliably localize the lips, three algorithms are studied and compared: the Gabor filter algorithm, the lip enhancement algorithm, and the modified Viola-Jones algorithm for lip features. Finally, to increase detection rate, a modified Viola-Jones algorithm and lip enhancement algorithms are cascaded based on the results of three lip localization methods. Overall, the front end achieves an accuracy of 90% for lip localization
Non-linear echo cancellation - a Bayesian approach
Echo cancellation literature is reviewed, then a Bayesian model is introduced and it is shown how how it can be used to model and fit nonlinear channels. An algorithm for cancellation of echo over a nonlinear channel is developed and tested. It is shown that this nonlinear algorithm converges for both linear and nonlinear channels and is superior to linear echo cancellation for canceling an echo through a nonlinear echo-path channel
- …