73 research outputs found

    Wavelet methods in speech recognition

    Get PDF
    In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    Data compression and harmonic analysis

    Get PDF
    In this paper we review some recent interactions between harmonic analysis and data compression. The story goes back of course to Shannon’

    Iterative Solvers for Physics-based Simulations and Displays

    Full text link
    La génération d’images et de simulations réalistes requiert des modèles complexes pour capturer tous les détails d’un phénomène physique. Les équations mathématiques qui composent ces modèles sont compliquées et ne peuvent pas être résolues analytiquement. Des procédures numériques doivent donc être employées pour obtenir des solutions approximatives à ces modèles. Ces procédures sont souvent des algorithmes itératifs, qui calculent une suite convergente vers la solution désirée à partir d’un essai initial. Ces méthodes sont une façon pratique et efficace de calculer des solutions à des systèmes complexes, et sont au coeur de la plupart des méthodes de simulation modernes. Dans cette thèse par article, nous présentons trois projets où les algorithmes itératifs jouent un rôle majeur dans une méthode de simulation ou de rendu. Premièrement, nous présentons une méthode pour améliorer la qualité visuelle de simulations fluides. En créant une surface de haute résolution autour d’une simulation existante, stabilisée par une méthode itérative, nous ajoutons des détails additionels à la simulation. Deuxièmement, nous décrivons une méthode de simulation fluide basée sur la réduction de modèle. En construisant une nouvelle base de champ de vecteurs pour représenter la vélocité d’un fluide, nous obtenons une méthode spécifiquement adaptée pour améliorer les composantes itératives de la simulation. Finalement, nous présentons un algorithme pour générer des images de haute qualité sur des écrans multicouches dans un contexte de réalité virtuelle. Présenter des images sur plusieurs couches demande des calculs additionels à coût élevé, mais nous formulons le problème de décomposition des images afin de le résoudre efficacement avec une méthode itérative simple.Realistic computer-generated images and simulations require complex models to properly capture the many subtle behaviors of each physical phenomenon. The mathematical equations underlying these models are complicated, and cannot be solved analytically. Numerical procedures must thus be used to obtain approximate solutions. These procedures are often iterative algorithms, where an initial guess is progressively improved to converge to a desired solution. Iterative methods are a convenient and efficient way to compute solutions to complex systems, and are at the core of most modern simulation methods. In this thesis by publication, we present three papers where iterative algorithms play a major role in a simulation or rendering method. First, we propose a method to improve the visual quality of fluid simulations. By creating a high-resolution surface representation around an input fluid simulation, stabilized with iterative methods, we introduce additional details atop of the simulation. Second, we describe a method to compute fluid simulations using model reduction. We design a novel vector field basis to represent fluid velocity, creating a method specifically tailored to improve all iterative components of the simulation. Finally, we present an algorithm to compute high-quality images for multifocal displays in a virtual reality context. Displaying images on multiple display layers incurs significant additional costs, but we formulate the image decomposition problem so as to allow an efficient solution using a simple iterative algorithm

    NOVEL OFDM SYSTEM BASED ON DUAL-TREE COMPLEX WAVELET TRANSFORM

    Get PDF
    The demand for higher and higher capacity in wireless networks, such as cellular, mobile and local area network etc, is driving the development of new signaling techniques with improved spectral and power efficiencies. At all stages of a transceiver, from the bandwidth efficiency of the modulation schemes through highly nonlinear power amplifier of the transmitters to the channel sharing between different users, the problems relating to power usage and spectrum are aplenty. In the coming future, orthogonal frequency division multiplexing (OFDM) technology promises to be a ready solution to achieving the high data capacity and better spectral efficiency in wireless communication systems by virtue of its well-known and desirable characteristics. Towards these ends, this dissertation investigates a novel OFDM system based on dual-tree complex wavelet transform (D

    Toward sparse and geometry adapted video approximations

    Get PDF
    Video signals are sequences of natural images, where images are often modeled as piecewise-smooth signals. Hence, video can be seen as a 3D piecewise-smooth signal made of piecewise-smooth regions that move through time. Based on the piecewise-smooth model and on related theoretical work on rate-distortion performance of wavelet and oracle based coding schemes, one can better analyze the appropriate coding strategies that adaptive video codecs need to implement in order to be efficient. Efficient video representations for coding purposes require the use of adaptive signal decompositions able to capture appropriately the structure and redundancy appearing in video signals. Adaptivity needs to be such that it allows for proper modeling of signals in order to represent these with the lowest possible coding cost. Video is a very structured signal with high geometric content. This includes temporal geometry (normally represented by motion information) as well as spatial geometry. Clearly, most of past and present strategies used to represent video signals do not exploit properly its spatial geometry. Similarly to the case of images, a very interesting approach seems to be the decomposition of video using large over-complete libraries of basis functions able to represent salient geometric features of the signal. In the framework of video, these features should model 2D geometric video components as well as their temporal evolution, forming spatio-temporal 3D geometric primitives. Through this PhD dissertation, different aspects on the use of adaptivity in video representation are studied looking toward exploiting both aspects of video: its piecewise nature and the geometry. The first part of this work studies the use of localized temporal adaptivity in subband video coding. This is done considering two transformation schemes used for video coding: 3D wavelet representations and motion compensated temporal filtering. A theoretical R-D analysis as well as empirical results demonstrate how temporal adaptivity improves coding performance of moving edges in 3D transform (without motion compensation) based video coding. Adaptivity allows, at the same time, to equally exploit redundancy in non-moving video areas. The analogy between motion compensated video and 1D piecewise-smooth signals is studied as well. This motivates the introduction of local length adaptivity within frame-adaptive motion compensated lifted wavelet decompositions. This allows an optimal rate-distortion performance when video motion trajectories are shorter than the transformation "Group Of Pictures", or when efficient motion compensation can not be ensured. After studying temporal adaptivity, the second part of this thesis is dedicated to understand the fundamentals of how can temporal and spatial geometry be jointly exploited. This work builds on some previous results that considered the representation of spatial geometry in video (but not temporal, i.e, without motion). In order to obtain flexible and efficient (sparse) signal representations, using redundant dictionaries, the use of highly non-linear decomposition algorithms, like Matching Pursuit, is required. General signal representation using these techniques is still quite unexplored. For this reason, previous to the study of video representation, some aspects of non-linear decomposition algorithms and the efficient decomposition of images using Matching Pursuits and a geometric dictionary are investigated. A part of this investigation concerns the study on the influence of using a priori models within approximation non-linear algorithms. Dictionaries with a high internal coherence have some problems to obtain optimally sparse signal representations when used with Matching Pursuits. It is proved, theoretically and empirically, that inserting in this algorithm a priori models allows to improve the capacity to obtain sparse signal approximations, mainly when coherent dictionaries are used. Another point discussed in this preliminary study, on the use of Matching Pursuits, concerns the approach used in this work for the decompositions of video frames and images. The technique proposed in this thesis improves a previous work, where authors had to recur to sub-optimal Matching Pursuit strategies (using Genetic Algorithms), given the size of the functions library. In this work the use of full search strategies is made possible, at the same time that approximation efficiency is significantly improved and computational complexity is reduced. Finally, a priori based Matching Pursuit geometric decompositions are investigated for geometric video representations. Regularity constraints are taken into account to recover the temporal evolution of spatial geometric signal components. The results obtained for coding and multi-modal (audio-visual) signal analysis, clarify many unknowns and show to be promising, encouraging to prosecute research on the subject

    Applications of Wavelet Transforms to the Suppression of Coherent Noise from Seismic Data in the Pre-Stack Domain

    Get PDF
    The wavelet transform, a relatively new mathematical technique, allows the analysis of non-stationary signals by using basis functions which are compact in time and frequency. The variables in the wavelet domain, scale (a frequency range), and translation (a temporal increment) can be associated with time-frequency, and so in the wavelet transform we have the potential to filter seismic signals in a pseudo time-frequency sense. The one dimensional discrete multiresolution form of the wavelet transform can be effectively used to suppress low frequency coherent noise on seismic shot records. This process, achieved by the muting or weighting of coefficients in the wavelet transform domain, is demonstrated by suppressing low velocity, low frequency ground roll from land- based seismic data, the benefits of which are visible at both the shot and stack stages of the seismic processing stream. The extension of this technique to the suppression of higher frequency coherent noise is limited by the octave band splitting of frequency space by the transform. The wavelet packet transform, an extension of the wavelet transform, allows a more adaptable tiling of the time frequency domain which in turn allows the suppression of noise containing high frequencies whilst minimising signal distortion. This technique is demonstrated to be effective in suppressing airblast from land based common receiver gathers, whilst minimising the distortion of reflected signals. These filtering techniques can be extended to two dimensions, filtering data in the two dimensional wavelet and wavelet packet domains. This technique involves muting the transform coefficients in the wavelet/wavelet packet transform space which has four variables: temporal translation, offset translation, frequency scale and wavenumber scale. As for the one-dimensional case the two dimensional wavelet transform suffers from poor resolution due to the octave splitting of f-k space, but when used in combination with a velocity based shift such as normal moveout, can be used to filter data with minimal distortion to the residual signal. Extending the process to using the two-dimensional wavelet packet transform eliminates the shift requirement and leads to more effective filtering in the four variable transform space. The wavelet packet filtering technique is effective in suppressing low velocity noise from land based seismic records showing visible improvement in both the common shot records and resultant stack. The non-stationary properties of the wavelet transform allows the filtering across geophone arrays (that is, the common shot record) by the application of the transform in the offset domain. Filtering of the wavelet coefficients, in combination with a linear or hyperbolic shift applied before and removed after filtering, allows discrimination against linear noise on common shot records associated with first breaks and hyperbolic events on common midpoint records such as multiples. The use of a simple muting technique in the wavelet domain effectively suppresses these forms of coherent noise. Where the velocity contrast between signal and noise is high, noise suppression is possible whilst preserving reflector amplitudes. Where the velocity contrast is smaller, weighting of the wavelet coefficients (based on transforms of the input signal after translation) allows noise suppression whilst preserving the amplitude versus offset relationships of the primary signal. This is shown to be effective on synthetic, marine and land based data, with improvements observed on common shot records and resultant stacks. The results of all these wavelet transform based filtering techniques are sensitive to the choice of wavelet transform kernel wavelet. The suitability of a kernel wavelet for filtering can be related to the frequency spectra of the kernel wavelet. A fast rate of frequency amplitude fall-off at the edge of a given scale of basis wavelet minimises frequency overlap between neighbouring kernel wavelet scales and so minimises contamination by noise associated with aliasing in the filtered signal, a process that is inherent in the transform process. A flat amplitude response across the frequency range of a given scale also leads to improved filtering results

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations
    corecore