26 research outputs found

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Wavelet methods in speech recognition

    Get PDF
    In this thesis, novel wavelet techniques are developed to improve parametrization of speech signals prior to classification. It is shown that non-linear operations carried out in the wavelet domain improve the performance of a speech classifier and consistently outperform classical Fourier methods. This is because of the localised nature of the wavelet, which captures correspondingly well-localised time-frequency features within the speech signal. Furthermore, by taking advantage of the approximation ability of wavelets, efficient representation of the non-stationarity inherent in speech can be achieved in a relatively small number of expansion coefficients. This is an attractive option when faced with the so-called 'Curse of Dimensionality' problem of multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial Neural Networks (ANNs). Conventional time-frequency analysis methods such as the Discrete Fourier Transform either miss irregular signal structures and transients due to spectral smearing or require a large number of coefficients to represent such characteristics efficiently. Wavelet theory offers an alternative insight in the representation of these types of signals. As an extension to the standard wavelet transform, adaptive libraries of wavelet and cosine packets are introduced which increase the flexibility of the transform. This approach is observed to be yet more suitable for the highly variable nature of speech signals in that it results in a time-frequency sampled grid that is well adapted to irregularities and transients. They result in a corresponding reduction in the misclassification rate of the recognition system. However, this is necessarily at the expense of added computing time. Finally, a framework based on adaptive time-frequency libraries is developed which invokes the final classifier to choose the nature of the resolution for a given classification problem. The classifier then performs dimensionaIity reduction on the transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet feature extractor. The overall conclusions of the thesis are that wavelets and their relatives are capable of extracting useful features for speech classification problems. The use of adaptive wavelet transforms provides the flexibility within which powerful feature extractors can be designed for these types of application

    Multiscale Methods in Image Modelling and Image Processing

    Get PDF
    The field of modelling and processing of 'images' has fairly recently become important, even crucial, to areas of science, medicine, and engineering. The inevitable explosion of imaging modalities and approaches stemming from this fact has become a rich source of mathematical applications. 'Imaging' is quite broad, and suffers somewhat from this broadness. The general question of 'what is an image?' or perhaps 'what is a natural image?' turns out to be difficult to address. To make real headway one may need to strongly constrain the class of images being considered, as will be done in part of this thesis. On the other hand there are general principles that can guide research in many areas. One such principle considered is the assertion that (classes of) images have multiscale relationships, whether at a pixel level, between features, or other variants. There are both practical (in terms of computational complexity) and more philosophical reasons (mimicking the human visual system, for example) that suggest looking at such methods. Looking at scaling relationships may also have the advantage of opening a problem up to many mathematical tools. This thesis will detail two investigations into multiscale relationships, in quite different areas. One will involve Iterated Function Systems (IFS), and the other a stochastic approach to reconstruction of binary images (binary phase descriptions of porous media). The use of IFS in this context, which has often been called 'fractal image coding', has been primarily viewed as an image compression technique. We will re-visit this approach, proposing it as a more general tool. Some study of the implications of that idea will be presented, along with applications inferred by the results. In the area of reconstruction of binary porous media, a novel, multiscale, hierarchical annealing approach is proposed and investigated

    Sparse Approximation and Dictionary Learning with Applications to Audio Signals

    Get PDF
    PhDOver-complete transforms have recently become the focus of a wide wealth of research in signal processing, machine learning, statistics and related fields. Their great modelling flexibility allows to find sparse representations and approximations of data that in turn prove to be very efficient in a wide range of applications. Sparse models express signals as linear combinations of a few basis functions called atoms taken from a so-called dictionary. Finding the optimal dictionary from a set of training signals of a given class is the objective of dictionary learning and the main focus of this thesis. The experimental evidence presented here focuses on the processing of audio signals, and the role of sparse algorithms in audio applications is accordingly highlighted. The first main contribution of this thesis is the development of a pitch-synchronous transform where the frame-by-frame analysis of audio data is adapted so that each frame analysing periodic signals contains an integer number of periods. This algorithm presents a technique for adapting transform parameters to the audio signal to be analysed, it is shown to improve the sparsity of the representation if compared to a non pitchsynchronous approach and further evaluated in the context of source separation by binary masking. A second main contribution is the development of a novel model and relative algorithm for dictionary learning of convolved signals, where the observed variables are sparsely approximated by the atoms contained in a convolved dictionary. An algorithm is devised to learn the impulse response applied to the dictionary and experimental results on synthetic data show the superior approximation performance of the proposed method compared to a state-of-the-art dictionary learning algorithm. Finally, a third main contribution is the development of methods for learning dictionaries that are both well adapted to a training set of data and mutually incoherent. Two novel algorithms namely the incoherent k-svd and the iterative projections and rotations (ipr) algorithm are introduced and compared to different techniques published in the literature in a sparse approximation context. The ipr algorithm in particular is shown to outperform the benchmark techniques in learning very incoherent dictionaries while maintaining a good signal-to-noise ratio of the representation

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    High-speed imaging with optical encoding and compressive sensing

    Get PDF
    Imaging instruments can be used to obtain a series of frames in domains such as frequency and time. Recent advancements in applications such as medical astronomical, scientic and the consumer application, demand overall improvements in these imaging systems. Many current imaging methods rely on the well-known Shannon-Nyquist theorem where sustaining this conventional model increases the system complexity, data rate, storage and processing power as well as the overall build costs of these units. Recent investigations based on the mathematical theory of compressed sensing (CS) have broken the traditional sampling mechanisms and introduces alternative methods of data sampling. This dissertation investigates the current advancements in the high-speed imaging schemes and proposes new methods and optical designs to improve the spatial and temporal resolution as well as the required transmission and storage capacity of the imaging systems. First, we investigate the current mathematical models of CS based algorithms in video acquisition systems and propose an improved adapted technique for data reconstruction. Then we investigate the state-of-the-art high-speed imaging methods and introduce optical encoding techniques that enable the current high-speed imaging systems to reach 10 times faster frame rates whilst preserving the spatial resolution of the existing systems. Second, we develop a novel high-speed imaging system that implements CS based optical imaging technique and experimentally demonstrate the operation of this novel imaging system. The proposed compressive coded rotating mirror (CCRM) camera benefits from noticeably improved physical dimensions, highly reduced build costs and the significantly simplified operation compared to the other high-speed cameras. Due to the built-in optical encoding and on-the-fly compression functionalities of CCRM camera, it becomes a viable option for the fields such as the medical and military based imaging applications where the security of the data remains one of the top priorities in the imaging instruments. Finally, we discuss the potential improvements on the CCRM camera and propose several advancement plans for the future of this system

    Transform domain texture synthesis on surfaces

    Get PDF
    In the recent past application areas such as virtual reality experiences, digital cinema and computer gamings have resulted in a renewed interest in advanced research topics in computer graphics. Although many research challenges in computer graphics have been met due to worldwide efforts, many more are yet to be met. Two key challenges which still remain open research problems are, the lack of perfect realism in animated/virtually-created objects when represented in graphical format and the need for the transmissiim/storage/exchange of a massive amount of information in between remote locations, when 3D computer generated objects are used in remote visualisations. These challenges call for further research to be focused in the above directions. Though a significant amount of ideas have been proposed by the international research community in their effort to meet the above challenges, the ideas still suffer from excessive complexity related issues resulting in high processing times and their practical inapplicability when bandwidth constraint transmission mediums are used or when the storage space or computational power of the display device is limited. In the proposed work we investigate the appropriate use of geometric representations of 3D structure (e.g. Bezier surface, NURBS, polygons) and multi-resolution, progressive representation of texture on such surfaces. This joint approach to texture synthesis has not been considered before and has significant potential in resolving current challenges in virtual realism, digital cinema and computer gaming industry. The main focus of the novel approaches that are proposed in this thesis is performing photo-realistic texture synthesis on surfaces. We have provided experimental results and detailed analysis to prove that the proposed algorithms allow fast, progressive building of texture on arbitrarily shaped 3D surfaces. In particular we investigate the above ideas in association with Bezier patch representation of 3D objects, an approach which has not been considered so far by any published world wide research effort, yet has flexibility of utmost practical importance. Further we have discussed the novel application domains that can be served by the inclusion of additional functionality within the proposed algorithms.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Single- and multi-microphone speech dereverberation using spectral enhancement

    Get PDF
    In speech communication systems, such as voice-controlled systems, hands-free mobile telephones, and hearing aids, the received microphone signals are degraded by room reverberation, background noise, and other interferences. This signal degradation may lead to total unintelligibility of the speech and decreases the performance of automatic speech recognition systems. In the context of this work reverberation is the process of multi-path propagation of an acoustic sound from its source to one or more microphones. The received microphone signal generally consists of a direct sound, reflections that arrive shortly after the direct sound (commonly called early reverberation), and reflections that arrive after the early reverberation (commonly called late reverberation). Reverberant speech can be described as sounding distant with noticeable echo and colouration. These detrimental perceptual effects are primarily caused by late reverberation, and generally increase with increasing distance between the source and microphone. Conversely, early reverberations tend to improve the intelligibility of speech. In combination with the direct sound it is sometimes referred to as the early speech component. Reduction of the detrimental effects of reflections is evidently of considerable practical importance, and is the focus of this dissertation. More specifically the dissertation deals with dereverberation techniques, i.e., signal processing techniques to reduce the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone speech dereverberation algorithms are developed that aim at the suppression of late reverberation, i.e., at estimation of the early speech component. This is done via so-called spectral enhancement techniques that require a specific measure of the late reverberant signal. This measure, called spectral variance, can be estimated directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation model and a limited amount of a priori knowledge about the acoustic channel(s) between the source and the microphone(s). In our work an existing single-channel statistical reverberation model serves as a starting point. The model is characterized by one parameter that depends on the acoustic characteristics of the environment. We show that the spectral variance estimator that is based on this model, can only be used when the source-microphone distance is larger than the so-called critical distance. This is, crudely speaking, the distance where the direct sound power is equal to the total reflective power. A generalization of the statistical reverberation model in which the direct sound is incorporated is developed. This model requires one additional parameter that is related to the ratio between the direct sound energy and the sound energy of all reflections. The generalized model is used to derive a novel spectral variance estimator. When the novel estimator is used for dereverberation rather than the existing estimator, and the source-microphone distance is smaller than the critical distance, the dereverberation performance is significantly increased. Single-microphone systems only exploit the temporal and spectral diversity of the received signal. Reverberation, of course, also induces spatial diversity. To additionally exploit this diversity, multiple microphones must be used, and their outputs must be combined by a suitable spatial processor such as the so-called delay and sum beamformer. It is not a priori evident whether spectral enhancement is best done before or after the spatial processor. For this reason we investigate both possibilities, as well as a merge of the spatial processor and the spectral enhancement technique. An advantage of the latter option is that the spectral variance estimator can be further improved. Our experiments show that the use of multiple microphones affords a significant improvement of the perceptual speech quality. The applicability of the theory developed in this dissertation is demonstrated using a hands-free communication system. Since hands-free systems are often used in a noisy and reverberant environment, the received microphone signal does not only contain the desired signal but also interferences such as room reverberation that is caused by the desired source, background noise, and a far-end echo signal that results from a sound that is produced by the loudspeaker. Usually an acoustic echo canceller is used to cancel the far-end echo. Additionally a post-processor is used to suppress background noise and residual echo, i.e., echo which could not be cancelled by the echo canceller. In this work a novel structure and post-processor for an acoustic echo canceller are developed. The post-processor suppresses late reverberation caused by the desired source, residual echo, and background noise. The late reverberation and late residual echo are estimated using the generalized statistical reverberation model. Experimental results convincingly demonstrate the benefits of the proposed system for suppressing late reverberation, residual echo and background noise. The proposed structure and post-processor have a low computational complexity, a highly modular structure, can be seamlessly integrated into existing hands-free communication systems, and affords a significant increase of the listening comfort and speech intelligibility

    Computer Aided Dysplasia Grading for Barrett’s Oesophagus Virtual Slides

    Get PDF
    Dysplasia grading in Barrett’s Oesophagus has been an issue among pathologist worldwide. Despite of the increasing number of sufferers every year especially for westerners, dysplasia in Barrett’s Oesophagus can only be graded by a trained pathologist with visual examination. Therefore, we present our work on extracting textural and spatial features from the tissue regions. Our first approach is to extract only the epithelial layer of the tissue, based on the grading rules by pathologists. This is carried out by extracting sub images of a certain window size along the tissue epithelial layer. The textural features of these sub images were used to grade regions into dysplasia or not-dysplasia and we have achieved 82.5% AP with 0.82 precision and 0.86 recall value. Therefore, we have managed to overcame the ‘boundary-effect’ issues that have usually been avoided by selecting or cropping tissue image without the boundary. Secondly, the textural and spatial features of the whole tissue in the region were investigated. Experiments were carried out using Grey Level Co-occurrence Matrices at the pixel-level with a brute-force approach experiment, to cluster patches based on its texture similarities.Then, we have developed a texture-mapping technique that translates the spatial arrangement of tissue texture within a tissue region on the patch-level. As a result, three binary decision tree models were developed from the texture-mapping image, to grade each annotated regions into dysplasia Grade 1, Grade 3 and Grade 5 with 87.5%, 75.0% and 81.3% accuracy percentage with kappa score 0.75, 0.5 and 0.63 respectively. A binary decision tree was then used on the spatial arrangement of the tissue texture types with respect to the epithelial layer to help grade the regions. 75.0%, 68.8% and 68.8% accuracy percentage with kappa value of 0.5, 0.37 and 0.37 were achieved respectively for dysplasia Grade 1, Grade 3 and Grade 5. Based on the result achieved, we can conclude that the spatial information of tissue texture types with regards to the epithelial layer, is not as strong as is on the whole region. The binary decision tree grading models were applied on the broader tissue area; the whole virtual pathology slides itself. The consensus grading for each tissue is calculated with positivity table and scoring method. Finally, we present our own thresholded frequency method to grade virtual slides based on frequency of grading occurrence; and the result were compared to the pathologist’s grading. High agreement score with 0.80 KV was achieved and this is a massive improvement compared to a simple frequency scoring, which is only 0.47 KV
    corecore