105 research outputs found

    Model-Based Speech Enhancement

    Get PDF
    Abstract A method of speech enhancement is developed that reconstructs clean speech from a set of acoustic features using a harmonic plus noise model of speech. This is a significant departure from traditional filtering-based methods of speech enhancement. A major challenge with this approach is to estimate accurately the acoustic features (voicing, fundamental frequency, spectral envelope and phase) from noisy speech. This is achieved using maximum a-posteriori (MAP) estimation methods that operate on the noisy speech. In each case a prior model of the relationship between the noisy speech features and the estimated acoustic feature is required. These models are approximated using speaker-independent GMMs of the clean speech features that are adapted to speaker-dependent models using MAP adaptation and for noise using the Unscented Transform. Objective results are presented to optimise the proposed system and a set of subjective tests compare the approach with traditional enhancement methods. Threeway listening tests examining signal quality, background noise intrusiveness and overall quality show the proposed system to be highly robust to noise, performing significantly better than conventional methods of enhancement in terms of background noise intrusiveness. However, the proposed method is shown to reduce signal quality, with overall quality measured to be roughly equivalent to that of the Wiener filter

    New Stategies for Single-channel Speech Separation

    Get PDF

    An Indirect Speech Enhancement Framework Through Intermediate Noisy Speech Targets

    Get PDF
    Noise presents a severe challenge in speech communication and processing systems. Speech enhancement aims at removing the inference and restoring speech quality. It is an essential step in a speech processing pipeline in many modern electronic devices, such as mobile phones and smart speakers. Traditionally, speech engineers have relied on signal processing techniques, such as spectral subtraction or Wiener filtering. Since the advent of deep learning, data-driven methods have offered an alternative solution to speech enhancement. Researchers and engineers have proposed various neural network architectures to map noisy speech features into clean ones. In this thesis, we refer to this class of mapping based data-driven techniques collectively as a direct method in speech enhancement. The output speech from direct mapping methods usually contains noise residue and unpleasant distortion if the speech power is low relative to the noise power or the background noise is very complex. The former adverse condition refers to low signal-to-noise-ratio (SNR). The latter condition implies difficult noise types. Researchers have proposed improving the SNR of speech signal incrementally during enhancement to overcome such difficulty, known as SNR-progressive speech enhancement. This design breaks down the problem of direct mapping into manageable sub-tasks. Inspired by the previous work, we propose to adopt a multi-stage indirect approach to speech enhancement in challenging noise conditions. Unlike SNR-progressive speech enhancement, we gradually transform noisy speech from difficult background noise to speech in simple noise types. The thesis's focus will include the characterization of background noise, speech transformation techniques, and integration of an indirect speech enhancement system.Ph.D

    Hierarchical learning : theory with applications in speech and vision

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 123-132).Over the past two decades several hierarchical learning models have been developed and applied to a diverse range of practical tasks with much success. Little is known, however, as to why such models work as well as they do. Indeed, most are difficult to analyze, and cannot be easily characterized using the established tools from statistical learning theory. In this thesis, we study hierarchical learning architectures from two complementary perspectives: one theoretical and the other empirical. The theoretical component of the thesis centers on a mathematical framework describing a general family of hierarchical learning architectures. The primary object of interest is a recursively defined feature map, and its associated kernel. The class of models we consider exploit the fact that data in a wide variety of problems satisfy a decomposability property. Paralleling the primate visual cortex, hierarchies are assembled from alternating filtering and pooling stages that build progressively invariant representations which are simultaneously selective for increasingly complex stimuli. A goal of central importance in the study of hierarchical architectures and the cortex alike, is that of understanding quantitatively the tradeoff between invariance and selectivity, and how invariance and selectivity contribute towards providing an improved representation useful for learning from data. A reasonable expectation is that an unsupervised hierarchical representation will positively impact the sample complexity of a corresponding supervised learning task.(cont.) We therefore analyze invariance and discrimination properties that emerge in particular instances of layered models described within our framework. A group-theoretic analysis leads to a concise set of conditions which must be met to establish invariance, as well as a constructive prescription for meeting those conditions. An information-theoretic analysis is then undertaken and seen as a means by which to characterize a model's discrimination properties. The empirical component of the thesis experimentally evaluates key assumptions built into the mathematical framework. In the case of images, we present simulations which support the hypothesis that layered architectures can reduce the sample complexity of a non-trivial learning problem. In the domain of speech, we describe a 3 localized analysis technique that leads to a noise-robust representation. The resulting biologically-motivated features are found to outperform traditional methods on a standard phonetic classification task in both clean and noisy conditions.by Jacob V. Bouvrie.Ph.D

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Novel deep learning architectures for marine and aquaculture applications

    Get PDF
    Alzayat Saleh's research was in the area of artificial intelligence and machine learning to autonomously recognise fish and their morphological features from digital images. Here he created new deep learning architectures that solved various computer vision problems specific to the marine and aquaculture context. He found that these techniques can facilitate aquaculture management and environmental protection. Fisheries and conservation agencies can use his results for better monitoring strategies and sustainable fishing practices

    Detection of instability in power systems using connectionism

    Get PDF
    • …
    corecore