33 research outputs found

    Voice SourceWaveform Analysis and Synthesis Using Principal Component Analysis and Gaussian Mixture Modelling

    Get PDF
    The paper presents a voice source waveform modeling techniques based on principal component analysis (PCA) and Gaussian mixture modeling (GMM). The voice source is obtained by inverse-filtering speech with the estimated vocal tract filter. This decomposition is useful in speech analysis, synthesis, recognition and coding. Here, a data-driven approach is presented for signal decomposition and classification based on the principal components of the voice source. The principal components are analyzed and the 'prototype' voice source signals corresponding to the Gaussian mixture means are examined. We show how an unknown signal can be decomposed into its components and/or prototypes and resynthesized. We show how the techniques are suited for both low bitrate or high quality analysis/synthesis schemes

    Hilbert phase methods for glottal activity detection

    Get PDF
    The 2 pi discontinuities found in the wrapped Hilbert phase of the bandpass-filtered analytic DEGG signal provide accurate candidate locations of glottal closure instances (GCIs). Pruning these GCI candidates with an automatically determined amplitude threshold, found by iteratively removing from the full signal the inlier samples within a fraction of its standard deviation until converged, yields a 99.6% accurate detection system with a false alarm rate of 0.17%. This simpler algorithm, named Glottal Activity Detector For Laryngeal Input (GADFLI), outperforms the state-of-the-art SIGMA algorithm for GCI detection, which has a 94.2% detection rate, but a 5.46% false alarm rate. Performance metrics were computed over the entire APLAWD database, using an extensive, hand-verified markings database of 10,944 waveforms. A related proposed algorithm, QuickGCI, also makes use of Hilbert phase discontinuities, and does not require a thresholding post-processing step for GCI selection. Its performance is nearly as good as GADFLI. Both proposed algorithms operate using the electroglottographic signal or acoustic speech signal

    Database recovery

    Get PDF
    Recovery techniques are an important aspect of database systems. They are essential to ensure that data integrity is maintained after any type of failure occurs. The recovery mechanism must be designed so that the availability and performance of the system are not unacceptably impacted by the recovery algorithms running during normal execution. On the other hand, enough information must be stored so that the database can be restored or transactions backed out in a reasonable amount of time. Concepts, techniques, and problems associated with database recovery will be presented in this thesis. The recovery issues for both centralized and distributed systems will be discussed, along with the tradeoffs of different recovery tools. The database recovery schemes in IMS/VS, DB2 and SDD-1 will be described to show approaches in existing systems

    Multiple source localization using spherical microphone arrays

    Get PDF
    Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimation methods for the real-world scenarios address one or more performance degrading factors such as noise, reverberation, multi-source activity or tackled problems such as source counting or reducing computational complexity. This thesis addresses various problems in MS DOA estimation for speech sources each of which focuses on one or more performance degrading factor(s). Firstly a narrowband DOA estimator is proposed utilizing high order spatial information in two computationally efficient ways. Secondly, an autonomous source counting technique is proposed which uses density-based clustering in an evolutionary framework. Thirdly, a confidence metric for validity of Single Source (SS) assumption in a Time-Frequency (TF) bin is proposed. It is based on MS assumption in a short time interval where the number and the TF bin of active sources are adaptively estimated. Finally two analytical narrowband MS DOA estimators are proposed based on MS assumption in a TF bin. The proposed methods are evaluated using simulations and real recordings. Each proposed technique outperforms comparative baseline methods and performs at least as accurately as the state-of-the-art.Open Acces

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    A systematic approach for integrated product, materials, and design-process design

    Get PDF
    Designers are challenged to manage customer, technology, and socio-economic uncertainty causing dynamic, unquenchable demands on limited resources. In this context, increased concept flexibility, referring to a designer s ability to generate concepts, is crucial. Concept flexibility can be significantly increased through the integrated design of product and material concepts. Hence, the challenge is to leverage knowledge of material structure-property relations that significantly affect system concepts for function-based, systematic design of product and materials concepts in an integrated fashion. However, having selected an integrated product and material system concept, managing complexity in embodiment design-processes is important. Facing a complex network of decisions and evolving analysis models a designer needs the flexibility to systematically generate and evaluate embodiment design-process alternatives. In order to address these challenges and respond to the primary research question of how to increase a designer s concept and design-process flexibility to enhance product creation in the conceptual and early embodiment design phases, the primary hypothesis in this dissertation is embodied as a systematic approach for integrated product, materials and design-process design. The systematic approach consists of two components i) a function-based, systematic approach to the integrated design of product and material concepts from a systems perspective, and ii) a systematic strategy to design-process generation and selection based on a decision-centric perspective and a value-of-information-based Process Performance Indicator. The systematic approach is validated using the validation-square approach that consists of theoretical and empirical validation. Empirical validation of the framework is carried out using various examples including: i) design of a reactive material containment system, and ii) design of an optoelectronic communication system.Ph.D.Committee Chair: Allen, Janet K.; Committee Member: Aidun, Cyrus K.; Committee Member: Klein, Benjamin; Committee Member: McDowell, David L.; Committee Member: Mistree, Farrokh; Committee Member: Yoder, Douglas P

    System Identification with Applications in Speech Enhancement

    No full text
    As the increasing popularity of integrating hands-free telephony on mobile portable devices and the rapid development of voice over internet protocol, identification of acoustic systems has become desirable for compensating distortions introduced to speech signals during transmission, and hence enhancing the speech quality. The objective of this research is to develop system identification algorithms for speech enhancement applications including network echo cancellation and speech dereverberation. A supervised adaptive algorithm for sparse system identification is developed for network echo cancellation. Based on the framework of selective-tap updating scheme on the normalized least mean squares algorithm, the MMax and sparse partial update tap-selection strategies are exploited in the frequency domain to achieve fast convergence performance with low computational complexity. Through demonstrating how the sparseness of the network impulse response varies in the transformed domain, the multidelay filtering structure is incorporated to reduce the algorithmic delay. Blind identification of SIMO acoustic systems for speech dereverberation in the presence of common zeros is then investigated. First, the problem of common zeros is defined and extended to include the presence of near-common zeros. Two clustering algorithms are developed to quantify the number of these zeros so as to facilitate the study of their effect on blind system identification and speech dereverberation. To mitigate such effect, two algorithms are developed where the two-stage algorithm based on channel decomposition identifies common and non-common zeros sequentially; and the forced spectral diversity approach combines spectral shaping filters and channel undermodelling for deriving a modified system that leads to an improved dereverberation performance. Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased dereverberation techniques. Comprehensive simulations and discussions demonstrate the effectiveness of the aforementioned algorithms. A discussion on possible directions of prospective research on system identification techniques concludes this thesis

    Proceedings of the 35th WIC Symposium on Information Theory in the Benelux and the 4th joint WIC/IEEE Symposium on Information Theory and Signal Processing in the Benelux, Eindhoven, the Netherlands May 12-13, 2014

    Get PDF
    Compressive sensing (CS) as an approach for data acquisition has recently received much attention. In CS, the signal recovery problem from the observed data requires the solution of a sparse vector from an underdetermined system of equations. The underlying sparse signal recovery problem is quite general with many applications and is the focus of this talk. The main emphasis will be on Bayesian approaches for sparse signal recovery. We will examine sparse priors such as the super-Gaussian and student-t priors and appropriate MAP estimation methods. In particular, re-weighted l2 and re-weighted l1 methods developed to solve the optimization problem will be discussed. The talk will also examine a hierarchical Bayesian framework and then study in detail an empirical Bayesian method, the Sparse Bayesian Learning (SBL) method. If time permits, we will also discuss Bayesian methods for sparse recovery problems with structure; Intra-vector correlation in the context of the block sparse model and inter-vector correlation in the context of the multiple measurement vector problem
    corecore