Search CORE

2,035 research outputs found

A corroborative study on improving pitch determination by time–frequency cepstrum decomposition using wavelets

Author
Publication venue: Springer
Publication date
Field of study

Novel Pitch Detection Algorithm With Application to Speech Coding

Author: Kura Vijay
Publication venue: ScholarWorks@UNO
Publication date: 19/12/2003
Field of study

This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

University of New Orleans

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Estimation of glottal closure instants in voiced speech using the DYPSA algorithm

Author: Brookes M
Gudnason J
Kounoudes A
Naylor PA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Published versio

Spiral - Imperial College Digital Repository

Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based Features Extracted in Different Acoustic Conditions

Author: Vásquez Correa Juan Camilo
Publication venue: Medellín, Colombia
Publication date: 01/01/2016
Field of study

ABSTRACT: In the last years, there has a great progress in automatic speech recognition. The challenge now it is not only recognize the semantic content in the speech but also the called "paralinguistic" aspects of the speech, including the emotions, and the personality of the speaker. This research work aims in the development of a methodology for the automatic emotion recognition from speech signals in non-controlled noise conditions. For that purpose, different sets of acoustic, non-linear, and wavelet based features are used to characterize emotions in different databases created for such purpose

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital del Sistema de Bibliotecas de la Universidad de Antioquia

Real-time human ambulation, activity, and physiological monitoring:taxonomy of issues, techniques, applications, challenges and limitations

Author: Achumba Ifeyinwa E.
Azzi Djamel
Bersch Sebastian D.
Khusainov Rinat
Publication venue: 'MDPI AG'
Publication date: 01/01/2013
Field of study

Automated methods of real-time, unobtrusive, human ambulation, activity, and wellness monitoring and data analysis using various algorithmic techniques have been subjects of intense research. The general aim is to devise effective means of addressing the demands of assisted living, rehabilitation, and clinical observation and assessment through sensor-based monitoring. The research studies have resulted in a large amount of literature. This paper presents a holistic articulation of the research studies and offers comprehensive insights along four main axes: distribution of existing studies; monitoring device framework and sensor types; data collection, processing and analysis; and applications, limitations and challenges. The aim is to present a systematic and most complete study of literature in the area in order to identify research gaps and prioritize future research directions

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central

Portsmouth University Research Portal (Pure)

Recent Advances in Signal Processing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

Directory of Open Access Books (DOAB)

An investigation into glottal waveform based speech coding

Author: Bleakley Christopher J.
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/1995
Field of study

Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

DCU Online Research Access Service

Robust Estimation of Tone Break Indices from Speech Signal using Multi-Scale Analysis and their Applications

Author: Kolli Chandra Sekhar Rao
Publication venue: University of Memphis Digital Commons
Publication date: 19/07/2012
Field of study

The aim of this study is to develop robust algorithm to automatically detect the Tone and Break Indices(ToBI) from the speech signal and explore their applications. iLAST was introduced to analyze the acoustic and prosodic features to detect the ToBI indices. Both expert and data driven rules were used to improve the robustness. The integration of multi-scale signal analysis with rule-based classification has helped in robustly identifying tones that can be used in applications, such as identifying Vowel triangle, emotions from speech etc. Empirical analyses using labeled dataset were performed to illustrate the utility of the proposed approach. Further analyses were conducted to identify the inefficiencies with the proposed approach and address those issues through co-analyses of prosodic features in identifying the major contributors to robust detection of ToBI. It was demonstrated that the proposed approach performs robustly and can be used for developing a wide variety of applications

University of Memphis Digital Commons