Search CORE

643 research outputs found

Employment of Spectral Voicing Information for Speech and Speaker Recognition in Noisy Conditions

Author: M&#252
Peter Jan&#269
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

University of Birmingham Research Portal

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Detailed versus gross spectro-temporal cues for the perception of stop consonants

Author: Smits R.L.H.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1995
Field of study

x+182hlm.;24c

Repository TU/e

Pure OAI Repository

uilis.unsyiah.ac.id

Recommended from our members

Modelling and extraction of fundamental frequency in speech signals

Author: Pawi Alipah
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method

Brunel University Research Archive

From heuristics-based to data-driven audio melody extraction

Author: Bosch Juan J.
Publication venue
Publication date: 01/01/2017
Field of study

The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

ZENODO

Tesis Doctorals en Xarxa

Digital Signal Processing

Author: Bondaryk Joseph E.
Cobra Daniel T.
Covell Michele M.
Davis Randall
Dove Webster P.
Feder Meir
Frisk George V.
Griffin Daniel W.
Harasty Daniel J.
Hardwick John C.
Izraelevitz David
Jachner Jacek
Joo Tae H.
Lim Jae S.
Martinez Dennis M.
Milios Evangelos E.
Myers Cory S.
Oppenheim Alan V.
Pappas Thrasyvoulos N.
Rodriguez Jeffrey J.
Silva Anthony J.
van Hove Patrick
Weinstein Ehud
Wengrovitz Michael S.
Zakhor Avideh
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 01/01/1987
Field of study

Contains an introduction and reports on twenty research projects.National Science Foundation (Grant ECS 84-07285)U.S. Navy - Office of Naval Research (Contract N00014-81-K-0742)National Science Foundation FellowshipSanders Associates, Inc.U.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028)Canada, Bell Northern Research ScholarshipCanada, Fonds pour la Formation de Chercheurs et l'Aide a la Recherche Postgraduate FellowshipCanada, Natural Science and Engineering Research Council Postgraduate FellowshipU.S. Navy - Office of Naval Research (Contract N00014-81-K-0472)Fanny and John Hertz Foundation FellowshipCenter for Advanced Television StudiesAmoco Foundation FellowshipU.S. Air Force - Office of Scientific Research (Contract F19628-85-K-0028

DSpace@MIT

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Author: -Doss Mathew Magimai.
Prasad RaviShankar
Sarkar Eklavya
Publication venue: 'International Speech Communication Association'
Publication date: 27/06/2022
Field of study

Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora-2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics.Comment: Accepted at Interspeech 202

arXiv.org e-Print Archive

Speech Communication

Author: Henke William L.
Menyuk Paula
Zue Victor W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 15/04/1972
Field of study

Contains reports on three research projects.U.S. Air Force Cambridge Research Laboratories under Contract F19628-72-C-0181National Institutes of Health (Grant 5 RO1 NS04332-09)Joint Services Electronics Programs (U.S. Army, U. S. Navy, and U. S. Air Force) under Contract DAAB07-71-C-0300M. I. T. Lincoln Laboratory Purchase Order CC-57

DSpace@MIT

Compromises in orchestra pit design: A ten-year trench war in The Royal Theatre, Copenhagen

Author: Gade Anders Christian
Mortensen Bo
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1998
Field of study

Crossref

Online Research Database In Technology

Communications Biophysics

Author: Boduch Raymond
Braida Louis D.
Buhlert Klaus J.
Bustamante Diane K.
Chen Francine R.
Chomsky Carol
Clements Mark A.
Coker Jackie
Colburn H. Steven
Conway-Fithian Sue
Davis Mark F.
DeGennaro Steven V.
Downy Leonard C.
Durlach Nathaniel I.
Florentine Mary S.
Freeman Dennis M.
Frishkopf Lawrence S.
Gabriel Kaigham J.
Garrett Merrill F.
Gruenewald Paul J.
Hicks Bruce L.
Houtsma Adrian J. M.
Ito Yoshiko
Kiang Nelson Y-S.
Krieg Kenneth R.
Milner Paul
Moser James
Moss Peter J.
Peake William T.
Peterson Patrick M.
Picheny Michael A.
Rabinowitz William M.
Reed Charlotte M.
Russell Roy P., Jr.
Schultz Martin C.
Siebert William M.
Siegel Ronald A.
Snyder Jeff
Sotomayor-Diaz Orlando
Uchanski Rosalie M.
Villchur Edgar
Weiss Thomas F.
Zue Victor W.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 01/01/1981
Field of study

Contains reports on eight research projects split into four sections.National Institutes of Health (Grant 5 P01 NS13126)National Institutes of Health (Grant 5 K04 NS00113)National Institutes of Health (Training Grant 5 T32 NS07047)National Science Foundation (Grant BNS80-06369)National Institutes of Health (Grant 5 ROl NS11153)National Institutes of Health (Fellowship 1 F32 NS06544)National Science Foundation (Grant BNS77-16861)National Institutes of Health (Grant 5 R01 NS10916)National Institutes of Health (Grant 5 RO1 NS12846)National Science Foundation (Grant BNS77-21751)National Institutes of Health (Grant 1 R01 NS14092)National Institutes of Health (Grant 2 R01 NS11680)National Institutes of Health (Grant 5 ROl1 NS11080)National Institutes of Health (Training Grant 5 T32 GM07301

DSpace@MIT