Search CORE

615 research outputs found

Pitch Estimation for Non-Stationary Speech

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Speech Decomposition and Enhancement

Author: Yoo Sungyub
Publication venue
Publication date: 14/10/2005
Field of study

The goal of this study is to investigate the roles of steady-state speech sounds and transitions between these sounds in the intelligibility of speech. The motivation for this approach is that the auditory system may be particularly sensitive to time-varying frequency edges, which in speech are produced primarily by transitions between vowels and consonants and within vowels. The possibility that selectively amplifying these edges may enhance speech intelligibility is examined. Computer algorithms to decompose speech into two different components were developed. One component, which is defined as a tonal component, was intended to predominately include formant activity. The second component, which is defined as a non-tonal component, was intended to predominately include transitions between and within formants.The approach to the decomposition is to use a set of time-varying filters whose center frequencies and bandwidths are controlled to identify the strongest formant components in speech. Each center frequency and bandwidth is estimated based on FM and AM information of each formant component. The tonal component is composed of the sum of the filter outputs. The non-tonal component is defined as the difference between the original speech signal and the tonal component.The relative energy and intelligibility of the tonal and non-tonal components were compared to the original speech. Psychoacoustic growth functions were used to assess the intelligibility. Most of the speech energy was in the tonal component, but this component had a significantly lower maximum word recognition than the original and non-tonal component had. The non-tonal component averaged 2% of the original speech energy, but this component had almost equal maximum word recognition as the original speech. The non-tonal component was amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to the original speech, and the intelligibility of the enhanced speech was compared to the original speech in background noise. The enhanced speech showed higher recognition scores at lower SNRs, and the differences were significant. The original and enhanced speech showed similar recognition scores at higher SNRs. These results suggest that amplification of transient information can enhance the speech in noise and this enhancement method is more effective at severe noise conditions

D-Scholarship@Pitt

Fast Harmonic Chirp Summation

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Jensen Søren Holdt
Jensen Tobias Lindstrøm
Nielsen Jesper Kjær
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/03/2017
Field of study

Crossref

VBN

Enhancement of Non-Stationary Speech using Harmonic Chirp Filters

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Nørholm Sidsel Marie
Publication venue: International Speech Communications Association
Publication date: 01/01/2015
Field of study

VBN

Instantaneous Fundamental Frequency Estimation with Optimal Segmentation for Nonstationary Voiced Speech

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Nørholm Sidsel Marie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2016
Field of study

VBN

Modeling and frequency tracking of marine mammal whistle calls

Author: Severson Jared
Publication venue: 'MBLWHOI Library'
Publication date: 01/01/2009
Field of study

Submitted in partial fulfillment of the requirements for the degree of Master of Science at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution February 2009Marine mammal whistle calls present an attractive medium for covert underwater communications. High quality models of the whistle calls are needed in order to synthesize natural-sounding whistles with embedded information. Since the whistle calls are composed of frequency modulated harmonic tones, they are best modeled as a weighted superposition of harmonically related sinusoids. Previous research with bottlenose dolphin whistle calls has produced synthetic whistles that sound too “clean” for use in a covert communications system. Due to the sensitivity of the human auditory system, watermarking schemes that slightly modify the fundamental frequency contour have good potential for producing natural-sounding whistles embedded with retrievable watermarks. Structured total least squares is used with linear prediction analysis to track the time-varying fundamental frequency and harmonic amplitude contours throughout a whistle call. Simulation and experimental results demonstrate the capability to accurately model bottlenose dolphin whistle calls and retrieve embedded information from watermarked synthetic whistle calls. Different fundamental frequency watermarking schemes are proposed based on their ability to produce natural sounding synthetic whistles and yield suitable watermark detection and retrieval

DSpace@MIT

Crossref

Woods Hole Open Access Server

Calhoun, Institutional Archive of the Naval Postgraduate School

A Tutorial on Speech Synthesis Models

Author: Affifi Sadek
Boughazi Mohamed
Tabet Youcef
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

For Speech Synthesis, the understanding of the physical and mathematical models of speech is essential. Hence, Speech Modeling is a large field, and is well documented in literature. The aim in this paper is to provide a background review of several speech models used in speech synthesis, specifically the Source Filter Model, Linear Prediction Model, Sinusoidal Model, and Harmonic/Noise Model. The most important models of speech signals will be described starting from the earlier ones up until the last ones, in order to highlight major improvements over these models. It would be desirable a parametric model of speech, that is relatively simple, flexible, high quality, and robust in re-synthesis. Emphasis will be given in Harmonic / Noise Model, since it seems to be more promising and robust model of speech. (C) 2015 The Authors. Published by Elsevier B.V

Archives ouvertes de l'Université M'hamed Bougara Boumerdes

Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech

Author: Christensen Mads Græsbøll
Jensen Jesper Rindom
Nørholm Sidsel Marie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2016
Field of study

VBN

Non-Stationary Prediction for Addressing the Non-Causality Problem in Fixed-Filter ANC Headphones for Speech Reduction

Author: Belyi Valiantsin
Christensen Mads Græsbøll
Iotov Yurii
Nørholm Sidsel Marie
Publication venue: European Signal Processing Conference, EUSIPCO
Publication date: 01/01/2023
Field of study

VBN