Search CORE

488 research outputs found

Perceptually Motivated Wavelet Packet Transform for Bioacoustic Signal Enhancement

Author: Cohen I.
Deller J. R.
Fu Q.
Jidong Tao
Michael T. Johnson
Osiejuk T. S.
Seyfarth R. M.
Shao Y.
Yao Ren
Publication venue: e-Publications@Marquette
Publication date: 01/07/2008
Field of study

A significant and often unavoidable problem in bioacoustic signal processing is the presence of background noise due to an adverse recording environment. This paper proposes a new bioacoustic signal enhancement technique which can be used on a wide range of species. The technique is based on a perceptually scaled wavelet packet decomposition using a species-specific Greenwood scale function. Spectral estimation techniques, similar to those used for human speech enhancement, are used for estimation of clean signal wavelet coefficients under an additive noise model. The new approach is compared to several other techniques, including basic bandpass filtering as well as classical speech enhancement methods such as spectral subtraction, Wiener filtering, and Ephraim–Malah filtering. Vocalizations recorded from several species are used for evaluation, including the ortolan bunting (Emberiza hortulana), rhesus monkey (Macaca mulatta), and humpback whale (Megaptera novaeanglia), with both additive white Gaussian noise and environment recording noise added across a range of signal-to-noise ratios (SNRs). Results, measured by both SNR and segmental SNR of the enhanced wave forms, indicate that the proposed method outperforms other approaches for a wide range of noise conditions

epublications@Marquette

Crossref

Uniform and warped low delay filter-banks for speech enhancement

Author: Beauchamp
Bellanger
Braccini
Cappé
Crochiere
Crochiere
Ephraim
Gustafsson
Gülzow
Heinrich W. Löllmann
Kates
Leou
Lotter
Martin
Morgan
Oppenheim
Oppenheim
Peter Vary
Petrovsky
Press
Proakis
Renfors
Schuller
Smith
Steiglitz
Vaidyanathan
Vary
Vary
Zwicker
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Multipitch Analysis and Tracking for Automatic Music Transcription

Author: Baumgartner Richard
Publication venue: ScholarWorks@UNO
Publication date: 21/05/2004
Field of study

Music has always played a large role in human life. The technology behind the art has progressed and grown over time in many areas, for instance the instruments themselves, the recording equipment used in studios, and the reproduction through digital signal processing. One facet of music that has seen very little attention over time is the ability to transcribe audio files into musical notation. In this thesis, a method of multipitch analysis is used to track multiple simultaneous notes through time in an audio music file. The analysis method is based on autocorrelation and a specialized peak pruning method to identify only the fundamental frequencies present at any single moment in the sequence. A sliding Hamming window is used to step through the input sound file and track through time. Results show the tracking of nontrivial musical patterns over two octaves in range and varying tempos

University of New Orleans

Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)

Author: Kanuri Mohan Kumar
Publication venue: ScholarWorks@UNO
Publication date: 09/08/2017
Field of study

Extraction of singing voice from music is one of the ongoing research topics in the field of speech recognition and audio analysis. In particular, this topic finds many applications in the music field, such as in determining music structure, lyrics recognition, and singer recognition. Although many studies have been conducted for the separation of voice from the background, there has been less study on singing voice in particular. In this study, efforts were made to design a new methodology to improve the separation of vocal and non-vocal components in audio clips using REPET [14]. In the newly designed method, we tried to rectify the issues encountered in the REPET method, while designing an improved repeating mask which is used to extract the non-vocal component in audio. The main reason why the REPET method was preferred over previous methods for this study is its independent nature. More specifically, the majority of existing methods for the separation of singing voice from music were constructed explicitly based on one or more assumptions

University of New Orleans

Graph Spectral Image Processing

Author: Cheung Gene
Magli Enrico
Ng Michael
Tanaka Yuichi
Publication venue
Publication date: 16/01/2018
Field of study

Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

arXiv.org e-Print Archive

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Frequency Domain Methods for Coding the Linear Predictive Residual of Speech Signals

Author: Perez Zarazaga Pablo
Publication venue
Publication date: 28/08/2017
Field of study

The most frequently used speech coding paradigm is ACELP, famous because it encodes speech with high quality, while consuming a small bandwidth. ACELP performs linear prediction filtering in order to eliminate the effect of the spectral envelope from the signal. The noise-like excitation is then encoded using algebraic codebooks. The search of this codebook, however, can not be performed optimally with conventional encoders due to the correlation between their samples. Because of this, more complex algorithms are required in order to maintain the quality. Four different transformation algorithms have been implemented (DCT, DFT, Eigenvalue decomposition and Vandermonde decomposition) in order to decorrelate the samples of the innovative excitation in ACELP. These transformations have been integrated in the ACELP of the EVS codec. The transformed innovative excitation is coded using the envelope based arithmetic coder. Objective and subjective tests have been carried out to evaluate the quality of the encoding, the degree of decorrelation achieved by the transformations and the computational complexity of the algorithms

Aaltodoc Publication Archive

Robust automatic transcription of lectures

Author: Wölfel Matthias
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2009
Field of study

Automatic transcription of lectures is becoming an important task. Possible applications can be found in the fields of automatic translation or summarization, information retrieval, digital libraries, education and communication research. Ideally those systems would operate on distant recordings, freeing the presenter from wearing body-mounted microphones. This task, however, is surpassingly difficult, given that the speech signal is severely degraded by background noise and reverberation

KITopen

Directory of Open Access Books (DOAB)

Analysis and correction of the helium speech effect by autoregressive signal processing

Author: Duncan George
Publication venue: The University of Edinburgh
Publication date: 01/01/1983
Field of study

SIGLELD:D48902/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

Edinburgh Research Archive

OpenGrey Repository

Robust Automatic Transcription of Lectures

Author: Wölfel Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

Die automatische Transkription von Vorträgen, Vorlesungen und Präsentationen wird immer wichtiger und ermöglicht erst die Anwendungen der automatischen Übersetzung von Sprache, der automatischen Zusammenfassung von Sprache, der gezielten Informationssuche in Audiodaten und somit die leichtere Zugänglichkeit in digitalen Bibliotheken. Im Idealfall arbeitet ein solches System mit einem Mikrofon das den Vortragenden vom Tragen eines Mikrofons befreit was der Fokus dieser Arbeit ist

KITopen