Search CORE

683 research outputs found

Estimation and Modeling Problems in Parametric Audio Coding

Author: Christensen Mads Græsbøll
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2005
Field of study

VBN

VoIP data Rate Reduction Exploiting Linear Prediction Coefficients Redundancy

Author: Islam Younis Morshed Amro
اسلام يونس مرشد عمرو
Publication venue: جامعة القدس
Publication date
Field of study

Al-Quds University Digital Repository

Exploring Domain-Specific Enhancements for a Neural Foley Synthesizer

Author: Betko Sage
Chen Hao
Liloia Ari
Pillay Ashwin
Shah Ankit
Publication venue
Publication date: 08/09/2023
Field of study

Foley sound synthesis refers to the creation of authentic, diegetic sound effects for media, such as film or radio. In this study, we construct a neural Foley synthesizer capable of generating mono-audio clips across seven predefined categories. Our approach introduces multiple enhancements to existing models in the text-to-audio domain, with the goal of enriching the diversity and acoustic characteristics of the generated foleys. Notably, we utilize a pre-trained encoder that retains acoustical and musical attributes in intermediate embeddings, implement class-conditioning to enhance differentiability among foley classes in their intermediate representations, and devise an innovative transformer-based architecture for optimizing self-attention computations on very large inputs without compromising valuable information. Subsequent to implementation, we present intermediate outcomes that surpass the baseline, discuss practical challenges encountered in achieving optimal results, and outline potential pathways for further research

arXiv.org e-Print Archive

Using a low-bit rate speech enhancement variable post-filter as a speech recognition system pre-filter to improve robustness to GSM speech

Author: Mahlanyane Nkululeko S
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2003
Field of study

Includes bibliographical references.Performance of speech recognition systems degrades when they are used to recognize speech that has been transmitted through GS1 (Global System for Mobile Communications) voice communication channels (GSM speech). This degradation is mainly due to GSM speech coding and GSM channel noise on speech signals transmitted through the network. This poor recognition of GSM channel speech limits the use of speech recognition applications over GSM networks. If speech recognition technology is to be used unlimitedly over GSM networks recognition accuracy of GSM channel speech has to be improved. Different channel normalization techniques have been developed in an attempt to improve recognition accuracy of voice channel modified speech in general (not specifically for GSM channel speech). These techniques can be classified into three broad categories, namely, model modification, signal pre-processing and feature processing techniques. In this work, as a contribution toward improving the robustness of speech recognition systems to GSM speech, the use of a low-bit speech enhancement post-filter as a speech recognition system pre-filter is proposed. This filter is to be used in recognition systems in combination with channel normalization techniques

Cape Town University OpenUCT

Parallelism and the software-hardware interface in embedded systems

Author: Chouliaras V A
Publication venue
Publication date: 01/01/2005
Field of study

This thesis by publications addresses issues in the architecture and microarchitecture of next generation, high performance streaming Systems-on-Chip through quantifying the most important forms of parallelism in current and emerging embedded system workloads. The work consists of three major research tracks, relating to data level parallelism, thread level parallelism and the software-hardware interface which together reflect the research interests of the author as they have been formed in the last nine years. Published works confirm that parallelism at the data level is widely accepted as the most important performance leverage for the efficient execution of embedded media and telecom applications and has been exploited via a number of approaches the most efficient being vectorlSIMD architectures. A further, complementary and substantial form of parallelism exists at the thread level but this has not been researched to the same extent in the context of embedded workloads. For the efficient execution of such applications, exploitation of both forms of parallelism is of paramount importance. This calls for a new architectural approach in the software-hardware interface as its rigidity, manifested in all desktop-based and the majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes. The author advocates a holistic, mature approach where parallelism is extracted via automatic means while at the same time, the traditionally rigid hardware-software interface is optimized to match the temporal and spatial behaviour of the embedded workload. This ultimate goal calls for the precise study of these forms of parallelism for a number of applications executing on theoretical models such as instruction set simulators and parallel RAM machines as well as the development of highly parametric microarchitectural frameworks to encapSUlate that functionality.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

Loughborough University Institutional Repository

OpenGrey Repository

Very low bit rate parametric audio coding

Author: Purnhagen Heiko
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2008
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Sparsity in Linear Predictive Coding of Speech

Author: Giacobello Daniele
Publication venue: Multimedia Information and Signal Processing, Institute of Electronic Systems, Aalborg University
Publication date: 01/01/2010
Field of study

nrpages: 197status: publishe

Lirias

VBN

Some New Results on the Estimation of Sinusoids in Noise

Author: Nielsen Jesper Kjær
Publication venue
Publication date: 27/09/2012
Field of study

VBN