Search CORE

498 research outputs found

The voice activity detection (VAD) recorder and VAD network recorder : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University

Author: Liu Feng
Publication venue: 'Massey University'
Publication date: 01/01/2001
Field of study

The project is to provide a feasibility study for the AudioGraph tool, focusing on two application areas: the VAD (voice activity detector) recorder and the VAD network recorder. The first one achieves a low bit-rate speech recording on the fly, using a GSM compression coder with a simple VAD algorithm; and the second one provides two-way speech over IP, fulfilling echo cancellation with a simplex channel. The latter is required for implementing a synchronous AudioGraph. In the first chapter we introduce the background of this project, specifically, the VoIP technology, the AudioGraph tool, and the VAD algorithms. We also discuss the problems set for this project. The second chapter presents all the relevant techniques in detail, including sound representation, speech-coding schemes, sound file formats, PowerPlant and Macintosh programming issues, and the simple VAD algorithm we have developed. The third chapter discusses the implementation issues, including the systems' objective, architecture, the problems encountered and solutions used. The fourth chapter illustrates the results of the two applications. The user documentations for the applications are given, and after that, we analyse the parameters based on the results. We also present the default settings of the parameters, which could be used in the AudioGraph system. The last chapter provides conclusions and future work

Massey Research Online

Design of a smartphone with a Digital Signal Processor

Author: Lecluse Joep
Publication venue
Publication date: 01/01/1996
Field of study

Repository TU/e

Pure OAI Repository

Secure covert communications over streaming media using dynamic steganography

Author: Peng Jinghui
Publication venue
Publication date
Field of study

Streaming technologies such as VoIP are widely embedded into commercial and industrial applications, so it is imperative to address data security issues before the problems get really serious. This thesis describes a theoretical and experimental investigation of secure covert communications over streaming media using dynamic steganography. A covert VoIP communications system was developed in C++ to enable the implementation of the work being carried out. A new information theoretical model of secure covert communications over streaming media was constructed to depict the security scenarios in streaming media-based steganographic systems with passive attacks. The model involves a stochastic process that models an information source for covert VoIP communications and the theory of hypothesis testing that analyses the adversary‘s detection performance. The potential of hardware-based true random key generation and chaotic interval selection for innovative applications in covert VoIP communications was explored. Using the read time stamp counter of CPU as an entropy source was designed to generate true random numbers as secret keys for streaming media steganography. A novel interval selection algorithm was devised to choose randomly data embedding locations in VoIP streams using random sequences generated from achaotic process. A dynamic key updating and transmission based steganographic algorithm that includes a one-way cryptographical accumulator integrated into dynamic key exchange for covert VoIP communications, was devised to provide secure key exchange for covert communications over streaming media. The discrete logarithm problem in mathematics and steganalysis using t-test revealed the algorithm has the advantage of being the most solid method of key distribution over a public channel. The effectiveness of the new steganographic algorithm for covert communications over streaming media was examined by means of security analysis, steganalysis using non parameter Mann-Whitney-Wilcoxon statistical testing, and performance and robustness measurements. The algorithm achieved the average data embedding rate of 800 bps, comparable to other related algorithms. The results indicated that the algorithm has no or little impact on real-time VoIP communications in terms of speech quality (< 5% change in PESQ with hidden data), signal distortion (6% change in SNR after steganography) and imperceptibility, and it is more secure and effective in addressing the security problems than other related algorithms

UWL Repository

FPGA-based architectures for acoustic beamforming with microphone arrays : trends, challenges and research opportunities

Author: Braeken An
da Silva Gomes Bruno
Touhafi Abdellah
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

Over the past decades, many systems composed of arrays of microphones have been developed to satisfy the quality demanded by acoustic applications. Such microphone arrays are sound acquisition systems composed of multiple microphones used to sample the sound field with spatial diversity. The relatively recent adoption of Field-Programmable Gate Arrays (FPGAs) to manage the audio data samples and to perform the signal processing operations such as filtering or beamforming has lead to customizable architectures able to satisfy the most demanding computational, power or performance acoustic applications. The presented work provides an overview of the current FPGA-based architectures and how FPGAs are exploited for different acoustic applications. Current trends on the use of this technology, pending challenges and open research opportunities on the use of FPGAs for acoustic applications using microphone arrays are presented and discussed

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Directory of Open Access Journals

A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion

Author: Nurminen Jani
Publication venue: Tampere University of Technology
Publication date: 01/01/2013
Field of study

During the past decades, many areas of speech processing have benefited from the vast increases in the available memory sizes and processing power. For example, speech recognizers can be trained with enormous speech databases and high-quality speech synthesizers can generate new speech sentences by concatenating speech units retrieved from a large inventory of speech data. However, even in today's world of ever-increasing memory sizes and computational resources, there are still lots of embedded application scenarios for speech processing techniques where the memory capacities and the processor speeds are very limited. Thus, there is still a clear demand for solutions that can operate with limited resources, e.g., on low-end mobile devices. This thesis introduces a new segmental parametric speech codec referred to as the VLBR codec. The novel proprietary sinusoidal speech codec designed for efficient speech storage is capable of achieving relatively good speech quality at compression ratios beyond the ones offered by the standardized speech coding solutions, i.e., at bitrates of approximately 1 kbps and below. The efficiency of the proposed coding approach is based on model simplifications, mode-based segmental processing, and the method of adaptive downsampling and quantization. The coding efficiency is also further improved using a novel flexible multi-mode matrix quantizer structure and enhanced dynamic codebook reordering. The compression is also facilitated using a new perceptual irrelevancy removal method. The VLBR codec is also applied to text-to-speech synthesis. In particular, the codec is utilized for the compression of unit selection databases and for the parametric concatenation of speech units. It is also shown that the efficiency of the database compression can be further enhanced using speaker-specific retraining of the codec. Moreover, the computational load is significantly decreased using a new compression-motivated scheme for very fast and memory-efficient calculation of concatenation costs, based on techniques and implementations used in the VLBR codec. Finally, the VLBR codec and the related speech synthesis techniques are complemented with voice conversion methods that allow modifying the perceived speaker identity which in turn enables, e.g., cost-efficient creation of new text-to-speech voices. The VLBR-based voice conversion system combines compression with the popular Gaussian mixture model based conversion approach. Furthermore, a novel method is proposed for converting the prosodic aspects of speech. The performance of the VLBR-based voice conversion system is also enhanced using a new approach for mode selection and through explicit control of the degree of voicing. The solutions proposed in the thesis together form a complete system that can be utilized in different ways and configurations. The VLBR codec itself can be utilized, e.g., for efficient compression of audio books, and the speech synthesis related methods can be used for reducing the footprint and the computational load of concatenative text-to-speech synthesizers to levels required in some embedded applications. The VLBR-based voice conversion techniques can be used to complement the codec both in storage applications and in connection with speech synthesis. It is also possible to only utilize the voice conversion functionality, e.g., in games or other entertainment applications

Trepo - Institutional Repository of Tampere University

VLSI design and FPGA-based prototyping of a buffered serial port for audio applications

Author: Marini Andrea
Publication venue: 'Pisa University Press'
Publication date: 27/02/2005
Field of study

The present market of semiconductor is very competitive; on one side consumers ask for always increasing performance and new possibilities, on the other companies have to offer low prices in order to be successful. For what concerns performance just think of the wide range of mobile applications, such as PDAs, cellular phones, and laptops : quality of services, duration of the battery and computational power are always taken into account when buying new devices. On the other side, due to the competition, costs have to be very low; this means that both recursive and non-recursive engineering costs have to be kept under control. Time is another important concern: it is usually true that the earlier a product is presented to the market, the wider share of the market it will gain. This leads modern semiconductor companies to look for viable ways to design improved products in a short time. Because of the complexity of the new electronic systems, this is not an easy task to be accomplished; even tough electronic design automation (EDA) tools have greatly improved in the recent years, a gap still exists between the rate foundries can produce chips and the rate these chips can be designed. A very common approach to deal with complexity and performance requirements is to integrate as many functions as possible on a single chip (System-On-Chip); this allows higher clock frequency and lower costs. In connection to this also design reuse has spread in a great part of semiconductor world. This means using in your system modules that others have already designed and tested. This allows you to skip some steps in the design flow (at least for those modules) and saving a significant amount of time. In this framework lies the work of my thesis, developed at the StarCore, a company headquartered in Austin, Texas. StarCore designs and licences Digital Signal Processors as intellectual property; this is basically one of the companies that offer its product to be used in other electronic systems, avoiding licensees to spend time in designing it by themselves. A Digital Signal Processor is a special kind of processor, designed to execute calculus-intensive applications: encoding and decoding of information, voice synthesis and recognition, compression and decompression of data, Fourier Transform are just some examples. In many systems, thanks to its programmability and its limited cost it is the suitable solution. For example most mobile phones employs a DSP processor to perform base band operation on the signal. In these kind of systems, it is important that very few cycles are spent doing other than signal processing, such as dealing with peripherals. In the case of an audio signal it is important that the audio port asks for the fewer cycle it is possible. For this reason at StarCore my activity was to design and develop an audio port controller aiming to reduce at least the cycles asked to the processor in case that the algorithm run is frame based. For this purpose I designed hardware to be mapped into an FPGA, and wrote some software for the DSP; I worked mainly with the Development Board, used to prototype applications based on the StarCore processor

Electronic Thesis and Dissertation Archive - Università di Pisa

Silicon Technologies for Speaker Independent Speech Processing and Recognition Systems in Noisy Environments

Author: Arun Selvaraj
Karthikeyan Natarajan
Mala John
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

Author: Huck R. W.
Rafferty William
Reekie D. Hugh M.
Publication venue
Publication date
Field of study

Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

NASA Technical Reports Server

Media gateway utilizando um GPU

Author: Portugal Ricardo
Publication venue: Universidade de Aveiro
Publication date: 01/01/2012
Field of study

Mestrado em Engenharia de Computadores e Telemátic

Repositório Institucional da Universidade de Aveiro