498 research outputs found
The voice activity detection (VAD) recorder and VAD network recorder : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University
The project is to provide a feasibility study for the AudioGraph tool, focusing on two application areas: the VAD (voice activity detector) recorder and the VAD network recorder. The first one achieves a low bit-rate speech recording on the fly, using a GSM compression coder with a simple VAD algorithm; and the second one provides two-way speech over IP, fulfilling echo cancellation with a simplex channel. The latter is required for implementing a synchronous AudioGraph. In the first chapter we introduce the background of this project, specifically, the VoIP technology, the AudioGraph tool, and the VAD algorithms. We also discuss the problems set for this project. The second chapter presents all the relevant techniques in detail, including sound representation, speech-coding schemes, sound file formats, PowerPlant and Macintosh programming issues, and the simple VAD algorithm we have developed. The third chapter discusses the implementation issues, including the systems' objective, architecture, the problems encountered and solutions used. The fourth chapter illustrates the results of the two applications. The user documentations for the applications are given, and after that, we analyse the parameters based on the results. We also present the default settings of the parameters, which could be used in the AudioGraph system. The last chapter provides conclusions and future work
Secure covert communications over streaming media using dynamic steganography
Streaming technologies such as VoIP are widely embedded into commercial and industrial applications, so it is imperative to address data security issues before the problems get really serious. This thesis describes a theoretical and experimental investigation of secure covert communications over streaming media using dynamic steganography. A covert VoIP communications system was developed in C++ to enable the implementation of the work being carried out.
A new information theoretical model of secure covert communications over streaming media was constructed to depict the security scenarios in streaming media-based steganographic systems with passive attacks. The model involves a stochastic process that models an information source for covert VoIP communications and the theory of hypothesis testing that analyses the adversary‘s detection performance.
The potential of hardware-based true random key generation and chaotic interval selection for innovative applications in covert VoIP communications was explored. Using the read time stamp counter of CPU as an entropy source was designed to generate true random numbers as secret keys for streaming media steganography. A novel interval selection algorithm was devised to choose randomly data embedding locations in VoIP streams using random sequences generated from achaotic process.
A dynamic key updating and transmission based steganographic algorithm that includes a one-way cryptographical accumulator integrated into dynamic key exchange for covert VoIP communications, was devised to provide secure key exchange for covert communications over streaming media. The discrete logarithm problem in mathematics and steganalysis using t-test revealed the algorithm has the advantage of being the most solid method of key distribution over a public channel.
The effectiveness of the new steganographic algorithm for covert communications over streaming media was examined by means of security analysis, steganalysis using non parameter Mann-Whitney-Wilcoxon statistical testing, and performance and robustness measurements. The algorithm achieved the average data embedding rate of 800 bps, comparable to other related algorithms. The results indicated that the algorithm has no or little impact on real-time VoIP communications in terms of speech quality (< 5% change in PESQ with hidden data), signal distortion (6% change in SNR after steganography) and imperceptibility, and it is more secure and effective in addressing the security problems than other related algorithms
FPGA-based architectures for acoustic beamforming with microphone arrays : trends, challenges and research opportunities
Over the past decades, many systems composed of arrays of microphones have been developed to satisfy the quality demanded by acoustic applications. Such microphone arrays are sound acquisition systems composed of multiple microphones used to sample the sound field with spatial diversity. The relatively recent adoption of Field-Programmable Gate Arrays (FPGAs) to manage the audio data samples and to perform the signal processing operations such as filtering or beamforming has lead to customizable architectures able to satisfy the most demanding computational, power or performance acoustic applications. The presented work provides an overview of the current FPGA-based architectures and how FPGAs are exploited for different acoustic applications. Current trends on the use of this technology, pending challenges and open research opportunities on the use of FPGAs for acoustic applications using microphone arrays are presented and discussed
A Parametric Approach for Efficient Speech Storage, Flexible Synthesis and Voice Conversion
During the past decades, many areas of speech processing have benefited from the vast increases in the available memory sizes and processing power. For example, speech recognizers can be trained with enormous speech databases and high-quality speech synthesizers can generate new speech sentences by concatenating speech units retrieved from a large inventory of speech data. However, even in today's world of ever-increasing memory sizes and computational resources, there are still lots of embedded application scenarios for speech processing techniques where the memory capacities and the processor speeds are very limited. Thus, there is still a clear demand for solutions that can operate with limited resources, e.g., on low-end mobile devices.
This thesis introduces a new segmental parametric speech codec referred to as the VLBR codec. The novel proprietary sinusoidal speech codec designed for efficient speech storage is capable of achieving relatively good speech quality at compression ratios beyond the ones offered by the standardized speech coding solutions, i.e., at bitrates of approximately 1 kbps and below. The efficiency of the proposed coding approach is based on model simplifications, mode-based segmental processing, and the method of adaptive downsampling and quantization. The coding efficiency is also further improved using a novel flexible multi-mode matrix quantizer structure and enhanced dynamic codebook reordering. The compression is also facilitated using a new perceptual irrelevancy removal method.
The VLBR codec is also applied to text-to-speech synthesis. In particular, the codec is utilized for the compression of unit selection databases and for the parametric concatenation of speech units. It is also shown that the efficiency of the database compression can be further enhanced using speaker-specific retraining of the codec. Moreover, the computational load is significantly decreased using a new compression-motivated scheme for very fast and memory-efficient calculation of concatenation costs, based on techniques and implementations used in the VLBR codec.
Finally, the VLBR codec and the related speech synthesis techniques are complemented with voice conversion methods that allow modifying the perceived speaker identity which in turn enables, e.g., cost-efficient creation of new text-to-speech voices. The VLBR-based voice conversion system combines compression with the popular Gaussian mixture model based conversion approach. Furthermore, a novel method is proposed for converting the prosodic aspects of speech. The performance of the VLBR-based voice conversion system is also enhanced using a new approach for mode selection and through explicit control of the degree of voicing.
The solutions proposed in the thesis together form a complete system that can be utilized in different ways and configurations. The VLBR codec itself can be utilized, e.g., for efficient compression of audio books, and the speech synthesis related methods can be used for reducing the footprint and the computational load of concatenative text-to-speech synthesizers to levels required in some embedded applications. The VLBR-based voice conversion techniques can be used to complement the codec both in storage applications and in connection with speech synthesis. It is also possible to only utilize the voice conversion functionality, e.g., in games or other entertainment applications
VLSI design and FPGA-based prototyping of a buffered serial port for audio applications
The present market of semiconductor is very competitive; on one
side consumers ask for always increasing performance and new
possibilities, on the other companies have to offer low prices in
order to be successful. For what concerns performance just think
of the wide range of mobile applications, such as PDAs, cellular
phones, and laptops : quality of services, duration of the battery
and computational power are always taken into account when buying
new devices. On the other side, due to the competition, costs have
to be very low; this means that both recursive and non-recursive
engineering costs have to be kept under control.
Time is another important concern: it is usually true that the
earlier a product is presented to the market, the wider share of
the market it will gain. This leads modern semiconductor companies
to look for viable ways to design improved products in a short
time. Because of the complexity of the new electronic systems,
this is not an easy task to be accomplished; even tough electronic
design automation (EDA) tools have greatly improved in the recent
years, a gap still exists between the rate foundries can produce
chips and the rate these chips can be designed.
A very common approach to deal with complexity and performance
requirements is to integrate as many functions as possible on a
single chip (System-On-Chip); this allows higher clock frequency
and lower costs. In connection to this also design reuse has
spread in a great part of semiconductor world. This means using in
your system modules that others have already designed and tested.
This allows you to skip some steps in the design flow (at least
for those modules) and saving a significant amount of time.
In this framework lies the work of my thesis, developed at the
StarCore, a company headquartered in Austin, Texas. StarCore
designs and licences Digital Signal Processors as intellectual
property; this is basically one of the companies that offer its
product to be used in other electronic systems, avoiding licensees
to spend time in designing it by themselves.
A Digital Signal Processor is a special kind of processor,
designed to execute calculus-intensive applications: encoding and
decoding of information, voice synthesis and recognition,
compression and decompression of data, Fourier Transform are just
some examples. In many systems, thanks to its programmability and
its limited cost it is the suitable solution. For example most
mobile phones employs a DSP processor to perform base band
operation on the signal.
In these kind of systems, it is important that very few cycles are
spent doing other than signal processing, such as dealing with
peripherals. In the case of an audio signal it is important that
the audio port asks for the fewer cycle it is possible. For this
reason at StarCore my activity was to design and develop an audio
port controller aiming to reduce at least the cycles asked to the
processor in case that the algorithm run is frame based.
For this purpose I designed hardware to be mapped into an FPGA,
and wrote some software for the DSP; I worked mainly with the
Development Board, used to prototype applications based on the
StarCore processor
Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)
Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression
Media gateway utilizando um GPU
Mestrado em Engenharia de Computadores e Telemátic
- …