13 research outputs found
A configurable vector processor for accelerating speech coding algorithms
The growing demand for voice-over-packer (VoIP) services and multimedia-rich
applications has made increasingly important the efficient, real-time implementation of
low-bit rates speech coders on embedded VLSI platforms. Such speech coders are
designed to substantially reduce the bandwidth requirements thus enabling dense multichannel
gateways in small form factor. This however comes at a high computational cost
which mandates the use of very high performance embedded processors.
This thesis investigates the potential acceleration of two major ITU-T speech coding
algorithms, namely G.729A and G.723.1, through their efficient implementation on a
configurable extensible vector embedded CPU architecture. New scalar and vector ISAs
were introduced which resulted in up to 80% reduction in the dynamic instruction count
of both workloads. These instructions were subsequently encapsulated into a parametric,
hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research
and implementation of the vector datapath of this vector coprocessor which is tightly-coupled
to a Sparc-V8 compliant CPU, the optimization and simulation methodologies
employed and the use of Electronic System Level (ESL) techniques to rapidly design
SIMD datapaths
Non-intrusive identification of speech codecs in digital audio signals
Speech compression has become an integral component in all modern telecommunications networks. Numerous codecs have been developed and deployed for efficiently transmitting voice signals while maintaining high perceptual quality. Because of the diversity of speech codecs used by different carriers and networks, the ability to distinguish between different codecs lends itself to a wide variety of practical applications, including determining call provenance, enhancing network diagnostic metrics, and improving automated speaker recognition. However, few research efforts have attempted to provide a methodology for identifying amongst speech codecs in an audio signal. In this research, we demonstrate a novel approach for accurately determining the presence of several contemporary speech codecs in a non-intrusive manner. The methodology developed in this research demonstrates techniques for analyzing an audio signal such that the subtle noise components introduced by the codec processing are accentuated while most of the original speech content is eliminated. Using these techniques, an audio signal may be profiled to gather a set of values that effectively characterize the codec present in the signal. This procedure is first applied to a large data set of audio signals from known codecs to develop a set of trained profiles. Thereafter, signals from unknown codecs may be similarly profiled, and the profiles compared to each of the known training profiles in order to decide which codec is the best match with the unknown signal. Overall, the proposed strategy generates extremely favorable results, with codecs being identified correctly in nearly 95% of all test signals. In addition, the profiling process is shown to require a very short analysis length of less than 4 seconds of audio to achieve these results. Both the identification rate and the small analysis window represent dramatic improvements over previous efforts in speech codec identification
A MODEL FOR PREDICTING THE PERFORMANCE OF IP VIDEOCONFERENCING
With the incorporation of free desktop videoconferencing (DVC) software on the
majority of the world's PCs, over the recent years, there has, inevitably, been considerable
interest in using DVC over the Internet. The growing popularity of DVC
increases the need for multimedia quality assessment. However, the task of predicting
the perceived multimedia quality over the Internet Protocol (IP) networks is
complicated by the fact that the audio and video streams are susceptible to unique
impairments due to the unpredictable nature of IP networks, different types of task
scenarios, different levels of complexity, and other related factors. To date, a standard
consensus to define the IP media Quality of Service (QoS) has yet to be implemented.
The thesis addresses this problem by investigating a new approach to
assess the quality of audio, video, and audiovisual overall as perceived in low cost
DVC systems.
The main aim of the thesis is to investigate current methods used to assess the perceived
IP media quality, and then propose a model which will predict the quality of
audiovisual experience from prevailing network parameters.
This thesis investigates the effects of various traffic conditions, such as, packet loss,
jitter, and delay and other factors that may influence end user acceptance, when low
cost DVC is used over the Internet. It also investigates the interaction effects between
the audio and video media, and the issues involving the lip sychronisation
error. The thesis provides the empirical evidence that the subjective mean opinion
score (MOS) of the perceived multimedia quality is unaffected by lip synchronisation
error in low cost DVC systems.
The data-gathering approach that is advocated in this thesis involves both field and
laboratory trials to enable the comparisons of results between classroom-based experiments
and real-world environments to be made, and to provide actual real-world
confirmation of the bench tests. The subjective test method was employed
since it has been proven to be more robust and suitable for the research studies, as
compared to objective testing techniques.
The MOS results, and the number of observations obtained, have enabled a set of
criteria to be established that can be used to determine the acceptable QoS for given
network conditions and task scenarios. Based upon these comprehensive findings,
the final contribution of the thesis is the proposal of a new adaptive architecture
method that is intended to enable the performance of IP based DVC of a particular
session to be predicted for a given network condition