175 research outputs found
Considering Bluetooth's Subband Codec (SBC) for Wideband Speech and Audio on the Internet
The Bluetooth Special Interest Group (SIG) has standardized the subband coding (SBC) audio codec to connect headphones via wireless Bluetooth links. SBC compresses audio at high fidelity while having an ultra-low algorithm delay. To make SBC suitable for the Internet, we extend it by using a time and packet loss concealment (PLC) algorithm that is based on ITU's G.711 Appendix I. The design is novel in the aspect of the interface between codec and speech receiver. We developed a new approach on how to distribute the functionality of a speech receiver between codec and application. Our approach leads to easier implementations of high quality VoIP applications.
We conducted subjective and objective listening tests of the audio quality of SBC and PLC in order to determine an optimal coding mode and the trade-off between coding mode and packet loss rate. More precisely, we conducted MUSHRA listening tests for selected sample items. These tests results are then compared with the results of multiple objective assessment algorithms (ITU P.862 PESQ, ITU BS.1387-1 PEAQ, Creusere's algorithm). We found out that a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results . The linear regression has coefficient of determination of RÂČ=0.907ÂČ. By comparison, our individual human ratings show a correlation of about R=0.9 compared to our averaged human rating results.
Using the combination of both PEAQ algorithms, we calculate hundred thousands of objective audio quality ratings varying audio content and algorithmic parameters of SBC and PLC. The results show which set of parameters value are best suitable for a bandwidth and delay constrained link. The transmission quality of SBC is enhanced significantly by selecting optimal encoding parameters as compared to the default parameter sets given in the standard.
Finally, we present preliminary objective tests results on the comparison of the audio codecs SBC, CELT, APT-X and ULD coding speech and audio transmission. They all allow a mono and stereo transmission of music at ultra-low coding delays (<10ms), which is especially useful for distributed ensemble performances over the Internet
Scalable Speech Coding for IP Networks
The emergence of Voice over Internet Protocol (VoIP) has posed new challenges to the development of speech codecs. The key issue of transporting real-time voice packet over IP networks is the lack of guarantee for reasonable speech quality due to packet delay or loss.
Most of the widely used narrowband codecs depend on the Code Excited Linear Prediction (CELP) coding technique. The CELP technique utilizes the long-term prediction across the frame boundaries and therefore causes error propagation in the case of packet loss and need to transmit redundant information in order to mitigate the problem. The internet Low Bit-rate Codec (iLBC) employs the frame-independent coding and therefore inherently possesses high robustness to packet loss. However, the original iLBC lacks in some of the key features of speech codecs for IP networks: Rate flexibility, Scalability, and Wideband support.
This dissertation presents novel scalable narrowband and wideband speech codecs for IP networks using the frame independent coding scheme based on the iLBC. The rate flexibility is added to the iLBC by employing the discrete cosine transform (DCT) and iii the scalable algebraic vector quantization (AVQ) and by allocating different number of bits to the AVQ. The bit-rate scalability is obtained by adding the enhancement layer to the core layer of the multi-rate iLBC. The enhancement layer encodes the weighted iLBC coding error in the modified DCT (MDCT) domain. The proposed wideband codec employs the bandwidth extension technique to extend the capabilities of existing narrowband codecs to provide wideband coding functionality. The wavelet transform is also used to further enhance the performance of the proposed codec.
The performance evaluation results show that the proposed codec provides high robustness to packet loss and achieves equivalent or higher speech quality than state-of-the-art codecs under the clean channel condition
Energy-efficient wireless communication
In this chapter we present an energy-efficient highly adaptive network interface architecture and a novel data link layer protocol for wireless networks that provides Quality of Service (QoS) support for diverse traffic types. Due to the dynamic nature of wireless networks, adaptations in bandwidth scheduling and error control are necessary to achieve energy efficiency and an acceptable quality of service. In our approach we apply adaptability through all layers of the protocol stack, and provide feedback to the applications. In this way the applications can adapt the data streams, and the network protocols can adapt the communication parameters
Quality of media traffic over Lossy internet protocol networks: Measurement and improvement.
Voice over Internet Protocol (VoIP) is an active area of research in the world of
communication. The high revenue made by the telecommunication companies is a
motivation to develop solutions that transmit voice over other media rather than
the traditional, circuit switching network.
However, while IP networks can carry data traffic very well due to their besteffort
nature, they are not designed to carry real-time applications such as voice.
As such several degradations can happen to the speech signal before it reaches its
destination. Therefore, it is important for legal, commercial, and technical reasons
to measure the quality of VoIP applications accurately and non-intrusively.
Several methods were proposed to measure the speech quality: some of these
methods are subjective, others are intrusive-based while others are non-intrusive.
One of the non-intrusive methods for measuring the speech quality is the E-model
standardised by the International Telecommunication Union-Telecommunication Standardisation
Sector (ITU-T).
Although the E-model is a non-intrusive method for measuring the speech quality,
but it depends on the time-consuming, expensive and hard to conduct subjective
tests to calibrate its parameters, consequently it is applicable to a limited number
of conditions and speech coders. Also, it is less accurate than the intrusive methods
such as Perceptual Evaluation of Speech Quality (PESQ) because it does not consider
the contents of the received signal.
In this thesis an approach to extend the E-model based on PESQ is proposed.
Using this method the E-model can be extended to new network conditions and
applied to new speech coders without the need for the subjective tests. The modified
E-model calibrated using PESQ is compared with the E-model calibrated using
i
ii
subjective tests to prove its effectiveness.
During the above extension the relation between quality estimation using the
E-model and PESQ is investigated and a correction formula is proposed to correct
the deviation in speech quality estimation.
Another extension to the E-model to improve its accuracy in comparison with
the PESQ looks into the content of the degraded signal and classifies packet loss
into either Voiced or Unvoiced based on the received surrounding packets. The accuracy
of the proposed method is evaluated by comparing the estimation of the new
method that takes packet class into consideration with the measurement provided
by PESQ as a more accurate, intrusive method for measuring the speech quality.
The above two extensions for quality estimation of the E-model are combined
to offer a method for estimating the quality of VoIP applications accurately, nonintrusively
without the need for the time-consuming, expensive, and hard to conduct
subjective tests.
Finally, the applicability of the E-model or the modified E-model in measuring
the quality of services in Service Oriented Computing (SOC) is illustrated
A novel multimedia adaptation architecture and congestion control mechanism designed for real-time interactive applications
PhDThe increasing use of interactive multimedia applications over the Internet has created a problem of congestion. This is because a majority of these applications do not respond to congestion indicators. This leads to resource starvation for responsive flows, and ultimately excessive delay and losses for all flows therefore loss of quality. This results in unfair sharing of network resources and increasing the risk of network âcongestion collapseâ.
Current Congestion Control Mechanisms such as âTCP-Friendly Rate Controlâ (TFRC) have been able to achieve âfair-shareâ of network resource when competing with responsive flows such as TCP, but TFRCâs method of congestion response (i.e. to reduce Packet Rate) is not ideally matched for interactive multimedia applications which maintain a fixed Frame Rate. This mismatch of the two rates (Packet Rate and Frame Rate) leads to buffering of frames at the Sender Buffer resulting in delay and loss, and an unacceptable reduction of quality or complete loss of service for the end-user.
To address this issue, this thesis proposes a novel Congestion Control Mechanism which is referred to as âTCP-friendly rate control â Fine Grain Scalableâ (TFGS) for interactive multimedia applications.
This new approach allows multimedia frames (data) to be sent as soon as they are generated, so that the multimedia frames can reach the destination as quickly as possible, in order to provide an isochronous interactive service. This is done by maintaining the Packet Rate of the Congestion Control Mechanism (CCM) at a level equivalent to the Frame Rate of the Multimedia Encoder.The response to congestion is to truncate the Packet Size, hence reducing the overall bitrate of the multimedia stream. This functionality of the Congestion Control Mechanism is referred to as Packet Size Truncation (PST), and takes advantage of adaptive multimedia encoding, such as Fine Grain Scalable (FGS), where the multimedia frame is encoded in order of significance, Most to Least Significant Bits. The Multimedia Adaptation Manager (MAM) truncates the multimedia frame to the size indicated by the Packet Size Truncation function of the CCM, accurately mapping user demand to available network resource. Additionally Fine Grain Scalable encoding can offer scalability at byte level granularity, providing a true match to available network resources.
This approach has the benefits of achieving a âfair-shareâ of network resource when competing with responsive flows (as similar to TFRC CCM), but it also provides an isochronous service which is of crucial benefit to real-time interactive services. Furthermore, results illustrate that an increased number of interactive multimedia flows (such as voice) can be carried over congested networks whilst maintaining a quality level equivalent to that of a standard landline telephone. This is because the loss and delay arising from the buffering of frames at the Sender Buffer is completely removed. Packets sent maintain a fixed inter-packet-gap-spacing (IPGS). This results in a majority of packets arriving at the receiving end at tight time intervals. Hence, this avoids the need of using large Playout (de-jitter) Buffer sizes and adaptive Playout Buffer configurations. As a result this reduces delay, improves interactivity and Quality of Experience (QoE) of the multimedia application
Quality aspects of Internet telephony
Internet telephony has had a tremendous impact on how people communicate.
Many now maintain contact using some form of Internet telephony.
Therefore the motivation for this work has been to address the quality aspects
of real-world Internet telephony for both fixed and wireless telecommunication.
The focus has been on the quality aspects of voice communication,
since poor quality leads often to user dissatisfaction. The scope of the work
has been broad in order to address the main factors within IP-based voice
communication.
The first four chapters of this dissertation constitute the background
material. The first chapter outlines where Internet telephony is deployed
today. It also motivates the topics and techniques used in this research.
The second chapter provides the background on Internet telephony including
signalling, speech coding and voice Internetworking. The third chapter
focuses solely on quality measures for packetised voice systems and finally
the fourth chapter is devoted to the history of voice research.
The appendix of this dissertation constitutes the research contributions.
It includes an examination of the access network, focusing on how calls are
multiplexed in wired and wireless systems. Subsequently in the wireless
case, we consider how to handover calls from 802.11 networks to the cellular
infrastructure. We then consider the Internet backbone where most of our
work is devoted to measurements specifically for Internet telephony. The
applications of these measurements have been estimating telephony arrival
processes, measuring call quality, and quantifying the trend in Internet telephony
quality over several years. We also consider the end systems, since
they are responsible for reconstructing a voice stream given loss and delay
constraints. Finally we estimate voice quality using the ITU proposal PESQ
and the packet loss process.
The main contribution of this work is a systematic examination of Internet
telephony. We describe several methods to enable adaptable solutions
for maintaining consistent voice quality. We have also found that relatively
small technical changes can lead to substantial user quality improvements.
A second contribution of this work is a suite of software tools designed to
ascertain voice quality in IP networks. Some of these tools are in use within
commercial systems today
- âŠ