162 research outputs found

    Considering Bluetooth's Subband Codec (SBC) for Wideband Speech and Audio on the Internet

    Get PDF
    The Bluetooth Special Interest Group (SIG) has standardized the subband coding (SBC) audio codec to connect headphones via wireless Bluetooth links. SBC compresses audio at high fidelity while having an ultra-low algorithm delay. To make SBC suitable for the Internet, we extend it by using a time and packet loss concealment (PLC) algorithm that is based on ITU's G.711 Appendix I. The design is novel in the aspect of the interface between codec and speech receiver. We developed a new approach on how to distribute the functionality of a speech receiver between codec and application. Our approach leads to easier implementations of high quality VoIP applications. We conducted subjective and objective listening tests of the audio quality of SBC and PLC in order to determine an optimal coding mode and the trade-off between coding mode and packet loss rate. More precisely, we conducted MUSHRA listening tests for selected sample items. These tests results are then compared with the results of multiple objective assessment algorithms (ITU P.862 PESQ, ITU BS.1387-1 PEAQ, Creusere's algorithm). We found out that a combination of the PEAQ basic and advanced values best matches---after third order linear regression---the subjective MUSHRA results . The linear regression has coefficient of determination of R²=0.907². By comparison, our individual human ratings show a correlation of about R=0.9 compared to our averaged human rating results. Using the combination of both PEAQ algorithms, we calculate hundred thousands of objective audio quality ratings varying audio content and algorithmic parameters of SBC and PLC. The results show which set of parameters value are best suitable for a bandwidth and delay constrained link. The transmission quality of SBC is enhanced significantly by selecting optimal encoding parameters as compared to the default parameter sets given in the standard. Finally, we present preliminary objective tests results on the comparison of the audio codecs SBC, CELT, APT-X and ULD coding speech and audio transmission. They all allow a mono and stereo transmission of music at ultra-low coding delays (<10ms), which is especially useful for distributed ensemble performances over the Internet

    Comparison of Wideband Earpiece Integrations in Mobile Phone

    Get PDF
    Perinteisesti puhelinverkoissa välitettävä puhe on ollut kapeakaistaista, kaistan ollessa 300 - 3400 Hz. Voidaan kuitenkin olettaa, että laajakaistaiset puhepalvelut tulevat saamaan markkinoilla enemmän jalansijaa tulevina vuosina. Tässä lopputyössä esitellään puheenkoodauksen perusteet laajakaistaisen adaptiivisen moninopeuspuhekoodekin (AMR-WB) kanssa. Laajakaistainen puhekoodekki laajentaa puhekaistan 50-7000 Hz käyttäen 16 kHz näytetaajuutta. Käytännössä laajempi kaista tarkoittaa parannuksia puheen ymmärrettävyyteen ja tekee siitä luonnollisemman ja mukavamman kuuloista. Tämän lopputyön päätavoite on vertailla kahden eri laajakaistaisen matkapuhelinkuulokkeen integrointia. Kysymys kuuluu, kuinka paljon käyttäjä hyötyy isommasta kuulokkeesta matkapuhelimessa? Kuulokkeiden suorituskyvyn selvittämiseksi niille tehtiin objektiivisia mittauksia vapaakentässä. Mittauksia tehtiin myös puhelimelle pää- ja torsosimulaattorissa (HATS) johdottamalla kuuloke suoraan vahvistimelle, sekä lisäksi puhelun ollessa aktiivisena GSM ja WCDMA verkoissa. Objektiiviset mittaukset osoittivat kahden eri integroinnin väliset erot kuulokkeiden taajuusvasteessa ja särössä erityisesti matalilla taajuuksilla. Lopuksi tehtiin kuuntelukoe tarkoituksena selvittää erottaako loppukäyttäjä pienemmän ja isomman kuulokkeen välistä eroa käyttäen kapeakaistaisia ja laajakaistaisia puhelinääninäytteitä. Kuuntelukokeen tuloksien pohjalta voidaan sanoa, että käyttäjä erottaa kahden eri integroinnin erot ja miespuhuja hyötyy naispuhujaa enemmän isommasta kuulokkeesta laajakaistaisella puhekoodekilla.The speech in telecommunication networks has been traditionally narrowband ranging from 300 Hz to 3400 Hz. It can be expected that wideband speech call services will increase their foothold in the markets during the coming years. In this thesis speech coding basics with adaptive multirate wideband (AMR-WB) are introduced. The wideband codec widens the speech band to new range from 50 Hz to 7000 Hz using 16 kHz sampling frequency. In practice the wider band means improvements to speech intelligibility and makes it more natural and comfortable to listen to. The main focus of this thesis work is to compare two different wideband earpiece integrations. The question is how much the end-user will benefit from using a larger earpiece in a mobile phone? To find out speaker performance, objective measurements in free field were done for the earpiece modules. Measurements were performed also for the phone on head and torso simulator (HATS) by wiring the earpieces directly to a power amplifier and with over the air on GSM and WCDMA networks. The results of objective measurements showed differences between the earpiece integrations especially on low frequencies in frequency response and distortion. Finally the subjective listening test is done for comparison to see if the end-user notices the difference between smaller and larger earpiece integrations using narrowband and wideband speech samples. Based on these subjective test results it can be said that the user can differentiate between two different integrations and that a male speaker benefits more from a larger earpiece than a female speaker

    Perceptual techniques in audio quality assessment

    Get PDF

    Quantification of audio quality loss after wireless transfer

    Get PDF
    The report describes a quality measurement for audio, both the theoretical background and implementation. It begins by describing the unlicensed methods the implementation is based on, Segmental SNR, Frequency Weighted Segmental SNR, Log-Likelihood Ratio, Cepstral Distance and Weighted Slope Spectral distance, and the commercial methods used as reference, PEAQ and PESQ. It also mentions the problems present in wireless transfer and the concept of sound quality assessment. It concludes by describing the suggested analysis method and implemented software together with the results when compared to PEAQ and PESQ.When talking on the phone, how do you know if the sound quality is good or bad? How do you know if it is better or worse than your last phone call? Although the perception of sound varies from person to person, only humans can truly determine sound quality. However, companies wants to ensure the quality of their product before releasing it, and therefore need an easier way to evaluate without humans, since human testing is expensive, time consuming and cannot be guaranteed to be consistent

    "Can you hear me now?":Automatic assessment of background noise intrusiveness and speech intelligibility in telecommunications

    Get PDF
    This thesis deals with signal-based methods that predict how listeners perceive speech quality in telecommunications. Such tools, called objective quality measures, are of great interest in the telecommunications industry to evaluate how new or deployed systems affect the end-user quality of experience. Two widely used measures, ITU-T Recommendations P.862 âPESQâ and P.863 âPOLQAâ, predict the overall listening quality of a speech signal as it would be rated by an average listener, but do not provide further insight into the composition of that score. This is in contrast to modern telecommunication systems, in which components such as noise reduction or speech coding process speech and non-speech signal parts differently. Therefore, there has been a growing interest for objective measures that assess different quality features of speech signals, allowing for a more nuanced analysis of how these components affect quality. In this context, the present thesis addresses the objective assessment of two quality features: background noise intrusiveness and speech intelligibility. The perception of background noise is investigated with newly collected datasets, including signals that go beyond the traditional telephone bandwidth, as well as Lombard (effortful) speech. We analyze listener scores for noise intrusiveness, and their relation to scores for perceived speech distortion and overall quality. We then propose a novel objective measure of noise intrusiveness that uses a sparse representation of noise as a model of high-level auditory coding. The proposed approach is shown to yield results that highly correlate with listener scores, without requiring training data. With respect to speech intelligibility, we focus on the case where the signal is degraded by strong background noises or very low bit-rate coding. Considering that listeners use prior linguistic knowledge in assessing intelligibility, we propose an objective measure that works at the phoneme level and performs a comparison of phoneme class-conditional probability estimations. The proposed approach is evaluated on a large corpus of recordings from public safety communication systems that use low bit-rate coding, and further extended to the assessment of synthetic speech, showing its applicability to a large range of distortion types. The effectiveness of both measures is evaluated with standardized performance metrics, using corpora that follow established recommendations for subjective listening tests

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    ADAPTIVE SPEECH QUALITY IN VOICE-OVER-IP COMMUNICATIONS

    Get PDF
    The quality of VoIP communication relies significantly on the network that transports the voice packets because this network does not usually guarantee the available bandwidth, delay, and loss that are critical for real-time voice traffic. The solution proposed here is to manage the voice-over-IP stream dynamically, changing parameters as needed to assure quality. The main objective of this dissertation is to develop an adaptive speech encoding system that can be applied to conventional (telephony-grade) and wideband voice communications. This comprehensive study includes the investigation and development of three key components of the system. First, to manage VoIP quality dynamically, a tool is needed to measure real-time changes in quality. The E-model, which exists for narrowband communication, is extended to a single computational technique that measures speech quality for narrowband and wideband VoIP codecs. This part of the dissertation also develops important theoretical work in the area of wideband telephony. The second system component is a variable speech-encoding algorithm. Although VoIP performance is affected by multiple codecs and network-based factors, only three factors can be managed dynamically: voice payload size, speech compression and jitter buffer management. Using an existing adaptive jitter-buffer algorithm, voice packet-size and compression variation are studied as they affect speech quality under different network conditions. This study explains the relationships among multiple parameters as they affect speech transmission and its resulting quality. Then, based on these two components, the third system component is a novel adaptive-rate control algorithm that establishes the interaction between a VoIP sender and receiver, and manages voice quality in real-time. Simulations demonstrate that the system provides better average voice quality than traditional VoIP

    VoIP Quality Assessment Technologies

    Get PDF

    Characterisation of noisy speech channels in 2G and 3G mobile networks

    Get PDF
    As the wireless cellular market reaches competitive levels never seen before, network operators need to focus on maintaining Quality of Service (QoS) a main priority if they wish to attract new subscribers while keeping existing customers satisfied. Speech Quality as perceived by the end user is one major example of a characteristic in constant need of maintenance and improvement. It is in this topic that this Master Thesis project fits in. Making use of an intrusive method of speech quality evaluation, as a means to further study and characterize the performance of speech codecs in second-generation (2G) and third-generation (3G) technologies. Trying to find further correlation between codecs with similar bit rates, along with the exploration of certain transmission parameters which may aid in the assessment of speech quality. Due to some limitations concerning the audio analyzer equipment that was to be employed, a different system for recording the test samples was sought out. Although the new designed system is not standard, after extensive testing and optimization of the system's parameters, final results were found reliable and satisfactory. Tests include a set of high and low bit rate codecs for both 2G and 3G, where values were compared and analysed, leading to the outcome that 3G speech codecs perform better, under the approximately same conditions, when compared with 2G. Reinforcing the idea that 3G is, with no doubt, the best choice if the costumer looks for the best possible listening speech quality. Regarding the transmission parameters chosen for the experiment, the Receiver Quality (RxQual) and Received Energy per Chip to the Power Density Ratio (Ec/N0), these were subject to speech quality correlation tests. Final results of RxQual were compared to those of prior studies from different researchers and, are considered to be of important relevance. Leading to the confirmation of RxQual as a reliable indicator of speech quality. As for Ec/N0, it is not possible to state it as a speech quality indicator however, it shows clear thresholds for which the MOS values decrease significantly. The studied transmission parameters show that they can be used not only for network management purposes but, at the same time, give an expected idea to the communications engineer (or technician) of the end-to-end speech quality consequences. With the conclusion of the work new ideas for future studies come to mind. Considering that the fourth-generation (4G) cellular technologies are now beginning to take an important place in the global market, as the first all-IP network structure, it seems of great relevance that 4G speech quality should be subject of evaluation. Comparing it to 3G, not only in narrowband but also adding wideband scenarios with the most recent standard objective method of speech quality assessment, POLQA. Also, new data found on Ec/N0 tests, justifies further research studies with the intention of validating the assumptions made in this work.Com o mercado das redes móveis a atingir níveis de competitividade nunca antes vistos, existe a crescente necessidade por parte dos operadores de rede em focar-se na Qualidade de Serviço (QoS) como principal prioridade, no sentido de atrair novos clientes ao mesmo tempo que asseguram a satisfação dos seus actuais assinantes. A percepção da Qualidade de Voz, por parte do utilizador, é apenas um exemplo de uma característica de QoS em constante necessidade de manutenção e melhoramento. Sendo nesta temática em que se insere a Tese de Mestrado. Aplicando um método intrusivo de avaliação de qualidade de voz, como meio para um estudo mais aprofundado e, ao mesmo tempo, caracterizando o desempenho dos codecs de voz para as tecnologias de segunda-geração (2G) e terceira-geração (3G). Investigando nova informação que possa ser retirada da correlação entre codecs com bit rates semelhantes, juntamente com a exploração de determinados 'parâmetros de transmissão os quais podem auxiliar na avaliação da qualidade de voz. Devido a algumas limitações ligadas ao analisador de áudio (requisito neste tipo de aplicações), existiu a necessidade de procurar um sistema distinto para gravação das amostras de teste. Embora o sistema escolhido não seja padronizado para este tipo de ensaios, após vários testes e consequente optimização dos parâmetros do sistema, os resultados finais consideram-se credíveis e satisfatórios. Os testes efectuados incluem um conjunto de codecs de elevado e baixo bit rate, onde a comparação e análise dos resultados levam a concluir que codecs de voz 3G têm melhor desempenho, sob aproximadamente as mesmas condições, comparativamente com os 2G. Reforçando a ideia generalizada que 3G é, sem dúvida, a melhor escolha se o utilizador procura uma solução superior a nível de qualidade de voz. No que diz respeito aos parâmetros de transmissão escolhidos para a experiência, RxQual (Qualidade do sinal Recebido pela estacão móvel) e Ec/N0 (razão entre Energia por chip e a Densidade Espectral de Potência), estes foram sujeitos a testes de correlação com a qualidade de voz. Os resultados de RxQual foram sujeitos a comparação com estudos prévios de outros investigadores, confirmando este parâmetro como um indicador de qualidade de voz bastante fiável. Quanto a Ec/N0, não é possível declará-lo como um indicador de qualidade de voz, no entanto, este demonstra limites claros para os quais os valores de Mean Opinion Score (MOS) decrescem significativamente. Os parâmetros de transmissão estudados demonstram não só que podem ser utilizados com objectivos de gestão de rede mas como também podem fornecer, ao engenheiro (ou técnico), informação relativa ao impacto que poderá existir na qualidade de voz. Com a finalização deste trabalho é possível constatar que novos estudos devem ser efectuados. Considerando que a tecnologia de quarta-geração (4G) começa agora a dar os seus primeiros passos no mercado das redes móveis, como a primeira com arquitectura de rede totalmente orientada para IP, parece de grande importância que esta tecnologia seja sujeita a avaliação. Comparando-a com 3G, não só para banda-estreita (300 a 3400 Hz) como também para cenários de banda-larga (50 a 7000Hz), aplicando o mais recente método normalizado de avaliação de qualidade de voz, o POLQA. Por fim, também se verifica como pertinente uma continuação do estudo relativo a Ec/N0 a fim de validar as ilações retiradas neste trabalho
    corecore