Search CORE

129 research outputs found

"Can you hear me now?":Automatic assessment of background noise intrusiveness and speech intelligibility in telecommunications

Author: Ullmann Raphaël Marc
Publication venue: Lausanne, EPFL
Publication date: 02/06/2016
Field of study

This thesis deals with signal-based methods that predict how listeners perceive speech quality in telecommunications. Such tools, called objective quality measures, are of great interest in the telecommunications industry to evaluate how new or deployed systems affect the end-user quality of experience. Two widely used measures, ITU-T Recommendations P.862 âPESQâ and P.863 âPOLQAâ, predict the overall listening quality of a speech signal as it would be rated by an average listener, but do not provide further insight into the composition of that score. This is in contrast to modern telecommunication systems, in which components such as noise reduction or speech coding process speech and non-speech signal parts differently. Therefore, there has been a growing interest for objective measures that assess different quality features of speech signals, allowing for a more nuanced analysis of how these components affect quality. In this context, the present thesis addresses the objective assessment of two quality features: background noise intrusiveness and speech intelligibility. The perception of background noise is investigated with newly collected datasets, including signals that go beyond the traditional telephone bandwidth, as well as Lombard (effortful) speech. We analyze listener scores for noise intrusiveness, and their relation to scores for perceived speech distortion and overall quality. We then propose a novel objective measure of noise intrusiveness that uses a sparse representation of noise as a model of high-level auditory coding. The proposed approach is shown to yield results that highly correlate with listener scores, without requiring training data. With respect to speech intelligibility, we focus on the case where the signal is degraded by strong background noises or very low bit-rate coding. Considering that listeners use prior linguistic knowledge in assessing intelligibility, we propose an objective measure that works at the phoneme level and performs a comparison of phoneme class-conditional probability estimations. The proposed approach is evaluated on a large corpus of recordings from public safety communication systems that use low bit-rate coding, and further extended to the assessment of synthetic speech, showing its applicability to a large range of distortion types. The effectiveness of both measures is evaluated with standardized performance metrics, using corpora that follow established recommendations for subjective listening tests

Infoscience - École polytechnique fédérale de Lausanne

Speech assessment and characterization for law enforcement applications

Author: Sharma Dushyant
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2013
Field of study

Speech signals acquired, transmitted or stored in non-ideal conditions are often degraded by one or more effects including, for example, additive noise. These degradations alter the signal properties in a manner that deteriorates the intelligibility or quality of the speech signal. In the law enforcement context such degradations are commonplace due to the limitations in the audio collection methodology, which is often required to be covert. In severe degradation conditions, the acquired signal may become unintelligible, losing its value in an investigation and in less severe conditions, a loss in signal quality may be encountered, which can lead to higher transcription time and cost. This thesis proposes a non-intrusive speech assessment framework from which algorithms for speech quality and intelligibility assessment are derived, to guide the collection and transcription of law enforcement audio. These methods are trained on a large database labelled using intrusive techniques (whose performance is verified with subjective scores) and shown to perform favorably when compared with existing non-intrusive techniques. Additionally, a non-intrusive CODEC identification and verification algorithm is developed which can identify a CODEC with an accuracy of 96.8 % and detect the presence of a CODEC with an accuracy higher than 97 % in the presence of additive noise. Finally, the speech description taxonomy framework is developed, with the aim of characterizing various aspects of a degraded speech signal, including the mechanism that results in a signal with particular characteristics, the vocabulary that can be used to describe those degradations and the measurable signal properties that can characterize the degradations. The taxonomy is implemented as a relational database that facilitates the modeling of the relationships between various attributes of a signal and promises to be a useful tool for training and guiding audio analysts

Spiral - Imperial College Digital Repository

Perspectives on panoramic photography

Author: Hasler David
Publication venue: Lausanne, EPFL
Publication date: 16/03/2005
Field of study

Digital imaging brings a new set of possibilities to photography. For example, little pictures can be assembled to form a large panorama, and digital cameras are trying to mimic the human visual system to produce better pictures. This manuscript aims at developing the algorithms required to stitch a set of pictures together to obtain a bigger and better image. This thesis explores three important topics of panoramic photography: The alignment of images, the matching of the colours, and the rendering of the resulting panorama. In addition, one chapter is devoted to 3D and constrained estimation. Aligning pictures can be difficult when the scene changes while taking the photographs. A method is proposed to model these changes —or outliers— that appear in image pairs, by computing the outlier distribution from the image histograms and handling the image-to-image correspondence problem as a mixture of inliers versus outliers. Compared to the standard methods, this approach uses the information contained in the image in a better way, and leads to a more reliable result. Digital cameras aim at reproducing the adaptation capabilities of the human eye in capturing the colours of a scene. As a consequence, there is often a large colour mismatch between two pictures. This work exposes a novel way of correcting for colour mismatches by modelling the transformation introduced by the camera, and reversing it to get consistent colours. Finally, this manuscript proposes a method to render high dynamic range images that contain very bright as well as very dark regions. To reproduce this kind of pictures the contrast has to be reduced in order to match the maximum contrast displayable on a screen or on paper. This last method, which is based on a complex model of the human visual system, reduces the contrast of the image while maintaining the little details visible the scene

Infoscience - École polytechnique fédérale de Lausanne

Journal of Telecommunications and Information Technology, 2001, nr 3

Author
Publication venue: Instytut Łączności - Państwowy Instytut Badawczy, Warszawa
Publication date: 01/01/2001
Field of study

kwartalni

Biblioteka Cyfrowa Instytutu Łączności / National Institute of Telecomunications: Digital Library

Contribution to quality of user experience provision over wireless networks

Author: Sánchez Iborra Ramón Jesús
Publication venue: 'Universidad Politecnica de Cartagena'
Publication date: 01/01/2015
Field of study

The widespread expansion of wireless networks has brought new attractive possibilities to end users. In addition to the mobility capabilities provided by unwired devices, it is worth remarking the easy configuration process that a user has to follow to gain connectivity through a wireless network. Furthermore, the increasing bandwidth provided by the IEEE 802.11 family has made possible accessing to high-demanding services such as multimedia communications. Multimedia traffic has unique characteristics that make it greatly vulnerable against network impairments, such as packet losses, delay, or jitter. Voice over IP (VoIP) communications, video-conference, video-streaming, etc., are examples of these high-demanding services that need to meet very strict requirements in order to be served with acceptable levels of quality. Accomplishing these tough requirements will become extremely important during the next years, taking into account that consumer video traffic will be the predominant traffic in the Internet during the next years. In wired systems, these requirements are achieved by using Quality of Service (QoS) techniques, such as Differentiated Services (DiffServ), traffic engineering, etc. However, employing these methodologies in wireless networks is not that simple as many other factors impact on the quality of the provided service, e.g., fading, interferences, etc. Focusing on the IEEE 802.11g standard, which is the most extended technology for Wireless Local Area Networks (WLANs), it defines two different architecture schemes. On one hand, the infrastructure mode consists of a central point, which manages the network, assuming network controlling tasks such as IP assignment, routing, accessing security, etc. The rest of the nodes composing the network act as hosts, i.e., they send and receive traffic through the central point. On the other hand, the IEEE 802.11 ad-hoc configuration mode is less extended than the infrastructure one. Under this scheme, there is not a central point in the network, but all the nodes composing the network assume both host and router roles, which permits the quick deployment of a network without a pre-existent infrastructure. This type of networks, so called Mobile Ad-hoc NETworks (MANETs), presents interesting characteristics for situations when the fast deployment of a communication system is needed, e.g., tactics networks, disaster events, or temporary networks. The benefits provided by MANETs are varied, including high mobility possibilities provided to the nodes, network coverage extension, or network reliability avoiding single points of failure. The dynamic nature of these networks makes the nodes to react to topology changes as fast as possible. Moreover, as aforementioned, the transmission of multimedia traffic entails real-time constraints, necessary to provide these services with acceptable levels of quality. For those reasons, efficient routing protocols are needed, capable of providing enough reliability to the network and with the minimum impact to the quality of the service flowing through the nodes. Regarding quality measurements, the current trend is estimating what the end user actually perceives when consuming the service. This paradigm is called Quality of user Experience (QoE) and differs from the traditional Quality of Service (QoS) approach in the human perspective given to quality estimations. In order to measure the subjective opinion that a user has about a given service, different approaches can be taken. The most accurate methodology is performing subjective tests in which a panel of human testers rates the quality of the service under evaluation. This approach returns a quality score, so-called Mean Opinion Score (MOS), for the considered service in a scale 1 - 5. This methodology presents several drawbacks such as its high expenses and the impossibility of performing tests at real time. For those reasons, several mathematical models have been presented in order to provide an estimation of the QoE (MOS) reached by different multimedia services In this thesis, the focus is on evaluating and understanding the multimedia-content transmission-process in wireless networks from a QoE perspective. To this end, firstly, the QoE paradigm is explored aiming at understanding how to evaluate the quality of a given multimedia service. Then, the influence of the impairments introduced by the wireless transmission channel on the multimedia communications is analyzed. Besides, the functioning of different WLAN schemes in order to test their suitability to support highly demanding traffic such as the multimedia transmission is evaluated. Finally, as the main contribution of this thesis, new mechanisms or strategies to improve the quality of multimedia services distributed over IEEE 802.11 networks are presented. Concretely, the distribution of multimedia services over ad-hoc networks is deeply studied. Thus, a novel opportunistic routing protocol, so-called JOKER (auto-adJustable Opportunistic acK/timEr-based Routing) is presented. This proposal permits better support to multimedia services while reducing the energy consumption in comparison with the standard ad-hoc routing protocols.Universidad Politécnica de CartagenaPrograma Oficial de Doctorado en Tecnologías de la Información y Comunicacione

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Digital de la Universidad Politécnica de Cartagena

A multi-hypothesis approach for range-only simultaneous localization and mapping with aerial robots

Author: Fabresse Felipe Ramón
Publication venue
Publication date: 21/07/2017
Field of study

Los sistemas de Range-only SLAM (o RO-SLAM) tienen como objetivo la construcción de un mapa formado por la posición de un conjunto de sensores de distancia y la localización simultánea del robot con respecto a dicho mapa, utilizando únicamente para ello medidas de distancia. Los sensores de distancia son dispositivos capaces de medir la distancia relativa entre cada par de dispositivos. Estos sensores son especialmente interesantes para su applicación a vehículos aéreos debido a su reducido tamaño y peso. Además, estos dispositivos son capaces de operar en interiores o zonas con carencia de señal GPS y no requieren de una línea de visión directa entre cada par de dispositivos a diferencia de otros sensores como cámaras o sensores laser, permitiendo así obtener una lectura de datos continuada sin oclusiones. Sin embargo, estos sensores presentan un modelo de observación no lineal con una deficiencia de rango debido a la carencia de información de orientación relativa entre cada par de sensores. Además, cuando se incrementa la dimensionalidad del problema de 2D a 3D para su aplicación a vehículos aéreos, el número de variables ocultas del modelo aumenta haciendo el problema más costoso computacionalmente especialmente ante implementaciones multi-hipótesis. Esta tesis estudia y propone diferentes métodos que permitan la aplicación eficiente de estos sistemas RO-SLAM con vehículos terrestres o aéreos en entornos reales. Para ello se estudia la escalabilidad del sistema en relación al número de variables ocultas y el número de dispositivos a posicionar en el mapa. A diferencia de otros métodos descritos en la literatura de RO-SLAM, los algoritmos propuestos en esta tesis tienen en cuenta las correlaciones existentes entre cada par de dispositivos especialmente para la integración de medidas estÃa˛ticas entre pares de sensores del mapa. Además, esta tesis estudia el ruido y las medidas espúreas que puedan generar los sensores de distancia para mejorar la robustez de los algoritmos propuestos con técnicas de detección y filtración. También se proponen métodos de integración de medidas de otros sensores como cámaras, altímetros o GPS para refinar las estimaciones realizadas por el sistema RO-SLAM. Otros capítulos estudian y proponen técnicas para la integración de los algoritmos RO-SLAM presentados a sistemas con múltiples robots, así como el uso de técnicas de percepción activa que permitan reducir la incertidumbre del sistema ante trayectorias con carencia de trilateración entre el robot y los sensores de destancia estáticos del mapa. Todos los métodos propuestos han sido validados mediante simulaciones y experimentos con sistemas reales detallados en esta tesis. Además, todos los sistemas software implementados, así como los conjuntos de datos registrados durante la experimentación han sido publicados y documentados para su uso en la comunidad científica

idUS. Depósito de Investigación Universidad de Sevilla

Spatial processing of conspecific signals in weakly electric fish: from sensory image to neural population coding

Author: Milam Oak Everette
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2023
Field of study

In this dissertation, I examine how an animal’s nervous system encodes spatially realistic conspecific signals in their environment and how the encoding mechanisms support behavioral sensitivity. I begin by modeling changes in the electrosensory signals exchanged by weakly electric fish in a social context. During this behavior, I estimate how the spatial structure of conspecific stimuli influences sensory responses at the electroreceptive periphery. I then quantify how space is represented in the hindbrain, specifically in the primary sensory area called the electrosensory lateral line lobe. I show that behavioral sensitivity is influenced by the heterogeneous properties of the pyramidal cell population. I further demonstrate that this heterogeneity serves to start segregating spatial and temporal information early in the sensory pathway. Lastly, I characterize the accuracy of spatial coding in this network and predict the role of network elements, such as correlated noise and feedback, in shaping the spatial information. My research provides a comprehensive understanding of spatial coding in the first stages of sensory processing in this system and allows us to better understand how network dynamics shape coding accuracy

The Research Repository @ WVU (West Virginia University)