78 research outputs found

    A psychoacoustically motivated speech enhancement system

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (leaves 81-82).A new method is introduced to perform enhancement of speech degraded by acoustic noise using the psychoacoustic property of masking. The goal of this algorithm is to preserve the natural quality of the noise while keeping the speech perceptually intact. Distortion masking principles based on prior work of Gustafsson are used to derive a hybrid gain function comprising a function minimizing speech distortion and another minimizing noise distortion. The system is implemented in floating-point software and was tested against several existing algorithms. In a forced choice listening test, the new system was preferred over the Enhanced Variable Rate Codec (EVRC) noise suppression algorithm in 88% of the cases. Informal listening tests showed preferable speech quality than Gustafsson's algorithm. As a front end to a vocoder, the new system was preferred over the other two by all the test subjects. Ideas on future work in speech enhancement are also explored.by Siddhartan Govindasamy.M.Eng

    Smart Sound Control in Acoustic Sensor Networks: a Perceptual Perspective

    Full text link
    [ES] Los sistemas de audio han experimentado un gran desarrollo en los últimos años gracias al aumento de dispositivos con procesadores de alto rendimiento capaces de realizar un procesamiento cada vez más eficiente. Además, las comunicaciones inalámbricas permiten a los dispositivos de una red estar ubicados en diferentes lugares sin limitaciones físicas. La combinación de estas tecnologías ha dado lugar a la aparición de las redes de sensores acústicos (ASN). Una ASN está compuesta por nodos equipados con transductores de audio, como micrófonos o altavoces. En el caso de la monitorización acústica del campo, sólo es necesario incorporar sensores acústicos a los nodos ASN. Sin embargo, en el caso de las aplicaciones de control, los nodos deben interactuar con el campo acústico a través de altavoces. La ASN puede implementarse mediante dispositivos de bajo coste, como Raspberry Pi o dispositivos móviles, capaces de gestionar varios micrófonos y altavoces y de ofrecer una buena capacidad de cálculo. Además, estos dispositivos pueden comunicarse mediante conexiones inalámbricas, como Wi-Fi o Bluetooth. Por lo tanto, en esta tesis, se propone una ASN compuesta por dispositivos móviles conectados a altavoces inalámbricos mediante un enlace Bluetooth. Además, el problema de la sincronización entre los dispositivos de una ASN es uno de los principales retos a abordar, ya que el rendimiento del procesamiento de audio es muy sensible a la falta de sincronismo. Por lo tanto, también se lleva a cabo un análisis del problema de sincronización entre dispositivos conectados a altavoces inalámbricos en una ASN. En este sentido, una de las principales aportaciones es el análisis de la latencia de audio cuando los nodos acústicos de la ASN están formados por dispositivos móviles que se comunican altavoces mediante enlaces Bluetooth. Una segunda contribución significativa de esta tesis es la implementación de un método para sincronizar los diferentes dispositivos de una ASN, junto con un estudio de sus limitaciones. Por último, se ha introducido el método propuesto para implementar aplicaciones de zonas sonoras personales (PSZ). Por lo tanto, la implementación y el análisis del rendimiento de diferentes aplicaciones de audio sobre una ASN compuesta por dispositivos móviles y altavoces inalámbricos es también una contribución significativa en el área de las ASN. Cuando el entorno acústico afecta negativamente a la percepción de la señal de audio emitida por los altavoces de la ASN, se uti­lizan técnicas de ecualización para mejorar la percepción de la señal de audio. Para ello, en esta tesis se implementa un sistema de ecualización inteligente. Para ello, se emplean algoritmos psicoacústicos para implementar un procesamiento inteligente basado en el sis­tema auditivo humano capaz de adaptarse a los cambios del entorno. Por ello, otra contribución importante de esta tesis es el análisis del enmas­caramiento espectral entre dos sonidos complejos. Este análisis permitirá calcular el umbral de enmascaramiento de un sonido con más precisión que los métodos utilizados actualmente. Este método se utiliza para implementar una aplicación de ecualización perceptiva que pretende mejorar la percepción de la señal de audio en presencia de un ruido ambien­tal. Para ello, esta tesis propone dos algoritmos de ecualización diferentes: 1) la pre-ecualización de la señal de audio para que se perciba por encima del umbral de enmascaramiento del ruido ambiental y 2) diseñar un con­trol de ruido ambiental perceptivo en los sistemas de ecualización activa de ruido (ANE), para que el nivel de ruido ambiental percibido esté por debajo del umbral de enmascaramiento de la señal de audio. Por lo tanto, la ultima aportación de esta tesis es la implementación de una aplicación de ecualización perceptiva con los dos diferentes algorit­mos de ecualización embebidos y el análisis de su rendimiento a través del banco de pruebas realizado en el laboratorio GTAC-iTEAM.[CA] El sistemes de so han experimentat un gran desenvolupament en els últims anys gràcies a l'augment de dispositius amb processadors d'alt rendiment capaços de realitzar un processament d'àudio cada vegada més eficient. D'altra banda, l'expansió de les comunicacions inalàmbriques ha permès implementar xarxes en les quals els dispositius poden estar situats a difer­ents llocs sense limitacions físiques. La combinació d'aquestes tecnologies ha donat lloc a l'aparició de les xarxes de sensors acústics (ASN). Una ASN està composta per nodes equipats amb transductors d'àudio, com micr`ofons o altaveus. En el cas del monitoratge del camp acústic, només cal incorporar sensors acústics als nodes de l'ASN. No obstant això, en el cas de les aplicacions de control, els nodes han d'interactuar amb el camp acústic a través d'altaveus. Una ASN pot implementar-se mitjant¿cant dispositius de baix cost, com ara Raspberry Pi o dispositius mòbils, capaços de gestionar di­versos micròfons i altaveus i d'oferir una bona capacitat computacional. A més, aquests dispositius poden comunicar-se a través de connexions inalàmbriques, com Wi-Fi o Bluetooth. Per això, en aquesta tesi es proposa una ASN composta per dispositius mòbils connectats a altaveus inalàmbrics a través d'un enllaç Bluetooth. El problema de la sincronització entre els dispositius d'una ASN és un dels principals reptes a abordar ja que el rendiment del processament d'àudio és molt sensible a la falta de sincronisme. Per tant, també es duu a terme una anàlisi profunda del problema de la sincronització entre els dispositius comercials connectats als altaveus inalàmbrics en una ASN. En aquest sentit, una de les principals contribucions és l'anàlisi de la latència d'àudio quan els nodes acústics en l'ASN estan compostos per dispositius mòbils que es comuniquen amb els altaveus corresponents mitjançant enllaços Bluetooth. Una segona contribuciò sig­nificativa d'aquesta tesi és la implementació d'un mètode per sincronitzar els diferents dispositius d'una ASN, juntament amb un estudi de les seves limitacions. Finalment, s'ha introduït el mètode proposat per implemen­tar aplicacions de zones de so personal. Per tant, la implementació i l'anàlisi del rendiment de diferents aplicacions d'àudio sobre una ASN composta per dispositius mòbils i al­taveus inalàmbrics és també una contribució significativa a l'àrea de les ASN. Quan l'entorn acústic afecta negativament a la percepció del senyal d'àudio emesa pels altaveus de l'ASN, es fan servir tècniques d'equalització per a millorar la percepció del senyal d'àudio. En consequència, en aquesta tesi s'implementa un sistema d'equalització intel·ligent. Per això, s'utilitzen algoritmes psicoacústics per implementar un processament intel·ligent basat en el sistema audi­tiu humà capaç d'adaptar-se als canvis de l'entorn. Per aquest motiu, una altra contribució important d'aquesta tesi és l'anàlisi de l'emmascarament espectral entre dos sons complexos. Aquesta anàlisi permetrà calcular el llindar d'emmascarament d'un so sobre amb més precisió que els mètodes utilitzats actualment. Aquest mètode s'utilitza per a imple­mentar una aplicació d'equalització perceptual que pretén millorar la per­cepció del senyal d'àudio en presència d'un soroll ambiental. Per això, aquesta tesi proposa dos algoritmes d'equalització diferents: 1) la pree­qualització del senyal d'àudio perquè es percebi per damunt del llindar d'emmascarament del soroll ambiental i 2) dissenyar un control de soroll ambiental perceptiu en els sistemes d'equalització activa de soroll (ANE) de manera que el nivell de soroll ambiental percebut estiga per davall del llindar d'emmascarament del senyal d'àudio. Per tant, l'última aportació d'aquesta tesi és la implementació d'una aplicació d'equalització perceptiva amb els dos algoritmes d'equalització embeguts i l'anàlisi del seu rendiment a través del banc de proves realitzat al laboratori GTAC-iTEAM.[EN] Audio systems have been extensively developed in recent years thanks to the increase of devices with high-performance processors able to per­form more efficient processing. In addition, wireless communications allow devices in a network to be located in different places without physical limitations. The combination of these technologies has led to the emergence of Acoustic Sensor Networks (ASN). An ASN is com­posed of nodes equipped with audio transducers, such as microphones or speakers. In the case of acoustic field monitoring, only acoustic sensors need to be incorporated into the ASN nodes. However, in the case of control applications, the nodes must interact with the acoustic field through loudspeakers. ASN can be implemented through low-cost devices, such as Rasp­berry Pi or mobile devices, capable of managing multiple mi­crophones and loudspeakers and offering good computational capacity. In addition, these devices can communicate through wireless connections, such as Wi-Fi or Bluetooth. Therefore, in this dissertation, an ASN composed of mobile devices connected to wireless speak­ers through a Bluetooth link is proposed. Additionally, the problem of syn­chronization between the devices in an ASN is one of the main challenges to be addressed since the audio processing performance is very sensitive to the lack of synchronism. Therefore, an analysis of the synchroniza­tion problem between devices connected to wireless speakers in an ASN is also carried out. In this regard, one of the main contributions is the analysis of the audio latency of mobile devices when the acoustic nodes in the ASN are comprised of mobile devices communicating with the corresponding loudspeakers through Bluetooth links. A second significant contribution of this dissertation is the implementation of a method to synchronize the different devices of an ASN, together with a study of its limitations. Finally, the proposed method has been introduced in order to implement personal sound zones (PSZ) applications. Therefore, the imple­mentation and analysis of the performance of different audio applications over an ASN composed of mobile devices and wireless speakers is also a significant contribution in the area of ASN. In cases where the acoustic environment negatively affects the percep­tion of the audio signal emitted by the ASN loudspeakers, equalization techniques are used with the objective of enhancing the perception thresh­old of the audio signal. For this purpose, a smart equalization system is implemented in this dissertation. In this regard, psychoacous­tic algorithms are employed to implement a smart processing based on the human hearing system capable of adapting to changes in the envi­ronment. Therefore, another important contribution of this thesis focuses on the analysis of the spectral masking between two complex sounds. This analysis will allow to calculate the masking threshold of one sound over the other in a more accurate way than the currently used methods. This method is used to implement a perceptual equalization application that aims to improve the perception threshold of the audio signal in presence of ambient noise. To this end, this thesis proposes two different equalization algorithms: 1) pre-equalizing the audio signal so that it is perceived above the ambient noise masking threshold and 2) designing a perceptual control of ambient noise in active noise equalization (ANE) systems, so that the perceived ambient noise level is below the masking threshold of the audio signal. Therefore, the last contribution of this dissertation is the imple­mentation of a perceptual equalization application with the two different embedded equalization algorithms and the analysis of their performance through the testbed carried out in the GTAC-iTEAM laboratory.This work has received financial support of the following projects: • SSPRESING: Smart Sound Processing for the Digital Living (Reference: TEC2015-67387-C4-1-R. Entity: Ministerio de Economia y Empresa. Spain). • FPI: Ayudas para contratos predoctorales para la formación de doctores (Reference: BES-2016-077899. Entity: Agencia Estatal de Investigación. Spain). DANCE: Dynamic Acoustic Networks for Changing Environments (Reference: RTI2018-098085-B-C41-AR. Entity: Agencia Estatal de Investigación. Spain). • DNOISE: Distributed Network of Active Noise Equalizers for Multi-User Sound Control (Reference: H2020-FETOPEN-4-2016-2017. Entity: I+D Colaborativa competitiva. Comisión de las comunidades europea).Estreder Campos, J. (2022). Smart Sound Control in Acoustic Sensor Networks: a Perceptual Perspective [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/181597TESI

    Perception of isolated chords: Examining frequency of occurrence, instrumental timbre, acoustic descriptors and musical training

    Get PDF
    This study investigated the perception of isolated chords using a combination of experimental manipulation and exploratory analysis. Twelve types of chord (five triads and seven tetrads) were presented in two instrumental timbres (piano and organ) to listeners who rated the chords for consonance, pleasantness, stability and relaxation. Listener ratings varied by chord, by timbre, and according to musical expertise, and revealed that musicians distinguished consonance from the other variables in a way that other listeners did not. To further explain the data, a principal component analysis and linear regression examined three potential predictors of the listener ratings. First, each chord’s frequency of occurrence was obtained by counting its appearances in selected works of music. Second, listeners rated their familiarity with the instrumental timbre in which the chord was played. Third, chords were described using a set of acoustic features derived using the Timbre Toolbox and MIR Toolbox. Results of the study indicated that listeners’ ratings of both consonance and stability were influenced by the degree of musical training and knowledge of tonal hierarchy. Listeners’ ratings of pleasantness and relaxation, on the other hand, depended more on the instrumental timbre and other acoustic descriptions of the chord

    Scalable and perceptual audio compression

    Get PDF
    This thesis deals with scalable perceptual audio compression. Two scalable perceptual solutions as well as a scalable to lossless solution are proposed and investigated. One of the scalable perceptual solutions is built around sinusoidal modelling of the audio signal whilst the other is built on a transform coding paradigm. The scalable coders are shown to scale both in a waveform matching manner as well as a psychoacoustic manner. In order to measure the psychoacoustic scalability of the systems investigated in this thesis, the similarity between the original signal\u27s psychoacoustic parameters and that of the synthesized signal are compared. The psychoacoustic parameters used are loudness, sharpness, tonahty and roughness. This analysis technique is a novel method used in this thesis and it allows an insight into the perceptual distortion that has been introduced by any coder analyzed in this manner

    Intelligent Subgrouping of Multitrack Audio

    Get PDF
    Subgrouping facilitates the simultaneous manipulation of a number of audio tracks and is a central aspect of mix engineering. However, the decision process of subgrouping is a poorly documented technique. This research sheds light on this ubiquitous but poorly de ned mix practice, provides rules and constraints on how it should be approached as well as demonstrates its bene t to an automatic mixing system. I rst explored the relationship that subgrouping has with perceived mix quality by examining a number of mix projects. This was in order to decipher the actual process of creating subgroups and to see if any of the decisions made were intrinsically linked to mix quality. I found mix quality to be related to the number of subgroups and type of subgroup processing used. This subsequently led me to interviewing distinguished professionals in the audio engineering eld, with the intention of gaining a deeper understanding of the process. The outcome of these interviews and the previous analyses of mix projects allowed me to propose rules that could be used for real life mixing and automatic mixing. Some of the rules I established were used to research and develop a method for the automatic creation of subgroups using machine learning techniques. I also investigated the relationship between music production quality and human emotion. This was to see if music production quality had an emotional e ect on a particular type of listener. The results showed that the emotional impact of mixing only really mattered to those with critical listening skills. This result is important for automatic mixing systems in general, as it would imply that quality only really matters to a minority of people. I concluded my research on subgrouping by conducting an experiment to see if subgrouping would bene t the perceived clarity and quality of a mix. The results of a subjective listening test showed this to be true

    Design and evaluation of dynamic feature-based segmentation on music

    Get PDF
    viii, 94 leaves : ill. ; 29 cmSegmentation is an indispensable step in the field of Music Information Retrieval (MIR). Segmentation refers to the splitting of a music piece into significant sections. Classically there has been a great deal of attention focused on various issues of segmentation, such as: perceptual segmentation vs. computational segmentation, segmentation evaluations, segmentation algorithms, etc. In this thesis, we conduct a series of perceptual experiments which challenge several of the traditional assumptions with respect to segmentation. Identifying some deficiencies in the current segmentation evaluation methods, we present a novel standardized evaluation approach which considers segmentation as a supportive step towards feature extraction in the MIR process. Furthermore, we propose a simple but effective segmentation algorithm and evaluate it utilizing our evaluation approach

    The Creation of Consonance: How Musical Context Influences Chord Perception

    Get PDF
    This PhD study investigates how our perception of musical chords, both in isolation and in musical context, is influenced and shaped by our knowledge of the tonal hierarchy and tonal syntax in terms of consonance/dissonance, pleasantness/unpleasantness, stability/instability, and relaxation/tension. Six experiments were conducted to gather behavioural data on the perception of chords from listeners with varying levels of musical training and experience. The first study is principally concerned with the influence of frequency of occurrence on the perception of twelve types of chord in isolation, including both triads and tetrads. It also examines to what extent factors besides frequency of occurrence, namely listener familiarity with the timbre in which chords are played and the acoustic features of chords, predict listener perception. The second and third studies concern the perception of chords in musical context. The second study focuses on musical contexts in which diminished and augmented chords appear, and on the harmonic functions of chords in short sequences of IV-V-I. Using sequences containing an augmented chord, the third study investigates the ways in which a non-diatonic tone can be anchored by its succeeding tone, and considers how the perception of these sequences is influenced by the harmonic function of its succeeding chord. These studies all reveal that the way in which chords and chord sequences are perceived is not completely predetermined by their acoustic, physical dimension. In addition, we impute on them a fluidity and elasticity as a result of our knowledge of the tonal hierarchy and tonal syntax in our musical schemata

    Mapping Acoustic and Semantic Dimensions of Auditory Perception

    Get PDF
    Auditory categorisation is a function of sensory perception which allows humans to generalise across many different sounds present in the environment and classify them into behaviourally relevant categories. These categories cover not only the variance of acoustic properties of the signal but also a wide variety of sound sources. However, it is unclear to what extent the acoustic structure of sound is associated with, and conveys, different facets of semantic category information. Whether people use such data and what drives their decisions when both acoustic and semantic information about the sound is available, also remains unknown. To answer these questions, we used the existing methods broadly practised in linguistics, acoustics and cognitive science, and bridged these domains by delineating their shared space. Firstly, we took a model-free exploratory approach to examine the underlying structure and inherent patterns in our dataset. To this end, we ran principal components, clustering and multidimensional scaling analyses. At the same time, we drew sound labels’ semantic space topography based on corpus-based word embeddings vectors. We then built an LDA model predicting class membership and compared the model-free approach and model predictions with the actual taxonomy. Finally, by conducting a series of web-based behavioural experiments, we investigated whether acoustic and semantic topographies relate to perceptual judgements. This analysis pipeline showed that natural sound categories could be successfully predicted based on the acoustic information alone and that perception of natural sound categories has some acoustic grounding. Results from our studies help to recognise the role of physical sound characteristics and their meaning in the process of sound perception and give an invaluable insight into the mechanisms governing the machine-based and human classifications
    corecore