21 research outputs found

    Speech Decomposition and Enhancement

    Get PDF
    The goal of this study is to investigate the roles of steady-state speech sounds and transitions between these sounds in the intelligibility of speech. The motivation for this approach is that the auditory system may be particularly sensitive to time-varying frequency edges, which in speech are produced primarily by transitions between vowels and consonants and within vowels. The possibility that selectively amplifying these edges may enhance speech intelligibility is examined. Computer algorithms to decompose speech into two different components were developed. One component, which is defined as a tonal component, was intended to predominately include formant activity. The second component, which is defined as a non-tonal component, was intended to predominately include transitions between and within formants.The approach to the decomposition is to use a set of time-varying filters whose center frequencies and bandwidths are controlled to identify the strongest formant components in speech. Each center frequency and bandwidth is estimated based on FM and AM information of each formant component. The tonal component is composed of the sum of the filter outputs. The non-tonal component is defined as the difference between the original speech signal and the tonal component.The relative energy and intelligibility of the tonal and non-tonal components were compared to the original speech. Psychoacoustic growth functions were used to assess the intelligibility. Most of the speech energy was in the tonal component, but this component had a significantly lower maximum word recognition than the original and non-tonal component had. The non-tonal component averaged 2% of the original speech energy, but this component had almost equal maximum word recognition as the original speech. The non-tonal component was amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to the original speech, and the intelligibility of the enhanced speech was compared to the original speech in background noise. The enhanced speech showed higher recognition scores at lower SNRs, and the differences were significant. The original and enhanced speech showed similar recognition scores at higher SNRs. These results suggest that amplification of transient information can enhance the speech in noise and this enhancement method is more effective at severe noise conditions

    Dynamics and network structure in neuroimaging data

    Get PDF

    Efficient and compact representations of head related transfer functions

    Get PDF
    These days most reproduced sound is consumed using portable devices and headphones, on which spatial binaural audio can be conveniently presented. One way of converting from conventional loudspeaker formats to binaural format is through the use of Head Related Transfer Functions (HRTFs), but head-tracking is also necessary to obtain a satisfactory externalisation of the simulated sound field. Typically a large HRTF dataset is required in order to provide enough measurements for a continuous virtual auditory space to be achieved through simple linear interpolation, or similar.\\This work describes an investigation into the use of alternative compact and efficient representations of an HRTF dataset measured in the azimuthal plane. The two main prongs of investigation are the use of orthogonal transformations in a decompositional approach, and parametric modelling approach that utilises techniques often associated with speech processing. The latter approach is explored through the application of a linear prediction derived all-pole model method and a pole-zero model design method proposed by Steiglitz and McBride \citep{Steiglitz1965}. The all-pole model is deemed to offer superior performance in matching the measured data after compression of the HRTF set through computer simulation results, whilst a preliminary subjective validation of the pole-zero models, that contrary to theoretical driven expectations, performed considerably worse in computer simulation experiments, is conducted as a pilot study.\\Consideration is also given to a method of secondary compression and interpolation that utilises the Discrete Cosine Transform applied to the angular dependent components derived from each of the approaches. It is possible that these techniques may also be useful in developing efficient schemes of custom HRTF capture

    Aprendizado de variedades para a síntese de áudio espacial

    Get PDF
    Orientadores: Luiz César Martini, Bruno Sanches MasieroTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: O objetivo do áudio espacial gerado com a técnica binaural é simular uma fonte sonora em localizações espaciais arbitrarias através das Funções de Transferência Relativas à Cabeça (HRTFs) ou também chamadas de Funções de Transferência Anatômicas. As HRTFs modelam a interação entre uma fonte sonora e a antropometria de uma pessoa (e.g., cabeça, torso e orelhas). Se filtrarmos uma fonte de áudio através de um par de HRTFs (uma para cada orelha), o som virtual resultante parece originar-se de uma localização espacial específica. Inspirados em nossos resultados bem sucedidos construindo uma aplicação prática de reconhecimento facial voltada para pessoas com deficiência visual que usa uma interface de usuário baseada em áudio espacial, neste trabalho aprofundamos nossa pesquisa para abordar vários aspectos científicos do áudio espacial. Neste contexto, esta tese analisa como incorporar conhecimentos prévios do áudio espacial usando uma nova representação não-linear das HRTFs baseada no aprendizado de variedades para enfrentar vários desafios de amplo interesse na comunidade do áudio espacial, como a personalização de HRTFs, a interpolação de HRTFs e a melhoria da localização de fontes sonoras. O uso do aprendizado de variedades para áudio espacial baseia-se no pressuposto de que os dados (i.e., as HRTFs) situam-se em uma variedade de baixa dimensão. Esta suposição também tem sido de grande interesse entre pesquisadores em neurociência computacional, que argumentam que as variedades são cruciais para entender as relações não lineares subjacentes à percepção no cérebro. Para todas as nossas contribuições usando o aprendizado de variedades, a construção de uma única variedade entre os sujeitos através de um grafo Inter-sujeito (Inter-subject graph, ISG) revelou-se como uma poderosa representação das HRTFs capaz de incorporar conhecimento prévio destas e capturar seus fatores subjacentes. Além disso, a vantagem de construir uma única variedade usando o nosso ISG e o uso de informações de outros indivíduos para melhorar o desempenho geral das técnicas aqui propostas. Os resultados mostram que nossas técnicas baseadas no ISG superam outros métodos lineares e não-lineares nos desafios de áudio espacial abordados por esta teseAbstract: The objective of binaurally rendered spatial audio is to simulate a sound source in arbitrary spatial locations through the Head-Related Transfer Functions (HRTFs). HRTFs model the direction-dependent influence of ears, head, and torso on the incident sound field. When an audio source is filtered through a pair of HRTFs (one for each ear), a listener is capable of perceiving a sound as though it were reproduced at a specific location in space. Inspired by our successful results building a practical face recognition application aimed at visually impaired people that uses a spatial audio user interface, in this work we have deepened our research to address several scientific aspects of spatial audio. In this context, this thesis explores the incorporation of spatial audio prior knowledge using a novel nonlinear HRTF representation based on manifold learning, which tackles three major challenges of broad interest among the spatial audio community: HRTF personalization, HRTF interpolation, and human sound localization improvement. Exploring manifold learning for spatial audio is based on the assumption that the data (i.e. the HRTFs) lies on a low-dimensional manifold. This assumption has also been of interest among researchers in computational neuroscience, who argue that manifolds are crucial for understanding the underlying nonlinear relationships of perception in the brain. For all of our contributions using manifold learning, the construction of a single manifold across subjects through an Inter-subject Graph (ISG) has proven to lead to a powerful HRTF representation capable of incorporating prior knowledge of HRTFs and capturing the underlying factors of spatial hearing. Moreover, the use of our ISG to construct a single manifold offers the advantage of employing information from other individuals to improve the overall performance of the techniques herein proposed. The results show that our ISG-based techniques outperform other linear and nonlinear methods in tackling the spatial audio challenges addressed by this thesisDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/14630-9FAPESPCAPE

    Acoustics of ancient Greek and Roman theaters in use today

    Full text link

    Perceptual evaluation of personal, location-aware spatial audio

    Full text link
    This thesis entails an analysis, synthesis and evaluation of the medium of personal, location aware spatial audio (PLASA). The PLASA medium is a specialisation of locative audio—the presentation of audio in relation to the listener’s position. It also intersects with audio augmented reality—the presentation of a virtual audio reality, superimposed on the real world. A PLASA system delivers binaural (personal) spa- tial audio to mobile listeners, with body-position and head-orientation interactivity, so that simulated sound source positions seem fixed in the world reference frame. PLASA technical requirements were analysed and three system architectures identified, employing mobile, remote or distributed rendering. Knowledge of human spatial hearing was reviewed to ascertain likely perceptual effects of the unique factors of PLASA compared to static spatial audio. Human factors identified were multimodal perception of body-motion interaction and coincident visual stimuli. Technical limitations identified were rendering method, individual binaural rendering, and accuracy and latency of position- and orientation-tracking. An experimental PLASA system was built and evaluated technically, then four perceptual experiments were conducted to investigate task-related perceptual per- formance. These experiments tested the identified human factors and technical limitations against performance measures related to localisation and navigation tasks, under conditions designed to be ecologically valid to PLASA application scenarios. A final experiment assessed navigation task performance with real sound sources and un-mediated spatial hearing for comparison with virtual source performance. Results found that body-motion interaction facilitated correction of front–back confusions. Body-motion and the multi-modal stimuli of virtual–audible and real–visible objects supported lower azimuth errors than stationary, mono-modal localisation of the same audio-only stimuli. PLASA users navigated efficiently to stationary virtual sources, despite varied rendering quality and head-turn latencies between 176 ms and 976 ms. Factors of rendering method, individualisation and head-turn latency showed interaction effects such as greater sensitivity to latency for some rendering methods than others. In general, PLASA task performance levels agreed with expectations from static or technical performance tests, and some results demonstrated similar performance levels to those achieved in the real-source baseline test

    Hybrid Data Storage Framework for the Biometrics Domain

    Get PDF
    Biometric based authentication is one of the most popular techniques adopted in large-scale identity matching systems due to its robustness in access control. In recent years, the number of enrolments has increased significantly posing serious issues towards the performance and scalability of these systems. In addition, the use of multiple modalities (such as face, iris and fingerprint) is further increasing the issues related to scalability. This research work focuses on the development of a new Hybrid Data Storage Framework (HDSF) that would improve scalability and performance of biometric authentication systems (BAS). In this framework, the scalability issue is addressed by integrating relational database and NoSQL data store, which combines the strengths of both. The proposed framework improves the performance of BAS in three areas (i) by proposing a new biographic match score based key filtering process, to identify any duplicate records in the storage (de-duplication search); (ii) by proposing a multi-modal biometric index based key filtering process for identification and de-duplication search operations; (iii) by adopting parallel biometric matching approach for identification, enrolment and verification processes. The efficacy of the proposed framework is compared with that of the traditional BAS and on several values of False Rejection Rate (FRR). Using our dataset and algorithms it is observed that when compared to traditional BAS, the HDSF is able to show an overall efficiency improvement of more than 54% for zero FRR and above 60% for FRR values between 1-3.5% during identification search operations

    Temporal integration of loudness as a function of level

    Get PDF
    corecore