142,064 research outputs found

    Improving elevation perception with a tool for image-guided head-related transfer function selection

    Get PDF
    This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a subject's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that subject. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of an auditory model for sound localization in the mid-sagittal plane available from previous literature. Using virtual subjects from a HRTF database, a virtual experiment is implemented to assess the vertical localization performance of the database subjects when they are provided with HRTF sets selected by the proposed procedure. Results report a statistically significant improvement in predictions of localization performance for selected HRTFs compared to KEMAR HRTF which is a commercial standard in many binaural audio solutions; moreover, the proposed analysis provides useful indications to refine the perceptually-motivated metrics that guides the selection

    Frequency Estimation Of The First Pinna Notch In Head-Related Transfer Functions With A Linear Anthropometric Model

    Get PDF
    The relation between anthropometric parameters and Head-Related Transfer Function (HRTF) features, especially those due to the pinna, are not fully understood yet. In this paper we apply signal processing techniques to extract the frequencies of the main pinna notches (known as N1, N2, and N3) in the frontal part of the median plane and build a model relating them to 13 different anthropometric parameters of the pinna, some of which depend on the elevation angle of the sound source. Results show that while the considered anthropometric parameters are not able to approximate with sufficient accuracy neither the N2 nor the N3 frequency, eight of them are sufficient for modeling the frequency of N1 within a psychoacoustically acceptable margin of error. In particular, distances between the ear canal and the outer helix border are the most important parameters for predicting N1

    Charge separation: From the topology of molecular electronic transitions to the dye/semiconductor interfacial energetics and kinetics

    Full text link
    Charge separation properties, that is the ability of a chromophore, or a chromophore/semiconductor interface, to separate charges upon light absorption, are crucial characteristics for an efficient photovoltaic device. Starting from this concept, we devote the first part of this book chapter to the topological analysis of molecular electronic transitions induced by photon capture. Such analysis can be either qualitative or quantitative, and is presented here in the framework of the reduced density matrix theory applied to single-reference, multiconfigurational excited states. The qualitative strategies are separated into density-based and wave function-based approaches, while the quantitative methods reported here for analysing the photoinduced charge transfer nature are either fragment-based, global or statistical. In the second part of this chapter we extend the analysis to dye-sensitized metal oxide surface models, discussing interfacial charge separation, energetics and electron injection kinetics from the dye excited state to the semiconductor conduction band states

    Direction of Arrival with One Microphone, a few LEGOs, and Non-Negative Matrix Factorization

    Get PDF
    Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.Comment: This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language processing (TASLP

    Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

    Full text link
    Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using a vector to measure pairwise dependency, but this requires to expand the alignment matrix to a tensor, which results in memory and computation bottlenecks. In this paper, we propose a novel attention mechanism called "Multi-mask Tensorized Self-Attention" (MTSA), which is as fast and as memory-efficient as a CNN, but significantly outperforms previous CNN-/RNN-/attention-based models. MTSA 1) captures both pairwise (token2token) and global (source2token) dependencies by a novel compatibility function composed of dot-product and additive attentions, 2) uses a tensor to represent the feature-wise alignment scores for better expressive power but only requires parallelizable matrix multiplications, and 3) combines multi-head with multi-dimensional attentions, and applies a distinct positional mask to each head (subspace), so the memory and computation can be distributed to multiple heads, each with sequential information encoded independently. The experiments show that a CNN/RNN-free model based on MTSA achieves state-of-the-art or competitive performance on nine NLP benchmarks with compelling memory- and time-efficiency

    Development of an Advanced Force Field for Water using Variational Energy Decomposition Analysis

    Full text link
    Given the piecewise approach to modeling intermolecular interactions for force fields, they can be difficult to parameterize since they are fit to data like total energies that only indirectly connect to their separable functional forms. Furthermore, by neglecting certain types of molecular interactions such as charge penetration and charge transfer, most classical force fields must rely on, but do not always demonstrate, how cancellation of errors occurs among the remaining molecular interactions accounted for such as exchange repulsion, electrostatics, and polarization. In this work we present the first generation of the (many-body) MB-UCB force field that explicitly accounts for the decomposed molecular interactions commensurate with a variational energy decomposition analysis, including charge transfer, with force field design choices that reduce the computational expense of the MB-UCB potential while remaining accurate. We optimize parameters using only single water molecule and water cluster data up through pentamers, with no fitting to condensed phase data, and we demonstrate that high accuracy is maintained when the force field is subsequently validated against conformational energies of larger water cluster data sets, radial distribution functions of the liquid phase, and the temperature dependence of thermodynamic and transport water properties. We conclude that MB-UCB is comparable in performance to MB-Pol, but is less expensive and more transferable by eliminating the need to represent short-ranged interactions through large parameter fits to high order polynomials
    • …
    corecore