Search CORE

347 research outputs found

Real-Time Dual-Microphone Speech Enhancement

Author: Boyer François-Raymond
Savaria Yvon
Trabelsi Abdelaziz
Publication venue: 'IntechOpen'
Publication date: 14/03/2012
Field of study

Adaptive Algorithms for Intelligent Acoustic Interfaces

Author: COMMINIELLO DANILO
Publication venue
Publication date: 16/04/2012
Field of study

Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

Adaptive Algorithms for Intelligent Acoustic Interfaces

Author: COMMINIELLO DANILO
Publication venue
Publication date: 16/04/2012
Field of study

Archivio della ricerca- Università di Roma La Sapienza

Identifying, Evaluating and Applying Importance Maps for Speech

Author: Trinh Viet Anh
Publication venue: CUNY Academic Works
Publication date: 01/02/2022
Field of study

Like many machine learning systems, speech models often perform well when employed on data in the same domain as their training data. However, when the inference is on out-of-domain data, performance suffers. With a fast-growing number of applications of speech models in healthcare, education, automotive, automation, etc., it is essential to ensure that speech models can generalize to out-of-domain data, especially to noisy environments in real-world scenarios. In contrast, human listeners are quite robust to noisy environments. Thus, a thorough understanding of the differences between human listeners and speech models is urgently required to enhance speech model performance in noise. These differences exist presumably because the speech model does not use the same information as humans for recognizing the speech. A possible solution is encouraging the speech model to attend to the same time-frequency regions as human listeners. In this way, speech model generalization in noise may be improved. We define those time-frequency regions that humans or machines focus on to recognize the speech as importance maps (IMs). In this research, first, we investigate how to identify speech importance maps. Second, we compare human and machine importance maps to understand how they differ and how the speech model can learn from humans to improve its performance in noise. Third, we develop a structured saliency benchmark (SSBM), a metric for evaluating IMs. Finally, we propose a new application of IMs as data augmentation for speech models, enhancing their performance and enabling them to better generalize to out-of-domain noise. Overall, our work demonstrates that we can improve speech models and achieve out-of-domain generalization to different noise environments with importance maps. In the future, we will expand our work with large-scale speech models and deploy different methods to identify IMs and use them to augment the speech data, such as those based on human responses. We can also extend the technique to computer vision tasks, such as image recognition by predicting importance maps for images and use IMs to enhance model performance to out-of-domain data

City University of New York

Multichannel Bin-Wise Robust Frequency-Domain Adaptive Filtering and Its Application to Adaptive Beamforming

Author: Herbert Buchner
Satoshi Nakamura
Walter Kellermann
Wolfgang Herbordt
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Cumulative index to NASA Tech Briefs, 1986-1990, volumes 10-14

Author
Publication venue
Publication date: 01/04/1992
Field of study

Tech Briefs are short announcements of new technology derived from the R&D activities of the National Aeronautics and Space Administration. These briefs emphasize information considered likely to be transferrable across industrial, regional, or disciplinary lines and are issued to encourage commercial application. This cumulative index of Tech Briefs contains abstracts and four indexes (subject, personal author, originating center, and Tech Brief number) and covers the period 1986 to 1990. The abstract section is organized by the following subject categories: electronic components and circuits, electronic systems, physical sciences, materials, computer programs, life sciences, mechanics, machinery, fabrication technology, and mathematics and information sciences

NASA Technical Reports Server

NASA Tech Briefs, December 1990

Author
Publication venue
Publication date
Field of study

Topics: New Product Ideas; NASA TU Services; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

NASA Technical Reports Server

Real-time dual-microphone speech enhancement

Author: Boyer François-Raymond
Savaria Yvon
Trabelsi Abdelaziz
Publication venue: 'IntechOpen'
Publication date: 14/03/2012
Field of study

PolyPublie

ChordMics: Acoustic Signal Purification with Distributed Microphones

Author: He Yuan
Jin Meng
Li Jinming
Wang Weiguo
Publication venue
Publication date: 30/09/2022
Field of study

Acoustic signal acts as an essential input to many systems. However, the pure acoustic signal is very difficult to extract, especially in noisy environments. Existing beamforming systems are able to extract the signal transmitted from certain directions. However, since microphones are centrally deployed, these systems have limited coverage and low spatial resolution. We overcome the above limitations and present ChordMics, a distributed beamforming system. By leveraging the spatial diversity of the distributed microphones, ChordMics is able to extract the acoustic signal from arbitrary points. To realize such a system, we further address the fundamental challenge in distributed beamforming: aligning the signals captured by distributed and unsynchronized microphones. We implement ChordMics and evaluate its performance under both LOS and NLOS scenarios. The evaluation results tell that ChordMics can deliver higher SINR than the centralized microphone array. The average performance gain is up to 15dB

arXiv.org e-Print Archive