Search CORE

149 research outputs found

Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays

Author: Chakraborty Rupayan
Nadeu Climent
Publication venue
Publication date: 01/01/2017
Field of study

In the analysis of acoustic scenes, often the occurring sounds have to be detected in time, recognized, and localized in space. Usually, each of these tasks is done separately. In this paper, a model-based approach to jointly carry them out for the case of multiple simultaneous sources is presented and tested. The recognized event classes and their respective room positions are obtained with a single system that maximizes the combination of a large set of scores, each one resulting from a different acoustic event model and a different beamformer output signal, which comes from one of several arbitrarily-located small microphone arrays. By using a two-step method, the experimental work for a specific scenario consisting of meeting-room acoustic events, either isolated or overlapped with speech, is reported. Tests carried out with two datasets show the advantage of the proposed approach with respect to some usual techniques, and that the inclusion of estimated priors brings a further performance improvement.Comment: Computational acoustic scene analysis, microphone array signal processing, acoustic event detectio

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

雑音特性の変動を伴う多様な環境で実用可能な音声強調

Author: Kawase Tomoko
川瀬智子
Publication venue
Publication date: 01/01/2018
Field of study

筑波大学 (University of Tsukuba)201

Tsukuba Repository

Linear Transmit-Receive Strategies for Multi-user MIMO Wireless Communications

Author: Song Bin
Publication venue
Publication date: 17/03/2014
Field of study

Die Notwendigkeit zur Unterdrueckung von Interferenzen auf der einen Seite und zur Ausnutzung der durch Mehrfachzugriffsverfahren erzielbaren Gewinne auf der anderen Seite rueckte die raeumlichen Mehrfachzugriffsverfahren (Space Division Multiple Access, SDMA) in den Fokus der Forschung. Ein Vertreter der raeumlichen Mehrfachzugriffsverfahren, die lineare Vorkodierung, fand aufgrund steigender Anzahl an Nutzern und Antennen in heutigen und zukuenftigen Mobilkommunikationssystemen besondere Beachtung, da diese Verfahren das Design von Algorithmen zur Vorcodierung vereinfachen. Aus diesem Grund leistet diese Dissertation einen Beitrag zur Entwicklung linearer Sende- und Empfangstechniken fuer MIMO-Technologie mit mehreren Nutzern. Zunaechst stellen wir ein Framework zur Approximation des Datendurchsatzes in Broadcast-MIMO-Kanaelen mit mehreren Nutzern vor. In diesem Framework nehmen wir das lineare Vorkodierverfahren regularisierte Blockdiagonalisierung (RBD) an. Durch den Vergleich von Dirty Paper Coding (DPC) und linearen Vorkodieralgorithmen (z.B. Zero Forcing (ZF) und Blockdiagonalisierung (BD)) ist es uns moeglich, untere und obere Schranken fuer den Unterschied bezueglich Datenraten und bezueglich Leistung zwischen beiden anzugeben. Im Weiteren entwickeln wir einen Algorithmus fuer koordiniertes Beamforming (Coordinated Beamforming, CBF), dessen Loesung sich in geschlossener Form angeben laesst. Dieser CBF-Algorithmus basiert auf der SeDJoCo-Transformation und loest bisher vorhandene Probleme im Bereich CBF. Im Anschluss schlagen wir einen iterativen CBF-Algorithmus namens FlexCoBF (flexible coordinated beamforming) fuer MIMO-Broadcast-Kanaele mit mehreren Nutzern vor. Im Vergleich mit bis dato existierenden iterativen CBF-Algorithmen kann als vielversprechendster Vorteil die freie Wahl der linearen Sende- und Empfangsstrategie herausgestellt werden. Das heisst, jede existierende Methode der linearen Vorkodierung kann als Sendestrategie genutzt werden, waehrend die Strategie zum Empfangsbeamforming frei aus MRC oder MMSE gewaehlt werden darf. Im Hinblick auf Szenarien, in denen Mobilfunkzellen in Clustern zusammengefasst sind, erweitern wir FlexCoBF noch weiter. Hier wurde das Konzept der koordinierten Mehrpunktverbindung (Coordinated Multipoint (CoMP) transmission) integriert. Zuletzt stellen wir drei Moeglichkeiten vor, Kanalzustandsinformationen (Channel State Information, CSI) unter verschiedenen Kanalumstaenden zu erlangen. Die Qualitaet der Kanalzustandsinformationen hat einen starken Einfluss auf die Guete des Uebertragungssystems. Die durch unsere neuen Algorithmen erzielten Verbesserungen haben wir mittels numerischer Simulationen von Summenraten und Bitfehlerraten belegt.In order to combat interference and exploit large multiplexing gains of the multi-antenna systems, a particular interest in spatial division multiple access (SDMA) techniques has emerged. Linear precoding techniques, as one of the SDMA strategies, have obtained more attention due to the fact that an increasing number of users and antennas involved into the existing and future mobile communication systems requires a simplification of the precoding design. Therefore, this thesis contributes to the design of linear transmit and receive strategies for multi-user MIMO broadcast channels in a single cell and clustered multiple cells. First, we present a throughput approximation framework for multi-user MIMO broadcast channels employing regularized block diagonalization (RBD) linear precoding. Comparing dirty paper coding (DPC) and linear precoding algorithms (e.g., zero forcing (ZF) and block diagonalization (BD)), we further quantify lower and upper bounds of the rate and power offset between them as a function of the system parameters such as the number of users and antennas. Next, we develop a novel closed-form coordinated beamforming (CBF) algorithm (i.e., SeDJoCo based closed-form CBF) to solve the existing open problem of CBF. Our new algorithm can support a MIMO system with an arbitrary number of users and transmit antennas. Moreover, the application of our new algorithm is not only for CBF, but also for blind source separation (BSS), since the same mathematical model has been used in BSS application.Then, we further propose a new iterative CBF algorithm (i.e., flexible coordinated beamforming (FlexCoBF)) for multi-user MIMO broadcast channels. Compared to the existing iterative CBF algorithms, the most promising advantage of our new algorithm is that it provides freedom in the choice of the linear transmit and receive beamforming strategies, i.e., any existing linear precoding method can be chosen as the transmit strategy and the receive beamforming strategy can be flexibly chosen from MRC or MMSE receivers. Considering clustered multiple cell scenarios, we extend the FlexCoBF algorithm further and introduce the concept of the coordinated multipoint (CoMP) transmission. Finally, we present three strategies for channel state information (CSI) acquisition regarding various channel conditions and channel estimation strategies. The CSI knowledge is required at the base station in order to implement SDMA techniques. The quality of the obtained CSI heavily affects the system performance. The performance enhancement achieved by our new strategies has been demonstrated by numerical simulation results in terms of the system sum rate and the bit error rate

Digitale Bibliothek Thüringen

Source Localization for Dual Speech Enhancement Technology

Author: Hyejeong Jeon
Lag-Young Kim
Seungil Kim
Publication venue: 'IntechOpen'
Publication date: 11/04/2011
Field of study

IntechOpen

Recommended from our members

End-to-end Speech Separation with Neural Networks

Author: Luo Yi
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2021
Field of study

Speech separation has long been an active research topic in the signal processing community with its importance in a wide range of applications such as hearable devices and telecommunication systems. It not only serves as a fundamental problem for all higher-level speech processing tasks such as automatic speech recognition, natural language understanding, and smart personal assistants, but also plays an important role in smart earphones and augmented and virtual reality devices. With the recent progress in deep neural networks, the separation performance has been significantly advanced by various new problem definitions and model architectures. The most widely-used approach in the past years performs separation in time-frequency domain, where a spectrogram or a time-frequency representation is first calculated from the mixture signal and multiple time-frequency masks are then estimated for the target sources. The masks are applied on the mixture's time-frequency representation to extract the target representations, and then operations such as inverse short-time Fourier transform is utilized to convert them back to waveforms. However, such frequency-domain methods may have difficulties in modeling the phase spectrogram as the conventional time-frequency masks often only consider the magnitude spectrogram. Moreover, the training objectives for the frequency-domain methods are typically also in frequency-domain, which may not be inline with widely-used time-domain evaluation metrics such as signal-to-noise ratio and signal-to-distortion ratio. The problem formulation of time-domain, end-to-end speech separation naturally arises to tackle the disadvantages in the frequency-domain systems. The end-to-end speech separation networks take the mixture waveform as input and directly estimate the waveforms of the target sources. Following the general pipeline of conventional frequency-domain systems which contains a waveform encoder, a separator, and a waveform decoder, time-domain systems can be design in a similar way while significantly improves the separation performance. In this dissertation, I focus on multiple aspects in the general problem formulation of end-to-end separation networks including the system designs, model architectures, and training objectives. I start with a single-channel pipeline, which we refer to as the time-domain audio separation network (TasNet), to validate the advantage of end-to-end separation comparing with the conventional time-frequency domain pipelines. I then move to the multi-channel scenario and introduce the filter-and-sum network (FaSNet) for both fixed-geometry and ad-hoc geometry microphone arrays. Next I introduce methods for lightweight network architecture design that allows the models to maintain the separation performance while using only as small as 2.5% model size and 17.6% model complexity. After that, I look into the training objective functions for end-to-end speech separation and describe two training objectives for separating varying numbers of sources and improving the robustness under reverberant environments, respectively. Finally I take a step back and revisit several problem formulations in end-to-end separation pipeline and raise more questions in this framework to be further analyzed and investigated in future works

Columbia University Academic Commons

Assessment of Measurement Distortions in GNSS Antenna Array Space-Time Processing

Author: Gérard Lachapelle
Saeed Daneshmand
Thyagaraja Marathe
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Antenna array processing techniques are studied in GNSS as effective tools to mitigate interference in spatial and spatiotemporal domains. However, without specific considerations, the array processing results in biases and distortions in the cross-ambiguity function (CAF) of the ranging codes. In space-time processing (STP) the CAF misshaping can happen due to the combined effect of space-time processing and the unintentional signal attenuation by filtering. This paper focuses on characterizing these degradations for different controlled signal scenarios and for live data from an antenna array. The antenna array simulation method introduced in this paper enables one to perform accurate analyses in the field of STP. The effects of relative placement of the interference source with respect to the desired signal direction are shown using overall measurement errors and profile of the signal strength. Analyses of contributions from each source of distortion are conducted individually and collectively. Effects of distortions on GNSS pseudorange errors and position errors are compared for blind, semi-distortionless, and distortionless beamforming methods. The results from characterization can be useful for designing low distortion filters that are especially important for high accuracy GNSS applications in challenging environments

Crossref

Directory of Open Access Journals

Studies on noise robust automatic speech recognition

Author: Kurimo Mikko
Palomäki Kalle J.
Remes Ulpu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2009
Field of study

Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

Aaltodoc Publication Archive

Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation

Author: Gu Rongzhi
Yu Dong
Zhang Shi-Xiong
Zou Yuexian
Publication venue
Publication date: 23/12/2022
Field of study

Recently, frequency domain all-neural beamforming methods have achieved remarkable progress for multichannel speech separation. In parallel, the integration of time domain network structure and beamforming also gains significant attention. This study proposes a novel all-neural beamforming method in time domain and makes an attempt to unify the all-neural beamforming pipelines for time domain and frequency domain multichannel speech separation. The proposed model consists of two modules: separation and beamforming. Both modules perform temporal-spectral-spatial modeling and are trained from end-to-end using a joint loss function. The novelty of this study lies in two folds. Firstly, a time domain directional feature conditioned on the direction of the target speaker is proposed, which can be jointly optimized within the time domain architecture to enhance target signal estimation. Secondly, an all-neural beamforming network in time domain is designed to refine the pre-separated results. This module features with parametric time-variant beamforming coefficient estimation, without explicitly following the derivation of optimal filters that may lead to an upper bound. The proposed method is evaluated on simulated reverberant overlapped speech data derived from the AISHELL-1 corpus. Experimental results demonstrate significant performance improvements over frequency domain state-of-the-arts, ideal magnitude masks and existing time domain neural beamforming methods

arXiv.org e-Print Archive