149 research outputs found
Joint model-based recognition and localization of overlapped acoustic events using a set of distributed small microphone arrays
In the analysis of acoustic scenes, often the occurring sounds have to be
detected in time, recognized, and localized in space. Usually, each of these
tasks is done separately. In this paper, a model-based approach to jointly
carry them out for the case of multiple simultaneous sources is presented and
tested. The recognized event classes and their respective room positions are
obtained with a single system that maximizes the combination of a large set of
scores, each one resulting from a different acoustic event model and a
different beamformer output signal, which comes from one of several
arbitrarily-located small microphone arrays. By using a two-step method, the
experimental work for a specific scenario consisting of meeting-room acoustic
events, either isolated or overlapped with speech, is reported. Tests carried
out with two datasets show the advantage of the proposed approach with respect
to some usual techniques, and that the inclusion of estimated priors brings a
further performance improvement.Comment: Computational acoustic scene analysis, microphone array signal
processing, acoustic event detectio
Linear Transmit-Receive Strategies for Multi-user MIMO Wireless Communications
Die Notwendigkeit zur Unterdrueckung von Interferenzen auf der einen Seite
und zur Ausnutzung der durch Mehrfachzugriffsverfahren erzielbaren Gewinne
auf der anderen Seite rueckte die raeumlichen Mehrfachzugriffsverfahren
(Space Division Multiple Access, SDMA) in den Fokus der Forschung. Ein
Vertreter der raeumlichen Mehrfachzugriffsverfahren, die lineare
Vorkodierung, fand aufgrund steigender Anzahl an Nutzern und Antennen in
heutigen und zukuenftigen Mobilkommunikationssystemen besondere Beachtung,
da diese Verfahren das Design von Algorithmen zur Vorcodierung
vereinfachen. Aus diesem Grund leistet diese Dissertation einen Beitrag zur
Entwicklung linearer Sende- und Empfangstechniken fuer MIMO-Technologie mit
mehreren Nutzern. Zunaechst stellen wir ein Framework zur Approximation des
Datendurchsatzes in Broadcast-MIMO-Kanaelen mit mehreren Nutzern vor. In
diesem Framework nehmen wir das lineare Vorkodierverfahren regularisierte
Blockdiagonalisierung (RBD) an. Durch den Vergleich von Dirty Paper Coding
(DPC) und linearen Vorkodieralgorithmen (z.B. Zero Forcing (ZF) und
Blockdiagonalisierung (BD)) ist es uns moeglich, untere und obere Schranken
fuer den Unterschied bezueglich Datenraten und bezueglich Leistung zwischen
beiden anzugeben. Im Weiteren entwickeln wir einen Algorithmus fuer
koordiniertes Beamforming (Coordinated Beamforming, CBF), dessen Loesung
sich in geschlossener Form angeben laesst. Dieser CBF-Algorithmus basiert
auf der SeDJoCo-Transformation und loest bisher vorhandene Probleme im
Bereich CBF. Im Anschluss schlagen wir einen iterativen CBF-Algorithmus
namens FlexCoBF (flexible coordinated beamforming) fuer
MIMO-Broadcast-Kanaele mit mehreren Nutzern vor. Im Vergleich mit bis dato
existierenden iterativen CBF-Algorithmen kann als vielversprechendster
Vorteil die freie Wahl der linearen Sende- und Empfangsstrategie
herausgestellt werden. Das heisst, jede existierende Methode der linearen
Vorkodierung kann als Sendestrategie genutzt werden, waehrend die Strategie
zum Empfangsbeamforming frei aus MRC oder MMSE gewaehlt werden darf. Im
Hinblick auf Szenarien, in denen Mobilfunkzellen in Clustern
zusammengefasst sind, erweitern wir FlexCoBF noch weiter. Hier wurde das
Konzept der koordinierten Mehrpunktverbindung (Coordinated Multipoint
(CoMP) transmission) integriert. Zuletzt stellen wir drei Moeglichkeiten
vor, Kanalzustandsinformationen (Channel State Information, CSI) unter
verschiedenen Kanalumstaenden zu erlangen. Die Qualitaet der
Kanalzustandsinformationen hat einen starken Einfluss auf die Guete des
Uebertragungssystems. Die durch unsere neuen Algorithmen erzielten
Verbesserungen haben wir mittels numerischer Simulationen von Summenraten
und Bitfehlerraten belegt.In order to combat interference and exploit large multiplexing gains of the
multi-antenna systems, a particular interest in spatial division multiple
access (SDMA) techniques has emerged. Linear precoding techniques, as one
of the SDMA strategies, have obtained more attention due to the fact that
an increasing number of users and antennas involved into the existing and
future mobile communication systems requires a simplification of the
precoding design. Therefore, this thesis contributes to the design of
linear transmit and receive strategies for multi-user MIMO broadcast
channels in a single cell and clustered multiple cells. First, we present a
throughput approximation framework for multi-user MIMO broadcast channels
employing regularized block diagonalization (RBD) linear precoding.
Comparing dirty paper coding (DPC) and linear precoding algorithms (e.g.,
zero forcing (ZF) and block diagonalization (BD)), we further quantify
lower and upper bounds of the rate and power offset between them as a
function of the system parameters such as the number of users and antennas.
Next, we develop a novel closed-form coordinated beamforming (CBF)
algorithm (i.e., SeDJoCo based closed-form CBF) to solve the existing open
problem of CBF. Our new algorithm can support a MIMO system with an
arbitrary number of users and transmit antennas. Moreover, the application
of our new algorithm is not only for CBF, but also for blind source
separation (BSS), since the same mathematical model has been used in BSS
application.Then, we further propose a new iterative CBF algorithm (i.e.,
flexible coordinated beamforming (FlexCoBF)) for multi-user MIMO broadcast
channels. Compared to the existing iterative CBF algorithms, the most
promising advantage of our new algorithm is that it provides freedom in the
choice of the linear transmit and receive beamforming strategies, i.e., any
existing linear precoding method can be chosen as the transmit strategy and
the receive beamforming strategy can be flexibly chosen from MRC or MMSE
receivers. Considering clustered multiple cell scenarios, we extend the
FlexCoBF algorithm further and introduce the concept of the coordinated
multipoint (CoMP) transmission. Finally, we present three strategies for
channel state information (CSI) acquisition regarding various channel
conditions and channel estimation strategies. The CSI knowledge is required
at the base station in order to implement SDMA techniques. The quality of
the obtained CSI heavily affects the system performance. The performance
enhancement achieved by our new strategies has been demonstrated by
numerical simulation results in terms of the system sum rate and the bit
error rate
Recommended from our members
End-to-end Speech Separation with Neural Networks
Speech separation has long been an active research topic in the signal processing community with its importance in a wide range of applications such as hearable devices and telecommunication systems. It not only serves as a fundamental problem for all higher-level speech processing tasks such as automatic speech recognition, natural language understanding, and smart personal assistants, but also plays an important role in smart earphones and augmented and virtual reality devices.
With the recent progress in deep neural networks, the separation performance has been significantly advanced by various new problem definitions and model architectures. The most widely-used approach in the past years performs separation in time-frequency domain, where a spectrogram or a time-frequency representation is first calculated from the mixture signal and multiple time-frequency masks are then estimated for the target sources. The masks are applied on the mixture's time-frequency representation to extract the target representations, and then operations such as inverse short-time Fourier transform is utilized to convert them back to waveforms. However, such frequency-domain methods may have difficulties in modeling the phase spectrogram as the conventional time-frequency masks often only consider the magnitude spectrogram. Moreover, the training objectives for the frequency-domain methods are typically also in frequency-domain, which may not be inline with widely-used time-domain evaluation metrics such as signal-to-noise ratio and signal-to-distortion ratio.
The problem formulation of time-domain, end-to-end speech separation naturally arises to tackle the disadvantages in the frequency-domain systems. The end-to-end speech separation networks take the mixture waveform as input and directly estimate the waveforms of the target sources. Following the general pipeline of conventional frequency-domain systems which contains a waveform encoder, a separator, and a waveform decoder, time-domain systems can be design in a similar way while significantly improves the separation performance.
In this dissertation, I focus on multiple aspects in the general problem formulation of end-to-end separation networks including the system designs, model architectures, and training objectives. I start with a single-channel pipeline, which we refer to as the time-domain audio separation network (TasNet), to validate the advantage of end-to-end separation comparing with the conventional time-frequency domain pipelines. I then move to the multi-channel scenario and introduce the filter-and-sum network (FaSNet) for both fixed-geometry and ad-hoc geometry microphone arrays.
Next I introduce methods for lightweight network architecture design that allows the models to maintain the separation performance while using only as small as 2.5% model size and 17.6% model complexity. After that, I look into the training objective functions for end-to-end speech separation and describe two training objectives for separating varying numbers of sources and improving the robustness under reverberant environments, respectively. Finally I take a step back and revisit several problem formulations in end-to-end separation pipeline and raise more questions in this framework to be further analyzed and investigated in future works
Assessment of Measurement Distortions in GNSS Antenna Array Space-Time Processing
Antenna array processing techniques are studied in GNSS as effective tools to mitigate interference in spatial and spatiotemporal domains. However, without specific considerations, the array processing results in biases and distortions in the cross-ambiguity function (CAF) of the ranging codes. In space-time processing (STP) the CAF misshaping can happen due to the combined effect of space-time processing and the unintentional signal attenuation by filtering. This paper focuses on characterizing these degradations for different controlled signal scenarios and for live data from an antenna array. The antenna array simulation method introduced in this paper enables one to perform accurate analyses in the field of STP. The effects of relative placement of the interference source with respect to the desired signal direction are shown using overall measurement errors and profile of the signal strength. Analyses of contributions from each source of distortion are conducted individually and collectively. Effects of distortions on GNSS pseudorange errors and position errors are compared for blind, semi-distortionless, and distortionless beamforming methods. The results from characterization can be useful for designing low distortion filters that are especially important for high accuracy GNSS applications in challenging environments
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation
Recently, frequency domain all-neural beamforming methods have achieved
remarkable progress for multichannel speech separation. In parallel, the
integration of time domain network structure and beamforming also gains
significant attention. This study proposes a novel all-neural beamforming
method in time domain and makes an attempt to unify the all-neural beamforming
pipelines for time domain and frequency domain multichannel speech separation.
The proposed model consists of two modules: separation and beamforming. Both
modules perform temporal-spectral-spatial modeling and are trained from
end-to-end using a joint loss function. The novelty of this study lies in two
folds. Firstly, a time domain directional feature conditioned on the direction
of the target speaker is proposed, which can be jointly optimized within the
time domain architecture to enhance target signal estimation. Secondly, an
all-neural beamforming network in time domain is designed to refine the
pre-separated results. This module features with parametric time-variant
beamforming coefficient estimation, without explicitly following the derivation
of optimal filters that may lead to an upper bound. The proposed method is
evaluated on simulated reverberant overlapped speech data derived from the
AISHELL-1 corpus. Experimental results demonstrate significant performance
improvements over frequency domain state-of-the-arts, ideal magnitude masks and
existing time domain neural beamforming methods
- …