56 research outputs found

    Subband beamforming with higher order statistics for distant speech recognition

    Get PDF
    This dissertation presents novel beamforming methods for distant speech recognition (DSR). Such techniques can relieve users from the necessity of putting on close talking microphones. DSR systems are useful in many applications such as humanoid robots, voice control systems for automobiles, automatic meeting transcription systems and so on. A main problem in DSR is that recognition performance is seriously degraded when a speaker is far from the microphones. In order to avoid the degradation, noise and reverberation should be removed from signals received with the microphones. Acoustic beamforming techniques have a potential to enhance speech from the far field with little distortion since they can maintain a distortionless constraint for a look direction. In beamforming, multiple signals propagating from a position are captured with multiple microphones. Typical conventional beamformers then adjust their weights so as to minimize the variance of their own outputs subject to a distortionless constraint in a look direction. The variance is the average of the second power (square) of the beamformer\u27s outputs. Accordingly, it is considered that the conventional beamformer uses second orderstatistics (SOS) of the beamformer\u27s outputs. The conventional beamforming techniques can effectively place a null on any source of interference. However, the desired signal is also canceled in reverberant environments, which is known as the signal cancellation problem. To avoid that problem, many algorithms have been developed. However, none of the algorithms can essentially solve the signal cancellation problem in reverberant environments. While many efforts have been made in order to overcome the signal cancellation problem in the field of acoustic beamforming, researchers have addressed another research issue with the microphone array, that is, blind source separation (BSS) [1]. The BSS techniques aim at separating sources from the mixture of signals without information about the geometry of the microphone array and positions of sources. It is achieved by multiplying an un-mixing matrix with input signals. The un-mixing matrix is constructed so that the outputs are stochastically independent. Measuring the stochastic independence of the signals is based on the theory of the independent component analysis (ICA) [1]. The field of ICA is based on the fact that distributions of information-bearing signals are not Gaussian and distributions of sums of various signals are close to Gaussian. There are two popular criteria for measuring the degree of the non-Gaussianity, namely, kurtosis and negentropy. As described in detail in this thesis, both criteria use more than the second moment. Accordingly, it is referred to as higher order statistics (HOS) in contrast to SOS. HOS is not considered in the field of acoustic beamforming well although Arai et al. showed the similarity between acoustic beamforming and BSS [2]. This thesis investigates new beamforming algorithms which take into consideration higher-order statistics (HOS). The new beamforming methods adjust the beamformer\u27s weights based on one of the following criteria: • minimum mutual information of the two beamformer\u27s outputs, • maximum negentropy of the beamformer\u27s outputs and • maximum kurtosis of the beamformer\u27s outputs. Those algorithms do not suffer from the signal cancellation, which is shown in this thesis. Notice that the new beamforming techniques can keep the distortionless constraint for the direction of interest in contrast to the BSS algorithms. The effectiveness of the new techniques is finally demonstrated through a series of distant automatic speech recognition experiments on real data recorded with real sensors unlike other work where signals artificially convolved with measured impulse responses are considered. Significant improvements are achieved by the beamforming algorithms proposed here.Diese Dissertation präsentiert neue Methoden zur Spracherkennung auf Entfernung. Mit diesen Methoden ist es möglich auf Nahbesprechungsmikrofone zu verzichten. Spracherkennungssysteme, die auf Nahbesprechungsmikrofone verzichten, sind in vielen Anwendungen nützlich, wie zum Beispiel bei Humanoiden-Robotern, in Voice Control Systemen für Autos oder bei automatischen Transcriptionssystemen von Meetings. Ein Hauptproblem in der Spracherkennung auf Entfernung ist, dass mit zunehmendem Abstand zwischen Sprecher und Mikrofon, die Genauigkeit der Spracherkennung stark abnimmt. Aus diesem Grund ist es elementar die Störungen, nämlich Hintergrundgeräusche, Hall und Echo, aus den Mikrofonsignalen herauszurechnen. Durch den Einsatz von mehreren Mikrofonen ist eine räumliche Trennung des Nutzsignals von den Störungen möglich. Diese Methode wird als akustisches Beamformen bezeichnet. Konventionelle akustische Beamformer passen ihre Gewichte so an, dass die Varianz des Ausgangssignals minimiert wird, wobei das Signal in "Blickrichtung" die Bedingung der Verzerrungsfreiheit erfüllen muss. Die Varianz ist definiert als das quadratische Mittel des Ausgangssignals.Somit werden bei konventionellen Beamformingmethoden Second-Order Statistics (SOS) des Ausgangssignals verwendet. Konventionelle Beamformer können Störquellen effizient unterdrücken, aber leider auch das Nutzsignal. Diese unerwünschte Unterdrückung des Nutzsignals wird im Englischen signal cancellation genannt und es wurden bereits viele Algorithmen entwickelt um dies zu vermeiden. Keiner dieser Algorithmen, jedoch, funktioniert effektiv in verhallter Umgebung. Eine weitere Methode das Nutzsignal von den Störungen zu trennen, diesesmal jedoch ohne die geometrische Information zu nutzen, wird Blind Source Separation (BSS) [1] genannt. Hierbei wird eine Matrixmultiplikation mit dem Eingangssignal durchgeführt. Die Matrix muss so konstruiert werden, dass die Ausgangssignale statistisch unabhängig voneinander sind. Die statistische Unabhängigkeit wird mit der Theorie der Independent Component Analysis (ICA) gemessen [1]. Die ICA nimmt an, dass informationstragende Signale, wie z.B. Sprache, nicht gaußverteilt sind, wohingegen die Summe der Signale, z.B. das Hintergrundrauschen, gaußverteilt sind. Es gibt zwei gängige Arten um den Grad der Nichtgaußverteilung zu bestimmen, Kurtosis und Negentropy. Wie in dieser Arbeit beschrieben, werden hierbei höhere Momente als das zweite verwendet und somit werden diese Methoden als Higher-Order Statistics (HOS) bezeichnet. Obwohl Arai et al. zeigten, dass sich Beamforming und BSS ähnlich sind, werden HOS beim akustischen Beamforming bisher nicht verwendet [2] und beruhen weiterhin auf SOS. In der hier vorliegenden Dissertation werden neue Beamformingalgorithmen entwickelt und evaluiert, die auf HOS basieren. Die neuen Beamformingmethoden passen ihre Gewichte anhand eines der folgenden Kriterien an: • Minimum Mutual Information zweier Beamformer Ausgangssignale • Maximum Negentropy der Beamformer Ausgangssignale und • Maximum Kurtosis der Beamformer Ausgangssignale. Es wird anhand von Spracherkennerexperimenten (gemessen in Wortfehlerrate) gezeigt, dass die hier entwickelten Beamformingtechniken auch erfolgreich Störquellen in verhallten Umgebungen unterdrücken, was ein klarer Vorteil gegenüber den herkömmlichen Methoden ist

    Hybrid solutions to instantaneous MIMO blind separation and decoding: narrowband, QAM and square cases

    Get PDF
    Future wireless communication systems are desired to support high data rates and high quality transmission when considering the growing multimedia applications. Increasing the channel throughput leads to the multiple input and multiple output and blind equalization techniques in recent years. Thereby blind MIMO equalization has attracted a great interest.Both system performance and computational complexities play important roles in real time communications. Reducing the computational load and providing accurate performances are the main challenges in present systems. In this thesis, a hybrid method which can provide an affordable complexity with good performance for Blind Equalization in large constellation MIMO systems is proposed first. Saving computational cost happens both in the signal sep- aration part and in signal detection part. First, based on Quadrature amplitude modulation signal characteristics, an efficient and simple nonlinear function for the Independent Compo- nent Analysis is introduced. Second, using the idea of the sphere decoding, we choose the soft information of channels in a sphere, and overcome the so- called curse of dimensionality of the Expectation Maximization (EM) algorithm and enhance the final results simultaneously. Mathematically, we demonstrate in the digital communication cases, the EM algorithm shows Newton -like convergence.Despite the widespread use of forward -error coding (FEC), most multiple input multiple output (MIMO) blind channel estimation techniques ignore its presence, and instead make the sim- plifying assumption that the transmitted symbols are uncoded. However, FEC induces code structure in the transmitted sequence that can be exploited to improve blind MIMO channel estimates. In final part of this work, we exploit the iterative channel estimation and decoding performance for blind MIMO equalization. Experiments show the improvements achievable by exploiting the existence of coding structures and that it can access the performance of a BCJR equalizer with perfect channel information in a reasonable SNR range. All results are confirmed experimentally for the example of blind equalization in block fading MIMO systems

    Advanced Sensing and Image Processing Techniques for Healthcare Applications

    Get PDF
    This Special Issue aims to attract the latest research and findings in the design, development and experimentation of healthcare-related technologies. This includes, but is not limited to, using novel sensing, imaging, data processing, machine learning, and artificially intelligent devices and algorithms to assist/monitor the elderly, patients, and the disabled population

    Speaker normalisation for large vocabulary multiparty conversational speech recognition

    Get PDF
    One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recording devices, noises etc.) and to the variability of speech across different speakers (i.e. due to different accents, coarticulation of phonemes and different vocal tract characteristics). Vocal tract length normalisation (VTLN) aims at normalising the acoustic signal, making it independent from the vocal tract length. This is done by a speaker specific warping of the frequency axis parameterised through a warping factor. In this thesis the application of VTLN to multiparty conversational speech was investigated focusing on the meeting domain. This is a challenging task showing a great variability of the speech acoustics both across different speakers and across time for a given speaker. VTL, the distance between the lips and the glottis, varies over time. We observed that the warping factors estimated using Maximum Likelihood seem to be context dependent: appearing to be influenced by the current conversational partner and being correlated with the behaviour of formant positions and the pitch. This is because VTL also influences the frequency of vibration of the vocal cords and thus the pitch. In this thesis we also investigated pitch-adaptive acoustic features with the goal of further improving the speaker normalisation provided by VTLN. We explored the use of acoustic features obtained using a pitch-adaptive analysis in combination with conventional features such as Mel frequency cepstral coefficients. These spectral representations were combined both at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA), and at the system level using ROVER. We evaluated this approach on a challenging large vocabulary speech recognition task: multiparty meeting transcription. We found that VTLN benefits the most from pitch-adaptive features. Our experiments also suggested that combining conventional and pitch-adaptive acoustic features using HLDA results in a consistent, significant decrease in the word error rate across all the tasks. Combining at the system level using ROVER resulted in a further significant improvement. Further experiments compared the use of pitch adaptive spectral representation with the adoption of a smoothed spectrogram for the extraction of cepstral coefficients. It was found that pitch adaptive spectral analysis, providing a representation which is less affected by pitch artefacts (especially for high pitched speakers), delivers features with an improved speaker independence. Furthermore this has also shown to be advantageous when HLDA is applied. The combination of a pitch adaptive spectral representation and VTLN based speaker normalisation in the context of LVCSR for multiparty conversational speech led to more speaker independent acoustic models improving the overall recognition performances

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    Beamforming for OFDM based hybrid terrestrial satellite mobile system

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Temperature aware power optimization for multicore floating-point units

    Full text link

    Cooperative Radio Communications for Green Smart Environments

    Get PDF
    The demand for mobile connectivity is continuously increasing, and by 2020 Mobile and Wireless Communications will serve not only very dense populations of mobile phones and nomadic computers, but also the expected multiplicity of devices and sensors located in machines, vehicles, health systems and city infrastructures. Future Mobile Networks are then faced with many new scenarios and use cases, which will load the networks with different data traffic patterns, in new or shared spectrum bands, creating new specific requirements. This book addresses both the techniques to model, analyse and optimise the radio links and transmission systems in such scenarios, together with the most advanced radio access, resource management and mobile networking technologies. This text summarises the work performed by more than 500 researchers from more than 120 institutions in Europe, America and Asia, from both academia and industries, within the framework of the COST IC1004 Action on "Cooperative Radio Communications for Green and Smart Environments". The book will have appeal to graduates and researchers in the Radio Communications area, and also to engineers working in the Wireless industry. Topics discussed in this book include: • Radio waves propagation phenomena in diverse urban, indoor, vehicular and body environments• Measurements, characterization, and modelling of radio channels beyond 4G networks• Key issues in Vehicle (V2X) communication• Wireless Body Area Networks, including specific Radio Channel Models for WBANs• Energy efficiency and resource management enhancements in Radio Access Networks• Definitions and models for the virtualised and cloud RAN architectures• Advances on feasible indoor localization and tracking techniques• Recent findings and innovations in antenna systems for communications• Physical Layer Network Coding for next generation wireless systems• Methods and techniques for MIMO Over the Air (OTA) testin
    • …
    corecore