44 research outputs found

    Separation of Correlated Signals Using Signal Canceler Constraints in a Hybrid CM Array Architecture

    Get PDF
    Publication in the conference proceedings of EUSIPCO, Florence, Italy, 200

    Acoustic sensor network geometry calibration and applications

    Get PDF
    In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed. Sound classification The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification. The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time. By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources. Speaker localization The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced. Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room. Geometry calibration The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system. A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds. Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room. The informed acoustic sensor network All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods

    Analysis and Design of Joint Communication and Sensing for Wireless Cellular Networks

    Get PDF
    Joint communication and sensing (JCAS) has emerged as an important piece of technology that will radically change ordinary wireless communication and radar systems. This research area, which has significantly grown over the last decade, aims to develop integrated systems that can provide both communication and sensing/radar functionalities simultaneously. The convergence of both systems into the same joint platform facilitates a more efficient use of the hardware and spectrum resources, enabling new civilian and professional applications. This thesis focuses on the integration of JCAS functionalities into mobile cellular networks, such as fifth-generation new radio (5G NR) and sixth generation (6G) communication systems, which are developing toward higher frequency ranges at millimeter-wave (mm-wave) bands, coming with wider bandwidths, and have massive antenna arrays, providing a great framework to develop sensing functionalities. By implementing JCAS, the different nodes of the cellular network, such as the base station and user equipment, can sense and reconstruct their surroundings. However, the JCAS operation yields multiple design challenges that need to be addressed. To this end, this thesis aims to develop novel algorithms in two relevant research areas that comprise self-interference (SI) cancellation and beamforming optimization techniques for JCAS systems. This work analyzes the potential sensing performance of mobile cellular networks, proposing a joint framework and identifying the main radar processing techniques to support JCAS. The fundamental SI challenge stemming from the simultaneous operation of the transmitter and receiver is investigated, and different JCAS cancellation techniques are proposed. The performance and feasibility of the proposed JCAS system is evaluated through simulation and measurement experiments at different frequency bands and scenarios, identifying mm-wave frequencies as the key enabler for future JCAS systems. Alternative antenna architectures and beamforming methods for mm-wave JCAS platforms are proposed by considering both communication and sensing requirements. Specifically, this thesis proposes novel beamforming methods that provide multiple beams, supporting efficient beamformed communications while an additional beam senses the environment simultaneously. In addition, the proposed beam-forming algorithms address the SI challenge by implementing an efficient spatial suppression scheme to suppress the direct transmitter–receiver coupling

    Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)

    Get PDF
    Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique

    NON-CONTACT TECHNIQUES FOR HUMAN VITAL SIGN DETECTION AND GAIT ANALYSIS

    Get PDF
    Human vital signs including respiratory rate, heart rate, oxygen saturation, blood pressure, and body temperature are important physiological parameters that are used to track and monitor human health condition. Another important biological parameter of human health is human gait. Human vital sign detection and gait investigations have been attracted many scientists and practitioners in various fields such as sport medicine, geriatric medicine, bio-mechanic and bio-medical engineering and has many biological and medical applications such as diagnosis of health issues and abnormalities, elderly care and health monitoring, athlete performance analysis, and treatment of joint problems. Thoroughly tracking and understanding the normal motion of human limb joints can help to accurately monitor human subjects or patients over time to provide early flags of possible complications in order to aid in a proper diagnosis and development of future comprehensive treatment plans. With the spread of COVID-19 around the world, it has been getting more important than ever to employ technology that enables us to detect human vital signs in a non-contact way and helps protect both patients and healthcare providers from potentially life-threatening viruses, and have the potential to also provide a convenient way to monitor people health condition, remotely. A popular technique to extract biological parameters from a distance is to use cameras. Radar systems are another attractive solution for non-contact human vital signs monitoring and gait investigation that track and monitor these biological parameters without invading people privacy. The goal of this research is to develop non-contact methods that is capable of extracting human vital sign parameters and gait features accurately. To do that, in this work, optical systems including cameras and proper filters have been developed to extract human respiratory rate, heart rate, and oxygen saturation. Feasibility of blood pressure extraction using the developed optical technique has been investigated, too. Moreover, a wideband and low-cost radar system has been implemented to detect single or multiple human subject’s respiration and heart rate in dark or from behind the wall. The performance of the implemented radar system has been enhanced and it has been utilized for non-contact human gait analysis. Along with the hardware, advanced signal processing schemes have been enhanced and applied to the data collected using the aforementioned radar system. The data processing algorithms have been extended for multi-subject scenarios with high accuracy for both human vital sign detection and gait analysis. In addition, different configurations of this and high-performance radar system including mono-static and MIMO have been designed and implemented with great success. Many sets of exhaustive experiments have been conducted using different human subjects and various situations and accurate reference sensors have been used to validate the performance of the developed systems and algorithms

    Adaptive beamforming and switching in smart antenna systems

    Get PDF
    The ever increasing requirement for providing large bandwidth and seamless data access to commuters has prompted new challenges to wireless solution providers. The communication channel characteristics between mobile clients and base station change rapidly with the increasing traveling speed of vehicles. Smart antenna systems with adaptive beamforming and switching technology is the key component to tackle the challenges. As a spatial filter, beamformer has long been widely used in wireless communication, radar, acoustics, medical imaging systems to enhance the received signal from a particular looking direction while suppressing noise and interference from other directions. The adaptive beamforming algorithm provides the capability to track the varying nature of the communication channel characteristics. However, the conventional adaptive beamformer assumes that the Direction of Arrival (DOA) of the signal of interest changes slowly, although the interference direction could be changed dynamically. The proliferation of High Speed Rail (HSR) and seamless wireless communication between infrastructure ( roadside, trackside equipment) and the vehicles (train, car, boat etc.) brings a unique challenge for adaptive beamforming due to its rapid change of DOA. For a HSR train with 250km/h, the DOA change speed can be up to 4⁰ per millisecond. To address these unique challenges, faster algorithms to calculate the beamforming weight based on the rapid-changing DOA are needed. In this dissertation, two strategies are adopted to address the challenges. The first one is to improve the weight calculation speed. The second strategy is to improve the speed of DOA estimation for the impinging signal by leveraging on the predefined constrained route for the transportation market. Based on these concepts, various algorithms in beampattern generation and adaptive weight control are evaluated and investigated in this thesis. The well known Generalized Sidelobe Cancellation (GSC) architecture is adopted in this dissertation. But it faces serious signal cancellation problem when the estimated DOA deviates from the actual DOA which is severe in high mobility scenarios as in the transportation market. Algorithms to improve various parts of the GSC are proposed in this dissertation. Firstly, a Cyclic Variable Step Size (CVSS) algorithm for adjusting the Least Mean Square (LMS) step size with simplicity for implementation is proposed and evaluated. Secondly, a Kalman filter based solution to fuse different sensor information for a faster estimation and tracking of the DOA is investigated and proposed. Thirdly, to address the DOA mismatch issue caused by the rapid DOA change, a fast blocking matrix generation algorithm named Simplifized Zero Placement Algorithm (SZPA) is proposed to mitigate the signal cancellation in GSC. Fourthly, to make the beam pattern robust against DOA mismatch, a fast algorithm for the generation of at beam pattern named Zero Placement Flat Top (ZPFT) for the fixed beamforming path in GSC is proposed. Finally, to evaluate the effectiveness and performance of the beamforming algorithms, wireless channel simulation is needed. One of the challenging aspects for wireless simulation is the coupling between Probability Density Function (PDF) and Power Spectral Density (PSD) for a random variable. In this regard, a simplified solution to simulate Non Gaussian wireless channel is proposed, proved and evaluated for the effectiveness of the algorithm. With the above optimizations, the controlled simulation shows that the at top beampattern can be generated 380 times faster than iterative optimization method and blocking matrix can be generated 9 times faster than normal SVD method while the same overall optimum state performance can be achieved
    corecore