571 research outputs found

    Multiple source localization using spherical microphone arrays

    Get PDF
    Direction-of-Arrival (DOA) estimation is a fundamental task in acoustic signal processing and is used in source separation, localization, tracking, environment mapping, speech enhancement and dereverberation. In applications such as hearing aids, robot audition, teleconferencing and meeting diarization, the presence of multiple simultaneously active sources often occurs. Therefore DOA estimation which is robust to Multi-Source (MS) scenarios is of particular importance. In the past decade, interest in Spherical Microphone Arrays (SMAs) has been rapidly grown due to its ability to analyse the sound field with equal resolution in all directions. Such symmetry makes SMAs suitable for applications in robot audition where potential variety of heights and positions of the talkers are expected. Acoustic signal processing for SMAs is often formulated in the Spherical Harmonic Domain (SHD) which describes the sound field in a form that is independent of the geometry of the SMA. DOA estimation methods for the real-world scenarios address one or more performance degrading factors such as noise, reverberation, multi-source activity or tackled problems such as source counting or reducing computational complexity. This thesis addresses various problems in MS DOA estimation for speech sources each of which focuses on one or more performance degrading factor(s). Firstly a narrowband DOA estimator is proposed utilizing high order spatial information in two computationally efficient ways. Secondly, an autonomous source counting technique is proposed which uses density-based clustering in an evolutionary framework. Thirdly, a confidence metric for validity of Single Source (SS) assumption in a Time-Frequency (TF) bin is proposed. It is based on MS assumption in a short time interval where the number and the TF bin of active sources are adaptively estimated. Finally two analytical narrowband MS DOA estimators are proposed based on MS assumption in a TF bin. The proposed methods are evaluated using simulations and real recordings. Each proposed technique outperforms comparative baseline methods and performs at least as accurately as the state-of-the-art.Open Acces

    Distributed adaptive signal processing for frequency estimation

    Get PDF
    It is widely recognised that future smart grids will heavily rely upon intelligent communication and signal processing as enabling technologies for their operation. Traditional tools for power system analysis, which have been built from a circuit theory perspective, are a good match for balanced system conditions. However, the unprecedented changes that are imposed by smart grid requirements, are pushing the limits of these old paradigms. To this end, we provide new signal processing perspectives to address some fundamental operations in power systems such as frequency estimation, regulation and fault detection. Firstly, motivated by our finding that any excursion from nominal power system conditions results in a degree of non-circularity in the measured variables, we cast the frequency estimation problem into a distributed estimation framework for noncircular complex random variables. Next, we derive the required next generation widely linear, frequency estimators which incorporate the so-called augmented data statistics and cater for the noncircularity and a widely linear nature of system functions. Uniquely, we also show that by virtue of augmented complex statistics, it is possible to treat frequency tracking and fault detection in a unified way. To address the ever shortening time-scales in future frequency regulation tasks, the developed distributed widely linear frequency estimators are equipped with the ability to compensate for the fewer available temporal voltage data by exploiting spatial diversity in wide area measurements. This contribution is further supported by new physically meaningful theoretical results on the statistical behavior of distributed adaptive filters. Our approach avoids the current restrictive assumptions routinely employed to simplify the analysis by making use of the collaborative learning strategies of distributed agents. The efficacy of the proposed distributed frequency estimators over standard strictly linear and stand-alone algorithms is illustrated in case studies over synthetic and real-world three-phase measurements. An overarching theme in this thesis is the elucidation of underlying commonalities between different methodologies employed in classical power engineering and signal processing. By revisiting fundamental power system ideas within the framework of augmented complex statistics, we provide a physically meaningful signal processing perspective of three-phase transforms and reveal their intimate connections with spatial discrete Fourier transform (DFT), optimal dimensionality reduction and frequency demodulation techniques. Moreover, under the widely linear framework, we also show that the two most widely used frequency estimators in the power grid are in fact special cases of frequency demodulation techniques. Finally, revisiting classic estimation problems in power engineering through the lens of non-circular complex estimation has made it possible to develop a new self-stabilising adaptive three-phase transformation which enables algorithms designed for balanced operating conditions to be straightforwardly implemented in a variety of real-world unbalanced operating conditions. This thesis therefore aims to help bridge the gap between signal processing and power communities by providing power system designers with advanced estimation algorithms and modern physically meaningful interpretations of key power engineering paradigms in order to match the dynamic and decentralised nature of the smart grid.Open Acces

    Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)

    Full text link
    The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image processing, psychology and more. So, it is a great opportunity to write a book which covers all of these topics for beginner to professional readers in the field of AI and even without having background of AI. Our goal is to provide a standalone introduction in the field of MFER analysis in the form of theorical descriptions for readers with no background in image processing with reproducible Matlab practical examples. Also, we describe any basic definitions for FMER analysis and MATLAB library which is used in the text, that helps final reader to apply the experiments in the real-world applications. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. We expect that, after reading this book, the reader feels comfortable with different key stages such as color and depth image processing, color and depth image representation, classification, machine learning, facial micro-expressions recognition, feature extraction and dimensionality reduction. The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment.Comment: This is the second edition of the boo

    The anthropometric, environmental and genetic determinants of right ventricular structure and function

    Get PDF
    BACKGROUND Measures of right ventricular (RV) structure and function have significant prognostic value. The right ventricle is currently assessed by global measures, or point surrogates, which are insensitive to regional and directional changes. We aim to create a high-resolution three-dimensional RV model to improve understanding of its structural and functional determinants. These may be particularly of interest in pulmonary hypertension (PH), a condition in which RV function and outcome are strongly linked. PURPOSE To investigate the feasibility and additional benefit of applying three-dimensional phenotyping and contemporary statistical and genetic approaches to large patient populations. METHODS Healthy subjects and incident PH patients were prospectively recruited. Using a semi-automated atlas-based segmentation algorithm, 3D models characterising RV wall position and displacement were developed, validated and compared with anthropometric, physiological and genetic influences. Statistical techniques were adapted from other high-dimensional approaches to deal with the problems of multiple testing, contiguity, sparsity and computational burden. RESULTS 1527 healthy subjects successfully completed high-resolution 3D CMR and automated segmentation. Of these, 927 subjects underwent next-generation sequencing of the sarcomeric gene titin and 947 subjects completed genotyping of common variants for genome-wide association study. 405 incident PH patients were recruited, of whom 256 completed phenotyping. 3D modelling demonstrated significant reductions in sample size compared to two-dimensional approaches. 3D analysis demonstrated that RV basal-freewall function reflects global functional changes most accurately and that a similar region in PH patients provides stronger survival prediction than all anthropometric, haemodynamic and functional markers. Vascular stiffness, titin truncating variants and common variants may also contribute to changes in RV structure and function. CONCLUSIONS High-resolution phenotyping coupled with computational analysis methods can improve insights into the determinants of RV structure and function in both healthy subjects and PH patients. Large, population-based approaches offer physiological insights relevant to clinical care in selected patient groups.Open Acces

    Advances in SCA and RF-DNA Fingerprinting Through Enhanced Linear Regression Attacks and Application of Random Forest Classifiers

    Get PDF
    Radio Frequency (RF) emissions from electronic devices expose security vulnerabilities that can be used by an attacker to extract otherwise unobtainable information. Two realms of study were investigated here, including the exploitation of 1) unintentional RF emissions in the field of Side Channel Analysis (SCA), and 2) intentional RF emissions from physical devices in the field of RF-Distinct Native Attribute (RF-DNA) fingerprinting. Statistical analysis on the linear model fit to measured SCA data in Linear Regression Attacks (LRA) improved performance, achieving 98% success rate for AES key-byte identification from unintentional emissions. However, the presence of non-Gaussian noise required the use of a non-parametric classifier to further improve key guessing attacks. RndF based profiling attacks were successful in very high dimensional data sets, correctly guessing all 16 bytes of the AES key with a 50,000 variable dataset. With variable reduction, Random Forest still outperformed Template Attack for this data set, requiring fewer traces and achieving higher success rates with lower misclassification rate. Finally, the use of a RndF classifier is examined for intentional RF emissions from ZigBee devices to enhance security using RF-DNA fingerprinting. RndF outperformed parametric MDA/ML and non-parametric GRLVQI classifiers, providing up to GS =18.0 dB improvement (reduction in required SNR). Network penetration, measured using rogue ZigBee devices, show that the RndF method improved rogue rejection in noisier environments - gains of up to GS =18.0 dB are realized over previous methods

    D4.2 Intelligent D-Band wireless systems and networks initial designs

    Get PDF
    This deliverable gives the results of the ARIADNE project's Task 4.2: Machine Learning based network intelligence. It presents the work conducted on various aspects of network management to deliver system level, qualitative solutions that leverage diverse machine learning techniques. The different chapters present system level, simulation and algorithmic models based on multi-agent reinforcement learning, deep reinforcement learning, learning automata for complex event forecasting, system level model for proactive handovers and resource allocation, model-driven deep learning-based channel estimation and feedbacks as well as strategies for deployment of machine learning based solutions. In short, the D4.2 provides results on promising AI and ML based methods along with their limitations and potentials that have been investigated in the ARIADNE project

    Studies on noise robust automatic speech recognition

    Get PDF
    Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK

    An investigation of the utility of monaural sound source separation via nonnegative matrix factorization applied to acoustic echo and reverberation mitigation for hands-free telephony

    Get PDF
    In this thesis we investigate the applicability and utility of Monaural Sound Source Separation (MSSS) via Nonnegative Matrix Factorization (NMF) for various problems related to audio for hands-free telephony. We first investigate MSSS via NMF as an alternative acoustic echo reduction approach to existing approaches such as Acoustic Echo Cancellation (AEC). To this end, we present the single-channel acoustic echo problem as an MSSS problem, in which the objective is to extract the users signal from a mixture also containing acoustic echo and noise. To perform separation, NMF is used to decompose the near-end microphone signal onto the union of two nonnegative bases in the magnitude Short Time Fourier Transform domain. One of these bases is for the spectral energy of the acoustic echo signal, and is formed from the in- coming far-end user’s speech, while the other basis is for the spectral energy of the near-end speaker, and is trained with speech data a priori. In comparison to AEC, the speaker extraction approach obviates Double-Talk Detection (DTD), and is demonstrated to attain its maximal echo mitigation performance immediately upon initiation and to maintain that performance during and after room changes for similar computational requirements. Speaker extraction is also shown to introduce distortion of the near-end speech signal during double-talk, which is quantified by means of a speech distortion measure and compared to that of AEC. Subsequently, we address Double-Talk Detection (DTD) for block-based AEC algorithms. We propose a novel block-based DTD algorithm that uses the available signals and the estimate of the echo signal that is produced by NMF-based speaker extraction to compute a suitably normalized correlation-based decision variable, which is compared to a fixed threshold to decide on doubletalk. Using a standard evaluation technique, the proposed algorithm is shown to have comparable detection performance to an existing conventional block-based DTD algorithm. It is also demonstrated to inherit the room change insensitivity of speaker extraction, with the proposed DTD algorithm generating minimal false doubletalk indications upon initiation and in response to room changes in comparison to the existing conventional DTD. We also show that this property allows its paired AEC to converge at a rate close to the optimum. Another focus of this thesis is the problem of inverting a single measurement of a non- minimum phase Room Impulse Response (RIR). We describe the process by which percep- tually detrimental all-pass phase distortion arises in reverberant speech filtered by the inverse of the minimum phase component of the RIR; in short, such distortion arises from inverting the magnitude response of the high-Q maximum phase zeros of the RIR. We then propose two novel partial inversion schemes that precisely mitigate this distortion. One of these schemes employs NMF-based MSSS to separate the all-pass phase distortion from the target speech in the magnitude STFT domain, while the other approach modifies the inverse minimum phase filter such that the magnitude response of the maximum phase zeros of the RIR is not fully compensated. Subjective listening tests reveal that the proposed schemes generally produce better quality output speech than a comparable inversion technique
    • …
    corecore