Search CORE

235 research outputs found

Audio‐Visual Speaker Tracking

Author: Kılıç Volkan
Wang Wenwu
Publication venue: 'IntechOpen'
Publication date: 12/07/2017
Field of study

Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multi‐speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in real‐time tracking and localization of speakers. However, speaker tracking is a challenging task in real‐life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multi‐modal information, as it conveys complementary information about the state of the speakers compared to single‐modal tracking. To use multi‐modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a state‐of‐the‐art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field

IntechOpen

Crossref

‘Did the speaker change?’: Temporal tracking for overlapping speaker segmentation in multi-speaker scenarios

Author: Hogg Aidan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/12/2022
Field of study

Diarization systems are an essential part of many speech processing applications, such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker domains. This thesis will focus on the first task of the diarization process, that being the task of speaker segmentation which can be thought of as trying to answer the question ‘Did the speaker change?’ in an audio recording. This thesis starts by showing that time-varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. It is then highlighted that an individual’s pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently, it is shown that if the pitch is not predictable, then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This thesis then goes on to demonstrate how voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker’s utterance in the presence of an additional active speaker. This thesis then extends this work to explore the use of a new multimodal approach for overlapping speaker segmentation that tracks both the fundamental frequency (F0) and direction of arrival (DoA) of each speaker simultaneously. The proposed multiple hypothesis tracking system, which simultaneously tracks both features, shows an improvement in segmentation performance when compared to tracking these features separately. Lastly, this thesis focuses on the DoA estimation part of the newly proposed multimodal approach. It does this by exploring a polynomial extension to the multiple signal classification (MUSIC) algorithm, spatio-spectral polynomial (SSP)-MUSIC, and evaluating its performance when using speech sound sources.Open Acces

Spiral - Imperial College Digital Repository

Sparse Bases and Bayesian Inference of Electromagnetic Scattering

Author: Lee John
Publication venue: AFIT Scholar
Publication date: 01/12/2020
Field of study

Many approaches in CEM rely on the decomposition of complex radiation and scattering behavior with a set of basis vectors. Accurate estimation of the quantities of interest can be synthesized through a weighted sum of these vectors. In addition to basis decompositions, sparse signal processing techniques developed in the CS community can be leveraged when only a small subset of the basis vectors are required to sufficiently represent the quantity of interest. We investigate several concepts in which novel bases are applied to common electromagnetic problems and leverage the sparsity property to improve performance and/or reduce computational burden. The first concept explores the use of multiple types of scattering primitives to reconstruct scattering patterns of electrically large targets. Using a combination of isotropic point scatterers and wedge diffraction primitives as our bases, a 40% reduction in reconstruction error can be achieved. Next, a sparse basis is used to improve DOA estimation. We implement the BSBL technique to determine the angle of arrival of multiple incident signals with only a single snapshot of data from an arbitrary arrangement of non-isotropic antennas. This is an improvement over the current state-of-the-art, where restrictions on the antenna type, configuration, and a priori knowledge of the number of signals are often assumed. Lastly, we investigate the feasibility of a basis set to reconstruct the scattering patterns of electrically small targets. The basis is derived from the TCM and can capture non-localized scattering behavior. Preliminary results indicate that this basis may be used in an interpolation and extrapolation scheme to generate scattering patterns over multiple frequencies

AFTI Scholar (Air Force Institute of Technology)

Modelling, Simulation and Data Analysis in Acoustical Problems

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Modelling and simulation in acoustics is currently gaining importance. In fact, with the development and improvement of innovative computational techniques and with the growing need for predictive models, an impressive boost has been observed in several research and application areas, such as noise control, indoor acoustics, and industrial applications. This led us to the proposal of a special issue about “Modelling, Simulation and Data Analysis in Acoustical Problems”, as we believe in the importance of these topics in modern acoustics’ studies. In total, 81 papers were submitted and 33 of them were published, with an acceptance rate of 37.5%. According to the number of papers submitted, it can be affirmed that this is a trending topic in the scientific and academic community and this special issue will try to provide a future reference for the research that will be developed in coming years

Directory of Open Access Books (DOAB)

Acoustics of ancient Greek and Roman theaters in use today

Author: Angelakis Konstantinos
Gade Anders Christian
Publication venue
Publication date: 01/01/2006
Field of study

Crossref

Online Research Database In Technology

Proceedings of the EAA Spatial Audio Signal Processing symposium: SASP 2019

Author: Katz Brian F. G.
Noisternig Markus
Rafaely Boaz
Publication venue: HAL CCSD
Publication date: 01/09/2019
Field of study

International audienc

Applications of Antenna Technology in Sensors

Author
Publication venue: 'MDPI AG'
Publication date: 06/07/2022
Field of study

During the past few decades, information technologies have been evolving at a tremendous rate, causing profound changes to our world and to our ways of living. Emerging applications have opened u[ new routes and set new trends for antenna sensors. With the advent of the Internet of Things (IoT), the adaptation of antenna technologies for sensor and sensing applications has become more important. Now, the antennas must be reconfigurable, flexible, low profile, and low-cost, for applications from airborne and vehicles, to machine-to-machine, IoT, 5G, etc. This reprint aims to introduce and treat a series of advanced and emerging topics in the field of antenna sensors

Directory of Open Access Books (DOAB)

User-Symbiotic Speech Enhancement for Hearing Aids

Author: Hoang Poul
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2022
Field of study

VBN