Search CORE

6,611 research outputs found

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Golbabaee Mohammad
Publication venue
Publication date: 01/01/2012
Field of study

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

arXiv.org e-Print Archive

Edinburgh Research Explorer

Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

Author: Absil P. -A.
Anthoine S.
Bertin N.
Bilen C.
Boumal N.
Boursier Y.
Bundervoet S.
Cambareri V.
Chabiron O.
Chainais P.
Cornelis B.
Dankova M.
Daubechies I.
Daudet L.
Davies M.
De Mol C.
De Vleeschouwer C.
Degraux K.
Determe J. -F.
Dobigeon N.
Dooms A.
Drémeau A.
Dunson D.
Duval V.
Fadili J.
Fawzi A.
Frossard P.
Geelen B.
Gigan S.
Gillis N.
Golbabaee M.
Gribonval R.
Heas P.
Herzet C.
Horlin F.
Jacques L.
Kitic S.
Lafruit G.
Liang J.
Liutkus A.
Loris I.
Louveaux J.
Maggioni M.
Magoarou L. Le
Malgouyres F.
Martina D.
Minsker S.
Mishra B.
Mory C.
Ngole F.
Peyré G.
Pizurica A.
Rajmic P.
Richard C.
Schelkens P.
Schretter C.
Sepulchre R.
Setti G.
Soussen C.
Starck J. -L.
Strawn N.
Sudhakar P.
Tourneret J. -Y.
Vaiter S.
Vandergheynst P.
Vavasis S. A.
Vukobratovic D.
Publication venue
Publication date: 01/10/2014
Field of study

The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

arXiv.org e-Print Archive

Edinburgh Research Explorer

Volumetric diffusers : pseudorandom cylinder arrays on a periodic lattice

Author: Angus JAS
Cox TJ
Gehring GA
Hughes RJ
Pogson M
Umnova O
Whittaker DM
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/11/2010
Field of study

Most conventional diffusers take the form of a surface based treatment, and as a result can only operate in hemispherical space. Placing a diffuser in the volume of a room might provide greater efficiency by allowing scattering into the whole space. A periodic cylinder array (or sonic crystal) produces periodicity lobes and uneven scattering. Introducing defects into an array, by removing or varying the size of some of the cylinders, can enhance their diffusing abilities. This paper applies number theoretic concepts to create cylinder arrays that have more even scattering. Predictions using a Boundary Element Method are compared to measurements to verify the model, and suitable metrics are adopted to evaluate performance. Arrangements with good aperiodic autocorrelation properties tend to produce the best results. At low frequency power is controlled by object size and at high frequency diffusion is dominated by lattice spacing and structural similarity. Consequently the operational bandwidth is rather small. By using sparse arrays and varying cylinder sizes, a wider bandwidth can be achieved

University of Salford Institutional Repository

Crossref

Harmonic Change Detection from Musical Audio

Author: Bernardes de Almeida Gilberto
Ramoneda Franco Pedro
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/1997
Field of study

In this dissertation, we advance an enhanced method for computing Harte et al.’s [31] Harmonic Change Detection Function (HCDF). HCDF aims to detect harmonic transitions in musical audio signals. HCDF is crucial both for the chord recognition in Music Information Retrieval (MIR) and a wide range of creative applications. In light of recent advances in harmonic description and transformation, we depart from the original architecture of Harte et al.’s HCDF, to revisit each one of its component blocks, which are evaluated using an exhaustive grid search aimed to identify optimal parameters across four large style-specific musical datasets. Our results show that the newly proposed methods and parameter optimization improve the detection of harmonic changes, by 5.57% (f-score) with respect to previous methods. Furthermore, while guaranteeing recall values at > 99%, our method improves precision by 6.28%. Aiming to leverage novel strategies for real-time harmonic-content audio processing, the optimized HCDF is made available for Javascript and the MAX and Pure Data multimedia programming environments. Moreover, all the data as well as the Python code used to generate them, are made available.<br /

Repositorio Universidad de Zaragoza

A novel Big Data analytics and intelligent technique to predict driver's intent

Author: Abtahi
Adam Grzywaczewski
Agrawal
Al-Sultan
Asimov
Bernardo
Bezdek
Bhavsar
Bostrom
Chang
Chen
Dawson
De Domenico
Diaz-Cabrera
Doctor
Doctor
Dreier
Faiyaz Doctor
Filev
Froehlich
Gerhardt
Grudin
Grzywaczewski
Hashem
Hawkins
Hawkins
Haykin
Hirsch
Huang
Huang
Iqbal
Jaguar Land Rover Limited
Jain
James
Kaisler
Kapicioglu
Karyotis
Karyotis
Kotsiantis
Kumar
Kumar
Kurihata
Lech Birek
Liao
Liu
Luukka
Mahmud
Maniak
Maniak
McFarland
McInerney
Mitchell
Nasoz
Noulas
Palen
Pang
Parpinelli
Poli
Quercia
Rahat Iqbal
Rainville
Reininger
Richards
Rish
Sagiroglu
Simmons
Sun
Suthaharan
Tan
Tran
Utgoff
Victor Chang
Wang
Warren
Wells-Parker
Whitley
Zadeh
Publication venue: 'Elsevier BV'
Publication date: 06/04/2018
Field of study

Modern age offers a great potential for automatically predicting the driver's intent through the increasing miniaturization of computing technologies, rapid advancements in communication technologies and continuous connectivity of heterogeneous smart objects. Inside the cabin and engine of modern cars, dedicated computer systems need to possess the ability to exploit the wealth of information generated by heterogeneous data sources with different contextual and conceptual representations. Processing and utilizing this diverse and voluminous data, involves many challenges concerning the design of the computational technique used to perform this task. In this paper, we investigate the various data sources available in the car and the surrounding environment, which can be utilized as inputs in order to predict driver's intent and behavior. As part of investigating these potential data sources, we conducted experiments on e-calendars for a large number of employees, and have reviewed a number of available geo referencing systems. Through the results of a statistical analysis and by computing location recognition accuracy results, we explored in detail the potential utilization of calendar location data to detect the driver's intentions. In order to exploit the numerous diverse data inputs available in modern vehicles, we investigate the suitability of different Computational Intelligence (CI) techniques, and propose a novel fuzzy computational modelling methodology. Finally, we outline the impact of applying advanced CI and Big Data analytics techniques in modern vehicles on the driver and society in general, and discuss ethical and legal issues arising from the deployment of intelligent self-learning cars

University of Essex Research Repository

Crossref

Teeside University's Research Repository

Coventry University Pure Portal

Acoustic sensor network geometry calibration and applications

Author: Plinge Axel
Publication venue
Publication date: 01/01/2017
Field of study

In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed. Sound classification The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification. The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time. By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources. Speaker localization The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced. Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room. Geometry calibration The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system. A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds. Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room. The informed acoustic sensor network All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Virtual Reality Games for Motor Rehabilitation

Author: Charles D.
Ma Minhua
McDonough S.
McNeill M.
Publication venue: University of Wolverhampton
Publication date: 01/01/2006
Field of study

This paper presents a fuzzy logic based method to track user satisfaction without the need for devices to monitor users physiological conditions. User satisfaction is the key to any product’s acceptance; computer applications and video games provide a unique opportunity to provide a tailored environment for each user to better suit their needs. We have implemented a non-adaptive fuzzy logic model of emotion, based on the emotional component of the Fuzzy Logic Adaptive Model of Emotion (FLAME) proposed by El-Nasr, to estimate player emotion in UnrealTournament 2004. In this paper we describe the implementation of this system and present the results of one of several play tests. Our research contradicts the current literature that suggests physiological measurements are needed. We show that it is possible to use a software only method to estimate user emotion

STORE - Staffordshire Online Repository

University of Huddersfield Repository