289 research outputs found
On the effect of SNR and superdirective beamforming in speaker diarisation in meetings
This paper examines the effect of sensor performance on speaker diarisation in meetings and investigates the use of more advanced beamforming techniques, beyond the typically employed delay-sum beamformer, for mitigating the effects of poorer sensor performance. We present superdirective beamforming and investigate how different time difference of arrival (TDOA) smoothing and beamforming techniques influence the performance of state-of-the-art diarisation systems. We produced and transcribed a new corpus of meetings recorded in the instrumented meeting room using a high SNR analogue and a newly developed low SNR digital MEMS microphone array (DMMA.2). This research demonstrates that TDOA smoothing has a significant effect on the diarisation error rate and that simple noise reduction and beamforming schemes suffice to overcome audio signal degradation due to the lower SNR of modern MEMS microphones. Index Terms â Speaker diarisation in meetings, digital MEMS microphone array, time difference of arrival (TDOA), superdirective beamforming 1
Design exploration and performance strategies towards power-efficient FPGA-based achitectures for sound source localization
Many applications rely on MEMS microphone arrays for locating sound sources prior to their execution. Those applications not only are executed under real-time constraints but also are often embedded on low-power devices. These environments become challenging when increasing the number of microphones or requiring dynamic responses. Field-Programmable Gate Arrays (FPGAs) are usually chosen due to their flexibility and computational power. This work intends to guide the design of reconfigurable acoustic beamforming architectures, which are not only able to accurately determine the sound Direction-Of-Arrival (DoA) but also capable to satisfy the most demanding applications in terms of power efficiency. Design considerations of the required operations performing the sound location are discussed and analysed in order to facilitate the elaboration of reconfigurable acoustic beamforming architectures. Performance strategies are proposed and evaluated based on the characteristics of the presented architecture. This power-efficient architecture is compared to a different architecture prioritizing performance in order to reveal the unavoidable design trade-offs
A Digital Microphone Array for Distant Speech Recognition
In this paper, the design, implementation and testing of a digital microphone array is presented. The array uses digital MEMS microphones which integrate the microphone, amplifier and analogue to digital converter on a single chip in place of the analogue microphones and external audio interfaces currently used. The device has the potential to be smaller, cheaper and more flexible than typical analogue arrays, however the effect on speech recognition performance of using digital microphones is as yet unknown. In order to evaluate the effect, an analogue array and the new digital array are used to simultaneously record test data for a speech recognition experiment. Initial results employing no adaptation show that performance using the digital array is significantly worse (14\% absolute WER) than the analogue device. Subsequent experiments using MLLR and CMLLR channel adaptation reduce this gap, and employing MLLR for both channel and speaker adaptation reduces the difference between the arrays to 4.5\% absolute WER
FPGA-based architectures for acoustic beamforming with microphone arrays : trends, challenges and research opportunities
Over the past decades, many systems composed of arrays of microphones have been developed to satisfy the quality demanded by acoustic applications. Such microphone arrays are sound acquisition systems composed of multiple microphones used to sample the sound field with spatial diversity. The relatively recent adoption of Field-Programmable Gate Arrays (FPGAs) to manage the audio data samples and to perform the signal processing operations such as filtering or beamforming has lead to customizable architectures able to satisfy the most demanding computational, power or performance acoustic applications. The presented work provides an overview of the current FPGA-based architectures and how FPGAs are exploited for different acoustic applications. Current trends on the use of this technology, pending challenges and open research opportunities on the use of FPGAs for acoustic applications using microphone arrays are presented and discussed
CABE : a cloud-based acoustic beamforming emulator for FPGA-based sound source localization
Microphone arrays are gaining in popularity thanks to the availability of low-cost microphones. Applications including sonar, binaural hearing aid devices, acoustic indoor localization techniques and speech recognition are proposed by several research groups and companies. In most of the available implementations, the microphones utilized are assumed to offer an ideal response in a given frequency domain. Several toolboxes and software can be used to obtain a theoretical response of a microphone array with a given beamforming algorithm. However, a tool facilitating the design of a microphone array taking into account the non-ideal characteristics could not be found. Moreover, generating packages facilitating the implementation on Field Programmable Gate Arrays has, to our knowledge, not been carried out yet. Visualizing the responses in 2D and 3D also poses an engineering challenge. To alleviate these shortcomings, a scalable Cloud-based Acoustic Beamforming Emulator (CABE) is proposed. The non-ideal characteristics of microphones are considered during the computations and results are validated with acoustic data captured from microphones. It is also possible to generate hardware description language packages containing delay tables facilitating the implementation of Delay-and-Sum beamformers in embedded hardware. Truncation error analysis can also be carried out for fixed-point signal processing. The effects of disabling a given group of microphones within the microphone array can also be calculated. Results and packages can be visualized with a dedicated client application. Users can create and configure several parameters of an emulation, including sound source placement, the shape of the microphone array and the required signal processing flow. Depending on the user configuration, 2D and 3D graphs showing the beamforming results, waterfall diagrams and performance metrics can be generated by the client application. The emulations are also validated with captured data from existing microphone arrays.</jats:p
The Sheffield Wargames Corpus.
Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between sev-eral speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high qual-ity near and far field data from real interactions between par-ticipants. In this paper we present a new corpus for research on speech recognition, speaker tracking and diarisation, based on recordings of native speakers of English playing a table-top wargame. The Sheffield Wargames Corpus comprises 7 hours of data from 10 recording sessions, obtained from 96 micro-phones, 3 video cameras and, most importantly, 3D location data provided by a sensor tracking system. The corpus repre-sents a unique resource, that provides for the first time location tracks (1.3Hz) of speakers that are constantly moving and talk-ing. The corpus is available for research purposes, and includes annotated development and evaluation test sets. Baseline results for close-talking and far field sets are included in this paper. 1
Speech processing using digital MEMS microphones
The last few years have seen the start of a unique change in microphones for consumer
devices such as smartphones or tablets. Almost all analogue capacitive microphones
are being replaced by digital silicon microphones or MEMS microphones.
MEMS microphones perform differently to conventional analogue microphones. Their
greatest disadvantage is significantly increased self-noise or decreased SNR, while
their most significant benefits are ease of design and manufacturing and improved sensitivity
matching.
This thesis presents research on speech processing, comparing conventional analogue
microphones with the newly available digital MEMS microphones. Specifically, voice
activity detection, speaker diarisation (who spoke when), speech separation and speech
recognition are looked at in detail.
In order to carry out this research different microphone arrays were built using digital
MEMS microphones and corpora were recorded to test existing algorithms and devise
new ones. Some corpora that were created for the purpose of this research will be
released to the public in 2013.
It was found that the most commonly used VAD algorithm in current state-of-theart
diarisation systems is not the best-performing one, i.e. MLP-based voice activity
detection consistently outperforms the more frequently used GMM-HMM-based VAD
schemes. In addition, an algorithm was derived that can determine the number of active
speakers in a meeting recording given audio data from a microphone array of known
geometry, leading to improved diarisation results.
Finally, speech separation experiments were carried out using different post-filtering
algorithms, matching or exceeding current state-of-the art results.
The performance of the algorithms and methods presented in this thesis was verified
by comparing their output using speech recognition tools and simple MLLR adaptation
and the results are presented as word error rates, an easily comprehensible scale.
To summarise, using speech recognition and speech separation experiments, this thesis
demonstrates that the significantly reduced SNR of the MEMS microphone can be
compensated for with well established adaptation techniques such as MLLR. MEMS
microphones do not affect voice activity detection and speaker diarisation performance
Digital Microphone Array - Design, Implementation and Speech Recognition Experiments
The instrumented meeting room of the future will help meetings to be more efficient and
productive. One of the basic components of the instrumented meeting room is the speech
recording device, in most cases a microphone array. The two basic requirements for this
microphone array are portability and cost-efficiency, neither of which are provided by current commercially available arrays. This will change in the near future thanks to the availability of new digital MEMS microphones. This dissertation reports on the first successful implementation of a digital MEMS microphone array. This digital MEMS microphone array was designed, implemented, tested and evaluated and successfully compared with an existing
analogue microphone array using a state-of-the-art ASR system and adaptation algorithms. The newly built digital MEMS microphone array compares well with the analogue microphone array on the basis of the word error rate achieved in an automated speech recognition system and is highly portable and economical
Using a Planar Array of MEMS Microphones to Obtain Acoustic Images of a Fan Matrix
This paper proposes the use of a signal acquisition and processing system based on an 8Ă8 planar array of MEMS (Microelectromechanical Systems) microphones to obtain acoustic images of a fan matrix. A 3Ă3 matrix of PC fans has been implemented to perform the study. Some tests to obtain the acoustic images of the individual fans and of the whole matrix have been defined and have been carried out inside an anechoic chamber. The nonstationary signals received by each MEMS microphone and their corresponding spectra have been analyzed, as well as the corresponding acoustic images. The analysis of the acoustic signals spectra reveals the resonance frequency of the individual fans. The obtained results reveal the feasibility of the proposed system to obtained acoustic images of a fan matrix and of its individual fans, in this last case, in order to estimate the real position of the fan inside the matrix
Design Considerations When Accelerating an FPGA-Based Digital Microphone Array for Sound-Source Localization
The use of microphone arrays for sound-source localization is a well-researched topic. The response of such sensor arrays is dependent on the quantity of microphones operating on the array. A higher number of microphones, however, increase the computational demand, making real-time response challenging. In this paper, we present a Filter-and-Sum based architecture and several acceleration techniques to provide accurate sound-source localization in real-time. Experiments demonstrate how an accurate sound-source localization is obtained in a couple of milliseconds, independently of the number of microphones. Finally, we also propose different strategies to further accelerate the sound-source localization while offering increased angular resolution
- âŠ