90 research outputs found

    On the effect of SNR and superdirective beamforming in speaker diarisation in meetings

    Get PDF
    This paper examines the effect of sensor performance on speaker diarisation in meetings and investigates the use of more advanced beamforming techniques, beyond the typically employed delay-sum beamformer, for mitigating the effects of poorer sensor performance. We present superdirective beamforming and investigate how different time difference of arrival (TDOA) smoothing and beamforming techniques influence the performance of state-of-the-art diarisation systems. We produced and transcribed a new corpus of meetings recorded in the instrumented meeting room using a high SNR analogue and a newly developed low SNR digital MEMS microphone array (DMMA.2). This research demonstrates that TDOA smoothing has a significant effect on the diarisation error rate and that simple noise reduction and beamforming schemes suffice to overcome audio signal degradation due to the lower SNR of modern MEMS microphones. Index Terms — Speaker diarisation in meetings, digital MEMS microphone array, time difference of arrival (TDOA), superdirective beamforming 1

    The Sheffield Wargames Corpus.

    Get PDF
    Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between sev-eral speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high qual-ity near and far field data from real interactions between par-ticipants. In this paper we present a new corpus for research on speech recognition, speaker tracking and diarisation, based on recordings of native speakers of English playing a table-top wargame. The Sheffield Wargames Corpus comprises 7 hours of data from 10 recording sessions, obtained from 96 micro-phones, 3 video cameras and, most importantly, 3D location data provided by a sensor tracking system. The corpus repre-sents a unique resource, that provides for the first time location tracks (1.3Hz) of speakers that are constantly moving and talk-ing. The corpus is available for research purposes, and includes annotated development and evaluation test sets. Baseline results for close-talking and far field sets are included in this paper. 1

    A Digital Microphone Array for Distant Speech Recognition

    Get PDF
    In this paper, the design, implementation and testing of a digital microphone array is presented. The array uses digital MEMS microphones which integrate the microphone, amplifier and analogue to digital converter on a single chip in place of the analogue microphones and external audio interfaces currently used. The device has the potential to be smaller, cheaper and more flexible than typical analogue arrays, however the effect on speech recognition performance of using digital microphones is as yet unknown. In order to evaluate the effect, an analogue array and the new digital array are used to simultaneously record test data for a speech recognition experiment. Initial results employing no adaptation show that performance using the digital array is significantly worse (14\% absolute WER) than the analogue device. Subsequent experiments using MLLR and CMLLR channel adaptation reduce this gap, and employing MLLR for both channel and speaker adaptation reduces the difference between the arrays to 4.5\% absolute WER

    Molecules Are Not Enough! Overcoming Students’ Overgeneralization Tendencies by Comparing and Contrasting

    Get PDF
    Many students assume a molecular structure for all substances, even after being instructed on the topic. But why do students struggle to understand key concepts like chemical bonding? One of the reasons is students’ tendency to overgeneralize: Students wrongfully transfer characteristics from familiar (e.g., molecular substances) to lesser-known concepts (e.g., ionic compounds). In this article, possible reasons behind this commonly observed tendency are discussed and a possible didactical solution is proposed. Comparing and contrasting approaches increased students’ ability to distinguish between similar concepts in mathematics.[1] The method of comparing and contrasting is therefore applied by simultaneously introducing the three types of chemical bonding to effectively tackle students’ overgeneralization tendencies

    The role of vegetative cell fusions in the development and asexual reproduction of the wheat fungal pathogen Zymoseptoria tritici

    Get PDF
    Background The ability of fungal cells to undergo cell-to-cell communication and anastomosis, the process of vegetative hyphal fusion, allows them to maximize their overall fitness. Previous studies in a number of fungal species have identified the requirement of several signaling pathways for anastomosis, including the so far best characterized soft (So) gene, and the MAPK pathway components MAK-1 and MAK-2 of Neurospora crassa. Despite the observations of hyphal fusions’ involvement in pathogenicity and host adhesion, the connection between cell fusion and fungal lifestyles is still unclear. Here, we address the role of anastomosis in fungal development and asexual reproduction in Zymoseptoria tritici, the most important fungal pathogen of wheat in Europe. Results We show that Z. tritici undergoes self-fusion between distinct cellular structures, and its mechanism is dependent on the initial cell density. Contrary to other fungi, cell fusion in Z. tritici only resulted in cytoplasmic mixing but not in multinucleated cell formation. The deletion of the So orthologous ZtSof1 disrupted cell-to-cell communication affecting both hyphal and germling fusion. We show that Z. tritici mutants for MAPK-encoding ZtSlt2 (orthologous to MAK-1) and ZtFus3 (orthologous to MAK-2) genes also failed to undergo anastomosis, demonstrating the functional conservation of this signaling mechanism across species. Additionally, the ΔZtSof1 mutant was severely impaired in melanization, suggesting that the So gene function is related to melanization. Finally, we demonstrated that anastomosis is dispensable for pathogenicity, but essential for the pycnidium development, and its absence abolishes the asexual reproduction of Z. tritici. Conclusions We demonstrate the role for ZtSof1, ZtSlt2, and ZtFus3 in cell fusions of Z. tritici. Cell fusions are essential for different aspects of the Z. tritici biology, and the ZtSof1 gene is a potential target to control septoria tritici blotch (STB) disease

    Speech processing using digital MEMS microphones

    Get PDF
    The last few years have seen the start of a unique change in microphones for consumer devices such as smartphones or tablets. Almost all analogue capacitive microphones are being replaced by digital silicon microphones or MEMS microphones. MEMS microphones perform differently to conventional analogue microphones. Their greatest disadvantage is significantly increased self-noise or decreased SNR, while their most significant benefits are ease of design and manufacturing and improved sensitivity matching. This thesis presents research on speech processing, comparing conventional analogue microphones with the newly available digital MEMS microphones. Specifically, voice activity detection, speaker diarisation (who spoke when), speech separation and speech recognition are looked at in detail. In order to carry out this research different microphone arrays were built using digital MEMS microphones and corpora were recorded to test existing algorithms and devise new ones. Some corpora that were created for the purpose of this research will be released to the public in 2013. It was found that the most commonly used VAD algorithm in current state-of-theart diarisation systems is not the best-performing one, i.e. MLP-based voice activity detection consistently outperforms the more frequently used GMM-HMM-based VAD schemes. In addition, an algorithm was derived that can determine the number of active speakers in a meeting recording given audio data from a microphone array of known geometry, leading to improved diarisation results. Finally, speech separation experiments were carried out using different post-filtering algorithms, matching or exceeding current state-of-the art results. The performance of the algorithms and methods presented in this thesis was verified by comparing their output using speech recognition tools and simple MLLR adaptation and the results are presented as word error rates, an easily comprehensible scale. To summarise, using speech recognition and speech separation experiments, this thesis demonstrates that the significantly reduced SNR of the MEMS microphone can be compensated for with well established adaptation techniques such as MLLR. MEMS microphones do not affect voice activity detection and speaker diarisation performance

    Digital Microphone Array - Design, Implementation and Speech Recognition Experiments

    Get PDF
    The instrumented meeting room of the future will help meetings to be more efficient and productive. One of the basic components of the instrumented meeting room is the speech recording device, in most cases a microphone array. The two basic requirements for this microphone array are portability and cost-efficiency, neither of which are provided by current commercially available arrays. This will change in the near future thanks to the availability of new digital MEMS microphones. This dissertation reports on the first successful implementation of a digital MEMS microphone array. This digital MEMS microphone array was designed, implemented, tested and evaluated and successfully compared with an existing analogue microphone array using a state-of-the-art ASR system and adaptation algorithms. The newly built digital MEMS microphone array compares well with the analogue microphone array on the basis of the word error rate achieved in an automated speech recognition system and is highly portable and economical

    Determining the number of speakers in a meeting using microphone array features

    Get PDF
    The accuracy of speaker diarisation in meetings relies heavily on determining the correct number of speakers. In this paper we present a novel algorithm based on time difference of arrival (TDOA) features that aims to find the correct number of active speakers in a meeting and thus aid the speaker segmentation and clustering process. With our proposed method the microphone array TDOA values and known geometry of the array are used to calculate a speaker matrix from which we determine the correct number of active speakers with the aid of the Bayesian information criterion (BIC). In addition, we analyse several well-known voice activity detection (VAD) algorithms and verified their fitness for meeting recordings. Experiments were performed using the NIST RT06, RT07 and RT09 data sets, and resulted in reduced error rates compared with BIC-based approaches. Index Terms — Speaker diarisation in meetings, microphone array, time difference of arrival (TDOA), speech segmentation and clustering, BIC, voice activity detection (VAD) 1
    • …
    corecore