1,522 research outputs found

    The Sheffield Wargames Corpus.

    Get PDF
    Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between sev-eral speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high qual-ity near and far field data from real interactions between par-ticipants. In this paper we present a new corpus for research on speech recognition, speaker tracking and diarisation, based on recordings of native speakers of English playing a table-top wargame. The Sheffield Wargames Corpus comprises 7 hours of data from 10 recording sessions, obtained from 96 micro-phones, 3 video cameras and, most importantly, 3D location data provided by a sensor tracking system. The corpus repre-sents a unique resource, that provides for the first time location tracks (1.3Hz) of speakers that are constantly moving and talk-ing. The corpus is available for research purposes, and includes annotated development and evaluation test sets. Baseline results for close-talking and far field sets are included in this paper. 1

    Requirements for tracking radar for falling spheres

    Get PDF
    Error analysis on radar tracking of falling sphere

    Source-filter Separation of Speech Signal in the Phase Domain

    Get PDF
    Deconvolution of the speech excitation (source) and vocal tract (filter) components through log-magnitude spectral processing is well-established and has led to the well-known cepstral features used in a multitude of speech processing tasks. This paper presents a novel source-filter decomposition based on processing in the phase domain. We show that separation between source and filter in the log-magnitude spectra is far from perfect, leading to loss of vital vocal tract information. It is demonstrated that the same task can be better performed by trend and fluctuation analysis of the phase spectrum of the minimum-phase component of speech, which can be computed via the Hilbert transform. Trend and fluctuation can be separated through low-pass filtering of the phase, using additivity of vocal tract and source in the phase domain. This results in separated signals which have a clear relation to the vocal tract and excitation components. The effectiveness of the method is put to test in a speech recognition task. The vocal tract component extracted in this way is used as the basis of a feature extraction algorithm for speech recognition on the Aurora-2 database. The recognition results shows upto 8.5% absolute improvement in comparison with MFCC features on average (0-20dB)

    Learning temporal clusters using capsule routing for speech emotion recognition

    Get PDF
    Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirectional long short-term memory network (BLSTM), CNN and Capsule networks. The BLSTM deals with the temporal dynamics of the speech signal by effectively representing forward/backward contextual information while the CNN along with the dynamic routing of the Capsule net learn temporal clusters which altogether provide a state-of-the-art technique for classifying the extracted patterns. The proposed approach was compared with a wide range of architectures on the FAU-Aibo and RAVDESS corpora and remarkable gain over state-of-the-art systems were obtained. For FAO-Aibo and RAVDESS 77.6% and 56.2% accuracy was achieved, respectively, which is 3% and 14% (absolute) higher than the best-reported result for the respective tasks

    Low-temperature statistical mechanics of the QuanTizer problem: fast quenching and equilibrium cooling of the three-dimensional Voronoi Liquid

    Full text link
    The Quantizer problem is a tessellation optimisation problem where point configurations are identified such that the Voronoi cells minimise the second moment of the volume distribution. While the ground state (optimal state) in 3D is almost certainly the body-centered cubic lattice, disordered and effectively hyperuniform states with energies very close to the ground state exist that result as stable states in an evolution through the geometric Lloyd's algorithm [Klatt et al. Nat. Commun., 10, 811 (2019)]. When considered as a statistical mechanics problem at finite temperature, the same system has been termed the 'Voronoi Liquid' by [Ruscher et al. EPL 112, 66003 (2015)]. Here we investigate the cooling behaviour of the Voronoi liquid with a particular view to the stability of the effectively hyperuniform disordered state. As a confirmation of the results by Ruscher et al., we observe, by both molecular dynamics and Monte Carlo simulations, that upon slow quasi-static equilibrium cooling, the Voronoi liquid crystallises from a disordered configuration into the body-centered cubic configuration. By contrast, upon sufficiently fast non-equilibrium cooling (and not just in the limit of a maximally fast quench) the Voronoi liquid adopts similar states as the effectively hyperuniform inherent structures identified by Klatt et al. and prevents the ordering transition into a BCC ordered structure. This result is in line with the geometric intuition that the geometric Lloyd's algorithm corresponds to a type of fast quench.Comment: 11 pages, 6 figure

    The impact of atmospheric pCO2 on carbon isotope ratios of the atmosphere and ocean

    No full text
    It is well known that the equilibration timescale for the isotopic ratios 13C/12C and 14C/12C in the ocean mixed layer is on the order of a decade, 2 orders of magnitude slower than for oxygen. Less widely appreciated is the fact that the equilibration timescale is quite sensitive to the speciation of dissolved inorganic carbon (DIC) in the mixed layer, scaling linearly with the ratio DIC/CO2, which varies inversely with atmospheric pCO2. Although this effect is included in models that resolve the role of carbon speciation in air-sea exchange, its role is often unrecognized, and it is not commonly considered in the interpretation of carbon isotope observations. Here we use a global three-dimensional ocean model to estimate the redistribution of the carbon isotopic ratios between the atmosphere and ocean due solely to variations in atmospheric pCO2. Under Last Glacial Maximum (LGM) pCO2, atmospheric Δ14C is increased by ~30‰ due to the speciation change, all else being equal, raising the surface reservoir age by about 250 years throughout most of the ocean. For 13C, enhanced surface disequilibrium under LGM pCO2 causes the upper ocean, atmosphere, and North Atlantic Deep Water δ13C to become at least 0.2‰ higher relative to deep waters ventilated by the Southern Ocean. Conversely, under high pCO2, rapid equilibration greatly decreases isotopic disequilibrium. As a result, during geological periods of high pCO2, vertical δ13C gradients may have been greatly weakened as a direct chemical consequence of the high pCO2, masquerading as very well ventilated or biologically dead Strangelove Oceans. The ongoing anthropogenic rise of pCO2 is accelerating the equilibration of the carbon isotopes in the ocean, lowering atmospheric Δ14C and weakening δ13C gradients within the ocean to a degree that is similar to the traditional fossil fuel “Suess” effect

    Computer Mapping of Seasonal Groundwater Fluctuations for Two Differing Southern New Jersey Swamp Forests I

    Get PDF
    Computer-generated maps (SYMAP, Harvard) of seasonal groundwater fluctuations for two New Jersey swamp forests, a red maple (Acer rubrum) swamp and an Atlantic white cedar (Chamaecyparis thyoides) swamp, are presented. Notable differences exist in water table behavior for the two swamp forests and are best accounted for by topographic differences. Other factors examined which might affect the hydrologic differences include vegetation and subsurface geologic differences

    On the usefulness of the speech phase spectrum for pitch extraction

    Get PDF
    © 2018 International Speech Communication Association. All rights reserved. Most frequency domain techniques for pitch extraction such as cepstrum, harmonic product spectrum (HPS) and summation residual harmonics (SRH) operate on the magnitude spectrum and turn it into a function in which the fundamental frequency emerges as argmax. In this paper, we investigate the extension of these three techniques to the phase and group delay (GD) domains. Our extensions exploit the observation that the bin at which F(magnitude) becomes maximum, for some monotonically increasing function F, is equivalent to bin at which F(phase) has maximum negative slope and F(groupdelay) has the maximum value. To extract the pitch track from speech phase spectrum, these techniques were coupled with the source-filter model in the phase domain that we proposed in earlier publications and a novel voicing detection algorithm proposed here. The accuracy and robustness of the phase-based pitch extraction techniques are illustrated and compared with their magnitude-based counterparts using six pitch evaluation metrics. On average, it is observed that the phase spectrum can be successfully employed in pitch tracking with comparable accuracy and robustness to the speech magnitude spectrum
    corecore