1,522 research outputs found
The Sheffield Wargames Corpus.
Recognition of speech in natural environments is a challenging task, even more so if this involves conversations between sev-eral speakers. Work on meeting recognition has addressed some of the significant challenges, mostly targeting formal, business style meetings where people are mostly in a static position in a room. Only limited data is available that contains high qual-ity near and far field data from real interactions between par-ticipants. In this paper we present a new corpus for research on speech recognition, speaker tracking and diarisation, based on recordings of native speakers of English playing a table-top wargame. The Sheffield Wargames Corpus comprises 7 hours of data from 10 recording sessions, obtained from 96 micro-phones, 3 video cameras and, most importantly, 3D location data provided by a sensor tracking system. The corpus repre-sents a unique resource, that provides for the first time location tracks (1.3Hz) of speakers that are constantly moving and talk-ing. The corpus is available for research purposes, and includes annotated development and evaluation test sets. Baseline results for close-talking and far field sets are included in this paper. 1
Requirements for tracking radar for falling spheres
Error analysis on radar tracking of falling sphere
Source-filter Separation of Speech Signal in the Phase Domain
Deconvolution of the speech excitation (source) and vocal tract
(filter) components through log-magnitude spectral processing
is well-established and has led to the well-known cepstral features
used in a multitude of speech processing tasks. This paper
presents a novel source-filter decomposition based on processing
in the phase domain. We show that separation between
source and filter in the log-magnitude spectra is far from
perfect, leading to loss of vital vocal tract information. It is
demonstrated that the same task can be better performed by
trend and fluctuation analysis of the phase spectrum of the
minimum-phase component of speech, which can be computed
via the Hilbert transform. Trend and fluctuation can be separated
through low-pass filtering of the phase, using additivity of
vocal tract and source in the phase domain. This results in separated
signals which have a clear relation to the vocal tract and
excitation components. The effectiveness of the method is put
to test in a speech recognition task. The vocal tract component
extracted in this way is used as the basis of a feature extraction
algorithm for speech recognition on the Aurora-2 database.
The recognition results shows upto 8.5% absolute improvement
in comparison with MFCC features on average (0-20dB)
Learning temporal clusters using capsule routing for speech emotion recognition
Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirectional long short-term memory network (BLSTM), CNN and Capsule networks. The BLSTM deals with the temporal dynamics of the speech signal by effectively representing forward/backward contextual information while the CNN along with the dynamic routing of the Capsule net learn temporal clusters which altogether provide a state-of-the-art technique for classifying the extracted patterns. The proposed approach was compared with a wide range of architectures on the FAU-Aibo and RAVDESS corpora and remarkable gain over state-of-the-art systems were obtained. For FAO-Aibo and RAVDESS 77.6% and 56.2% accuracy was achieved, respectively, which is 3% and 14% (absolute) higher than the best-reported result for the respective tasks
Low-temperature statistical mechanics of the QuanTizer problem: fast quenching and equilibrium cooling of the three-dimensional Voronoi Liquid
The Quantizer problem is a tessellation optimisation problem where point
configurations are identified such that the Voronoi cells minimise the second
moment of the volume distribution. While the ground state (optimal state) in 3D
is almost certainly the body-centered cubic lattice, disordered and effectively
hyperuniform states with energies very close to the ground state exist that
result as stable states in an evolution through the geometric Lloyd's algorithm
[Klatt et al. Nat. Commun., 10, 811 (2019)]. When considered as a statistical
mechanics problem at finite temperature, the same system has been termed the
'Voronoi Liquid' by [Ruscher et al. EPL 112, 66003 (2015)]. Here we investigate
the cooling behaviour of the Voronoi liquid with a particular view to the
stability of the effectively hyperuniform disordered state. As a confirmation
of the results by Ruscher et al., we observe, by both molecular dynamics and
Monte Carlo simulations, that upon slow quasi-static equilibrium cooling, the
Voronoi liquid crystallises from a disordered configuration into the
body-centered cubic configuration. By contrast, upon sufficiently fast
non-equilibrium cooling (and not just in the limit of a maximally fast quench)
the Voronoi liquid adopts similar states as the effectively hyperuniform
inherent structures identified by Klatt et al. and prevents the ordering
transition into a BCC ordered structure. This result is in line with the
geometric intuition that the geometric Lloyd's algorithm corresponds to a type
of fast quench.Comment: 11 pages, 6 figure
The impact of atmospheric pCO2 on carbon isotope ratios of the atmosphere and ocean
It is well known that the equilibration timescale for the isotopic ratios 13C/12C and 14C/12C in the ocean mixed layer is on the order of a decade, 2 orders of magnitude slower than for oxygen. Less widely appreciated is the fact that the equilibration timescale is quite sensitive to the speciation of dissolved inorganic carbon (DIC) in the mixed layer, scaling linearly with the ratio DIC/CO2, which varies inversely with atmospheric pCO2. Although this effect is included in models that resolve the role of carbon speciation in air-sea exchange, its role is often unrecognized, and it is not commonly considered in the interpretation of carbon isotope observations. Here we use a global three-dimensional ocean model to estimate the redistribution of the carbon isotopic ratios between the atmosphere and ocean due solely to variations in atmospheric pCO2. Under Last Glacial Maximum (LGM) pCO2, atmospheric Δ14C is increased by ~30‰ due to the speciation change, all else being equal, raising the surface reservoir age by about 250 years throughout most of the ocean. For 13C, enhanced surface disequilibrium under LGM pCO2 causes the upper ocean, atmosphere, and North Atlantic Deep Water δ13C to become at least 0.2‰ higher relative to deep waters ventilated by the Southern Ocean. Conversely, under high pCO2, rapid equilibration greatly decreases isotopic disequilibrium. As a result, during geological periods of high pCO2, vertical δ13C gradients may have been greatly weakened as a direct chemical consequence of the high pCO2, masquerading as very well ventilated or biologically dead Strangelove Oceans. The ongoing anthropogenic rise of pCO2 is accelerating the equilibration of the carbon isotopes in the ocean, lowering atmospheric Δ14C and weakening δ13C gradients within the ocean to a degree that is similar to the traditional fossil fuel “Suess” effect
Computer Mapping of Seasonal Groundwater Fluctuations for Two Differing Southern New Jersey Swamp Forests I
Computer-generated maps (SYMAP, Harvard) of seasonal groundwater fluctuations for two New Jersey swamp forests, a red maple (Acer rubrum) swamp and an Atlantic white cedar (Chamaecyparis thyoides) swamp, are presented. Notable differences exist in water table behavior for the two swamp forests and are best accounted for by topographic differences. Other factors examined which might affect the hydrologic differences include vegetation and subsurface geologic differences
On the usefulness of the speech phase spectrum for pitch extraction
© 2018 International Speech Communication Association. All rights reserved. Most frequency domain techniques for pitch extraction such as cepstrum, harmonic product spectrum (HPS) and summation residual harmonics (SRH) operate on the magnitude spectrum and turn it into a function in which the fundamental frequency emerges as argmax. In this paper, we investigate the extension of these three techniques to the phase and group delay (GD) domains. Our extensions exploit the observation that the bin at which F(magnitude) becomes maximum, for some monotonically increasing function F, is equivalent to bin at which F(phase) has maximum negative slope and F(groupdelay) has the maximum value. To extract the pitch track from speech phase spectrum, these techniques were coupled with the source-filter model in the phase domain that we proposed in earlier publications and a novel voicing detection algorithm proposed here. The accuracy and robustness of the phase-based pitch extraction techniques are illustrated and compared with their magnitude-based counterparts using six pitch evaluation metrics. On average, it is observed that the phase spectrum can be successfully employed in pitch tracking with comparable accuracy and robustness to the speech magnitude spectrum
- …