5 research outputs found
Speaker Embeddings as Individuality Proxy for Voice Stress Detection
Since the mental states of the speaker modulate speech, stress introduced by
cognitive or physical loads could be detected in the voice. The existing voice
stress detection benchmark has shown that the audio embeddings extracted from
the Hybrid BYOL-S self-supervised model perform well. However, the benchmark
only evaluates performance separately on each dataset, but does not evaluate
performance across the different types of stress and different languages.
Moreover, previous studies found strong individual differences in stress
susceptibility. This paper presents the design and development of voice stress
detection, trained on more than 100 speakers from 9 language groups and five
different types of stress. We address individual variabilities in voice stress
analysis by adding speaker embeddings to the hybrid BYOL-S features. The
proposed method significantly improves voice stress detection performance with
an input audio length of only 3-5 seconds.Comment: 5 pages, 2 figures. Accepted at Interspeech 202
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Methods for extracting audio and speech features have been studied since
pioneering work on spectrum analysis decades ago. Recent efforts are guided by
the ambition to develop general-purpose audio representations. For example,
deep neural networks can extract optimal embeddings if they are trained on
large audio datasets. This work extends existing methods based on
self-supervised learning by bootstrapping, proposes various encoder
architectures, and explores the effects of using different pre-training
datasets. Lastly, we present a novel training framework to come up with a
hybrid audio representation, which combines handcrafted and data-driven learned
audio features. All the proposed representations were evaluated within the HEAR
NeurIPS 2021 challenge for auditory scene classification and timestamp
detection tasks. Our results indicate that the hybrid model with a
convolutional transformer as the encoder yields superior performance in most
HEAR challenge tasks.Comment: Submitted to HEAR-PMLR 202
Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees
Phylogenetics is now fundamental in life sciences, providing insights into
the earliest branches of life and the origins and spread of epidemics. However,
finding suitable phylogenies from the vast space of possible trees remains
challenging. To address this problem, for the first time, we perform both tree
exploration and inference in a continuous space where the computation of
gradients is possible. This continuous relaxation allows for major leaps across
tree space in both rooted and unrooted trees, and is less susceptible to
convergence to local minima. Our approach outperforms the current best methods
for inference on unrooted trees and, in simulation, accurately infers the tree
and root in ultrametric cases. The approach is effective in cases of empirical
data with negligible amounts of data, which we demonstrate on the phylogeny of
jawed vertebrates. Indeed, only a few genes with an ultrametric signal were
generally sufficient for resolving the major lineages of vertebrate. With
cubic-time complexity and efficient optimisation via automatic differentiation,
our method presents an effective way forwards for exploring the most difficult,
data-deficient phylogenetic questions.Comment: 13 pages, 4 figures, 14 supplementary pages, 2 supplementary figure
Commentary: The Risky Closed Economy: A Holistic, Longitudinal Approach to Studying Fear and Anxiety in Rodents
The limits of the constant-rate birth-death prior for phylogenetic tree topology inference
<p>Birth-death models are stochastic processes describing speciation and extinction through time and across taxa, and are widely used in biology for inference of evolutionary timescales. Previous research has highlighted how the expected trees under constant-rate birth-death (crBD) tend to differ from empirical trees, for example with respect to the amount of phylogenetic imbalance. However, our understanding of how trees differ between crBD and the signal in empirical data remains incomplete. In this Point of View, we aim to expose the degree to which crBD differs from empirically inferred phylogenies and test the limits of the model in practice. Using a wide range of topology indices to compare crBD expectations against a comprehensive dataset of 1189 empirically estimated trees, we confirm that crBD trees frequently differ topologically compared with empirical trees. To place this in the context of standard practice in the field, we conducted a meta-analysis for a subset of the empirical studies. When comparing studies that used crBD priors with those that used other non-BD Bayesian and non-Bayesian methods, we do not find any significant differences in tree topology inferences. To scrutinize this finding for the case of highly imbalanced trees, we selected the 100 trees with the greatest imbalance from our dataset, simulated sequence data for these tree topologies under various evolutionary rates, and re-inferred the trees under maximum likelihood and using crBD in a Bayesian setting. We find that when the substitution rate is low, the crBD prior results in overly balanced trees, but the tendency is negligible when substitution rates are sufficiently high. Overall, our findings demonstrate the general robustness of crBD priors across a broad range of phylogenetic inference scenarios, but also highlights that empirically observed phylogenetic imbalance is highly improbable under crBD, leading to systematic bias in data sets with limited information content.</p>