7,445 research outputs found
Speech Emotion Diarization: Which Emotion Appears When?
Speech Emotion Recognition (SER) typically relies on utterance-level
solutions. However, emotions conveyed through speech should be considered as
discrete speech events with definite temporal boundaries, rather than
attributes of the entire utterance. To reflect the fine-grained nature of
speech emotions, we propose a new task: Speech Emotion Diarization (SED). Just
as Speaker Diarization answers the question of "Who speaks when?", Speech
Emotion Diarization answers the question of "Which emotion appears when?". To
facilitate the evaluation of the performance and establish a common benchmark
for researchers, we introduce the Zaion Emotion Dataset (ZED), an openly
accessible speech emotion dataset that includes non-acted emotions recorded in
real-life conditions, along with manually-annotated boundaries of emotion
segments within the utterance. We provide competitive baselines and open-source
the code and the pre-trained models
AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring
Global change is predicted to induce shifts in anuran acoustic behavior,
which can be studied through passive acoustic monitoring (PAM). Understanding
changes in calling behavior requires the identification of anuran species,
which is challenging due to the particular characteristics of neotropical
soundscapes. In this paper, we introduce a large-scale multi-species dataset of
anuran amphibians calls recorded by PAM, that comprises 27 hours of expert
annotations for 42 different species from two Brazilian biomes. We provide open
access to the dataset, including the raw recordings, experimental setup code,
and a benchmark with a baseline model of the fine-grained categorization
problem. Additionally, we highlight the challenges of the dataset to encourage
machine learning researchers to solve the problem of anuran call identification
towards conservation policy. All our experiments and resources can be found on
our GitHub repository https://github.com/soundclim/anuraset
Human-centred artificial intelligence for mobile health sensing:challenges and opportunities
Advances in wearable sensing and mobile computing have enabled the collection of health and well-being data outside of traditional laboratory and hospital settings, paving the way for a new era of mobile health. Meanwhile, artificial intelligence (AI) has made significant strides in various domains, demonstrating its potential to revolutionize healthcare. Devices can now diagnose diseases, predict heart irregularities and unlock the full potential of human cognition. However, the application of machine learning (ML) to mobile health sensing poses unique challenges due to noisy sensor measurements, high-dimensional data, sparse and irregular time series, heterogeneity in data, privacy concerns and resource constraints. Despite the recognition of the value of mobile sensing, leveraging these datasets has lagged behind other areas of ML. Furthermore, obtaining quality annotations and ground truth for such data is often expensive or impractical. While recent large-scale longitudinal studies have shown promise in leveraging wearable sensor data for health monitoring and prediction, they also introduce new challenges for data modelling. This paper explores the challenges and opportunities of human-centred AI for mobile health, focusing on key sensing modalities such as audio, location and activity tracking. We discuss the limitations of current approaches and propose potential solutions
- …