54 research outputs found
Pediatric Sleep Scoring In-the-wild from Millions of Multi-channel EEG Signals
Sleep is critical to the health and development of infants, children, and
adolescents, but pediatric sleep is severely under-researched compared to adult
sleep in the context of machine learning for health and well-being. Here, we
present the first automated pediatric sleep scoring results on a recent
large-scale sleep study dataset that was collected during standard clinical
care. We develop a transformer-based deep neural network model that learns to
classify five sleep stages from millions of multi-channel electroencephalogram
(EEG) signals with 78% overall accuracy. Further, we conduct an in-depth
analysis of the model performance based on patient demographics and EEG
channels
On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors
Out-of-distribution (OOD) detection is concerned with identifying data points
that do not belong to the same distribution as the model's training data. For
the safe deployment of predictive models in a real-world environment, it is
critical to avoid making confident predictions on OOD inputs as it can lead to
potentially dangerous consequences. However, OOD detection largely remains an
under-explored area in the audio (and speech) domain. This is despite the fact
that audio is a central modality for many tasks, such as speaker diarization,
automatic speech recognition, and sound event detection. To address this, we
propose to leverage feature-space of the model with deep k-nearest neighbors to
detect OOD samples. We show that this simple and flexible method effectively
detects OOD inputs across a broad category of audio (and speech) datasets.
Specifically, it improves the false positive rate (FPR@TPR95) by 17% and the
AUROC score by 7% than other prior techniques
Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices
This work introduces BRILLsson, a novel binary neural network-based
representation learning model for a broad range of non-semantic speech tasks.
We train the model with knowledge distillation from a large and real-valued
TRILLsson model with only a fraction of the dataset used to train TRILLsson.
The resulting BRILLsson models are only 2MB in size with a latency less than
8ms, making them suitable for deployment in low-resource devices such as
wearables. We evaluate BRILLsson on eight benchmark tasks (including but not
limited to spoken language identification, emotion recognition, health
condition diagnosis, and keyword spotting), and demonstrate that our proposed
ultra-light and low-latency models perform as well as large-scale models
Multi-task Self-Supervised Learning for Human Activity Detection
Deep learning methods are successfully used in applications pertaining to
ubiquitous computing, health, and well-being. Specifically, the area of human
activity recognition (HAR) is primarily transformed by the convolutional and
recurrent neural networks, thanks to their ability to learn semantic
representations from raw input. However, to extract generalizable features,
massive amounts of well-curated data are required, which is a notoriously
challenging task; hindered by privacy issues, and annotation costs. Therefore,
unsupervised representation learning is of prime importance to leverage the
vast amount of unlabeled data produced by smart devices. In this work, we
propose a novel self-supervised technique for feature learning from sensory
data that does not require access to any form of semantic labels. We learn a
multi-task temporal convolutional network to recognize transformations applied
on an input signal. By exploiting these transformations, we demonstrate that
simple auxiliary tasks of the binary classification result in a strong
supervisory signal for extracting useful features for the downstream task. We
extensively evaluate the proposed approach on several publicly available
datasets for smartphone-based HAR in unsupervised, semi-supervised, and
transfer learning settings. Our method achieves performance levels superior to
or comparable with fully-supervised networks, and it performs significantly
better than autoencoders. Notably, for the semi-supervised case, the
self-supervised features substantially boost the detection rate by attaining a
kappa score between 0.7-0.8 with only 10 labeled examples per class. We get
similar impressive performance even if the features are transferred from a
different data source. While this paper focuses on HAR as the application
domain, the proposed technique is general and could be applied to a wide
variety of problems in other areas
Federated Fine-Tuning of Foundation Models via Probabilistic Masking
Foundation Models (FMs) have revolutionized machine learning with their
adaptability and high performance across tasks; yet, their integration into
Federated Learning (FL) is challenging due to substantial communication
overhead from their extensive parameterization. Current communication-efficient
FL strategies, such as gradient compression, reduce bitrates to around
bit-per-parameter (bpp). However, these approaches fail to harness the
characteristics of FMs, with their large number of parameters still posing a
challenge to communication efficiency, even at these bitrate regimes. In this
work, we present DeltaMask, a novel method that efficiently fine-tunes FMs in
FL at an ultra-low bitrate, well below 1 bpp. DeltaMask employs stochastic
masking to detect highly effective subnetworks within FMs and leverage
stochasticity and sparsity in client masks to compress updates into a compact
grayscale image using probabilistic filters, deviating from traditional weight
training approaches. Our comprehensive evaluations across various datasets and
architectures demonstrate DeltaMask efficiently achieves bitrates as low as
0.09 bpp, enhancing communication efficiency while maintaining FMs performance,
as measured on 8 datasets and 5 pre-trained models of various network
architectures.Comment: 19 pages, 9 figure
Active Learning of Non-semantic Speech Tasks with Pretrained Models
Pretraining neural networks with massive unlabeled datasets has become
popular as it equips the deep models with a better prior to solve downstream
tasks. However, this approach generally assumes that for downstream tasks, we
have access to annotated data of sufficient size. In this work, we propose
ALOE, a novel system for improving the data- and label-efficiency of
non-semantic speech tasks with active learning (AL). ALOE uses pre-trained
models in conjunction with active learning to label data incrementally and
learns classifiers for downstream tasks, thereby mitigating the need to acquire
labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide
range of tasks, uncertainty-based acquisition functions, and model
architectures. Training a linear classifier on top of a frozen encoder with
ALOE is shown to achieve performance similar to several baselines that utilize
the entire labeled data
Communication-Efficient Federated Learning through Adaptive Weight Clustering and Server-Side Distillation
Federated Learning (FL) is a promising technique for the collaborative
training of deep neural networks across multiple devices while preserving data
privacy. Despite its potential benefits, FL is hindered by excessive
communication costs due to repeated server-client communication during
training. To address this challenge, model compression techniques, such as
sparsification and weight clustering are applied, which often require modifying
the underlying model aggregation schemes or involve cumbersome hyperparameter
tuning, with the latter not only adjusts the model's compression rate but also
limits model's potential for continuous improvement over growing data. In this
paper, we propose FedCompress, a novel approach that combines dynamic weight
clustering and server-side knowledge distillation to reduce communication costs
while learning highly generalizable models. Through a comprehensive evaluation
on diverse public datasets, we demonstrate the efficacy of our approach
compared to baselines in terms of communication costs and inference speed.Comment: 9 pages, 2 figures, Accepted on ICASSP 202
- …