3 research outputs found
INSET: Sentence Infilling with INter-SEntential Transformer
Missing sentence generation (or sentence infilling) fosters a wide range of
applications in natural language generation, such as document auto-completion
and meeting note expansion. This task asks the model to generate intermediate
missing sentences that can syntactically and semantically bridge the
surrounding context. Solving the sentence infilling task requires techniques in
natural language processing ranging from understanding to discourse-level
planning to generation. In this paper, we propose a framework to decouple the
challenge and address these three aspects respectively, leveraging the power of
existing large-scale pre-trained models such as BERT and GPT-2. We empirically
demonstrate the effectiveness of our model in learning a sentence
representation for generation and further generating a missing sentence that
fits the context.Comment: Y.H. and Y.Z. contributed equally to this work. v2: published version
with updated results and reference
Large-scale Training of Foundation Models for Wearable Biosignals
Tracking biosignals is crucial for monitoring wellness and preempting the
development of severe medical conditions. Today, wearable devices can
conveniently record various biosignals, creating the opportunity to monitor
health status without disruption to one's daily routine. Despite widespread use
of wearable devices and existing digital biomarkers, the absence of curated
data with annotated medical labels hinders the development of new biomarkers to
measure common health conditions. In fact, medical datasets are usually small
in comparison to other domains, which is an obstacle for developing neural
network models for biosignals. To address this challenge, we have employed
self-supervised learning using the unlabeled sensor data collected under
informed consent from the large longitudinal Apple Heart and Movement Study
(AHMS) to train foundation models for two common biosignals:
photoplethysmography (PPG) and electrocardiogram (ECG) recorded on Apple Watch.
We curated PPG and ECG datasets from AHMS that include data from ~141K
participants spanning ~3 years. Our self-supervised learning framework includes
participant level positive pair selection, stochastic augmentation module and a
regularized contrastive loss optimized with momentum training, and generalizes
well to both PPG and ECG modalities. We show that the pre-trained foundation
models readily encode information regarding participants' demographics and
health conditions. To the best of our knowledge, this is the first study that
builds foundation models using large-scale PPG and ECG data collected via
wearable consumer devices \unicode{x2013} prior works have commonly used
smaller-size datasets collected in clinical and experimental settings. We
believe PPG and ECG foundation models can enhance future wearable devices by
reducing the reliance on labeled data and hold the potential to help the users
improve their health.Comment: Camera ready version for ICLR 202