3 research outputs found
Personalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data
Chronic stress can significantly affect physical and mental health. The
advent of wearable technology allows for the tracking of physiological signals,
potentially leading to innovative stress prediction and intervention methods.
However, challenges such as label scarcity and data heterogeneity render stress
prediction difficult in practice. To counter these issues, we have developed a
multimodal personalized stress prediction system using wearable biosignal data.
We employ self-supervised learning (SSL) to pre-train the models on each
subject's data, allowing the models to learn the baseline dynamics of the
participant's biosignals prior to fine-tuning the stress prediction task. We
test our model on the Wearable Stress and Affect Detection (WESAD) dataset,
demonstrating that our SSL models outperform non-SSL models while utilizing
less than 5% of the annotations. These results suggest that our approach can
personalize stress prediction to each user with minimal annotations. This
paradigm has the potential to enable personalized prediction of a variety of
recurring health events using complex multimodal data streams
Gaining Insight into Determinants of Physical Activity using Bayesian Network Learning
Contains fulltext :
228326pre.pdf (preprint version ) (Open Access)
Contains fulltext :
228326pub.pdf (publisher's version ) (Open Access)BNAIC/BeneLearn 202
Building and Evaluating Open-Vocabulary Language Models
Language models have always been a fundamental NLP tool and application. This thesis focuses on open-vocabulary language models, i.e., models that can deal with novel and unknown words at runtime. We will propose both new ways to construct such models as well as use such models in cross-linguistic evaluations to answer questions of difficulty and language-specificity in modern NLP tools.
We start by surveying linguistic background as well as past and present NLP approaches to tokenization and open-vocabulary language modeling (Mielke et al., 2021).
Thus equipped, we establish desirable principles for such models, both from an engineering mindset as well as a linguistic one and hypothesize a model based on the marriage of neural language modeling and Bayesian nonparametrics to handle a truly infinite vocabulary, boasting attractive theoretical properties and mathematical soundness, but presenting practical implementation difficulties.
As a compromise, we thus introduce a word-based two-level language model that still has many desirable characteristics while being highly feasible to run (Mielke and Eisner, 2019). Unlike the more dominant approaches of characters or subword units as one-layer tokenization it uses words; its key feature is the ability to generate novel words in context and in isolation.
Moving on to evaluation, we ask: how do such models deal with the wide variety of languages of the world---are they struggling with some languages? Relating this question to a more linguistic one, are some languages inherently more difficult to deal with?
Using simple methods, we show that indeed they are, starting with a small pilot study that suggests typological predictors of difficulty (Cotterell et al., 2018). Thus encouraged, we design a far bigger study with more powerful methodology, a principled and highly feasible evaluation and comparison scheme based again on multi-text likelihood (Mielke et al., 2019). This larger study shows that the earlier conclusion of typological predictors is difficult to substantiate, but also offers a new insight on the complexity of Translationese.
Following that theme, we end by extending this scheme to machine translation models to answer questions traditional evaluation metrics like BLEU cannot (Bugliarello et al., 2020)