388 research outputs found
Recommended from our members
Innovative food recommendation systems: a machine learning approach
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonRecommendation systems employ users history data records to predict their preference,
and have been widely used in diverse fields including biology, e-commerce, and healthcare.
Traditional recommendation techniques include content-based, collaborative-based and
hybrid methods but not all real-world problems can be best addressed by these classical
recommendation techniques. Food recommendation is one such challenging problem where
there is an urgent need to use novel recommendation systems in assisting people to select
healthy, balanced and personalized food plans. In this thesis, we make several advances in
food recommendation systems using innovative machine learning methods. First, a novel
recommendation approach is proposed by transforming an original recommendation problem
into a many-objective optimisation one that contains several different objectives resulting in
more balanced recommendations. Second, a unified approach to designing sequence-based
personalised food recommendation systems is investigated to accommodate dynamic user
behaviours. Third, a new food recommendation approach is developed with a temporal
dependent graph neural network and data augmentation techniques leading to more accurate
and robust recommendations. The experimental results show that these proposed approaches
have not only provided a more balanced and accurate way of recommending food than the
traditional methods but also led to promising areas for future research
Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision
Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to
synthesize training labels efficiently. The core component of PWS is the label
model, which infers true labels by aggregating the outputs of multiple noisy
supervision sources abstracted as labeling functions (LFs). Existing
statistical label models typically rely only on the outputs of LF, ignoring the
instance features when modeling the underlying generative process. In this
paper, we attempt to incorporate the instance features into a statistical label
model via the proposed FABLE. In particular, it is built on a mixture of
Bayesian label models, each corresponding to a global pattern of correlation,
and the coefficients of the mixture components are predicted by a Gaussian
Process classifier based on instance features. We adopt an auxiliary
variable-based variational inference algorithm to tackle the non-conjugate
issue between the Gaussian Process and Bayesian label models. Extensive
empirical comparison on eleven benchmark datasets sees FABLE achieving the
highest averaged performance across nine baselines.Comment: 16 page
AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models
While pre-trained language model (PLM) fine-tuning has achieved strong
performance in many NLP tasks, the fine-tuning stage can be still demanding in
labeled data. Recent works have resorted to active fine-tuning to improve the
label efficiency of PLM fine-tuning, but none of them investigate the potential
of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled
data to improve the label efficiency of active PLM fine-tuning. AcTune switches
between data annotation and model self-training based on uncertainty: it
selects high-uncertainty unlabeled samples for active annotation and
low-uncertainty ones for model self-training. Under this framework, we design
(1) a region-aware sampling strategy that reduces redundancy when actively
querying for annotations and (2) a momentum-based memory bank that dynamically
aggregates the model's pseudo labels to suppress label noise in self-training.
Experiments on 6 text classification datasets show that AcTune outperforms the
strongest active learning and self-training baselines and improves the label
efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be
available at \url{https://github.com/yueyu1030/actune}.Comment: NAACL 2022 Main Conference (Code:
https://github.com/yueyu1030/actune
Adaptive Ranking-based Sample Selection for Weakly Supervised Class-imbalanced Text Classification
To obtain a large amount of training labels inexpensively, researchers have
recently adopted the weak supervision (WS) paradigm, which leverages labeling
rules to synthesize training labels rather than using individual annotations to
achieve competitive results for natural language processing (NLP) tasks.
However, data imbalance is often overlooked in applying the WS paradigm,
despite being a common issue in a variety of NLP tasks. To address this
challenge, we propose Adaptive Ranking-based Sample Selection (ARS2), a
model-agnostic framework to alleviate the data imbalance issue in the WS
paradigm. Specifically, it calculates a probabilistic margin score based on the
output of the current model to measure and rank the cleanliness of each data
point. Then, the ranked data are sampled based on both class-wise and
rule-aware ranking. In particular, the two sample strategies corresponds to our
motivations: (1) to train the model with balanced data batches to reduce the
data imbalance issue and (2) to exploit the expertise of each labeling rule for
collecting clean samples. Experiments on four text classification datasets with
four different imbalance ratios show that ARS2 outperformed the
state-of-the-art imbalanced learning and WS methods, leading to a 2%-57.8%
improvement on their F1-score
- …