Search CORE

388 research outputs found

Recommended from our members

Innovative food recommendation systems: a machine learning approach

Author: Zhang Jieyu
Publication venue: Brunel University London
Publication date: 01/01/2023
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonRecommendation systems employ users history data records to predict their preference, and have been widely used in diverse fields including biology, e-commerce, and healthcare. Traditional recommendation techniques include content-based, collaborative-based and hybrid methods but not all real-world problems can be best addressed by these classical recommendation techniques. Food recommendation is one such challenging problem where there is an urgent need to use novel recommendation systems in assisting people to select healthy, balanced and personalized food plans. In this thesis, we make several advances in food recommendation systems using innovative machine learning methods. First, a novel recommendation approach is proposed by transforming an original recommendation problem into a many-objective optimisation one that contains several different objectives resulting in more balanced recommendations. Second, a unified approach to designing sequence-based personalised food recommendation systems is investigated to accommodate dynamic user behaviours. Third, a new food recommendation approach is developed with a temporal dependent graph neural network and data augmentation techniques leading to more accurate and robust recommendations. The experimental results show that these proposed approaches have not only provided a more balanced and accurate way of recommending food than the traditional methods but also led to promising areas for future research

Brunel University Research Archive

邊城的荒野留下少年的笛聲 : 1930年代北平前線詩人的城市記憶與文化心態

Author: ZHANG Jieyu
Publication venue: Digital Commons @ Lingnan University
Publication date: 01/01/2005
Field of study

Digital Commons @ Lingnan University

Leveraging Instance Features for Label Aggregation in Programmatic Weak Supervision

Author: Ratner Alexander
Song Linxin
Zhang Jieyu
Publication venue
Publication date: 09/10/2022
Field of study

Programmatic Weak Supervision (PWS) has emerged as a widespread paradigm to synthesize training labels efficiently. The core component of PWS is the label model, which infers true labels by aggregating the outputs of multiple noisy supervision sources abstracted as labeling functions (LFs). Existing statistical label models typically rely only on the outputs of LF, ignoring the instance features when modeling the underlying generative process. In this paper, we attempt to incorporate the instance features into a statistical label model via the proposed FABLE. In particular, it is built on a mixture of Bayesian label models, each corresponding to a global pattern of correlation, and the coefficients of the mixture components are predicted by a Gaussian Process classifier based on instance features. We adopt an auxiliary variable-based variational inference algorithm to tackle the non-conjugate issue between the Gaussian Process and Bayesian label models. Extensive empirical comparison on eleven benchmark datasets sees FABLE achieving the highest averaged performance across nine baselines.Comment: 16 page

arXiv.org e-Print Archive

AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

Author: Kong Lingkai
Yu Yue
Zhang Chao
Zhang Jieyu
Zhang Rongzhi
Publication venue
Publication date: 03/05/2022
Field of study

While pre-trained language model (PLM) fine-tuning has achieved strong performance in many NLP tasks, the fine-tuning stage can be still demanding in labeled data. Recent works have resorted to active fine-tuning to improve the label efficiency of PLM fine-tuning, but none of them investigate the potential of unlabeled data. We propose {\ours}, a new framework that leverages unlabeled data to improve the label efficiency of active PLM fine-tuning. AcTune switches between data annotation and model self-training based on uncertainty: it selects high-uncertainty unlabeled samples for active annotation and low-uncertainty ones for model self-training. Under this framework, we design (1) a region-aware sampling strategy that reduces redundancy when actively querying for annotations and (2) a momentum-based memory bank that dynamically aggregates the model's pseudo labels to suppress label noise in self-training. Experiments on 6 text classification datasets show that AcTune outperforms the strongest active learning and self-training baselines and improves the label efficiency of PLM fine-tuning by 56.2\% on average. Our implementation will be available at \url{https://github.com/yueyu1030/actune}.Comment: NAACL 2022 Main Conference (Code: https://github.com/yueyu1030/actune

arXiv.org e-Print Archive

Adaptive Ranking-based Sample Selection for Weakly Supervised Class-imbalanced Text Classification

Author: Goto Masayuki
Song Linxin
Yang Tianxiang
Zhang Jieyu
Publication venue
Publication date: 07/10/2022
Field of study

To obtain a large amount of training labels inexpensively, researchers have recently adopted the weak supervision (WS) paradigm, which leverages labeling rules to synthesize training labels rather than using individual annotations to achieve competitive results for natural language processing (NLP) tasks. However, data imbalance is often overlooked in applying the WS paradigm, despite being a common issue in a variety of NLP tasks. To address this challenge, we propose Adaptive Ranking-based Sample Selection (ARS2), a model-agnostic framework to alleviate the data imbalance issue in the WS paradigm. Specifically, it calculates a probabilistic margin score based on the output of the current model to measure and rank the cleanliness of each data point. Then, the ranked data are sampled based on both class-wise and rule-aware ranking. In particular, the two sample strategies corresponds to our motivations: (1) to train the model with balanced data batches to reduce the data imbalance issue and (2) to exploit the expertise of each labeling rule for collecting clean samples. Experiments on four text classification datasets with four different imbalance ratios show that ARS2 outperformed the state-of-the-art imbalanced learning and WS methods, leading to a 2%-57.8% improvement on their F1-score

arXiv.org e-Print Archive