3,180 research outputs found

    Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

    Get PDF
    Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

    On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey

    Full text link
    Recent advances in NLP are brought by a range of large-scale pretrained language models (PLMs). These PLMs have brought significant performance gains for a range of NLP tasks, circumventing the need to customize complex designs for specific tasks. However, most current work focus on finetuning PLMs on a domain-specific datasets, ignoring the fact that the domain gap can lead to overfitting and even performance drop. Therefore, it is practically important to find an appropriate method to effectively adapt PLMs to a target domain of interest. Recently, a range of methods have been proposed to achieve this purpose. Early surveys on domain adaptation are not suitable for PLMs due to the sophisticated behavior exhibited by PLMs from traditional models trained from scratch and that domain adaptation of PLMs need to be redesigned to take effect. This paper aims to provide a survey on these newly proposed methods and shed light in how to apply traditional machine learning methods to newly evolved and future technologies. By examining the issues of deploying PLMs for downstream tasks, we propose a taxonomy of domain adaptation approaches from a machine learning system view, covering methods for input augmentation, model optimization and personalization. We discuss and compare those methods and suggest promising future research directions

    Text Classification: A Review, Empirical, and Experimental Evaluation

    Full text link
    The explosive and widespread growth of data necessitates the use of text classification to extract crucial information from vast amounts of data. Consequently, there has been a surge of research in both classical and deep learning text classification methods. Despite the numerous methods proposed in the literature, there is still a pressing need for a comprehensive and up-to-date survey. Existing survey papers categorize algorithms for text classification into broad classes, which can lead to the misclassification of unrelated algorithms and incorrect assessments of their qualities and behaviors using the same metrics. To address these limitations, our paper introduces a novel methodological taxonomy that classifies algorithms hierarchically into fine-grained classes and specific techniques. The taxonomy includes methodology categories, methodology techniques, and methodology sub-techniques. Our study is the first survey to utilize this methodological taxonomy for classifying algorithms for text classification. Furthermore, our study also conducts empirical evaluation and experimental comparisons and rankings of different algorithms that employ the same specific sub-technique, different sub-techniques within the same technique, different techniques within the same category, and categorie

    Teacher-Student Architecture for Knowledge Distillation: A Survey

    Full text link
    Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with arXiv:2210.1733

    Enhancing deep transfer learning for image classification

    Get PDF
    Though deep learning models require a large amount of labelled training data for yielding high performance, they are applied to accomplish many computer vision tasks such as image classification. Current models also do not perform well across different domain settings such as illumination, camera angle and real-to-synthetic. Thus the models are more likely to misclassify unknown classes as known classes. These issues challenge the supervised learning paradigm of the models and encourage the study of transfer learning approaches. Transfer learning allows us to utilise the knowledge acquired from related domains to improve performance on a target domain. Existing transfer learning approaches lack proper high-level source domain feature analyses and are prone to negative transfers for not exploring proper discriminative information across domains. Current approaches also lack at discovering necessary visual-semantic linkage and has a bias towards the source domain. In this thesis, to address these issues and improve image classification performance, we make several contributions to three different deep transfer learning scenarios, i.e., the target domain has i) labelled data; no labelled data; and no visual data. Firstly, for improving inductive transfer learning for the first scenario, we analyse the importance of high-level deep features and propose utilising them in sequential transfer learning approaches and investigating the suitable conditions for optimal performance. Secondly, to improve image classification across different domains in an open set setting by reducing negative transfers (second scenario), we propose two novel architectures. The first model has an adaptive weighting module based on underlying domain distinctive information, and the second model has an information-theoretic weighting module to reduce negative transfers. Thirdly, to learn visual classifiers when no visual data is available (third scenario) and reduce source domain bias, we propose two novel models. One model has a new two-step dense attention mechanism to discover semantic attribute-guided local visual features and mutual learning loss. The other model utilises bidirectional mapping and adversarial supervision to learn the joint distribution of source-target domains simultaneously. We propose a new pointwise mutual information dependant loss in the first model and a distance-based loss in the second one for handling source domain bias. We perform extensive evaluations on benchmark datasets and demonstrate the proposed models outperform contemporary works.Doctor of Philosoph
    • …
    corecore