37 research outputs found

    Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

    Get PDF
    Over the past few years, adversarial training has become an extremely active research topic and has been successfully applied to various Artificial Intelligence (AI) domains. As a potentially crucial technique for the development of the next generation of emotional AI systems, we herein provide a comprehensive overview of the application of adversarial training to affective computing and sentiment analysis. Various representative adversarial training algorithms are explained and discussed accordingly, aimed at tackling diverse challenges associated with emotional AI systems. Further, we highlight a range of potential future research directions. We expect that this overview will help facilitate the development of adversarial training for affective computing and sentiment analysis in both the academic and industrial communities

    Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

    Get PDF
    State-of-the-art deep learning models are often trained with a large amountof costly labeled training data. However, requiring exhaustive manualannotations may degrade the model's generalizability in the limited-labelregime. Semi-supervised learning and unsupervised learning offer promisingparadigms to learn from an abundance of unlabeled visual data. Recent progressin these paradigms has indicated the strong benefits of leveraging unlabeleddata to improve model generalization and provide better model initialization.In this survey, we review the recent advanced deep learning algorithms onsemi-supervised learning (SSL) and unsupervised learning (UL) for visualrecognition from a unified perspective. To offer a holistic understanding ofthe state-of-the-art in these areas, we propose a unified taxonomy. Wecategorize existing representative SSL and UL with comprehensive and insightfulanalysis to highlight their design rationales in different learning scenariosand applications in different computer vision tasks. Lastly, we discuss theemerging trends and open challenges in SSL and UL to shed light on futurecritical research directions.<br

    Disentangled Representation Learning

    Full text link
    Disentangled Representation Learning (DRL) aims to learn a model capable of identifying and disentangling the underlying factors hidden in the observable data in representation form. The process of separating underlying factors of variation into variables with semantic meaning benefits in learning explainable representations of data, which imitates the meaningful understanding process of humans when observing an object or relation. As a general learning strategy, DRL has demonstrated its power in improving the model explainability, controlability, robustness, as well as generalization capacity in a wide range of scenarios such as computer vision, natural language processing, data mining etc. In this article, we comprehensively review DRL from various aspects including motivations, definitions, methodologies, evaluations, applications and model designs. We discuss works on DRL based on two well-recognized definitions, i.e., Intuitive Definition and Group Theory Definition. We further categorize the methodologies for DRL into four groups, i.e., Traditional Statistical Approaches, Variational Auto-encoder Based Approaches, Generative Adversarial Networks Based Approaches, Hierarchical Approaches and Other Approaches. We also analyze principles to design different DRL models that may benefit different tasks in practical applications. Finally, we point out challenges in DRL as well as potential research directions deserving future investigations. We believe this work may provide insights for promoting the DRL research in the community.Comment: 22 pages,9 figure

    A comparative study of anomaly detection methods for gross error detection problems.

    Get PDF
    The chemical industry requires highly accurate and reliable measurements to ensure smooth operation and effective monitoring of processing facilities. However, measured data inevitably contains errors from various sources. Traditionally in flow systems, data reconciliation through mass balancing is applied to reduce error by estimating balanced flows. However, this approach can only handle random errors. For non-random errors (called gross errors, GEs) which are caused by measurement bias, instrument failures, or process leaks, among others, this approach would return incorrect results. In recent years, many gross error detection (GED) methods have been proposed by the research community. It is recognised that the basic principle of GED is a special case of the detection of outliers (or anomalies) in data analytics. With the developments of Machine Learning (ML) research, patterns in the data can be discovered to provide effective detection of anomalous instances. In this paper, we present a comprehensive study of the application of ML-based Anomaly Detection methods (ADMs) in the GED context on a number of synthetic datasets and compare the results with several established GED approaches. We also perform data transformation on the measurement data and compare its associated results to the original results, as well as investigate the effects of training size on the detection performance. One class Support Vector Machine outperformed other ADMs and five selected statistical tests for GED on Accuracy, F1 Score, and Overall Power while Interquartile Range (IQR) method obtained the best selectivity outcome among the top 6 AMDs and the five statistical tests. The results indicate that ADMs can potentially be applied to GED problems

    Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work

    Full text link
    Inspired by the fact that human brains can emphasize discriminative parts of the input and suppress irrelevant ones, substantial local mechanisms have been designed to boost the development of computer vision. They can not only focus on target parts to learn discriminative local representations, but also process information selectively to improve the efficiency. In terms of application scenarios and paradigms, local mechanisms have different characteristics. In this survey, we provide a systematic review of local mechanisms for various computer vision tasks and approaches, including fine-grained visual recognition, person re-identification, few-/zero-shot learning, multi-modal learning, self-supervised learning, Vision Transformers, and so on. Categorization of local mechanisms in each field is summarized. Then, advantages and disadvantages for every category are analyzed deeply, leaving room for exploration. Finally, future research directions about local mechanisms have also been discussed that may benefit future works. To the best our knowledge, this is the first survey about local mechanisms on computer vision. We hope that this survey can shed light on future research in the computer vision field

    Web knowledge bases

    Get PDF
    Knowledge is key to natural language understanding. References to specific people, places and things in text are crucial to resolving ambiguity and extracting meaning. Knowledge Bases (KBs) codify this information for automated systems — enabling applications such as entity-based search and question answering. This thesis explores the idea that sites on the web may act as a KB, even if that is not their primary intent. Dedicated kbs like Wikipedia are a rich source of entity information, but are built and maintained at an ongoing cost in human effort. As a result, they are generally limited in terms of the breadth and depth of knowledge they index about entities. Web knowledge bases offer a distributed solution to the problem of aggregating entity knowledge. Social networks aggregate content about people, news sites describe events with tags for organizations and locations, and a diverse assortment of web directories aggregate statistics and summaries for long-tail entities notable within niche movie, musical and sporting domains. We aim to develop the potential of these resources for both web-centric entity Information Extraction (IE) and structured KB population. We first investigate the problem of Named Entity Linking (NEL), where systems must resolve ambiguous mentions of entities in text to their corresponding node in a structured KB. We demonstrate that entity disambiguation models derived from inbound web links to Wikipedia are able to complement and in some cases completely replace the role of resources typically derived from the KB. Building on this work, we observe that any page on the web which reliably disambiguates inbound web links may act as an aggregation point for entity knowledge. To uncover these resources, we formalize the task of Web Knowledge Base Discovery (KBD) and develop a system to automatically infer the existence of KB-like endpoints on the web. While extending our framework to multiple KBs increases the breadth of available entity knowledge, we must still consolidate references to the same entity across different web KBs. We investigate this task of Cross-KB Coreference Resolution (KB-Coref) and develop models for efficiently clustering coreferent endpoints across web-scale document collections. Finally, assessing the gap between unstructured web knowledge resources and those of a typical KB, we develop a neural machine translation approach which transforms entity knowledge between unstructured textual mentions and traditional KB structures. The web has great potential as a source of entity knowledge. In this thesis we aim to first discover, distill and finally transform this knowledge into forms which will ultimately be useful in downstream language understanding tasks

    Learning structured task related abstractions

    Get PDF
    As robots and autonomous agents are to assist people with more tasks in various domains they need the ability to quickly gain contextual awareness in unseen environments and learn new tasks. Current state of the art methods rely predominantly on statistical learning techniques which tend to overfit to sensory signals and often fail to extract structured task related abstractions. The obtained environment and task models are typically represented as black box objects that cannot be easily updated or inspected and provide limited generalisation capabilities. We address the aforementioned shortcomings of current methods by explicitly studying the problem of learning structured task related abstractions. In particular, we are interested in extracting symbolic representations of the environment from sensory signals and encoding the task to be executed as a computer program. We consider the standard problem of learning to solve a task by mapping sensory signals to actions and propose the decomposition of such a mapping into two stages: i) perceiving symbols from sensory data and ii) using a program to manipulate those symbols in order to make decisions. This thesis studies the bidirectional interactions between the agent’s capabilities to perceive symbols and the programs it can execute in order to solve a task. In the first part of the thesis we demonstrate that access to a programmatic description of the task provides a strong inductive bias which facilitates the learning of structured task related representations of the environment. In order to do so, we first consider a collaborative human-robot interaction setup and propose a framework for Grounding and Learning Instances through Demonstration and Eye tracking (GLIDE) which enables robots to learn symbolic representations of the environment from few demonstrations. In order to relax the constraints on the task encoding program which GLIDE assumes, we introduce the perceptor gradients algorithm and prove that it can be applied with any task encoding program. In the second part of the thesis we investigate the complement problem of inducing task encoding programs assuming that a symbolic representations of the environment is available. Therefore, we propose the p-machine – a novel program induction framework which combines standard enumerative search techniques with a stochastic gradient descent optimiser in order to obtain an efficient program synthesiser. We show that the induction of task encoding programs is applicable to various problems such as learning physics laws, inspecting neural networks and learning in human-robot interaction setups

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field
    corecore