37 research outputs found
Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives
Over the past few years, adversarial training has become an extremely active
research topic and has been successfully applied to various Artificial
Intelligence (AI) domains. As a potentially crucial technique for the
development of the next generation of emotional AI systems, we herein provide a
comprehensive overview of the application of adversarial training to affective
computing and sentiment analysis. Various representative adversarial training
algorithms are explained and discussed accordingly, aimed at tackling diverse
challenges associated with emotional AI systems. Further, we highlight a range
of potential future research directions. We expect that this overview will help
facilitate the development of adversarial training for affective computing and
sentiment analysis in both the academic and industrial communities
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
State-of-the-art deep learning models are often trained with a large amountof costly labeled training data. However, requiring exhaustive manualannotations may degrade the model's generalizability in the limited-labelregime. Semi-supervised learning and unsupervised learning offer promisingparadigms to learn from an abundance of unlabeled visual data. Recent progressin these paradigms has indicated the strong benefits of leveraging unlabeleddata to improve model generalization and provide better model initialization.In this survey, we review the recent advanced deep learning algorithms onsemi-supervised learning (SSL) and unsupervised learning (UL) for visualrecognition from a unified perspective. To offer a holistic understanding ofthe state-of-the-art in these areas, we propose a unified taxonomy. Wecategorize existing representative SSL and UL with comprehensive and insightfulanalysis to highlight their design rationales in different learning scenariosand applications in different computer vision tasks. Lastly, we discuss theemerging trends and open challenges in SSL and UL to shed light on futurecritical research directions.<br
Disentangled Representation Learning
Disentangled Representation Learning (DRL) aims to learn a model capable of
identifying and disentangling the underlying factors hidden in the observable
data in representation form. The process of separating underlying factors of
variation into variables with semantic meaning benefits in learning explainable
representations of data, which imitates the meaningful understanding process of
humans when observing an object or relation. As a general learning strategy,
DRL has demonstrated its power in improving the model explainability,
controlability, robustness, as well as generalization capacity in a wide range
of scenarios such as computer vision, natural language processing, data mining
etc. In this article, we comprehensively review DRL from various aspects
including motivations, definitions, methodologies, evaluations, applications
and model designs. We discuss works on DRL based on two well-recognized
definitions, i.e., Intuitive Definition and Group Theory Definition. We further
categorize the methodologies for DRL into four groups, i.e., Traditional
Statistical Approaches, Variational Auto-encoder Based Approaches, Generative
Adversarial Networks Based Approaches, Hierarchical Approaches and Other
Approaches. We also analyze principles to design different DRL models that may
benefit different tasks in practical applications. Finally, we point out
challenges in DRL as well as potential research directions deserving future
investigations. We believe this work may provide insights for promoting the DRL
research in the community.Comment: 22 pages,9 figure
A comparative study of anomaly detection methods for gross error detection problems.
The chemical industry requires highly accurate and reliable measurements to ensure smooth operation and effective monitoring of processing facilities. However, measured data inevitably contains errors from various sources. Traditionally in flow systems, data reconciliation through mass balancing is applied to reduce error by estimating balanced flows. However, this approach can only handle random errors. For non-random errors (called gross errors, GEs) which are caused by measurement bias, instrument failures, or process leaks, among others, this approach would return incorrect results. In recent years, many gross error detection (GED) methods have been proposed by the research community. It is recognised that the basic principle of GED is a special case of the detection of outliers (or anomalies) in data analytics. With the developments of Machine Learning (ML) research, patterns in the data can be discovered to provide effective detection of anomalous instances. In this paper, we present a comprehensive study of the application of ML-based Anomaly Detection methods (ADMs) in the GED context on a number of synthetic datasets and compare the results with several established GED approaches. We also perform data transformation on the measurement data and compare its associated results to the original results, as well as investigate the effects of training size on the detection performance. One class Support Vector Machine outperformed other ADMs and five selected statistical tests for GED on Accuracy, F1 Score, and Overall Power while Interquartile Range (IQR) method obtained the best selectivity outcome among the top 6 AMDs and the five statistical tests. The results indicate that ADMs can potentially be applied to GED problems
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Inspired by the fact that human brains can emphasize discriminative parts of
the input and suppress irrelevant ones, substantial local mechanisms have been
designed to boost the development of computer vision. They can not only focus
on target parts to learn discriminative local representations, but also process
information selectively to improve the efficiency. In terms of application
scenarios and paradigms, local mechanisms have different characteristics. In
this survey, we provide a systematic review of local mechanisms for various
computer vision tasks and approaches, including fine-grained visual
recognition, person re-identification, few-/zero-shot learning, multi-modal
learning, self-supervised learning, Vision Transformers, and so on.
Categorization of local mechanisms in each field is summarized. Then,
advantages and disadvantages for every category are analyzed deeply, leaving
room for exploration. Finally, future research directions about local
mechanisms have also been discussed that may benefit future works. To the best
our knowledge, this is the first survey about local mechanisms on computer
vision. We hope that this survey can shed light on future research in the
computer vision field
Web knowledge bases
Knowledge is key to natural language understanding. References to specific people, places and things in text are crucial to resolving ambiguity and extracting meaning. Knowledge Bases (KBs) codify this information for automated systems — enabling applications such as entity-based search and question answering. This thesis explores the idea that sites on the web may act as a KB, even if that is not their primary intent. Dedicated kbs like Wikipedia are a rich source of entity information, but are built and maintained at an ongoing cost in human effort. As a result, they are generally limited in terms of the breadth and depth of knowledge they index about entities. Web knowledge bases offer a distributed solution to the problem of aggregating entity knowledge. Social networks aggregate content about people, news sites describe events with tags for organizations and locations, and a diverse assortment of web directories aggregate statistics and summaries for long-tail entities notable within niche movie, musical and sporting domains. We aim to develop the potential of these resources for both web-centric entity Information Extraction (IE) and structured KB population. We first investigate the problem of Named Entity Linking (NEL), where systems must resolve ambiguous mentions of entities in text to their corresponding node in a structured KB. We demonstrate that entity disambiguation models derived from inbound web links to Wikipedia are able to complement and in some cases completely replace the role of resources typically derived from the KB. Building on this work, we observe that any page on the web which reliably disambiguates inbound web links may act as an aggregation point for entity knowledge. To uncover these resources, we formalize the task of Web Knowledge Base Discovery (KBD) and develop a system to automatically infer the existence of KB-like endpoints on the web. While extending our framework to multiple KBs increases the breadth of available entity knowledge, we must still consolidate references to the same entity across different web KBs. We investigate this task of Cross-KB Coreference Resolution (KB-Coref) and develop models for efficiently clustering coreferent endpoints across web-scale document collections. Finally, assessing the gap between unstructured web knowledge resources and those of a typical KB, we develop a neural machine translation approach which transforms entity knowledge between unstructured textual mentions and traditional KB structures. The web has great potential as a source of entity knowledge. In this thesis we aim to first discover, distill and finally transform this knowledge into forms which will ultimately be useful in downstream language understanding tasks
Learning structured task related abstractions
As robots and autonomous agents are to assist people with more tasks in various
domains they need the ability to quickly gain contextual awareness in unseen environments
and learn new tasks. Current state of the art methods rely predominantly
on statistical learning techniques which tend to overfit to sensory signals and often
fail to extract structured task related abstractions. The obtained environment and task
models are typically represented as black box objects that cannot be easily updated or
inspected and provide limited generalisation capabilities.
We address the aforementioned shortcomings of current methods by explicitly
studying the problem of learning structured task related abstractions. In particular, we
are interested in extracting symbolic representations of the environment from sensory
signals and encoding the task to be executed as a computer program. We consider the
standard problem of learning to solve a task by mapping sensory signals to actions
and propose the decomposition of such a mapping into two stages: i) perceiving
symbols from sensory data and ii) using a program to manipulate those symbols in
order to make decisions. This thesis studies the bidirectional interactions between the
agent’s capabilities to perceive symbols and the programs it can execute in order to
solve a task.
In the first part of the thesis we demonstrate that access to a programmatic
description of the task provides a strong inductive bias which facilitates the learning
of structured task related representations of the environment. In order to do so, we first
consider a collaborative human-robot interaction setup and propose a framework for
Grounding and Learning Instances through Demonstration and Eye tracking (GLIDE)
which enables robots to learn symbolic representations of the environment from few
demonstrations. In order to relax the constraints on the task encoding program which
GLIDE assumes, we introduce the perceptor gradients algorithm and prove that it can
be applied with any task encoding program.
In the second part of the thesis we investigate the complement problem of inducing
task encoding programs assuming that a symbolic representations of the
environment is available. Therefore, we propose the p-machine – a novel program
induction framework which combines standard enumerative search techniques with a
stochastic gradient descent optimiser in order to obtain an efficient program synthesiser.
We show that the induction of task encoding programs is applicable to various
problems such as learning physics laws, inspecting neural networks and learning in
human-robot interaction setups
A Review of Deep Learning Techniques for Speech Processing
The field of speech processing has undergone a transformative shift with the
advent of deep learning. The use of multiple processing layers has enabled the
creation of models capable of extracting intricate features from speech data.
This development has paved the way for unparalleled advancements in speech
recognition, text-to-speech synthesis, automatic speech recognition, and
emotion recognition, propelling the performance of these tasks to unprecedented
heights. The power of deep learning techniques has opened up new avenues for
research and innovation in the field of speech processing, with far-reaching
implications for a range of industries and applications. This review paper
provides a comprehensive overview of the key deep learning models and their
applications in speech-processing tasks. We begin by tracing the evolution of
speech processing research, from early approaches, such as MFCC and HMM, to
more recent advances in deep learning architectures, such as CNNs, RNNs,
transformers, conformers, and diffusion models. We categorize the approaches
and compare their strengths and weaknesses for solving speech-processing tasks.
Furthermore, we extensively cover various speech-processing tasks, datasets,
and benchmarks used in the literature and describe how different deep-learning
networks have been utilized to tackle these tasks. Additionally, we discuss the
challenges and future directions of deep learning in speech processing,
including the need for more parameter-efficient, interpretable models and the
potential of deep learning for multimodal speech processing. By examining the
field's evolution, comparing and contrasting different approaches, and
highlighting future directions and challenges, we hope to inspire further
research in this exciting and rapidly advancing field