9,221 research outputs found
Modular lifelong machine learning
Deep learning has drastically improved the state-of-the-art in many important fields, including computer vision and natural language processing (LeCun et al., 2015). However, it is expensive to train a deep neural network on a machine learning problem. The overall training cost further increases when one wants to solve additional problems. Lifelong machine learning (LML) develops algorithms that aim to efficiently learn to solve a sequence of problems, which become available one at a time. New problems are solved with less resources by transferring previously learned knowledge. At the same time, an LML algorithm needs to retain good performance on all encountered problems, thus avoiding catastrophic forgetting. Current approaches do not possess all the desired properties of an LML algorithm. First, they primarily focus on preventing catastrophic forgetting (Diaz-Rodriguez et al., 2018; Delange et al., 2021). As a result, they neglect some knowledge transfer properties. Furthermore, they assume that all problems in a sequence share the same input space. Finally, scaling these methods to a large sequence of problems remains a challenge.
Modular approaches to deep learning decompose a deep neural network into sub-networks, referred to as modules. Each module can then be trained to perform an atomic transformation, specialised in processing a distinct subset of inputs. This modular approach to storing knowledge makes it easy to only reuse the subset of modules which are useful for the task at hand.
This thesis introduces a line of research which demonstrates the merits of a modular approach to lifelong machine learning, and its ability to address the aforementioned shortcomings of other methods. Compared to previous work, we show that a modular approach can be used to achieve more LML properties than previously demonstrated. Furthermore, we develop tools which allow modular LML algorithms to scale in order to retain said properties on longer sequences of problems.
First, we introduce HOUDINI, a neurosymbolic framework for modular LML. HOUDINI represents modular deep neural networks as functional programs and accumulates a library of pre-trained modules over a sequence of problems. Given a new problem, we use program synthesis to select a suitable neural architecture, as well as a high-performing combination of pre-trained and new modules. We show that our approach has most of the properties desired from an LML algorithm. Notably, it can perform forward transfer, avoid negative transfer and prevent catastrophic forgetting, even across problems with disparate input domains and problems which require different neural architectures.
Second, we produce a modular LML algorithm which retains the properties of HOUDINI but can also scale to longer sequences of problems. To this end, we fix the choice of a neural architecture and introduce a probabilistic search framework, PICLE, for searching through different module combinations. To apply PICLE, we introduce two probabilistic models over neural modules which allows us to efficiently identify promising module combinations.
Third, we phrase the search over module combinations in modular LML as black-box optimisation, which allows one to make use of methods from the setting of hyperparameter optimisation (HPO). We then develop a new HPO method which marries a multi-fidelity approach with model-based optimisation. We demonstrate that this leads to improvement in anytime performance in the HPO setting and discuss how this can in turn be used to augment modular LML methods.
Overall, this thesis identifies a number of important LML properties, which have not all been attained in past methods, and presents an LML algorithm which can achieve all of them, apart from backward transfer
An innovative EEG-based emotion recognition using a single channel-specific feature from the brain rhythm code method
IntroductionEfficiently recognizing emotions is a critical pursuit in brain–computer interface (BCI), as it has many applications for intelligent healthcare services. In this work, an innovative approach inspired by the genetic code in bioinformatics, which utilizes brain rhythm code features consisting of δ, θ, α, β, or γ, is proposed for electroencephalography (EEG)-based emotion recognition.MethodsThese features are first extracted from the sequencing technique. After evaluating them using four conventional machine learning classifiers, an optimal channel-specific feature that produces the highest accuracy in each emotional case is identified, so emotion recognition through minimal data is realized. By doing so, the complexity of emotion recognition can be significantly reduced, making it more achievable for practical hardware setups.ResultsThe best classification accuracies achieved for the DEAP and MAHNOB datasets range from 83–92%, and for the SEED dataset, it is 78%. The experimental results are impressive, considering the minimal data employed. Further investigation of the optimal features shows that their representative channels are primarily on the frontal region, and associated rhythmic characteristics are typical of multiple kinds. Additionally, individual differences are found, as the optimal feature varies with subjects.DiscussionCompared to previous studies, this work provides insights into designing portable devices, as only one electrode is appropriate to generate satisfactory performances. Consequently, it would advance the understanding of brain rhythms, which offers an innovative solution for classifying EEG signals in diverse BCI applications, including emotion recognition
GripNet: Graph information propagation on supergraph for heterogeneous graphs
Heterogeneous graph representation learning aims to learn low-dimensional vector representations of different types of entities and relations to empower downstream tasks. Existing popular methods either capture semantic relationships but indirectly leverage node/edge attributes in a complex way, or leverage node/edge attributes directly without taking semantic relationships into account. When involving multiple convolution operations, they also have poor scalability. To overcome these limitations, this paper proposes a flexible and efficient Graph information propagation Network (GripNet) framework. Specifically, we introduce a new supergraph data structure consisting of supervertices and superedges. A supervertex is a semantically-coherent subgraph. A superedge defines an information propagation path between two supervertices. GripNet learns new representations for the supervertex of interest by propagating information along the defined path using multiple layers. We construct multiple large-scale graphs and evaluate GripNet against competing methods to show its superiority in link prediction, node classification, and data integration
GlobalMind: Global Multi-head Interactive Self-attention Network for Hyperspectral Change Detection
High spectral resolution imagery of the Earth's surface enables users to
monitor changes over time in fine-grained scale, playing an increasingly
important role in agriculture, defense, and emergency response. However, most
current algorithms are still confined to describing local features and fail to
incorporate a global perspective, which limits their ability to capture
interactions between global features, thus usually resulting in incomplete
change regions. In this paper, we propose a Global Multi-head INteractive
self-attention change Detection network (GlobalMind) to explore the implicit
correlation between different surface objects and variant land cover
transformations, acquiring a comprehensive understanding of the data and
accurate change detection result. Firstly, a simple but effective Global Axial
Segmentation (GAS) strategy is designed to expand the self-attention
computation along the row space or column space of hyperspectral images,
allowing the global connection with high efficiency. Secondly, with GAS, the
global spatial multi-head interactive self-attention (Global-M) module is
crafted to mine the abundant spatial-spectral feature involving potential
correlations between the ground objects from the entire rich and complex
hyperspectral space. Moreover, to acquire the accurate and complete
cross-temporal changes, we devise a global temporal interactive multi-head
self-attention (GlobalD) module which incorporates the relevance and variation
of bi-temporal spatial-spectral features, deriving the integrate potential same
kind of changes in the local and global range with the combination of GAS. We
perform extensive experiments on five mostly used hyperspectral datasets, and
our method outperforms the state-of-the-art algorithms with high accuracy and
efficiency.Comment: 14 page, 18 figure
FairGen: Towards Fair Graph Generation
There have been tremendous efforts over the past decades dedicated to the
generation of realistic graphs in a variety of domains, ranging from social
networks to computer networks, from gene regulatory networks to online
transaction networks. Despite the remarkable success, the vast majority of
these works are unsupervised in nature and are typically trained to minimize
the expected graph reconstruction loss, which would result in the
representation disparity issue in the generated graphs, i.e., the protected
groups (often minorities) contribute less to the objective and thus suffer from
systematically higher errors. In this paper, we aim to tailor graph generation
to downstream mining tasks by leveraging label information and user-preferred
parity constraint. In particular, we start from the investigation of
representation disparity in the context of graph generative models. To mitigate
the disparity, we propose a fairness-aware graph generative model named
FairGen. Our model jointly trains a label-informed graph generation module and
a fair representation learning module by progressively learning the behaviors
of the protected and unprotected groups, from the `easy' concepts to the `hard'
ones. In addition, we propose a generic context sampling strategy for graph
generative models, which is proven to be capable of fairly capturing the
contextual information of each group with a high probability. Experimental
results on seven real-world data sets, including web-based graphs, demonstrate
that FairGen (1) obtains performance on par with state-of-the-art graph
generative models across six network properties, (2) mitigates the
representation disparity issues in the generated graphs, and (3) substantially
boosts the model performance by up to 17% in downstream tasks via data
augmentation
An investigation of speaker independent phrase break models in End-to-End TTS systems
This paper presents our work on phrase break prediction in the context of
end-to-end TTS systems, motivated by the following questions: (i) Is there any
utility in incorporating an explicit phrasing model in an end-to-end TTS
system?, and (ii) How do you evaluate the effectiveness of a phrasing model in
an end-to-end TTS system? In particular, the utility and effectiveness of
phrase break prediction models are evaluated in in the context of childrens
story synthesis, using listener comprehension. We show by means of perceptual
listening evaluations that there is a clear preference for stories synthesized
after predicting the location of phrase breaks using a trained phrasing model,
over stories directly synthesized without predicting the location of phrase
breaks.Comment: Submitted for review to IEEE Acces
Disentanglement of Latent Representations via Sparse Causal Interventions
The process of generating data such as images is controlled by independent
and unknown factors of variation. The retrieval of these variables has been
studied extensively in the disentanglement, causal representation learning, and
independent component analysis fields. Recently, approaches merging these
domains together have shown great success. Instead of directly representing the
factors of variation, the problem of disentanglement can be seen as finding the
interventions on one image that yield a change to a single factor. Following
this assumption, we introduce a new method for disentanglement inspired by
causal dynamics that combines causality theory with vector-quantized
variational autoencoders. Our model considers the quantized vectors as causal
variables and links them in a causal graph. It performs causal interventions on
the graph and generates atomic transitions affecting a unique factor of
variation in the image. We also introduce a new task of action retrieval that
consists of finding the action responsible for the transition between two
images. We test our method on standard synthetic and real-world disentanglement
datasets. We show that it can effectively disentangle the factors of variation
and perform precise interventions on high-level semantic attributes of an image
without affecting its quality, even with imbalanced data distributions.Comment: 16 pages, 10 pages for the main paper and 6 pages for the supplement,
14 figures, submitted to IJCAI 2023. V2: added link to repositor
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
Class prototype construction and matching are core aspects of few-shot action
recognition. Previous methods mainly focus on designing spatiotemporal relation
modeling modules or complex temporal alignment algorithms. Despite the
promising results, they ignored the value of class prototype construction and
matching, leading to unsatisfactory performance in recognizing similar
categories in every task. In this paper, we propose GgHM, a new framework with
Graph-guided Hybrid Matching. Concretely, we learn task-oriented features by
the guidance of a graph neural network during class prototype construction,
optimizing the intra- and inter-class feature correlation explicitly. Next, we
design a hybrid matching strategy, combining frame-level and tuple-level
matching to classify videos with multivariate styles. We additionally propose a
learnable dense temporal modeling module to enhance the video feature temporal
representation to build a more solid foundation for the matching process. GgHM
shows consistent improvements over other challenging baselines on several
few-shot datasets, demonstrating the effectiveness of our method. The code will
be publicly available at https://github.com/jiazheng-xing/GgHM.Comment: Accepted by ICCV202
Digital Twin-Oriented Complex Networked Systems based on Heterogeneous node features and interaction rules
This study proposes an extendable modelling framework for Digital
Twin-Oriented Complex Networked Systems (DT-CNSs) with a goal of generating
networks that faithfully represent real systems. Modelling process focuses on
(i) features of nodes and (ii) interaction rules for creating connections that
are built based on individual node's preferences. We conduct experiments on
simulation-based DT-CNSs that incorporate various features and rules about
network growth and different transmissibilities related to an epidemic spread
on these networks. We present a case study on disaster resilience of social
networks given an epidemic outbreak by investigating the infection occurrence
within specific time and social distance. The experimental results show how
different levels of the structural and dynamics complexities, concerned with
feature diversity and flexibility of interaction rules respectively, influence
network growth and epidemic spread. The analysis revealed that, to achieve
maximum disaster resilience, mitigation policies should be targeted at nodes
with preferred features as they have higher infection risks and should be the
focus of the epidemic control
- …