712 research outputs found
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
This paper tackles the problem of reading comprehension over long narratives
where documents easily span over thousands of tokens. We propose a curriculum
learning (CL) based Pointer-Generator framework for reading/sampling over large
documents, enabling diverse training of the neural model based on the notion of
alternating contextual difficulty. This can be interpreted as a form of domain
randomization and/or generative pretraining during training. To this end, the
usage of the Pointer-Generator softens the requirement of having the answer
within the context, enabling us to construct diverse training samples for
learning. Additionally, we propose a new Introspective Alignment Layer (IAL),
which reasons over decomposed alignments using block-based self-attention. We
evaluate our proposed method on the NarrativeQA reading comprehension
benchmark, achieving state-of-the-art performance, improving existing baselines
by relative improvement on BLEU-4 and relative improvement on
Rouge-L. Extensive ablations confirm the effectiveness of our proposed IAL and
CL components.Comment: Accepted to ACL 201
Reinforcement learning from internal, partially correct, and multiple demonstrations
Typically, a reinforcement learning agent interacts with the environment and learns how to select an action to gain cumulative reward in one trajectory of a task. However, classic reinforcement learning emphasises knowledge free learning processes. The agent only learns from state-action-reward-next state samples. The learning process has the problem of sample inefficiency and needs a huge number of interactions to converge upon an optimal policy. One of the solutions to deal with this challenge is to employ human behaviour records in the same task as demonstrations for the agent to speed up the learning process.
Demonstrations are not, however, from the optimal policy and may be in conflict in many states especially when demonstrations come from multiple resources. Meanwhile, the agent's behaviour in the learning process can be used as demonstration data. To address the research gaps mentioned above, three novel techniques, including; introspective reinforcement learning, two-level Q-learning, and the radius restrained weighted vote, are proposed in this thesis. Introspective reinforcement learning uses a priority queue as a filter to select qualified agent behaviours during the learning process as demonstrations. It applies reward shaping to give the agent an extra reward when it performs similar behaviours as demonstrations in the filter. The two-level-Q-learning deals with the issue of conflicting demonstrations. Two Q-tables (or Q-net in function approximation) for storing state-expert value and state-action value are proposed respectively. The two-level-Q-learning allows the agent not only to learn a strategy from selected actions but also to learn to distribute credits to experts through trial and error. The Radius restrained weighted vote can derive a guidance policy from demonstrations which satisfy a restriction through a hyper-parameter radius. The Radius restrained weighted vote applied the Gaussian distances between the current state and demonstrations as weights of the votes. Softmax was applied to the total number of weighted votes from all candidate demonstrations to derive the guidance policy
HAPI: Hardware-Aware Progressive Inference
Convolutional neural networks (CNNs) have recently become the
state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN
inference still comes at a high computational cost. A growing body of work aims
to alleviate this by exploiting the difference in the classification difficulty
among samples and early-exiting at different stages of the network.
Nevertheless, existing studies on early exiting have primarily focused on the
training scheme, without considering the use-case requirements or the
deployment platform. This work presents HAPI, a novel methodology for
generating high-performance early-exit networks by co-optimising the placement
of intermediate exits together with the early-exit strategy at inference time.
Furthermore, we propose an efficient design space exploration algorithm which
enables the faster traversal of a large number of alternative architectures and
generates the highest-performing design, tailored to the use-case requirements
and target hardware. Quantitative evaluation shows that our system consistently
outperforms alternative search mechanisms and state-of-the-art early-exit
schemes across various latency budgets. Moreover, it pushes further the
performance of highly optimised hand-crafted early-exit CNNs, delivering up to
5.11x speedup over lightweight models on imposed latency-driven SLAs for
embedded devices.Comment: Accepted at the 39th International Conference on Computer-Aided
Design (ICCAD), 202
Introspective knowledge acquisition for case retrieval networks in textual case base reasoning.
Textual Case Based Reasoning (TCBR) aims at effective reuse of information contained in unstructured documents. The key advantage of TCBR over traditional Information Retrieval systems is its ability to incorporate domain-specific knowledge to facilitate case comparison beyond simple keyword matching. However, substantial human intervention is needed to acquire and transform this knowledge into a form suitable for a TCBR system. In this research, we present automated approaches that exploit statistical properties of document collections to alleviate this knowledge acquisition bottleneck. We focus on two important knowledge containers: relevance knowledge, which shows relatedness of features to cases, and similarity knowledge, which captures the relatedness of features to each other. The terminology is derived from the Case Retrieval Network (CRN) retrieval architecture in TCBR, which is used as the underlying formalism in this thesis applied to text classification. Latent Semantic Indexing (LSI) generated concepts are a useful resource for relevance knowledge acquisition for CRNs. This thesis introduces a supervised LSI technique called sprinkling that exploits class knowledge to bias LSI's concept generation. An extension of this idea, called Adaptive Sprinkling has been proposed to handle inter-class relationships in complex domains like hierarchical (e.g. Yahoo directory) and ordinal (e.g. product ranking) classification tasks. Experimental evaluation results show the superiority of CRNs created with sprinkling and AS, not only over LSI on its own, but also over state-of-the-art classifiers like Support Vector Machines (SVM). Current statistical approaches based on feature co-occurrences can be utilized to mine similarity knowledge for CRNs. However, related words often do not co-occur in the same document, though they co-occur with similar words. We introduce an algorithm to efficiently mine such indirect associations, called higher order associations. Empirical results show that CRNs created with the acquired similarity knowledge outperform both LSI and SVM. Incorporating acquired knowledge into the CRN transforms it into a densely connected network. While improving retrieval effectiveness, this has the unintended effect of slowing down retrieval. We propose a novel retrieval formalism called the Fast Case Retrieval Network (FCRN) which eliminates redundant run-time computations to improve retrieval speed. Experimental results show FCRN's ability to scale up over high dimensional textual casebases. Finally, we investigate novel ways of visualizing and estimating complexity of textual casebases that can help explain performance differences across casebases. Visualization provides a qualitative insight into the casebase, while complexity is a quantitative measure that characterizes classification or retrieval hardness intrinsic to a dataset. We study correlations of experimental results from the proposed approaches against complexity measures over diverse casebases
Deep Learning for Inverse Problems: Performance Characterizations, Learning Algorithms, and Applications
Deep learning models have witnessed immense empirical success over the last decade. However, in spite of their widespread adoption, a profound understanding of the generalization behaviour of these over-parameterized architectures is still missing. In this thesis, we provide one such way via a data-dependent characterizations of the generalization capability of deep neural networks based data representations. In particular, by building on the algorithmic robustness framework, we offer a generalisation error bound that encapsulates key ingredients associated with the learning problem such as the complexity of the data space, the cardinality of the training set, and the Lipschitz properties of a deep neural network.
We then specialize our analysis to a specific class of model based regression problems, namely the inverse problems. These problems often come with well defined forward operators that map variables of interest to the observations. It is therefore natural to ask whether such knowledge of the forward operator can be exploited in deep learning approaches increasingly used to solve inverse problems. We offer a generalisation error bound that -- apart from the other factors -- depends on the Jacobian of the composition of the forward operator with the neural network.
Motivated by our analysis, we then propose a `plug-and-play' regulariser that leverages the knowledge of the forward map to improve the generalization of the network. We likewise also provide a method allowing us to tightly upper bound the norms of the Jacobians of the relevant operators that is much more {computationally} efficient than existing ones. We demonstrate the efficacy of our model-aware regularised deep learning algorithms against other state-of-the-art approaches on inverse problems involving various sub-sampling operators such as those used in classical compressed sensing setup and inverse problems that are of interest in the biomedical imaging setup
- …