712 research outputs found

    Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives

    Full text link
    This paper tackles the problem of reading comprehension over long narratives where documents easily span over thousands of tokens. We propose a curriculum learning (CL) based Pointer-Generator framework for reading/sampling over large documents, enabling diverse training of the neural model based on the notion of alternating contextual difficulty. This can be interpreted as a form of domain randomization and/or generative pretraining during training. To this end, the usage of the Pointer-Generator softens the requirement of having the answer within the context, enabling us to construct diverse training samples for learning. Additionally, we propose a new Introspective Alignment Layer (IAL), which reasons over decomposed alignments using block-based self-attention. We evaluate our proposed method on the NarrativeQA reading comprehension benchmark, achieving state-of-the-art performance, improving existing baselines by 51%51\% relative improvement on BLEU-4 and 17%17\% relative improvement on Rouge-L. Extensive ablations confirm the effectiveness of our proposed IAL and CL components.Comment: Accepted to ACL 201

    Reinforcement learning from internal, partially correct, and multiple demonstrations

    Get PDF
    Typically, a reinforcement learning agent interacts with the environment and learns how to select an action to gain cumulative reward in one trajectory of a task. However, classic reinforcement learning emphasises knowledge free learning processes. The agent only learns from state-action-reward-next state samples. The learning process has the problem of sample inefficiency and needs a huge number of interactions to converge upon an optimal policy. One of the solutions to deal with this challenge is to employ human behaviour records in the same task as demonstrations for the agent to speed up the learning process. Demonstrations are not, however, from the optimal policy and may be in conflict in many states especially when demonstrations come from multiple resources. Meanwhile, the agent's behaviour in the learning process can be used as demonstration data. To address the research gaps mentioned above, three novel techniques, including; introspective reinforcement learning, two-level Q-learning, and the radius restrained weighted vote, are proposed in this thesis. Introspective reinforcement learning uses a priority queue as a filter to select qualified agent behaviours during the learning process as demonstrations. It applies reward shaping to give the agent an extra reward when it performs similar behaviours as demonstrations in the filter. The two-level-Q-learning deals with the issue of conflicting demonstrations. Two Q-tables (or Q-net in function approximation) for storing state-expert value and state-action value are proposed respectively. The two-level-Q-learning allows the agent not only to learn a strategy from selected actions but also to learn to distribute credits to experts through trial and error. The Radius restrained weighted vote can derive a guidance policy from demonstrations which satisfy a restriction through a hyper-parameter radius. The Radius restrained weighted vote applied the Gaussian distances between the current state and demonstrations as weights of the votes. Softmax was applied to the total number of weighted votes from all candidate demonstrations to derive the guidance policy

    HAPI: Hardware-Aware Progressive Inference

    Full text link
    Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. A growing body of work aims to alleviate this by exploiting the difference in the classification difficulty among samples and early-exiting at different stages of the network. Nevertheless, existing studies on early exiting have primarily focused on the training scheme, without considering the use-case requirements or the deployment platform. This work presents HAPI, a novel methodology for generating high-performance early-exit networks by co-optimising the placement of intermediate exits together with the early-exit strategy at inference time. Furthermore, we propose an efficient design space exploration algorithm which enables the faster traversal of a large number of alternative architectures and generates the highest-performing design, tailored to the use-case requirements and target hardware. Quantitative evaluation shows that our system consistently outperforms alternative search mechanisms and state-of-the-art early-exit schemes across various latency budgets. Moreover, it pushes further the performance of highly optimised hand-crafted early-exit CNNs, delivering up to 5.11x speedup over lightweight models on imposed latency-driven SLAs for embedded devices.Comment: Accepted at the 39th International Conference on Computer-Aided Design (ICCAD), 202

    Introspective knowledge acquisition for case retrieval networks in textual case base reasoning.

    Get PDF
    Textual Case Based Reasoning (TCBR) aims at effective reuse of information contained in unstructured documents. The key advantage of TCBR over traditional Information Retrieval systems is its ability to incorporate domain-specific knowledge to facilitate case comparison beyond simple keyword matching. However, substantial human intervention is needed to acquire and transform this knowledge into a form suitable for a TCBR system. In this research, we present automated approaches that exploit statistical properties of document collections to alleviate this knowledge acquisition bottleneck. We focus on two important knowledge containers: relevance knowledge, which shows relatedness of features to cases, and similarity knowledge, which captures the relatedness of features to each other. The terminology is derived from the Case Retrieval Network (CRN) retrieval architecture in TCBR, which is used as the underlying formalism in this thesis applied to text classification. Latent Semantic Indexing (LSI) generated concepts are a useful resource for relevance knowledge acquisition for CRNs. This thesis introduces a supervised LSI technique called sprinkling that exploits class knowledge to bias LSI's concept generation. An extension of this idea, called Adaptive Sprinkling has been proposed to handle inter-class relationships in complex domains like hierarchical (e.g. Yahoo directory) and ordinal (e.g. product ranking) classification tasks. Experimental evaluation results show the superiority of CRNs created with sprinkling and AS, not only over LSI on its own, but also over state-of-the-art classifiers like Support Vector Machines (SVM). Current statistical approaches based on feature co-occurrences can be utilized to mine similarity knowledge for CRNs. However, related words often do not co-occur in the same document, though they co-occur with similar words. We introduce an algorithm to efficiently mine such indirect associations, called higher order associations. Empirical results show that CRNs created with the acquired similarity knowledge outperform both LSI and SVM. Incorporating acquired knowledge into the CRN transforms it into a densely connected network. While improving retrieval effectiveness, this has the unintended effect of slowing down retrieval. We propose a novel retrieval formalism called the Fast Case Retrieval Network (FCRN) which eliminates redundant run-time computations to improve retrieval speed. Experimental results show FCRN's ability to scale up over high dimensional textual casebases. Finally, we investigate novel ways of visualizing and estimating complexity of textual casebases that can help explain performance differences across casebases. Visualization provides a qualitative insight into the casebase, while complexity is a quantitative measure that characterizes classification or retrieval hardness intrinsic to a dataset. We study correlations of experimental results from the proposed approaches against complexity measures over diverse casebases

    Deep Learning for Inverse Problems: Performance Characterizations, Learning Algorithms, and Applications

    Get PDF
    Deep learning models have witnessed immense empirical success over the last decade. However, in spite of their widespread adoption, a profound understanding of the generalization behaviour of these over-parameterized architectures is still missing. In this thesis, we provide one such way via a data-dependent characterizations of the generalization capability of deep neural networks based data representations. In particular, by building on the algorithmic robustness framework, we offer a generalisation error bound that encapsulates key ingredients associated with the learning problem such as the complexity of the data space, the cardinality of the training set, and the Lipschitz properties of a deep neural network. We then specialize our analysis to a specific class of model based regression problems, namely the inverse problems. These problems often come with well defined forward operators that map variables of interest to the observations. It is therefore natural to ask whether such knowledge of the forward operator can be exploited in deep learning approaches increasingly used to solve inverse problems. We offer a generalisation error bound that -- apart from the other factors -- depends on the Jacobian of the composition of the forward operator with the neural network. Motivated by our analysis, we then propose a `plug-and-play' regulariser that leverages the knowledge of the forward map to improve the generalization of the network. We likewise also provide a method allowing us to tightly upper bound the norms of the Jacobians of the relevant operators that is much more {computationally} efficient than existing ones. We demonstrate the efficacy of our model-aware regularised deep learning algorithms against other state-of-the-art approaches on inverse problems involving various sub-sampling operators such as those used in classical compressed sensing setup and inverse problems that are of interest in the biomedical imaging setup
    • …
    corecore