708 research outputs found

    Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

    Full text link
    We introduce a model for bidirectional retrieval of images and sentences through a multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. In addition to a ranking objective seen in previous work, this allows us to add a new fragment alignment objective that learns to directly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments significantly improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions since the inferred inter-modal fragment alignment is explicit

    Knowledge and Reasoning for Image Understanding

    Get PDF
    abstract: Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (โ€œwhereโ€), characterize and recognize (โ€œwhatโ€) objects, regions, and their attributes in the image. However, the notion of โ€œunderstandingโ€ (and the goal of artificial intelligent machines) goes beyond factual recall of the recognized components and includes reasoning and thinking beyond what can be seen (or perceived). Understanding is often evaluated by asking questions of increasing difficulty. Thus, the expected functionalities of an intelligent Image Understanding system can be expressed in terms of the functionalities that are required to answer questions about an image. Answering questions about images require primarily three components: Image Understanding, question (natural language) understanding, and reasoning based on knowledge. Any question, asking beyond what can be directly seen, requires modeling of commonsense (or background/ontological/factual) knowledge and reasoning. Knowledge and reasoning have seen scarce use in image understanding applications. In this thesis, we demonstrate the utilities of incorporating background knowledge and using explicit reasoning in image understanding applications. We first present a comprehensive survey of the previous work that utilized background knowledge and reasoning in understanding images. This survey outlines the limited use of commonsense knowledge in high-level applications. We then present a set of vision and reasoning-based methods to solve several applications and show that these approaches benefit in terms of accuracy and interpretability from the explicit use of knowledge and reasoning. We propose novel knowledge representations of image, knowledge acquisition methods, and a new implementation of an efficient probabilistic logical reasoning engine that can utilize publicly available commonsense knowledge to solve applications such as visual question answering, image puzzles. Additionally, we identify the need for new datasets that explicitly require external commonsense knowledge to solve. We propose the new task of Image Riddles, which requires a combination of vision, and reasoning based on ontological knowledge; and we collect a sufficiently large dataset to serve as an ideal testbed for vision and reasoning research. Lastly, we propose end-to-end deep architectures that can combine vision, knowledge and reasoning modules together and achieve large performance boosts over state-of-the-art methods.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Embedding Approaches for Relational Data

    Get PDF
    โ€‹Embedding methods for searching latent representations of the data are very important tools for unsupervised and supervised machine learning as well as information visualisation. Over the years, such methods have continually progressed towards the ability to capture and analyse the structure and latent characteristics of larger and more complex data. In this thesis, we examine the problem of developing efficient and reliable embedding methods for revealing, understanding, and exploiting the different aspects of the relational data. We split our work into three pieces, where each deals with a different relational data structure. In the first part, we are handling with the weighted bipartite relational structure. Based on the relational measurements between two groups of heterogeneous objects, our goal is to generate low dimensional representations of these two different types of objects in a unified common space. We propose a novel method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimisation despatched using matrix decomposition. And we have also proposed a simple way of measuring the conformity between the original object relations and the ones re-estimated from the embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. We show that our proposed method achieves consistently better or on-par results on multiple synthetic datasets and real world ones from the text mining domain when compared with existing embedding generation approaches. In the second part of this thesis, we focus on the multi-relational data, where objects are interlinked by various relation types. Embedding approaches are very popular in this field, they typically encode objects and relation types with hidden representations and use the operations between them to compute the positive scalars corresponding to the linkages' likelihood score. In this work, we aim at further improving the existing embedding techniques by taking into account the multiple facets of the different patterns and behaviours of each relation type. To the best of our knowledge, this is the first latent representation model which considers relational representations to be dependent on the objects they relate in this field. The multi-modality of the relation type over different objects is effectively formulated as a projection matrix over the space spanned by the object vectors. Two large benchmark knowledge bases are used to evaluate the performance with respect to the link prediction task. And a new test data partition scheme is proposed to offer a better understanding of the behaviour of a link prediction model. In the last part of this thesis, a much more complex relational structure is considered. In particular, we aim at developing novel embedding methods for jointly modelling the linkage structure and objects' attributes. Traditionally, link prediction task is carried out on either the linkage structure or the objects' attributes, which does not aware of their semantic connections and is insufficient for handling the complex link prediction task. Thus, our goal in this work is to build a reliable model that can fuse both sources of information to improve the link prediction problem. The key idea of our approach is to encode both the linkage validities and the nodes neighbourhood information into embedding-based conditional probabilities. Another important aspect of our proposed algorithm is that we utilise a margin-based contrastive training process for encoding the linkage structure, which relies on a more appropriate assumption and dramatically reduces the number of training links. In the experiments, our proposed method indeed improves the link prediction performance on three citation/hyperlink datasets, when compared with those methods relying on only the nodes' attributes or the linkage structure, and it also achieves much better performances compared with the state-of-arts

    ์ž๊ธฐํšŒ๊ท€๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํ…์ŠคํŠธ ์ƒ์„ฑ์„ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ํ•™์Šต ๋ฐฉ๋ฒ•์— ๊ด€ํ•œ ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2021.8. ๊น€ํšจ์„.The rise of deep neural networks has promoted tremendous advances in natural language processing research. Natural language generation is a subfield of natural language processing, which is inevitable in building a human-like artificial intelligence since they take responsibility for delivering the decision-making of machines in natural language. For neural network-based text generation techniques, which have achieved most state-of-the-art performance, autoregressive methods are generally adapted because of their correspondence to the word-by-word nature of human language production. In this dissertation, we investigate two different ways to train autoregressive text generation models, which are based on deep neural networks. We first focus on a token-level training of question generation, which aims to generate a question related to a given input passage. The proposed Answer-Separated Seq2Seq effectively mitigates a problem from the previous question generation models that a significant proportion of the generated questions include words in the target answer. While autoregressive methods are primarily trained with maximum likelihood estimation, they suffer from several problems, such as exposure bias. As a remedy, we propose a sequence-level GAN-based approach for text generation that promotes collaborative training in both continuous and discrete representations of text. To aggregate the achievement of the research mentioned above, we finally propose a novel way of training a sequence-level question generation model, adopting a pre-trained language model, one of the most significant breakthroughs in natural language processing, along with Proximal Policy Optimization.์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์—ฐ๊ตฌ๋Š” ๋”ฅ ๋‰ด๋Ÿด๋„ท์˜ ๋„์ž…์œผ๋กœ ์ธํ•ด ๋Œ€๋Œ€์ ์ธ ๋ฐœ์ „์„ ๊ฑฐ์ณค๋‹ค. ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์—ฐ๊ตฌ์˜ ์ผ์ข…์ธ ์ž์—ฐ์–ด ์ƒ์„ฑ์€ ๊ธฐ๊ณ„๊ฐ€ ๋‚ด๋ฆฐ ๊ฒฐ์ •์„ ์‚ฌ๋žŒ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ์ „๋‹ฌํ•˜๋Š” ๊ธฐ๋Šฅ์ด ์žˆ๋‹ค, ๊ทธ๋ ‡๊ธฐ์— ์‚ฌ๋žŒ์„ ๋ชจ๋ฐฉํ•˜๋Š” ์ธ๊ณต์ง€๋Šฅ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋ฐ์— ์žˆ์–ด ํ•„์ˆ˜ ๋ถˆ๊ฐ€๊ฒฐํ•œ ์š”์†Œ์ด๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๋‰ด๋Ÿด๋„ท ๊ธฐ๋ฐ˜์˜ ํ…์ŠคํŠธ ์ƒ์„ฑ ํƒœ์Šคํฌ์—์„œ๋Š” ์ž๋™ํšŒ๊ท€ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, ์ด๋Š” ์‚ฌ๋žŒ์˜ ์–ธ์–ด ์ƒ์„ฑ ๊ณผ์ •๊ณผ ์œ ์‚ฌํ•œ ์–‘์ƒ์„ ๋ ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ๋‰ด๋Ÿด๋„ท ๊ธฐ๋ฐ˜์˜ ์ž๋™ํšŒ๊ท€ ํ…์ŠคํŠธ ์ƒ์„ฑ ๋ชจ๋ธ ํ•™์Šต ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ์ œ์•ˆํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋ฐฉ๋ฒ•๋ก ์—์„œ๋Š” ํ† ํฐ ๋ ˆ๋ฒจ์—์„œ์˜ ์งˆ๋ฌธ ์ƒ์„ฑ ๋ชจ๋ธ ํ•™์Šต ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•œ๋‹ค. ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋‹ต๋ณ€ ๋ถ„๋ฆฌ ์‹œํ€€์Šค-ํˆฌ-์‹œํ€€์Šค ๋ชจ๋ธ์€ ๊ธฐ์กด์— ์กด์žฌํ•˜๋Š” ์งˆ๋ฌธ ์ƒ์„ฑ ๋ชจ๋ธ๋กœ ์ƒ์„ฑ๋œ ์งˆ๋ฌธ์ด ๋‹ต๋ณ€์— ํ•ด๋‹นํ•˜๋Š” ๋‚ด์šฉ์„ ํฌํ•จํ•˜๋Š” ๋ฌธ์ œ์ ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค. ์ฃผ๋กœ ์ตœ๋Œ€ ์šฐ๋„ ์ถ”์ •๋ฒ•์„ ํ†ตํ•ด ํ•™์Šต๋˜๋Š” ์ž๋™ํšŒ๊ท€ ๋ฐฉ๋ฒ•๋ก ์—๋Š” ๋…ธ์ถœ ํŽธํ–ฅ ๋“ฑ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ์ ์ด ์กด์žฌํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋…ผ๋ฌธ์—์„œ๋Š” ํ…์ŠคํŠธ์˜ ์—ฐ์† ๊ณต๊ฐ„ ํ‘œํ˜„๊ณผ ์ด์‚ฐ ๊ณต๊ฐ„ ํ‘œํ˜„ ๋ชจ๋‘์— ๋Œ€ํ•ด ์ƒํ˜ธ๋ณด์™„์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ์‹œํ€€์Šค ๋ ˆ๋ฒจ์˜ ์ ๋Œ€ ์‹ ๊ฒฝ๋ง ๊ธฐ๋ฐ˜์˜ ํ…์ŠคํŠธ ์ƒ์„ฑ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์•ž์„  ๋ฐฉ๋ฒ•๋ก ๋“ค์„ ์ข…ํ•ฉํ•˜์—ฌ ์‹œํ€€์Šค ๋ ˆ๋ฒจ์˜ ์งˆ๋ฌธ ์ƒ์„ฑ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋Ÿฌํ•œ ๊ณผ์ •์—์„œ ์ตœ์‹  ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ ์‚ฌ์ „ ํ•™์Šต ์–ธ์–ด ๋ชจ๋ธ๊ณผ ๊ทผ์œ„ ์ •์ฑ… ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•œ๋‹ค.1 INTRODUCTION 1 1.1 Contributions 4 2 BACKGROUND 8 2.1 Sequence-to-Sequence model 8 2.1.1 Sequence-to-Sequence model with Attention Mechanism 8 2.2 Autoregressive text generation 11 2.2.1 Maximum Likelihood Training 11 2.2.2 Pros and cons of autoregressive methods 11 2.3 Non-autoregressive text generation 13 2.4 Transformers 13 2.5 Reinforcement Learning 16 2.5.1 Policy Gradient 17 3 TOKEN-LEVEL TRAINING OF CONDITIONAL TEXT GENERATION MODEL 19 3.1 Related Work 22 3.2 Task Definition 23 3.3 Base Model: Encoder-Decoder with Attention 23 3.4 Answer-Separated Seq2Seq 25 3.4.1 Encoder 27 3.4.2 Answer-Separated Decoder 28 3.5 Experimental Settings 30 3.5.1 Dataset 30 3.5.2 Implementation Details 30 3.5.3 Evaluation Methods 32 3.6 Results 32 3.6.1 Performance Comparison 32 3.6.2 Impact of Answer Separation 34 3.6.3 Question Generation for Machine Comprehension 36 3.7 Conclusion 38 4 SEQUENCE-LEVEL TRAINING OF UNCONDITIONAL TEXT GENERATION 40 4.1 Background 42 4.1.1 Generative Adversarial Networks 42 4.1.2 Continuous-space Methods 44 4.1.3 Discrete-space Methods 44 4.2 ConcreteGAN 45 4.2.1 Autoencoder Reconstruction 45 4.2.2 Adversarial Training in the Latent Code Space 47 4.2.3 Adversarial Training with Textual Outputs 48 4.3 Experiments 49 4.3.1 Dataset 50 4.3.2 Experimental Settings 50 4.3.3 Evaluation Metrics 51 4.3.4 Experimental Results for Quality & Diversity 52 4.3.5 Experimental Results for FD score 56 4.3.6 Human Evaluation 56 4.3.7 Analyses of Code Space 57 4.4 Conclusion 60 5 SEQUENCE-LEVEL TRAINING OF CONDITIONAL TEXT GENERATION 61 5.1 Introduction 61 5.2 Background 63 5.2.1 Pre-trained Language Model 63 5.2.2 Proximal Policy Optimization 70 5.3 Methods 72 5.3.1 Step One: Token-level Fine-tuning 72 5.3.2 Step Two: Sequence-level Fine-tuning with Question-specific Reward 72 5.4 Experiments 74 5.4.1 Implementation Details 75 5.4.2 Quantitative Analysis 76 5.4.3 Qualitative Analysis 76 5.5 Conclusion 78 6 CONCLUSION 80 7 APPENDIX* 82 7.1 Generated Samples 82 7.2 Comparison of ARAE and ARAE* 84 7.3 Human Evaluation Criteria 85๋ฐ•

    Semantic Representation and Inference for NLP

    Full text link
    Semantic representation and inference is essential for Natural Language Processing (NLP). The state of the art for semantic representation and inference is deep learning, and particularly Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer Self-Attention models. This thesis investigates the use of deep learning for novel semantic representation and inference, and makes contributions in the following three areas: creating training data, improving semantic representations and extending inference learning. In terms of creating training data, we contribute the largest publicly available dataset of real-life factual claims for the purpose of automatic claim verification (MultiFC), and we present a novel inference model composed of multi-scale CNNs with different kernel sizes that learn from external sources to infer fact checking labels. In terms of improving semantic representations, we contribute a novel model that captures non-compositional semantic indicators. By definition, the meaning of a non-compositional phrase cannot be inferred from the individual meanings of its composing words (e.g., hot dog). Motivated by this, we operationalize the compositionality of a phrase contextually by enriching the phrase representation with external word embeddings and knowledge graphs. Finally, in terms of inference learning, we propose a series of novel deep learning architectures that improve inference by using syntactic dependencies, by ensembling role guided attention heads, incorporating gating layers, and concatenating multiple heads in novel and effective ways. This thesis consists of seven publications (five published and two under review).Comment: PhD thesis, the University of Copenhage

    Efficient machine learning: models and accelerations

    Get PDF
    One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

    Grammatical Functions and Possibilistic Reasoning for the Extraction and Representation of Semantic Knowledge in Text Documents

    Get PDF
    This study seeks to explore and develop innovative methods for the extraction of semantic knowledge from unlabelled written English documents and the representation of this knowledge using a formal mathematical expression to facilitate its use in practical applications. The first method developed in this research focuses on semantic information extraction. To perform this task, the study introduces a natural language processing (NLP) method designed to extract information-rich keywords from English sentences. The method involves initially learning a set of rules that guide the extraction of keywords from parts of sentences. Once this learning stage is completed, the method can be used to extract the keywords from complete sentences by pairing these sentences to the most similar sequence of rules. The key innovation in this method is the use of a part-of-speech hierarchy. By raising words to increasingly general grammatical categories in this hierarchy, the system can compare rules, compute the degree of similarity between them, and learn new rules. The second method developed in this study addresses the problem of knowledge representation. This method processes triplets of keywords through several successive steps to represent information contained in the triplets using possibility distributions. These distributions represent the possibility of a topic given a particular triplet of keywords. Using this methodology, the information contained in the natural language triplets can be quantified and represented in a mathematical format, which can be easily used in a number of applications, such as document classifiers. In further extensions to the research, a theoretical justification and mathematical development for both methods are provided, and examples are given to illustrate these notions. Sample applications are also developed based on these methods, and the experimental results generated through these implementations are expounded and thoroughly analyzed to confirm that the methods are reliable in practice
    • โ€ฆ
    corecore