Search CORE

708 research outputs found

Deep Fragment Embeddings for Bidirectional Image Sentence Mapping

Author: Fei-Fei Li
Joulin Armand
Karpathy Andrej
Publication venue
Publication date: 22/06/2014
Field of study

We introduce a model for bidirectional retrieval of images and sentences through a multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. In addition to a ranking objective seen in previous work, this allows us to add a new fragment alignment objective that learns to directly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments significantly improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions since the inferred inter-modal fragment alignment is explicit

arXiv.org e-Print Archive

CiteSeerX

Knowledge and Reasoning for Image Understanding

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Image Understanding is a long-established discipline in computer vision, which encompasses a body of advanced image processing techniques, that are used to locate (“where”), characterize and recognize (“what”) objects, regions, and their attributes in the image. However, the notion of “understanding” (and the goal of artificial intelligent machines) goes beyond factual recall of the recognized components and includes reasoning and thinking beyond what can be seen (or perceived). Understanding is often evaluated by asking questions of increasing difficulty. Thus, the expected functionalities of an intelligent Image Understanding system can be expressed in terms of the functionalities that are required to answer questions about an image. Answering questions about images require primarily three components: Image Understanding, question (natural language) understanding, and reasoning based on knowledge. Any question, asking beyond what can be directly seen, requires modeling of commonsense (or background/ontological/factual) knowledge and reasoning. Knowledge and reasoning have seen scarce use in image understanding applications. In this thesis, we demonstrate the utilities of incorporating background knowledge and using explicit reasoning in image understanding applications. We first present a comprehensive survey of the previous work that utilized background knowledge and reasoning in understanding images. This survey outlines the limited use of commonsense knowledge in high-level applications. We then present a set of vision and reasoning-based methods to solve several applications and show that these approaches benefit in terms of accuracy and interpretability from the explicit use of knowledge and reasoning. We propose novel knowledge representations of image, knowledge acquisition methods, and a new implementation of an efficient probabilistic logical reasoning engine that can utilize publicly available commonsense knowledge to solve applications such as visual question answering, image puzzles. Additionally, we identify the need for new datasets that explicitly require external commonsense knowledge to solve. We propose the new task of Image Riddles, which requires a combination of vision, and reasoning based on ontological knowledge; and we collect a sufficiently large dataset to serve as an ideal testbed for vision and reasoning research. Lastly, we propose end-to-end deep architectures that can combine vision, knowledge and reasoning modules together and achieve large performance boosts over state-of-the-art methods.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Embedding Approaches for Relational Data

Author: Wu Y
Publication venue
Publication date
Field of study

Embedding methods for searching latent representations of the data are very important tools for unsupervised and supervised machine learning as well as information visualisation. Over the years, such methods have continually progressed towards the ability to capture and analyse the structure and latent characteristics of larger and more complex data. In this thesis, we examine the problem of developing efficient and reliable embedding methods for revealing, understanding, and exploiting the different aspects of the relational data. We split our work into three pieces, where each deals with a different relational data structure. In the first part, we are handling with the weighted bipartite relational structure. Based on the relational measurements between two groups of heterogeneous objects, our goal is to generate low dimensional representations of these two different types of objects in a unified common space. We propose a novel method that models the embedding of each object type symmetrically to the other type, subject to flexible scale constraints and weighting parameters. The embedding generation relies on an efficient optimisation despatched using matrix decomposition. And we have also proposed a simple way of measuring the conformity between the original object relations and the ones re-estimated from the embeddings, in order to achieve model selection by identifying the optimal model parameters with a simple search procedure. We show that our proposed method achieves consistently better or on-par results on multiple synthetic datasets and real world ones from the text mining domain when compared with existing embedding generation approaches. In the second part of this thesis, we focus on the multi-relational data, where objects are interlinked by various relation types. Embedding approaches are very popular in this field, they typically encode objects and relation types with hidden representations and use the operations between them to compute the positive scalars corresponding to the linkages' likelihood score. In this work, we aim at further improving the existing embedding techniques by taking into account the multiple facets of the different patterns and behaviours of each relation type. To the best of our knowledge, this is the first latent representation model which considers relational representations to be dependent on the objects they relate in this field. The multi-modality of the relation type over different objects is effectively formulated as a projection matrix over the space spanned by the object vectors. Two large benchmark knowledge bases are used to evaluate the performance with respect to the link prediction task. And a new test data partition scheme is proposed to offer a better understanding of the behaviour of a link prediction model. In the last part of this thesis, a much more complex relational structure is considered. In particular, we aim at developing novel embedding methods for jointly modelling the linkage structure and objects' attributes. Traditionally, link prediction task is carried out on either the linkage structure or the objects' attributes, which does not aware of their semantic connections and is insufficient for handling the complex link prediction task. Thus, our goal in this work is to build a reliable model that can fuse both sources of information to improve the link prediction problem. The key idea of our approach is to encode both the linkage validities and the nodes neighbourhood information into embedding-based conditional probabilities. Another important aspect of our proposed algorithm is that we utilise a margin-based contrastive training process for encoding the linkage structure, which relies on a more appropriate assumption and dramatically reduces the number of training links. In the experiments, our proposed method indeed improves the link prediction performance on three citation/hyperlink datasets, when compared with those methods relying on only the nodes' attributes or the linkage structure, and it also achieves much better performances compared with the state-of-arts

University of Liverpool Repository

자기회귀모델 기반 텍스트 생성을 위한 효과적인 학습 방법에 관한 연구

Author: 김양훈
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2021.8. 김효석.The rise of deep neural networks has promoted tremendous advances in natural language processing research. Natural language generation is a subfield of natural language processing, which is inevitable in building a human-like artificial intelligence since they take responsibility for delivering the decision-making of machines in natural language. For neural network-based text generation techniques, which have achieved most state-of-the-art performance, autoregressive methods are generally adapted because of their correspondence to the word-by-word nature of human language production. In this dissertation, we investigate two different ways to train autoregressive text generation models, which are based on deep neural networks. We first focus on a token-level training of question generation, which aims to generate a question related to a given input passage. The proposed Answer-Separated Seq2Seq effectively mitigates a problem from the previous question generation models that a significant proportion of the generated questions include words in the target answer. While autoregressive methods are primarily trained with maximum likelihood estimation, they suffer from several problems, such as exposure bias. As a remedy, we propose a sequence-level GAN-based approach for text generation that promotes collaborative training in both continuous and discrete representations of text. To aggregate the achievement of the research mentioned above, we finally propose a novel way of training a sequence-level question generation model, adopting a pre-trained language model, one of the most significant breakthroughs in natural language processing, along with Proximal Policy Optimization.자연어 처리 연구는 딥 뉴럴넷의 도입으로 인해 대대적인 발전을 거쳤다. 자연어 처리 연구의 일종인 자연어 생성은 기계가 내린 결정을 사람이 이해할 수 있도록 전달하는 기능이 있다, 그렇기에 사람을 모방하는 인공지능 시스템을 구축하는 데에 있어 필수 불가결한 요소이다. 일반적으로 뉴럴넷 기반의 텍스트 생성 태스크에서는 자동회귀 방법론들이 주로 사용되는데, 이는 사람의 언어 생성 과정과 유사한 양상을 띠기 때문이다. 본 학위 논문에서는 두 가지 뉴럴넷 기반의 자동회귀 텍스트 생성 모델 학습 기법에 대해 제안한다. 첫 번째 방법론에서는 토큰 레벨에서의 질문 생성 모델 학습 방법에 대해 소개한다. 논문에서 제안하는 답변 분리 시퀀스-투-시퀀스 모델은 기존에 존재하는 질문 생성 모델로 생성된 질문이 답변에 해당하는 내용을 포함하는 문제점을 효과적으로 해결한다. 주로 최대 우도 추정법을 통해 학습되는 자동회귀 방법론에는 노출 편향 등과 같은 문제점이 존재한다. 이러한 문제점을 해결하기 위해 논문에서는 텍스트의 연속 공간 표현과 이산 공간 표현 모두에 대해 상호보완적으로 학습하는 시퀀스 레벨의 적대 신경망 기반의 텍스트 생성 기법을 제안한다. 마지막으로 앞선 방법론들을 종합하여 시퀀스 레벨의 질문 생성기법을 제안하며, 이러한 과정에서 최신 자연어 처리 방법 중 하나인 사전 학습 언어 모델과 근위 정책 최적화 방법을 이용한다.1 INTRODUCTION 1 1.1 Contributions 4 2 BACKGROUND 8 2.1 Sequence-to-Sequence model 8 2.1.1 Sequence-to-Sequence model with Attention Mechanism 8 2.2 Autoregressive text generation 11 2.2.1 Maximum Likelihood Training 11 2.2.2 Pros and cons of autoregressive methods 11 2.3 Non-autoregressive text generation 13 2.4 Transformers 13 2.5 Reinforcement Learning 16 2.5.1 Policy Gradient 17 3 TOKEN-LEVEL TRAINING OF CONDITIONAL TEXT GENERATION MODEL 19 3.1 Related Work 22 3.2 Task Definition 23 3.3 Base Model: Encoder-Decoder with Attention 23 3.4 Answer-Separated Seq2Seq 25 3.4.1 Encoder 27 3.4.2 Answer-Separated Decoder 28 3.5 Experimental Settings 30 3.5.1 Dataset 30 3.5.2 Implementation Details 30 3.5.3 Evaluation Methods 32 3.6 Results 32 3.6.1 Performance Comparison 32 3.6.2 Impact of Answer Separation 34 3.6.3 Question Generation for Machine Comprehension 36 3.7 Conclusion 38 4 SEQUENCE-LEVEL TRAINING OF UNCONDITIONAL TEXT GENERATION 40 4.1 Background 42 4.1.1 Generative Adversarial Networks 42 4.1.2 Continuous-space Methods 44 4.1.3 Discrete-space Methods 44 4.2 ConcreteGAN 45 4.2.1 Autoencoder Reconstruction 45 4.2.2 Adversarial Training in the Latent Code Space 47 4.2.3 Adversarial Training with Textual Outputs 48 4.3 Experiments 49 4.3.1 Dataset 50 4.3.2 Experimental Settings 50 4.3.3 Evaluation Metrics 51 4.3.4 Experimental Results for Quality & Diversity 52 4.3.5 Experimental Results for FD score 56 4.3.6 Human Evaluation 56 4.3.7 Analyses of Code Space 57 4.4 Conclusion 60 5 SEQUENCE-LEVEL TRAINING OF CONDITIONAL TEXT GENERATION 61 5.1 Introduction 61 5.2 Background 63 5.2.1 Pre-trained Language Model 63 5.2.2 Proximal Policy Optimization 70 5.3 Methods 72 5.3.1 Step One: Token-level Fine-tuning 72 5.3.2 Step Two: Sequence-level Fine-tuning with Question-specific Reward 72 5.4 Experiments 74 5.4.1 Implementation Details 75 5.4.2 Quantitative Analysis 76 5.4.3 Qualitative Analysis 76 5.5 Conclusion 78 6 CONCLUSION 80 7 APPENDIX* 82 7.1 Generated Samples 82 7.2 Comparison of ARAE and ARAE* 84 7.3 Human Evaluation Criteria 85박

SNU Open Repository and Archive

Semantic Representation and Inference for NLP

Author: Wang Dongsheng
Publication venue
Publication date: 01/01/2020
Field of study

Semantic representation and inference is essential for Natural Language Processing (NLP). The state of the art for semantic representation and inference is deep learning, and particularly Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and transformer Self-Attention models. This thesis investigates the use of deep learning for novel semantic representation and inference, and makes contributions in the following three areas: creating training data, improving semantic representations and extending inference learning. In terms of creating training data, we contribute the largest publicly available dataset of real-life factual claims for the purpose of automatic claim verification (MultiFC), and we present a novel inference model composed of multi-scale CNNs with different kernel sizes that learn from external sources to infer fact checking labels. In terms of improving semantic representations, we contribute a novel model that captures non-compositional semantic indicators. By definition, the meaning of a non-compositional phrase cannot be inferred from the individual meanings of its composing words (e.g., hot dog). Motivated by this, we operationalize the compositionality of a phrase contextually by enriching the phrase representation with external word embeddings and knowledge graphs. Finally, in terms of inference learning, we propose a series of novel deep learning architectures that improve inference by using syntactic dependencies, by ensembling role guided attention heads, incorporating gating layers, and concatenating multiple heads in novel and effective ways. This thesis consists of seven publications (five published and two under review).Comment: PhD thesis, the University of Copenhage

arXiv.org e-Print Archive

Copenhagen University Research Information System

Efficient machine learning: models and accelerations

Author: Li Zhe
Publication venue: SURFACE at Syracuse University
Publication date: 21/12/2018
Field of study

One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

Syracuse University Research Facility and Collaborative Environment

Recommended from our members

Representation Learning beyond Semantic Similarity: Character-aware and Function-specific Approaches

Author: Gerz Daniela Susanne
Publication venue: University of Cambridge
Publication date: 28/04/2020
Field of study

Representation learning is a research area within machine learning and natural language processing (NLP) concerned with building machine-understandable representations of discrete units of text. Continuous representations are at the core of modern machine learning applications, and representation learning has thereby become one of the central research areas in NLP. The induction of text representations is typically based on the distributional hypothesis, and consequently encodes general information about word similarity. Words or phrases with similar meaning obtain similar representations in a vector space constructed for this purpose. This established methodology excels for morphologically-simple languages such as English, and in data-rich settings. However, several useful lexical relations such as entailment or selectional preference, are not captured or get conflated with other relations. Another challenge is dealing with low-data regimes for morphologically-complex and under-resourced languages. In this thesis we construct novel representation learning methods that go beyond the limitations of the distributional hypothesis and investigate solutions that induce vector spaces with diverse properties. In particular, we look at how the vector space induction process influences the contained information, and how the information manifests in a number of core NLP tasks: semantic similarity, lexical entailment, selectional preference, and language modeling. We contribute novel evaluations of state-of-the-art models highlighting their current capabilities and limitations. An analysis of language modeling in 50 typologically-diverse languages demonstrates that representations can indeed pose a performance bottleneck. We introduce a novel approach to leveraging subword-level information in word representations: our solution lifts this bottleneck in low-resource scenarios. Finally, we introduce a novel paradigm of function-specific representation learning that aims to integrate fine-grained semantic relations and real-world knowledge into the word vector spaces. We hope this thesis can serve as a valuable overview on word representations, and inspire future work in modeling \textit{semantic similarity and beyond}.ERC Consolidator Grant LEXICAL (648909

Apollo (Cambridge)

Grammatical Functions and Possibilistic Reasoning for the Extraction and Representation of Semantic Knowledge in Text Documents

Author: Khoury Richard
Publication venue: 'University of Waterloo'
Publication date: 01/01/2007
Field of study

This study seeks to explore and develop innovative methods for the extraction of semantic knowledge from unlabelled written English documents and the representation of this knowledge using a formal mathematical expression to facilitate its use in practical applications. The first method developed in this research focuses on semantic information extraction. To perform this task, the study introduces a natural language processing (NLP) method designed to extract information-rich keywords from English sentences. The method involves initially learning a set of rules that guide the extraction of keywords from parts of sentences. Once this learning stage is completed, the method can be used to extract the keywords from complete sentences by pairing these sentences to the most similar sequence of rules. The key innovation in this method is the use of a part-of-speech hierarchy. By raising words to increasingly general grammatical categories in this hierarchy, the system can compare rules, compute the degree of similarity between them, and learn new rules. The second method developed in this study addresses the problem of knowledge representation. This method processes triplets of keywords through several successive steps to represent information contained in the triplets using possibility distributions. These distributions represent the possibility of a topic given a particular triplet of keywords. Using this methodology, the information contained in the natural language triplets can be quantified and represented in a mathematical format, which can be easily used in a number of applications, such as document classifiers. In further extensions to the research, a theoretical justification and mathematical development for both methods are provided, and examples are given to illustrate these notions. Sample applications are also developed based on these methods, and the experimental results generated through these implementations are expounded and thoroughly analyzed to confirm that the methods are reliable in practice

University of Waterloo's Institutional Repository