122 research outputs found

    Orthogonality regularizer for question answering

    Get PDF
    International audienceLearning embeddings of words and knowledge base elements is a promising approach for open domain question answering. Based on the remark that relations and entities are distinct object types lying in the same embedding space, we analyze the benefit of adding a regularizer favoring the embeddings of entities to be orthogonal to those of relations. The main motivation comes from the observation that modifying the embeddings using prior knowledge often helps performance. The experiments show that incorporating the regularizer yields better results on a challenging question answering benchmark

    ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ฌธ๋งฅ ์ •๋ณด ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์–ดํ…์…˜์„ ํ™œ์šฉํ•˜๋Š” ๊ณ„์ธต์  ๋ฌธ๋งฅ ์ธ์ฝ”๋”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.์ตœ๊ทผ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ์œ„ํ•œ ํ‘œ์ค€ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์—์„œ ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋กœ ๋ฐœ์ „ํ–ˆ๋‹ค. ํŠธ๋žœ์Šคํฌ๋จธ ์•„ํ‚คํ…์ฒ˜๋Š” ํ† ํฐ ๊ฐ„์˜ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐ ๊ฐ•์ ์„ ๋ณด์—ฌ์ฃผ๊ณ  ์ถ”์ถœํ•œ ์ •๋ณด๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์ ์ ˆํ•œ ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•˜๋Š” attention layer๋“ค๋กœ ๊ตฌ์„ฑ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐœ์ „์€ ์ตœ๊ทผ ๋”ฅ ๋Ÿฌ๋‹ ์‚ฌํšŒ์— ์ฃผ์–ด์ง„ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ๋ฐ–์˜ ์ถ”๊ฐ€ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋„์ „์„ ์ œ์‹œํ–ˆ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ž‘์—…์—์„œ ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์™ธ์— ์ถ”๊ฐ€์ ์ธ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•๊ณผ ๋ถ„์„์„ attention layer์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ €, ์ด์ „ ๋ฌธ์žฅ์— ๋Œ€ํ•œ ์ปจํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๋‚ด์žฅํ•˜๊ณ , ๋ฉ”๋ชจ๋ฆฌ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๋‚ด์žฅ๋œ ๋ฌธ๋งฅ ํ‘œํ˜„์„ ์ž…๋ ฅ ํ‘œํ˜„์— ์œตํ•ฉํ•˜๋Š” ๊ณ„์ธต์  ๋ฉ”๋ชจ๋ฆฌ ์ปจํ…์ŠคํŠธ ์ธ์ฝ”๋”(HMCE)๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆ๋œ HMCE๋Š” ๋‹ค์–‘ํ•œ ๋ฌธ๋งฅ ์ธ์ง€ ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ์ž‘์—…์—์„œ ์ถ”๊ฐ€ ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๋Š” ํŠธ๋žœ์Šคํฌ๋จธ์™€ ๋น„๊ตํ•˜์˜€์„ ๋•Œ ๋” ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ฌธ๋งฅ ํ‘œํ˜„๊ณผ ์ž…๋ ฅ ํ‘œํ˜„ ์‚ฌ์ด์˜ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌธ๋งฅ ํ‘œํ˜„๊ณผ ์ž…๋ ฅ ํ‘œํ˜„ ์‚ฌ์ด์˜ ํ‘œํ˜„ ์œ ์‚ฌ์„ฑ์„ Centered Kernel Alignment(CKA)๋ฅผ ์ด์šฉํ•˜์—ฌ ์‹ฌ์ธต ๋ถ„์„ํ•˜๋ฉฐ, CKA๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฌธ๋งฅ ์ •๋ณด๊ฐ€ ์‹œ๊ฐ ์–‘์‹์œผ๋กœ ์ฃผ์–ด์ง€๋Š” ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋Œ€ํ•ด CKA ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ •๋ ฌ ๋ฐฉ๋ฒ•์œผ๋กœ ํ™•์žฅํ•œ๋‹ค. ์ด Modality Alignment ๋ฐฉ๋ฒ•์€ ๋ฉ€ํ‹ฐ ๋ชจ๋‹ฌ๊ฐ„ ํ‘œํ˜„ ์œ ์‚ฌ์„ฑ์„ ๊ทน๋Œ€ํ™”ํ•˜์—ฌ ๋น„๋””์˜ค ์งˆ๋ฌธ ์‘๋‹ต ์ž‘์—…์—์„œ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๊ฐ€์ ธ์˜จ๋‹ค.Recently, the standard architecture for Natural Language Processing (NLP) has evolved from Recurrent Neural Network to Transformer architecture. Transformer architecture consists of attention layers which show its strength at finding the correlation between tokens and incorporate the correlation information to generate proper output. While many researches leveraging Transformer architecture report the new state-of-the-arts performances on various NLP tasks, These recent improvements propose a new challenge to deep learning society: exploiting additional context information. Because human intelligence perceives signals in everyday life with much rich contextual information (e.g. additional memory, visual information, and common sense), exploiting the context information is a step forward to the ultimate goal for Artificial Intelligence. In this dissertation, I propose novel methodologies and analyses to improve context-awareness of Transformer architecture focusing on the attention mechanism for various natural language processing tasks. The proposed methods utilize the additionally given context information, which is not limited to the modality of natural language, aside the given input information. First, I propose Hierarchical Memory Context Encoder (HMCE) which efficiently embeds the contextual information over preceding sentences via a hierarchical architecture of Transformer and fuses the embedded context representation into the input representation via memory attention mechanism. The proposed HMCE outperforms the original Transformer which does not leverage the additional context information on various context-aware machine translation tasks. It also shows the best performance evaluated in BLEU among the baselines using the additional context. Then, to improve the attention mechanism between context representation and input representation, I deeply analyze the representational similarity between the context representation and the input representation. Based on my analyses on representational similarity inside Transformer architecture, I propose a method for optimizing Centered Kernel Alignment (CKA) between internal representations of Transformer. The proposed CKA optimization method increases the performance of Transformer in various machine translation tasks and language modelling tasks. Lastly, I extend the CKA optimization method to Modality Alignment method for multi-modal scenarios where the context information takes the modality of visual information. My Modality Alignment method enhances the cross-modality attention mechanism by maximizing the representational similarity between visual representation and natural language representation, resulting in performance improvements larger than 3.5% accuracy on video question answering tasks.1 Introduction 1 2 Backgrounds 8 3 Context-aware Hierarchical Transformer Architecture 12 3.1 Related Works 15 3.1.1 Using Multiple Sentences for Context-awareness in Machine Translation 15 3.1.2 Structured Neural Machine Translation Models for Contextawareness 16 3.1.3 Evaluating Context-awareness with Generated Translation 16 3.2 Proposed Approach: Context-aware Hierarchical Text Encoder with Memory Networks 16 3.2.1 Context-aware NMT Encoders 17 3.2.2 Hierarchical Memory Context Encoder 21 3.3 Experiments 25 3.3.1 Data 26 3.3.2 Hyperparameters and Training Details 28 3.3.3 Overall BLEU Evaluation 28 3.3.4 Model Complexity Analysis 30 3.3.5 BLEU Evaluation on Helpful/Unhelpful Context 31 3.3.6 Qualitative Analysis 32 3.3.7 Limitations and Future Directions 34 3.4 Conclusion 35 4 Optimizing Representational Diversity of Transformer Architecture 36 4.1 Related Works 38 4.1.1 Analyses of Diversity in Multi-Head Attention 38 4.1.2 Similarities between Deep Neural Representations 39 4.2 Similarity Measures for Multi-Head Attention 40 4.2.1 Multi-Head Attention 40 4.2.2 Singular Vector Canonical Correlation Analysis (SVCCA) 41 4.2.3 Centered Kernel Alignment (CKA) 43 4.3 Proposed Approach: Controlling Inter-Head Diversity 43 4.3.1 HSIC Regularizer 44 4.3.2 Orthogonality Regularizer 44 4.3.3 Drophead 45 4.4 Inter-Head Similarity Analyses 46 4.4.1 Experimental Details for Similarity Analysis 46 4.4.2 Applying SVCCA and CKA 47 4.4.3 Analyses on Inter-Model Similarity 47 4.4.4 Does Multi-Head Strategy Diversify a Model's Representation Subspaces 49 4.5 Experiments on Controlling Inter-Head Similarity Methods 52 4.5.1 Experimental Details 52 4.5.2 Analysis on Controlling Inter-Head Diversity 54 4.5.3 Quantitative Evaluation 55 4.5.4 Limitations and Future Directions 57 4.6 Conclusions 60 5 Modality Alignment for Cross-modal Attention 61 5.1 Related Works 63 5.1.1 Representation Similarity between Modalities 63 5.1.2 Video Question Answering 64 5.2 Proposed Approach: Modality Align between Multi-modal Representations 65 5.2.1 Centered Kernel Alignment Review 65 5.2.2 Why CKA is Proper to Modality Alignment 66 5.2.3 Proposed Method 69 5.3 Experiments 71 5.3.1 Cosine Similarity Learning with CKA 72 5.3.2 Modality Align on Video Question Answering Task 75 5.4 Conclusion 82 6 Conclusion 83 Abstract (In Korean) 97๋ฐ•

    Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start

    Full text link
    Every day, thousands of users sign up as new Wikipedia contributors. Once joined, these users have to decide which articles to contribute to, which users to seek out and learn from or collaborate with, etc. Any such task is a hard and potentially frustrating one given the sheer size of Wikipedia. Supporting newcomers in their first steps by recommending articles they would enjoy editing or editors they would enjoy collaborating with is thus a promising route toward converting them into long-term contributors. Standard recommender systems, however, rely on users' histories of previous interactions with the platform. As such, these systems cannot make high-quality recommendations to newcomers without any previous interactions -- the so-called cold-start problem. The present paper addresses the cold-start problem on Wikipedia by developing a method for automatically building short questionnaires that, when completed by a newly registered Wikipedia user, can be used for a variety of purposes, including article recommendations that can help new editors get started. Our questionnaires are constructed based on the text of Wikipedia articles as well as the history of contributions by the already onboarded Wikipedia editors. We assess the quality of our questionnaire-based recommendations in an offline evaluation using historical data, as well as an online evaluation with hundreds of real Wikipedia newcomers, concluding that our method provides cohesive, human-readable questions that perform well against several baselines. By addressing the cold-start problem, this work can help with the sustainable growth and maintenance of Wikipedia's diverse editor community.Comment: Accepted at the 13th International AAAI Conference on Web and Social Media (ICWSM-2019

    Link Prediction for Free-Format Text

    Get PDF

    AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching

    Full text link
    Despite significant progress of deep learning in recent years, state-of-the-art semantic matching methods still rely on legacy features such as SIFT or HoG. We argue that the strong invariance properties that are key to the success of recent deep architectures on the classification task make them unfit for dense correspondence tasks, unless a large amount of supervision is used. In this work, we propose a deep network, termed AnchorNet, that produces image representations that are well-suited for semantic matching. It relies on a set of filters whose response is geometrically consistent across different object instances, even in the presence of strong intra-class, scale, or viewpoint variations. Trained only with weak image-level labels, the final representation successfully captures information about the object structure and improves results of state-of-the-art semantic matching methods such as the deformable spatial pyramid or the proposal flow methods. We show positive results on the cross-instance matching task where different instances of the same object category are matched as well as on a new cross-category semantic matching task aligning pairs of instances each from a different object class.Comment: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 201

    Attentive Single-Tasking of Multiple Tasks

    Full text link
    In this work we address task interference in universal networks by considering that a network is trained on multiple tasks, but performs one task at a time, an approach we refer to as "single-tasking multiple tasks". The network thus modifies its behaviour through task-dependent feature adaptation, or task attention. This gives the network the ability to accentuate the features that are adapted to a task, while shunning irrelevant ones. We further reduce task interference by forcing the task gradients to be statistically indistinguishable through adversarial training, ensuring that the common backbone architecture serving all tasks is not dominated by any of the task-specific gradients. Results in three multi-task dense labelling problems consistently show: (i) a large reduction in the number of parameters while preserving, or even improving performance and (ii) a smooth trade-off between computation and multi-task accuracy. We provide our system's code and pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.Comment: CVPR 2019 Camera Read
    • โ€ฆ
    corecore