2,611 research outputs found

    Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs

    Full text link
    Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge graphs. This additional source of information may help to alleviate the inherent sparsity and incompleteness problem that are prevalent in knowledge graphs. Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs. In this paper, we propose a novel multi-task neural network approach for both encoding and prediction of non-discrete attribute information in a relational setting. Specifically, we train a neural network for triplet prediction along with a separate network for attribute value regression. Via multi-task learning, we are able to learn representations of entities, relations and attributes that encode information about both tasks. Moreover, such attributes are not only central to many predictive tasks as an information source but also as a prediction target. Therefore, models that are able to encode, incorporate and predict such information in a relational learning context are highly attractive as well. We show that our approach outperforms many state-of-the-art methods for the tasks of relational triplet classification and attribute value prediction.Comment: Accepted at CIKM 201

    Learning Correspondence Structures for Person Re-identification

    Full text link
    This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global constraint-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Finally, we also extend our approach by introducing a multi-structure scheme, which learns a set of local correspondence structures to capture the spatial correspondence sub-patterns between a camera pair, so as to handle the spatial misalignments between individual images in a more precise way. Experimental results on various datasets demonstrate the effectiveness of our approach.Comment: IEEE Trans. Image Processing, vol. 26, no. 5, pp. 2438-2453, 2017. The project page for this paper is available at http://min.sjtu.edu.cn/lwydemo/personReID.htm arXiv admin note: text overlap with arXiv:1504.0624

    Equivariant Contrastive Learning for Sequential Recommendation

    Full text link
    Contrastive learning (CL) benefits the training of sequential recommendation models with informative self-supervision signals. Existing solutions apply general sequential data augmentation strategies to generate positive pairs and encourage their representations to be invariant. However, due to the inherent properties of user behavior sequences, some augmentation strategies, such as item substitution, can lead to changes in user intent. Learning indiscriminately invariant representations for all augmentation strategies might be suboptimal. Therefore, we propose Equivariant Contrastive Learning for Sequential Recommendation (ECL-SR), which endows SR models with great discriminative power, making the learned user behavior representations sensitive to invasive augmentations (e.g., item substitution) and insensitive to mild augmentations (e.g., featurelevel dropout masking). In detail, we use the conditional discriminator to capture differences in behavior due to item substitution, which encourages the user behavior encoder to be equivariant to invasive augmentations. Comprehensive experiments on four benchmark datasets show that the proposed ECL-SR framework achieves competitive performance compared to state-of-the-art SR models. The source code is available at https://github.com/Tokkiu/ECL.Comment: Accepted by RecSys 202

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์ด์šฉํ•œ ๋ฌธ์žฅ ๊ฐ„ ๊ด€๊ณ„ ๋ชจ๋ธ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์ด์ƒ๊ตฌ.๋ฌธ์žฅ ๋งค์นญ์ด๋ž€ ๋‘ ๋ฌธ์žฅ ๊ฐ„ ์˜๋ฏธ์ ์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์ •๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค. ์–ด๋–ค ๋ชจ๋ธ์ด ๋‘ ๋ฌธ์žฅ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ฐํ˜€๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ์ˆ˜์ค€์˜ ์ž์—ฐ์–ด ํ…์ŠคํŠธ ์ดํ•ด ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์žฅ ๋งค์นญ์€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ์„ฑ๋Šฅ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”, ๋งค์นญ ํ•จ์ˆ˜, ์ค€์ง€๋„ ํ•™์Šต์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ๋ฌธ์žฅ ๋งค์นญ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ๋ชจ์ƒ‰ํ•œ๋‹ค. ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ž€ ๋ฌธ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ์œ ์šฉํ•œ ํŠน์งˆ๋“ค์„ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•˜์—ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Gumbel Tree-LSTM์€ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(recursive neural network) ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ ์•„ํ‚คํ…์ฒ˜์ด๋‹ค. ๊ตฌ์กฐ ์ •๋ณด๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋˜ ๊ธฐ์กด์˜ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, Gumbel Tree-LSTM์€ ๊ตฌ์กฐ๊ฐ€ ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํŠน์ • ๋ฌธ์ œ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํŒŒ์‹ฑ ์ „๋žต์„ ํ•™์Šตํ•œ๋‹ค. Cell-aware Stacked LSTM์€ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ, ์—ฌ๋Ÿฌ LSTM ๋ ˆ์ด์–ด๋ฅผ ์ค‘์ฒฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ๋•Œ ๋ง๊ฐ ๊ฒŒ์ดํŠธ(forget gate)๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๋„์ž…ํ•˜์—ฌ ์ˆ˜์ง ๋ฐฉํ–ฅ์˜ ์ •๋ณด ํ๋ฆ„์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ํ•œํŽธ, ์ƒˆ๋กœ์šด ๋งค์นญ ํ•จ์ˆ˜๋กœ์„œ ์šฐ๋ฆฌ๋Š” ์š”์†Œ๋ณ„ ์Œ์„ ํ˜• ๋ฌธ์žฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ElBiS ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠน์ • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‘ ๋ฌธ์žฅ ํ‘œํ˜„์„ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ํ•ฉ์น˜๋Š” ๋ฐฉ๋ฒ•์„ ์ž๋™์œผ๋กœ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋ฌธ์žฅ ํ‘œํ˜„์„ ์–ป์„ ๋•Œ์— ์„œ๋กœ ๊ฐ™์€ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์‚ฌ์‹ค๋กœ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” ๋ฒกํ„ฐ์˜ ๊ฐ ์š”์†Œ ๊ฐ„ ์Œ์„ ํ˜•(bilinear) ์ƒํ˜ธ ์ž‘์šฉ๋งŒ์„ ๊ณ ๋ คํ•˜์—ฌ๋„ ๋‘ ๋ฌธ์žฅ ๋ฒกํ„ฐ ๊ฐ„ ๋น„๊ต๋ฅผ ์ถฉ๋ถ„ํžˆ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ˆ˜๋ฆฝํ•˜๊ณ  ์ด๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•œ๋‹ค. ์ƒํ˜ธ ์ž‘์šฉ์˜ ๋ฒ”์œ„๋ฅผ ์ œํ•œํ•จ์œผ๋กœ์จ, ์ž๋™์œผ๋กœ ์œ ์šฉํ•œ ๋ณ‘ํ•ฉ ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š”๋‹ค๋Š” ์ด์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ชจ๋“  ์ƒํ˜ธ ์ž‘์šฉ์„ ๊ณ ๋ คํ•˜๋Š” ์Œ์„ ํ˜• ํ’€๋ง ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•™์Šต ์‹œ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๊ต์ฐจ ๋ฌธ์žฅ ์ž ์žฌ ๋ณ€์ˆ˜ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์„ ์ œ์•ˆํ•œ๋‹ค. CS-LVM์˜ ์ƒ์„ฑ ๋ชจ๋ธ์€ ์ถœ์ฒ˜ ๋ฌธ์žฅ(source sentence)์˜ ์ž ์žฌ ํ‘œํ˜„ ๋ฐ ์ถœ์ฒ˜ ๋ฌธ์žฅ๊ณผ ๋ชฉํ‘œ ๋ฌธ์žฅ(target sentence) ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ๋ชฉํ‘œ ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. CS-LVM์—์„œ๋Š” ๋‘ ๋ฌธ์žฅ์ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ ์•ˆ์—์„œ ๋ชจ๋‘ ๊ณ ๋ ค๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๊ฐ€ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ •์˜๋œ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋” ์˜๋ฏธ์ ์œผ๋กœ ์ ํ•ฉํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ผ๋ จ์˜ ์˜๋ฏธ ์ œ์•ฝ๋“ค์„ ์ •์˜ํ•œ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๊ฐœ์„  ๋ฐฉ์•ˆ๋“ค์€ ๋ฌธ์žฅ ๋งค์นญ ๊ณผ์ •์„ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ํšจ์šฉ์„ฑ์„ ๋†’์ผ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Sentence matching is a task of predicting the degree of match between two sentences. Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences, it is an important component for various natural language processing applications. In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning. To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM). Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input. Instead, it learns from data a parsing strategy that is optimized for a specific task. The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow. And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function. It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task. From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors. By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes. Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM). Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence. As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE). We also define semantic constraints that force the generator to generate semantically more plausible sentences. We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1 1.1 Sentence Matching 1 1.2 Deep Neural Networks for Sentence Matching 2 1.3 Scope of the Dissertation 4 Chapter 2 Background and Related Work 9 2.1 Sentence Encoders 9 2.2 Matching Functions 11 2.3 Semi-Supervised Training 13 Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15 3.1 Motivation 15 3.2 Preliminaries 16 3.2.1 Recursive Neural Networks 16 3.2.2 Training RvNNs without Tree Information 17 3.3 Model Description 19 3.3.1 Tree-LSTM 19 3.3.2 Gumbel-Softmax 20 3.3.3 Gumbel Tree-LSTM 22 3.4 Implementation Details 25 3.5 Experiments 27 3.5.1 Natural Language Inference 27 3.5.2 Sentiment Analysis 32 3.5.3 Qualitative Analysis 33 3.6 Summary 36 Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38 4.1 Motivation 38 4.2 Related Work 40 4.3 Model Description 43 4.3.1 Stacked LSTMs 43 4.3.2 Cell-aware Stacked LSTMs 44 4.3.3 Sentence Encoders 46 4.4 Experiments 47 4.4.1 Natural Language Inference 47 4.4.2 Paraphrase Identification 50 4.4.3 Sentiment Classification 52 4.4.4 Machine Translation 53 4.4.5 Forget Gate Analysis 55 4.4.6 Model Variations 56 4.5 Summary 59 Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60 5.1 Motivation 60 5.2 Proposed Method: ElBiS 61 5.3 Experiments 63 5.3.1 Natural language inference 64 5.3.2 Paraphrase Identification 66 5.4 Summary and Discussion 68 Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70 6.1 Motivation 70 6.2 Preliminaries 71 6.2.1 Variational Auto-Encoders 71 6.2.2 von Misesโ€“Fisher Distribution 73 6.3 Proposed Framework: CS-LVM 74 6.3.1 Cross-Sentence Latent Variable Model 75 6.3.2 Architecture 78 6.3.3 Optimization 79 6.4 Experiments 84 6.4.1 Natural Language Inference 84 6.4.2 Paraphrase Identification 85 6.4.3 Ablation Study 86 6.4.4 Generated Sentences 88 6.4.5 Implementation Details 89 6.5 Summary and Discussion 90 Chapter 7 Conclusion 92 Appendix A Appendix 96 A.1 Sentences Generated from CS-LVM 96Docto

    Semi-Supervised and Unsupervised Deep Visual Learning: A Survey

    Get PDF
    State-of-the-art deep learning models are often trained with a large amountof costly labeled training data. However, requiring exhaustive manualannotations may degrade the model's generalizability in the limited-labelregime. Semi-supervised learning and unsupervised learning offer promisingparadigms to learn from an abundance of unlabeled visual data. Recent progressin these paradigms has indicated the strong benefits of leveraging unlabeleddata to improve model generalization and provide better model initialization.In this survey, we review the recent advanced deep learning algorithms onsemi-supervised learning (SSL) and unsupervised learning (UL) for visualrecognition from a unified perspective. To offer a holistic understanding ofthe state-of-the-art in these areas, we propose a unified taxonomy. Wecategorize existing representative SSL and UL with comprehensive and insightfulanalysis to highlight their design rationales in different learning scenariosand applications in different computer vision tasks. Lastly, we discuss theemerging trends and open challenges in SSL and UL to shed light on futurecritical research directions.<br

    Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning

    Full text link
    The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised "pre-training." In particular, we propose to use self-supervised automatic image colorization. We show that traditional methods for unsupervised learning, such as layer-wise clustering or autoencoders, remain inferior to supervised pre-training. In search for an alternative, we develop a fully automatic image colorization method. Our method sets a new state-of-the-art in revitalizing old black-and-white photography, without requiring human effort or expertise. Additionally, it gives us a method for self-supervised representation learning. In order for the model to appropriately re-color a grayscale object, it must first be able to identify it. This ability, learned entirely self-supervised, can be used to improve other visual tasks, such as classification and semantic segmentation. As a future direction for self-supervision, we investigate if multiple proxy tasks can be combined to improve generalization. This turns out to be a challenging open problem. We hope that our contributions to this endeavor will provide a foundation for future efforts in making self-supervision compete with supervised pre-training.Comment: Ph.D. thesi
    • โ€ฆ
    corecore