2,611 research outputs found
Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a
list of non-discrete attributes for each entity. Intuitively, these attributes
such as height, price or population count are able to richly characterize
entities in knowledge graphs. This additional source of information may help to
alleviate the inherent sparsity and incompleteness problem that are prevalent
in knowledge graphs. Unfortunately, many state-of-the-art relational learning
models ignore this information due to the challenging nature of dealing with
non-discrete data types in the inherently binary-natured knowledge graphs. In
this paper, we propose a novel multi-task neural network approach for both
encoding and prediction of non-discrete attribute information in a relational
setting. Specifically, we train a neural network for triplet prediction along
with a separate network for attribute value regression. Via multi-task
learning, we are able to learn representations of entities, relations and
attributes that encode information about both tasks. Moreover, such attributes
are not only central to many predictive tasks as an information source but also
as a prediction target. Therefore, models that are able to encode, incorporate
and predict such information in a relational learning context are highly
attractive as well. We show that our approach outperforms many state-of-the-art
methods for the tasks of relational triplet classification and attribute value
prediction.Comment: Accepted at CIKM 201
Learning Correspondence Structures for Person Re-identification
This paper addresses the problem of handling spatial misalignments due to
camera-view changes or human-pose variations in person re-identification. We
first introduce a boosting-based approach to learn a correspondence structure
which indicates the patch-wise matching probabilities between images from a
target camera pair. The learned correspondence structure can not only capture
the spatial correspondence pattern between cameras but also handle the
viewpoint or human-pose variation in individual images. We further introduce a
global constraint-based matching process. It integrates a global matching
constraint over the learned correspondence structure to exclude cross-view
misalignments during the image patch matching process, hence achieving a more
reliable matching score between images. Finally, we also extend our approach by
introducing a multi-structure scheme, which learns a set of local
correspondence structures to capture the spatial correspondence sub-patterns
between a camera pair, so as to handle the spatial misalignments between
individual images in a more precise way. Experimental results on various
datasets demonstrate the effectiveness of our approach.Comment: IEEE Trans. Image Processing, vol. 26, no. 5, pp. 2438-2453, 2017.
The project page for this paper is available at
http://min.sjtu.edu.cn/lwydemo/personReID.htm arXiv admin note: text overlap
with arXiv:1504.0624
Equivariant Contrastive Learning for Sequential Recommendation
Contrastive learning (CL) benefits the training of sequential recommendation
models with informative self-supervision signals. Existing solutions apply
general sequential data augmentation strategies to generate positive pairs and
encourage their representations to be invariant. However, due to the inherent
properties of user behavior sequences, some augmentation strategies, such as
item substitution, can lead to changes in user intent. Learning
indiscriminately invariant representations for all augmentation strategies
might be suboptimal. Therefore, we propose Equivariant Contrastive Learning for
Sequential Recommendation (ECL-SR), which endows SR models with great
discriminative power, making the learned user behavior representations
sensitive to invasive augmentations (e.g., item substitution) and insensitive
to mild augmentations (e.g., featurelevel dropout masking). In detail, we use
the conditional discriminator to capture differences in behavior due to item
substitution, which encourages the user behavior encoder to be equivariant to
invasive augmentations. Comprehensive experiments on four benchmark datasets
show that the proposed ECL-SR framework achieves competitive performance
compared to state-of-the-art SR models. The source code is available at
https://github.com/Tokkiu/ECL.Comment: Accepted by RecSys 202
๋ฅ ๋ด๋ด ๋คํธ์ํฌ ๊ธฐ๋ฐ์ ๋ฌธ์ฅ ์ธ์ฝ๋๋ฅผ ์ด์ฉํ ๋ฌธ์ฅ ๊ฐ ๊ด๊ณ ๋ชจ๋ธ๋ง
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ์ด์๊ตฌ.๋ฌธ์ฅ ๋งค์นญ์ด๋ ๋ ๋ฌธ์ฅ ๊ฐ ์๋ฏธ์ ์ผ๋ก ์ผ์นํ๋ ์ ๋๋ฅผ ์์ธกํ๋ ๋ฌธ์ ์ด๋ค.
์ด๋ค ๋ชจ๋ธ์ด ๋ ๋ฌธ์ฅ ์ฌ์ด์ ๊ด๊ณ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ๋ฐํ๋ด๊ธฐ ์ํด์๋ ๋์ ์์ค์ ์์ฐ์ด ํ
์คํธ ์ดํด ๋ฅ๋ ฅ์ด ํ์ํ๊ธฐ ๋๋ฌธ์, ๋ฌธ์ฅ ๋งค์นญ์ ๋ค์ํ ์์ฐ์ด ์ฒ๋ฆฌ ์์ฉ์ ์ฑ๋ฅ์ ์ค์ํ ์ํฅ์ ๋ฏธ์น๋ค.
๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ์ธ์ฝ๋, ๋งค์นญ ํจ์, ์ค์ง๋ ํ์ต์ด๋ผ๋ ์ธ ๊ฐ์ง ์ธก๋ฉด์์ ๋ฌธ์ฅ ๋งค์นญ์ ์ฑ๋ฅ ๊ฐ์ ์ ๋ชจ์ํ๋ค.
๋ฌธ์ฅ ์ธ์ฝ๋๋ ๋ฌธ์ฅ์ผ๋ก๋ถํฐ ์ ์ฉํ ํน์ง๋ค์ ์ถ์ถํ๋ ์ญํ ์ ํ๋ ๊ตฌ์ฑ ์์๋ก, ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ฌธ์ฅ ์ธ์ฝ๋์ ์ฑ๋ฅ ํฅ์์ ์ํ์ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋ ๋ ๊ฐ์ ์๋ก์ด ์ํคํ
์ฒ๋ฅผ ์ ์ํ๋ค.
Gumbel Tree-LSTM์ ์ฌ๊ท์ ๋ด๋ด ๋คํธ์ํฌ(recursive neural network) ๊ตฌ์กฐ์ ๊ธฐ๋ฐํ ์ํคํ
์ฒ์ด๋ค.
๊ตฌ์กฐ ์ ๋ณด๊ฐ ํฌํจ๋ ๋ฐ์ดํฐ๋ฅผ ์
๋ ฅ์ผ๋ก ์ฌ์ฉํ๋ ๊ธฐ์กด์ ์ฌ๊ท์ ๋ด๋ด ๋คํธ์ํฌ ๋ชจ๋ธ๊ณผ ๋ฌ๋ฆฌ, Gumbel Tree-LSTM์ ๊ตฌ์กฐ๊ฐ ์๋ ๋ฐ์ดํฐ๋ก๋ถํฐ ํน์ ๋ฌธ์ ์ ๋ํ ์ฑ๋ฅ์ ์ต๋ํํ๋ ํ์ฑ ์ ๋ต์ ํ์ตํ๋ค.
Cell-aware Stacked LSTM์ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐ์ ํ ์ํคํ
์ฒ๋ก, ์ฌ๋ฌ LSTM ๋ ์ด์ด๋ฅผ ์ค์ฒฉํ์ฌ ์ฌ์ฉํ ๋ ๋ง๊ฐ ๊ฒ์ดํธ(forget gate)๋ฅผ ์ถ๊ฐ์ ์ผ๋ก ๋์
ํ์ฌ ์์ง ๋ฐฉํฅ์ ์ ๋ณด ํ๋ฆ์ ๋ ํจ์จ์ ์ผ๋ก ์ ์ดํ ์ ์๋๋ก ํ๋ค.
ํํธ, ์๋ก์ด ๋งค์นญ ํจ์๋ก์ ์ฐ๋ฆฌ๋ ์์๋ณ ์์ ํ ๋ฌธ์ฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํจ์๋ฅผ ์ ์ํ๋ค.
ElBiS ์๊ณ ๋ฆฌ์ฆ์ ํน์ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ๋ฐ์ ์ ํฉํ ๋ฐฉ์์ผ๋ก ๋ ๋ฌธ์ฅ ํํ์ ํ๋์ ๋ฒกํฐ๋ก ํฉ์น๋ ๋ฐฉ๋ฒ์ ์๋์ผ๋ก ์ฐพ๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๋ค.
๋ฌธ์ฅ ํํ์ ์ป์ ๋์ ์๋ก ๊ฐ์ ๋ฌธ์ฅ ์ธ์ฝ๋๋ฅผ ์ฌ์ฉํ๋ค๋ ์ฌ์ค๋ก๋ถํฐ ์ฐ๋ฆฌ๋ ๋ฒกํฐ์ ๊ฐ ์์ ๊ฐ ์์ ํ(bilinear) ์ํธ ์์ฉ๋ง์ ๊ณ ๋ คํ์ฌ๋ ๋ ๋ฌธ์ฅ ๋ฒกํฐ ๊ฐ ๋น๊ต๋ฅผ ์ถฉ๋ถํ ์ ์ํํ ์ ์๋ค๋ ๊ฐ์ค์ ์๋ฆฝํ๊ณ ์ด๋ฅผ ์คํ์ ์ผ๋ก ๊ฒ์ฆํ๋ค.
์ํธ ์์ฉ์ ๋ฒ์๋ฅผ ์ ํํจ์ผ๋ก์จ, ์๋์ผ๋ก ์ ์ฉํ ๋ณํฉ ๋ฐฉ๋ฒ์ ์ฐพ๋๋ค๋ ์ด์ ์ ์ ์งํ๋ฉด์ ๋ชจ๋ ์ํธ ์์ฉ์ ๊ณ ๋ คํ๋ ์์ ํ ํ๋ง ๋ฐฉ๋ฒ์ ๋นํด ํ์ํ ํ๋ผ๋ฏธํฐ์ ์๋ฅผ ํฌ๊ฒ ์ค์ผ ์ ์๋ค.
๋ง์ง๋ง์ผ๋ก, ํ์ต ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ๋ฅผ ํจ๊ป ์ฌ์ฉํ๋ ์ค์ง๋ ํ์ต์ ์ํด ์ฐ๋ฆฌ๋ ๊ต์ฐจ ๋ฌธ์ฅ ์ ์ฌ ๋ณ์ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์ ์ ์ํ๋ค.
CS-LVM์ ์์ฑ ๋ชจ๋ธ์ ์ถ์ฒ ๋ฌธ์ฅ(source sentence)์ ์ ์ฌ ํํ ๋ฐ ์ถ์ฒ ๋ฌธ์ฅ๊ณผ ๋ชฉํ ๋ฌธ์ฅ(target sentence) ๊ฐ์ ๊ด๊ณ๋ฅผ ๋ํ๋ด๋ ๋ณ์๋ก๋ถํฐ ๋ชฉํ ๋ฌธ์ฅ์ด ์์ฑ๋๋ค๊ณ ๊ฐ์ ํ๋ค.
CS-LVM์์๋ ๋ ๋ฌธ์ฅ์ด ํ๋์ ๋ชจ๋ธ ์์์ ๋ชจ๋ ๊ณ ๋ ค๋๊ธฐ ๋๋ฌธ์, ํ์ต์ ์ฌ์ฉ๋๋ ๋ชฉ์ ํจ์๊ฐ ๋ ์์ฐ์ค๋ฝ๊ฒ ์ ์๋๋ค.
๋ํ, ์ฐ๋ฆฌ๋ ์์ฑ ๋ชจ๋ธ์ ํ๋ผ๋ฏธํฐ๊ฐ ๋ ์๋ฏธ์ ์ผ๋ก ์ ํฉํ ๋ฌธ์ฅ์ ์์ฑํ๋๋ก ์ ๋ํ๊ธฐ ์ํ์ฌ ์ผ๋ จ์ ์๋ฏธ ์ ์ฝ๋ค์ ์ ์ํ๋ค.
๋ณธ ํ์ ๋
ผ๋ฌธ์์ ์ ์๋ ๊ฐ์ ๋ฐฉ์๋ค์ ๋ฌธ์ฅ ๋งค์นญ ๊ณผ์ ์ ํฌํจํ๋ ๋ค์ํ ์์ฐ์ด ์ฒ๋ฆฌ ์์ฉ์ ํจ์ฉ์ฑ์ ๋์ผ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค.Sentence matching is a task of predicting the degree of match between two sentences.
Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences,
it is an important component for various natural language processing applications.
In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning.
To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM).
Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input.
Instead, it learns from data a parsing strategy that is optimized for a specific task.
The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow.
And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function.
It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task.
From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors.
By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes.
Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM).
Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence.
As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE).
We also define semantic constraints that force the generator to generate semantically more plausible sentences.
We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1
1.1 Sentence Matching 1
1.2 Deep Neural Networks for Sentence Matching 2
1.3 Scope of the Dissertation 4
Chapter 2 Background and Related Work 9
2.1 Sentence Encoders 9
2.2 Matching Functions 11
2.3 Semi-Supervised Training 13
Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15
3.1 Motivation 15
3.2 Preliminaries 16
3.2.1 Recursive Neural Networks 16
3.2.2 Training RvNNs without Tree Information 17
3.3 Model Description 19
3.3.1 Tree-LSTM 19
3.3.2 Gumbel-Softmax 20
3.3.3 Gumbel Tree-LSTM 22
3.4 Implementation Details 25
3.5 Experiments 27
3.5.1 Natural Language Inference 27
3.5.2 Sentiment Analysis 32
3.5.3 Qualitative Analysis 33
3.6 Summary 36
Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38
4.1 Motivation 38
4.2 Related Work 40
4.3 Model Description 43
4.3.1 Stacked LSTMs 43
4.3.2 Cell-aware Stacked LSTMs 44
4.3.3 Sentence Encoders 46
4.4 Experiments 47
4.4.1 Natural Language Inference 47
4.4.2 Paraphrase Identification 50
4.4.3 Sentiment Classification 52
4.4.4 Machine Translation 53
4.4.5 Forget Gate Analysis 55
4.4.6 Model Variations 56
4.5 Summary 59
Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60
5.1 Motivation 60
5.2 Proposed Method: ElBiS 61
5.3 Experiments 63
5.3.1 Natural language inference 64
5.3.2 Paraphrase Identification 66
5.4 Summary and Discussion 68
Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70
6.1 Motivation 70
6.2 Preliminaries 71
6.2.1 Variational Auto-Encoders 71
6.2.2 von MisesโFisher Distribution 73
6.3 Proposed Framework: CS-LVM 74
6.3.1 Cross-Sentence Latent Variable Model 75
6.3.2 Architecture 78
6.3.3 Optimization 79
6.4 Experiments 84
6.4.1 Natural Language Inference 84
6.4.2 Paraphrase Identification 85
6.4.3 Ablation Study 86
6.4.4 Generated Sentences 88
6.4.5 Implementation Details 89
6.5 Summary and Discussion 90
Chapter 7 Conclusion 92
Appendix A Appendix 96
A.1 Sentences Generated from CS-LVM 96Docto
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
State-of-the-art deep learning models are often trained with a large amountof costly labeled training data. However, requiring exhaustive manualannotations may degrade the model's generalizability in the limited-labelregime. Semi-supervised learning and unsupervised learning offer promisingparadigms to learn from an abundance of unlabeled visual data. Recent progressin these paradigms has indicated the strong benefits of leveraging unlabeleddata to improve model generalization and provide better model initialization.In this survey, we review the recent advanced deep learning algorithms onsemi-supervised learning (SSL) and unsupervised learning (UL) for visualrecognition from a unified perspective. To offer a holistic understanding ofthe state-of-the-art in these areas, we propose a unified taxonomy. Wecategorize existing representative SSL and UL with comprehensive and insightfulanalysis to highlight their design rationales in different learning scenariosand applications in different computer vision tasks. Lastly, we discuss theemerging trends and open challenges in SSL and UL to shed light on futurecritical research directions.<br
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
The success of deep learning in computer vision is rooted in the ability of
deep networks to scale up model complexity as demanded by challenging visual
tasks. As complexity is increased, so is the need for large amounts of labeled
data to train the model. This is associated with a costly human annotation
effort. To address this concern, with the long-term goal of leveraging the
abundance of cheap unlabeled data, we explore methods of unsupervised
"pre-training." In particular, we propose to use self-supervised automatic
image colorization.
We show that traditional methods for unsupervised learning, such as
layer-wise clustering or autoencoders, remain inferior to supervised
pre-training. In search for an alternative, we develop a fully automatic image
colorization method. Our method sets a new state-of-the-art in revitalizing old
black-and-white photography, without requiring human effort or expertise.
Additionally, it gives us a method for self-supervised representation learning.
In order for the model to appropriately re-color a grayscale object, it must
first be able to identify it. This ability, learned entirely self-supervised,
can be used to improve other visual tasks, such as classification and semantic
segmentation. As a future direction for self-supervision, we investigate if
multiple proxy tasks can be combined to improve generalization. This turns out
to be a challenging open problem. We hope that our contributions to this
endeavor will provide a foundation for future efforts in making
self-supervision compete with supervised pre-training.Comment: Ph.D. thesi
- โฆ