3,846 research outputs found

    What-and-Where to Match: Deep Spatially Multiplicative Integration Networks for Person Re-identification

    Full text link
    Matching pedestrians across disjoint camera views, known as person re-identification (re-id), is a challenging problem that is of importance to visual recognition and surveillance. Most existing methods exploit local regions within spatial manipulation to perform matching in local correspondence. However, they essentially extract \emph{fixed} representations from pre-divided regions for each image and perform matching based on the extracted representation subsequently. For models in this pipeline, local finer patterns that are crucial to distinguish positive pairs from negative ones cannot be captured, and thus making them underperformed. In this paper, we propose a novel deep multiplicative integration gating function, which answers the question of \emph{what-and-where to match} for effective person re-id. To address \emph{what} to match, our deep network emphasizes common local patterns by learning joint representations in a multiplicative way. The network comprises two Convolutional Neural Networks (CNNs) to extract convolutional activations, and generates relevant descriptors for pedestrian matching. This thus, leads to flexible representations for pair-wise images. To address \emph{where} to match, we combat the spatial misalignment by performing spatially recurrent pooling via a four-directional recurrent neural network to impose spatial dependency over all positions with respect to the entire image. The proposed network is designed to be end-to-end trainable to characterize local pairwise feature interactions in a spatially aligned manner. To demonstrate the superiority of our method, extensive experiments are conducted over three benchmark data sets: VIPeR, CUHK03 and Market-1501.Comment: Published at Pattern Recognition, Elsevie

    Maurinian Truths : Essays in Honour of Anna-Sofia Maurin on her 50th Birthday

    Get PDF
    This book is in honour of Professor Anna-Sofia Maurin on her 50th birthday. It consists of eighteen essays on metaphysical issues written by Swedish and international scholars

    Combining simulated and real images in deep learning

    Get PDF
    To train a deep learning (DL) model, considerable amounts of data are required to generalize to unseen cases successfully. Furthermore, such data is often manually labeled, making its annotation process costly and time-consuming. We propose the use of simulated data, obtained from simulators, as a way to surpass the increasing need for annotated data. Although the use of simulated environments represents an unlimited and cost-effective supply of automatically annotated data, we are still referring to synthetic information. As such, it differs in representation and distribution comparatively to real-world data. The field which addresses the problem of merging the useful features from each of these domains is called domain adaptation (DA), a branch of transfer learning. In this field, several advances have been made, from fine-tuning existing networks to sample-reconstruction approaches. Adversarial DA methods, which make use of Generative Adversarial Networks (GANs), are state-of-the-art and the most widely used. With previous approaches, training data was being sourced from already existent datasets, and the usage of simulators as a means to obtain new observations was an alternative not fully explored. We aim to survey possible DA techniques and apply them to this context of obtaining simulated data with the purpose of training DL models. Stemming from a previous project, aimed to automate quality control at the end of a vehicle's production line, a proof-of-concept will be developed. Previously, a DL model that identified vehicle parts was trained using only data obtained through a simulator. By making use of DA techniques to combine simulated and real images, a new model will be trained to be applied to the real-world more effectively. The model's performance, using both types of data, will be compared to its performance when using exclusively one of the two types. We believe this can be expanded to new areas where, until now, the usage of DL was not feasible due to the constraints imposed by data collection

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์ด์šฉํ•œ ๋ฌธ์žฅ ๊ฐ„ ๊ด€๊ณ„ ๋ชจ๋ธ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์ด์ƒ๊ตฌ.๋ฌธ์žฅ ๋งค์นญ์ด๋ž€ ๋‘ ๋ฌธ์žฅ ๊ฐ„ ์˜๋ฏธ์ ์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์ •๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค. ์–ด๋–ค ๋ชจ๋ธ์ด ๋‘ ๋ฌธ์žฅ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ฐํ˜€๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ์ˆ˜์ค€์˜ ์ž์—ฐ์–ด ํ…์ŠคํŠธ ์ดํ•ด ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์žฅ ๋งค์นญ์€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ์„ฑ๋Šฅ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”, ๋งค์นญ ํ•จ์ˆ˜, ์ค€์ง€๋„ ํ•™์Šต์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ๋ฌธ์žฅ ๋งค์นญ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ๋ชจ์ƒ‰ํ•œ๋‹ค. ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ž€ ๋ฌธ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ์œ ์šฉํ•œ ํŠน์งˆ๋“ค์„ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•˜์—ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Gumbel Tree-LSTM์€ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(recursive neural network) ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ ์•„ํ‚คํ…์ฒ˜์ด๋‹ค. ๊ตฌ์กฐ ์ •๋ณด๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋˜ ๊ธฐ์กด์˜ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, Gumbel Tree-LSTM์€ ๊ตฌ์กฐ๊ฐ€ ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํŠน์ • ๋ฌธ์ œ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํŒŒ์‹ฑ ์ „๋žต์„ ํ•™์Šตํ•œ๋‹ค. Cell-aware Stacked LSTM์€ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ, ์—ฌ๋Ÿฌ LSTM ๋ ˆ์ด์–ด๋ฅผ ์ค‘์ฒฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ๋•Œ ๋ง๊ฐ ๊ฒŒ์ดํŠธ(forget gate)๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๋„์ž…ํ•˜์—ฌ ์ˆ˜์ง ๋ฐฉํ–ฅ์˜ ์ •๋ณด ํ๋ฆ„์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ํ•œํŽธ, ์ƒˆ๋กœ์šด ๋งค์นญ ํ•จ์ˆ˜๋กœ์„œ ์šฐ๋ฆฌ๋Š” ์š”์†Œ๋ณ„ ์Œ์„ ํ˜• ๋ฌธ์žฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ElBiS ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠน์ • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‘ ๋ฌธ์žฅ ํ‘œํ˜„์„ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ํ•ฉ์น˜๋Š” ๋ฐฉ๋ฒ•์„ ์ž๋™์œผ๋กœ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋ฌธ์žฅ ํ‘œํ˜„์„ ์–ป์„ ๋•Œ์— ์„œ๋กœ ๊ฐ™์€ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์‚ฌ์‹ค๋กœ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” ๋ฒกํ„ฐ์˜ ๊ฐ ์š”์†Œ ๊ฐ„ ์Œ์„ ํ˜•(bilinear) ์ƒํ˜ธ ์ž‘์šฉ๋งŒ์„ ๊ณ ๋ คํ•˜์—ฌ๋„ ๋‘ ๋ฌธ์žฅ ๋ฒกํ„ฐ ๊ฐ„ ๋น„๊ต๋ฅผ ์ถฉ๋ถ„ํžˆ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ˆ˜๋ฆฝํ•˜๊ณ  ์ด๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•œ๋‹ค. ์ƒํ˜ธ ์ž‘์šฉ์˜ ๋ฒ”์œ„๋ฅผ ์ œํ•œํ•จ์œผ๋กœ์จ, ์ž๋™์œผ๋กœ ์œ ์šฉํ•œ ๋ณ‘ํ•ฉ ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š”๋‹ค๋Š” ์ด์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ชจ๋“  ์ƒํ˜ธ ์ž‘์šฉ์„ ๊ณ ๋ คํ•˜๋Š” ์Œ์„ ํ˜• ํ’€๋ง ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•™์Šต ์‹œ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๊ต์ฐจ ๋ฌธ์žฅ ์ž ์žฌ ๋ณ€์ˆ˜ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์„ ์ œ์•ˆํ•œ๋‹ค. CS-LVM์˜ ์ƒ์„ฑ ๋ชจ๋ธ์€ ์ถœ์ฒ˜ ๋ฌธ์žฅ(source sentence)์˜ ์ž ์žฌ ํ‘œํ˜„ ๋ฐ ์ถœ์ฒ˜ ๋ฌธ์žฅ๊ณผ ๋ชฉํ‘œ ๋ฌธ์žฅ(target sentence) ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ๋ชฉํ‘œ ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. CS-LVM์—์„œ๋Š” ๋‘ ๋ฌธ์žฅ์ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ ์•ˆ์—์„œ ๋ชจ๋‘ ๊ณ ๋ ค๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๊ฐ€ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ •์˜๋œ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋” ์˜๋ฏธ์ ์œผ๋กœ ์ ํ•ฉํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ผ๋ จ์˜ ์˜๋ฏธ ์ œ์•ฝ๋“ค์„ ์ •์˜ํ•œ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๊ฐœ์„  ๋ฐฉ์•ˆ๋“ค์€ ๋ฌธ์žฅ ๋งค์นญ ๊ณผ์ •์„ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ํšจ์šฉ์„ฑ์„ ๋†’์ผ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Sentence matching is a task of predicting the degree of match between two sentences. Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences, it is an important component for various natural language processing applications. In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning. To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM). Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input. Instead, it learns from data a parsing strategy that is optimized for a specific task. The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow. And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function. It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task. From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors. By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes. Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM). Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence. As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE). We also define semantic constraints that force the generator to generate semantically more plausible sentences. We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1 1.1 Sentence Matching 1 1.2 Deep Neural Networks for Sentence Matching 2 1.3 Scope of the Dissertation 4 Chapter 2 Background and Related Work 9 2.1 Sentence Encoders 9 2.2 Matching Functions 11 2.3 Semi-Supervised Training 13 Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15 3.1 Motivation 15 3.2 Preliminaries 16 3.2.1 Recursive Neural Networks 16 3.2.2 Training RvNNs without Tree Information 17 3.3 Model Description 19 3.3.1 Tree-LSTM 19 3.3.2 Gumbel-Softmax 20 3.3.3 Gumbel Tree-LSTM 22 3.4 Implementation Details 25 3.5 Experiments 27 3.5.1 Natural Language Inference 27 3.5.2 Sentiment Analysis 32 3.5.3 Qualitative Analysis 33 3.6 Summary 36 Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38 4.1 Motivation 38 4.2 Related Work 40 4.3 Model Description 43 4.3.1 Stacked LSTMs 43 4.3.2 Cell-aware Stacked LSTMs 44 4.3.3 Sentence Encoders 46 4.4 Experiments 47 4.4.1 Natural Language Inference 47 4.4.2 Paraphrase Identification 50 4.4.3 Sentiment Classification 52 4.4.4 Machine Translation 53 4.4.5 Forget Gate Analysis 55 4.4.6 Model Variations 56 4.5 Summary 59 Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60 5.1 Motivation 60 5.2 Proposed Method: ElBiS 61 5.3 Experiments 63 5.3.1 Natural language inference 64 5.3.2 Paraphrase Identification 66 5.4 Summary and Discussion 68 Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70 6.1 Motivation 70 6.2 Preliminaries 71 6.2.1 Variational Auto-Encoders 71 6.2.2 von Misesโ€“Fisher Distribution 73 6.3 Proposed Framework: CS-LVM 74 6.3.1 Cross-Sentence Latent Variable Model 75 6.3.2 Architecture 78 6.3.3 Optimization 79 6.4 Experiments 84 6.4.1 Natural Language Inference 84 6.4.2 Paraphrase Identification 85 6.4.3 Ablation Study 86 6.4.4 Generated Sentences 88 6.4.5 Implementation Details 89 6.5 Summary and Discussion 90 Chapter 7 Conclusion 92 Appendix A Appendix 96 A.1 Sentences Generated from CS-LVM 96Docto

    Large Language Models and Knowledge Graphs: Opportunities and Challenges

    Full text link
    Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.Comment: 30 page

    Template-based Abstractive Microblog Opinion Summarisation

    Full text link
    We introduce the task of microblog opinion summarisation (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarisation dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarising news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favours extractive summarisation models. To showcase the dataset's utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarisation models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.Comment: Accepted for publication in Transactions of the Association for Computational Linguistics (TACL), 2022. Pre-MIT Press publication versio
    • โ€ฆ
    corecore