Search CORE

8 research outputs found

Visual Pivoting for (Unsupervised) Entity Alignment

Author: Chen Muhao
Collier Nigel
Liu Fangyu
Roth Dan
Publication venue
Publication date: 16/12/2020
Field of study

This work studies the use of visual semantic representations to align entities in heterogeneous knowledge graphs (KGs). Images are natural components of many existing KGs. By combining visual knowledge with other auxiliary information, we show that the proposed new approach, EVA, creates a holistic entity representation that provides strong signals for cross-graph entity alignment. Besides, previous entity alignment methods require human labelled seed alignment, restricting availability. EVA provides a completely unsupervised solution by leveraging the visual similarity of entities to create an initial seed dictionary (visual pivots). Experiments on benchmark data sets DBP15k and DWY15k show that EVA offers state-of-the-art performance on both monolingual and cross-lingual entity alignment tasks. Furthermore, we discover that images are particularly useful to align long-tail KG entities, which inherently lack the structural contexts necessary for capturing the correspondences.Comment: To appear at AAAI-202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning

Author: Hafiz Bolanle Matti
Ige Tosin
Sikiru Adewale
Publication venue
Publication date
Field of study

This work demonstrates the implementation and use of an encoder-decoder model to perform a many-to-many mapping of video data to text captions. The many-to-many mapping occurs via an input temporal sequence of video frames to an output sequence of words to form a caption sentence. Data preprocessing, model construction, and model training are discussed. Caption correctness is evaluated using 2-gram BLEU scores across the different splits of the dataset. Specific examples of output captions were shown to demonstrate model generality over the video temporal dimension. Predicted captions were shown to generalize over video action, even in instances where the video scene changed dramatically. Model architecture changes are discussed to improve sentence grammar and correctness

PhilPapers

Deep Learning Based Video Captioning through Encoder-Decoder Based Long Short-Term Memory (LSTM)

Author: Chelsea Grimsby
Publication venue
Publication date
Field of study

PhilPapers

Captioning Deep Learning Based Encoder-Decoder through Long Short-Term Memory (LSTM)

Author: Chelsea Grimsby
Publication venue
Publication date
Field of study

PhilPapers

Deep Learning Based Video Captioning through Encoder-Decoder Based Long Short-Term Memory (LSTM)

Author: Chelsea Grimsby
Publication venue
Publication date
Field of study

PhilPapers

Visual grounding in video for unsupervised word translation

Author: Alayrac JB
Blunsom P
Carreira J
Malinowski M
Nematzadeh A
Sigurdsson GA
Smaira L
Zisserman A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instructional videos narrated in the native language. Given this shared embedding we demonstrate that (i) we can map words between the languages, particularly the 'visual' words; (ii) that the shared embedding provides a good initialization for existing unsupervised text-based word translation techniques, forming the basis for our proposed hybrid visual-text mapping algorithm, MUVE; and (iii) our approach achieves superior performance by addressing the shortcomings of text-based methods - it is more robust, handles datasets with less commonality, and is applicable to low-resource languages. We apply these methods to translate words from English to French, Korean, and Japanese - all without any parallel corpora and simply by watching many videos of people speaking while doing things

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive