Search CORE

2,738 research outputs found

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Proceedings - University of Groningen

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Dissertations of the University of Groningen

English-Chinese Name Transliteration with Bi-Directional Syllable-Based Maximum Matching

Author: Kwong Oi Yee
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Applying dynamic Bayesian networks in transliteration detection and generation

Author: Nabende Peter
Publication venue: s.n.
Publication date: 01/01/2011
Field of study

Combining Word Embeddings with Bilingual Orthography Embeddings for Bilingual Dictionary Induction

Author: Fraser Alexander
Hangya Viktor
Schütze Hinrich
Severini Silvia
Publication venue
Publication date: 01/01/2020
Field of study

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

Author: Chandar Sarath
Khapra Mitesh M.
Rajendran Janarthanan
Ravindran Balaraman
Publication venue
Publication date: 01/01/2016
Field of study

Recently there has been a lot of interest in learning common representations for multiple views of data. Typically, such common representations are learned using a parallel corpus between the two views (say, 1M images and their English captions). In this work, we address a real-world scenario where no direct parallel data is available between two views of interest (say,

V_1

and

V_2

) but parallel data is available between each of these views and a pivot view (

V_3

). We propose a model for learning a common representation for

V_1

V_2

and

V_3

using only the parallel data available between

V_1V_3

and

V_2V_3

. The proposed model is generic and even works when there are

n

views of interest and only one pivot view which acts as a bridge between them. There are two specific downstream applications that we focus on (i) transfer learning between languages

L_1

L_2

,...,

L_n

using a pivot language

L

and (ii) cross modal access between images and a language

L_1

using a pivot language

L_2

. Our model achieves state-of-the-art performance in multilingual document classification on the publicly available multilingual TED corpus and promising results in multilingual multimodal retrieval on a new dataset created and released as a part of this work.Comment: Published at NAACL-HLT 201

arXiv.org e-Print Archive

TRANSLIT : a large-scale name transliteration resource

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Duivesteijn Gilbert François
von Däniken Pius
Publication venue: European Language Resources Association
Publication date: 01/05/2020
Field of study

Transliteration is the process of expressing a proper name from a source language in the characters of a target language (e.g. from Cyrillic to Latin characters). We present TRANSLIT, a large-scale corpus with approx. 1.6 million entries in more than 180 languages with about 3 million variations of person and geolocation names. The corpus is based on various public data sources, which have been transformed into a unified format to simplify their usage, plus a newly compiled dataset from Wikipedia. In addition, we apply several machine learning methods to establish baselines for automatically detecting transliterated names in various languages. Our best systems achieve an accuracy of 92\% on identification of transliterated pairs