Search CORE

398 research outputs found

Exploiting Cross-Lingual Representations For Natural Language Processing

Author: Upadhyay Shyam
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

Traditional approaches to supervised learning require a generous amount of labeled data for good generalization. While such annotation-heavy approaches have proven useful for some Natural Language Processing (NLP) tasks in high-resource languages (like English), they are unlikely to scale to languages where collecting labeled data is di cult and time-consuming. Translating supervision available in English is also not a viable solution, because developing a good machine translation system requires expensive to annotate resources which are not available for most languages. In this thesis, I argue that cross-lingual representations are an effective means of extending NLP tools to languages beyond English without resorting to generous amounts of annotated data or expensive machine translation. These representations can be learned in an inexpensive manner, often from signals completely unrelated to the task of interest. I begin with a review of different ways of inducing such representations using a variety of cross-lingual signals and study algorithmic approaches of using them in a diverse set of downstream tasks. Examples of such tasks covered in this thesis include learning representations to transfer a trained model across languages for document classification, assist in monolingual lexical semantics like word sense induction, identify asymmetric lexical relationships like hypernymy between words in different languages, or combining supervision across languages through a shared feature space for cross-lingual entity linking. In all these applications, the representations make information expressed in other languages available in English, while requiring minimal additional supervision in the language of interest

ScholarlyCommons@Penn

Are You Finding the Right Person? A Name Translation System Towards Web 2.0

Author: Zhou Yilu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2009
Field of study

In a multilingual world, information available in global information systems is increasing rapidly. Searching for proper names in foreign language becomes an important task in multilingual search and knowledge discovery. However, these names are the most difficult to handle because they are often unknown words that cannot be found in a translation dictionary and even human experts cannot handle the variation generated during translation. Furthermore, existing research on name translation have focused on translation algorithms. However, user experience during name translation and name search are often ignored. With the Web technology moving towards Web 2.0, creating a platform that allow easier distributed collaboration and information sharing, we seek methods to incorporate Web 2.0 technologies into a name translation system. In this research, we review challenges in name translation and propose an interactive name translation and search system: NameTran. This system takes English names and translates them into Chinese using a combined hybrid Hidden Markov Model-based (HMM-based) transliteration approach and a web mining approach. Evaluation results showed that web mining consistently boosted the performance of a pure HMM approach. Our system achieved top-1 accuracy of 0.64 and top-8 accuracy of 0.96. To cope with changing popularity and variation in name translations, we demonstrated the feasibility of allowing users to rank translations and the new ranking serves as feedback to the original trained HMM model. We believe that such user input will significantly improve system usability

AIS Electronic Library (AISeL)

TRANSLIT : a large-scale name transliteration resource

Author: Benites de Azevedo e Souza Fernando
Cieliebak Mark
Duivesteijn Gilbert François
von Däniken Pius
Publication venue: European Language Resources Association
Publication date: 01/05/2020
Field of study

Transliteration is the process of expressing a proper name from a source language in the characters of a target language (e.g. from Cyrillic to Latin characters). We present TRANSLIT, a large-scale corpus with approx. 1.6 million entries in more than 180 languages with about 3 million variations of person and geolocation names. The corpus is based on various public data sources, which have been transformed into a unified format to simplify their usage, plus a newly compiled dataset from Wikipedia. In addition, we apply several machine learning methods to establish baselines for automatically detecting transliterated names in various languages. Our best systems achieve an accuracy of 92\% on identification of transliterated pairs

ZHAW digitalcollection

Slav-NER : the 3rd Cross-lingual Challenge on Recognition, Normalization, Classification, and Linking of Named Entities across Slavic languages

Author: Babych Bogdan
Kancheva Zara
Kanishcheva Olga
Lebedeva Maria
Marcinczuk Michał
Nakov Preslav
Osenova Petya
Piskorski Jakub
Pivovarova Lidia
Pollak Senja
Přibáň Pavel
Radev Ivaylo
Robnik-Šikonja Marko
Starko Vasyl
Steinberger Josef
Yangarber Roman
Publication venue
Publication date: 01/01/2021
Field of study

Non peer reviewe

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

Learning to match names across languages

Author: Alex Yeh
Inderjeet Mani
Sherri Condon
Publication venue: ACL
Publication date: 01/01/2008
Field of study

We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.

CiteSeerX

Crossref