Search CORE

32,161 research outputs found

Termhood-based Comparability Metrics of Comparable Corpus in Special Domain

Author: C.Y. Kit
T. Talvensaari
T. Talvensaari
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Cross-Language Information Retrieval (CLIR) and machine translation (MT) resources, such as dictionaries and parallel corpora, are scarce and hard to come by for special domains. Besides, these resources are just limited to a few languages, such as English, French, and Spanish and so on. So, obtaining comparable corpora automatically for such domains could be an answer to this problem effectively. Comparable corpora, that the subcorpora are not translations of each other, can be easily obtained from web. Therefore, building and using comparable corpora is often a more feasible option in multilingual information processing. Comparability metrics is one of key issues in the field of building and using comparable corpus. Currently, there is no widely accepted definition or metrics method of corpus comparability. In fact, Different definitions or metrics methods of comparability might be given to suit various tasks about natural language processing. A new comparability, namely, termhood-based metrics, oriented to the task of bilingual terminology extraction, is proposed in this paper. In this method, words are ranked by termhood not frequency, and then the cosine similarities, calculated based on the ranking lists of word termhood, is used as comparability. Experiments results show that termhood-based metrics performs better than traditional frequency-based metrics

arXiv.org e-Print Archive

Crossref

Parallel Corpora in translator education

Author: Ruiz Yepes Guadalupe
Publication venue
Publication date: 01/01/2011
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional Universidad de Málaga

A Survey of Paraphrasing and Textual Entailment Methods

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Publication venue: 'AI Access Foundation'
Publication date: 30/05/2010
Field of study

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

arXiv.org e-Print Archive

Crossref

Multilingual term extraction from comparable corpora : informativeness of monolingual term extraction features

Author: Rigouts Terryn Ayla
Steyaert Kim
Publication venue
Publication date: 01/01/2019
Field of study

Most research on bilingual automatic term extraction (ATE) from comparable corpora focuses on both components of the task separately, i.e. monolingual automatic term extraction and finding equivalent pairs cross-lingually. The latter usually relies on context vectors and is notoriously inaccurate for infrequent terms. The aim of this pilot study is to investigate whether using information gathered for the former might be beneficial for the cross-lingual linking as well, thereby illustrating the potential of a more holistic approach to ATE from comparable corpora with re-use of information across the components. To test this hypothesis, an existing dataset was expanded, which covers three languages and four domains. A supervised binary classifier is shown to achieve robust performance, with stable results across languages and domains

Ghent University Academic Bibliography

On the cross-linguistic equivalence of sentir(e) in Romance languages: a contrastive study in semantics

Author: Enghels Renata
Jansegers Marlies
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2013
Field of study

Recent linguistic studies on perception have focused mainly on verbs referring to the dominant visual and auditory modalities, (e.g. English see/look and hear/listen) and have largely ignored the minor verbs. The present paper seeks to fill this gap by comparing the complex semantics of the cognate verbs sentir(e) in three Romance languages, namely Spanish, French and Italian. Because the objective study of semantics is a problematic issue, we pay special attention to methodological problems and opt for a combined corpus approach involving both a translation corpus and comparable data. Evidence from both corpora indicates that, notwithstanding the fact that the rich polysemy of the three verbs partly coincides, each individual verb has undergone semantic specializations differentiating the morphological cognates

Ghent University Academic Bibliography

Lexical typology : a programmatic sketch

Author: Behrens Leila
Sasse Hans-Jürgen
Publication venue
Publication date: 01/01/1997
Field of study

The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

Hochschulschriftenserver - Universität Frankfurt am Main

Transferable Positive/Negative Speech Emotion Recognition via Class-wise Adversarial Domain Adaptation

Author: Chen Ke
Zhou Hao
Publication venue
Publication date: 01/01/2019
Field of study

Speech emotion recognition plays an important role in building more intelligent and human-like agents. Due to the difficulty of collecting speech emotional data, an increasingly popular solution is leveraging a related and rich source corpus to help address the target corpus. However, domain shift between the corpora poses a serious challenge, making domain shift adaptation difficult to function even on the recognition of positive/negative emotions. In this work, we propose class-wise adversarial domain adaptation to address this challenge by reducing the shift for all classes between different corpora. Experiments on the well-known corpora EMODB and Aibo demonstrate that our method is effective even when only a very limited number of target labeled examples are provided.Comment: 5 pages, 3 figures, accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository