1,294 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
The demands of users and the publishing world: printed or online, free or paid for?
International audienc
Similar Text Fragments Extraction for Identifying Common Wikipedia Communities
Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities
Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach
This paper defines a method for lexicon in the biomedical domain from
comparable corpora. The method is based on compositional translation and
exploits morpheme-level translation equivalences. It can generate translations
for a large variety of morphologically constructed words and can also generate
'fertile' translations. We show that fertile translations increase the overall
quality of the extracted lexicon for English to French translation
Methodology for the Corpus-based English-German-Ukrainian Dictionary of Collocations
Traballo Fin de Máster en Lexicografía. Curso 2021-2022[EN]This Master’s thesis recounts the vision of the multilingual collocations dictionary project for the English, German, and Ukrainian languages (“Corpus-based English-German-Ukrainian
Dictionary of Collocations” or EDU-Col) and elaborates on the methodology for compiling
the dictionary and its key dictionary structures. The dictionary will cater to the needs of language learners, translators, text producers (journalists, copywriters), and native speakers.
Tapping into the latest developments in NLP and the capabilities of corpora, the methodology
for creating the proposed dictionary relies on the automatic extraction of dictionary information types, namely collocation candidates, example sentences, and translation
equivalents for collocations. The automatic extraction is followed by manual validation in
order to maintain the quality of the obtained lexicographic data.[DE]Diese Masterarbeit befasst sich mit der Konzeption des mehrsprachigen
Kollokationswörterbuchs für die englische, deutsche und ukrainische Sprache ("Corpus-based
English-German-Ukrainian Dictionary of Collocations" oder EDU-Col) und erläutert die
Methodik für die Erstellung des Wörterbuchs und seine wichtigsten Wörterbuchstrukturen.
Das Wörterbuch ist auf die Bedürfnisse von Sprachlernern, Übersetzern, Redakteuren
(Journalisten, Werbetextern) und Muttersprachler ausgerichtet. Die Methodik zur Erstellung
des vorgeschlagenen Wörterbuchs basiert auf der automatischen Extraktion von
Wörterbuchinformationen, nämlich Kollokationskandidaten, Beispielsätzen und
Übersetzungsäquivalenten für Kollokationen. Auf die automatische Extraktion folgt eine
manuelle Überprüfung, um die Qualität der erhaltenen lexikografischen Daten zu
gewährleiste
Proceedings
Proceedings of the Workshop
CHAT 2011: Creation, Harmonization and Application of Terminology Resources.
Editors: Tatiana Gornostay and Andrejs Vasiļjevs.
NEALT Proceedings Series, Vol. 12 (2011).
© 2011 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/16956
- …