1,895 research outputs found
Identification of Fertile Translations in Medical Comparable Corpora: a Morpho-Compositional Approach
This paper defines a method for lexicon in the biomedical domain from
comparable corpora. The method is based on compositional translation and
exploits morpheme-level translation equivalences. It can generate translations
for a large variety of morphologically constructed words and can also generate
'fertile' translations. We show that fertile translations increase the overall
quality of the extracted lexicon for English to French translation
Handling non-compositionality in multilingual CNLs
In this paper, we describe methods for handling multilingual
non-compositional constructions in the framework of GF. We specifically look at
methods to detect and extract non-compositional phrases from parallel texts and
propose methods to handle such constructions in GF grammars. We expect that the
methods to handle non-compositional constructions will enrich CNLs by providing
more flexibility in the design of controlled languages. We look at two specific
use cases of non-compositional constructions: a general-purpose method to
detect and extract multilingual multiword expressions and a procedure to
identify nominal compounds in German. We evaluate our procedure for multiword
expressions by performing a qualitative analysis of the results. For the
experiments on nominal compounds, we incorporate the detected compounds in a
full SMT pipeline and evaluate the impact of our method in machine translation
process.Comment: CNL workshop in COLING 201
Automatic extraction of Arabic multiword expressions
In this paper we investigate the automatic acquisition of Arabic Multiword Expressions (MWE). We propose three complementary approaches to extract MWEs from available data resources. The first approach relies on the correspondence asymmetries between Arabic Wikipedia titles and titles in 21 different languages. The second approach collects English MWEs from Princeton WordNet 3.0, translates the collection into Arabic using Google Translate, and utilizes different search engines to validate the output. The third uses lexical association measures to extract MWEs from a large unannotated corpus. We experimentally explore the feasibility of each approach and measure the quality and coverage of the output against gold standards
From holism to compositionality: memes and the evolution of segmentation, syntax, and signification in music and language
Steven Mithen argues that language evolved from an antecedent he terms âHmmmmm, [meaning it was] Holistic, manipulative, multi-modal, musical and mimeticâ. Owing to certain innate and learned factors, a capacity for segmentation and cross-stream mapping in early Homo sapiens broke the continuous line of Hmmmmm, creating discrete replicated units which, with the initial support of Hmmmmm, eventually became the semantically freighted words of modern language. That which remained after what was a bifurcation of Hmmmmm arguably survived as music, existing as a sound stream segmented into discrete units, although one without the explicit and relatively fixed semantic content of language. All three types of utterance â the parent Hmmmmm, language, and music â are amenable to a memetic interpretation which applies Universal Darwinism to what are understood as language and musical memes. On the basis of Peter Carruthersâ distinction between âcognitivismâ and âcommunicativismâ in language, and William Calvinâs theories of cortical information encoding, a framework is hypothesized for the semantic and syntactic associations between, on the one hand, the sonic patterns of language memes (âlexemesâ) and of musical memes (âmusemesâ) and, on the other hand, âmentaleseâ conceptual structures, in Chomskyâs âLogical Formâ (LF)
Multiword expression processing: A survey
Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by "MWE processing," distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives
- âŠ