1,947 research outputs found
Handling non-compositionality in multilingual CNLs
In this paper, we describe methods for handling multilingual
non-compositional constructions in the framework of GF. We specifically look at
methods to detect and extract non-compositional phrases from parallel texts and
propose methods to handle such constructions in GF grammars. We expect that the
methods to handle non-compositional constructions will enrich CNLs by providing
more flexibility in the design of controlled languages. We look at two specific
use cases of non-compositional constructions: a general-purpose method to
detect and extract multilingual multiword expressions and a procedure to
identify nominal compounds in German. We evaluate our procedure for multiword
expressions by performing a qualitative analysis of the results. For the
experiments on nominal compounds, we incorporate the detected compounds in a
full SMT pipeline and evaluate the impact of our method in machine translation
process.Comment: CNL workshop in COLING 201
Multiword expressions in English to Swahili machine translation : Adjectives
The report deals with descriptive multiword adjectives in the context of English to Swahili machine translation. These descriptions often have more than one section, which inflect according to the noun class system of Swahili. The report handles all adjective types and shows how the correct surface form can be produced in each case
Multiword expressions in English to Swahili machine translation : Nouns
In machine translation, two types of multiword expressions (MWE) must be considered. Such expressions may occur in source text and target text, or in both. A MWE in source text may be a MWE also in target text. There are also cases, when a MWE in source text corresponds to a single word in target text. However, the majority of cases are such, where a single word in source text corresponds to a MWE in target text. In this report we discuss the last type of cases. Th emphasis is on nouns
MultiMWE: building a multi-lingual multi-word expression (MWE) parallel corpora
Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features
Evaluation of a Substitution Method for Idiom Transformation in Statistical Machine Translation
We evaluate a substitution based technique for improving Statistical Machine Translation performance on idiomatic multiword expressions. The method operates by performing substitution on the original idiom with its literal meaning before translation, with a second substitution step replacing literal meanings with idioms following translation. We detail our approach, outline our implementation and provide an evaluation of the method for the language pair English/Brazilian-Portuguese. Our results show improvements in translation accuracy on sentences containing either morphosyntactically constrained or unconstrained idioms. We discuss the consequences of our results and outline potential extensions to this process
Cross-lingual transfer learning and multitask learning for capturing multiword expressions
This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119
The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches
Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification
This paper deals with the identification of Multiword Expressions (MWEs) in
Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the
Eight Schedule of Indian Constitution. MWE plays an important role in the
applications of Natural Language Processing(NLP) like Machine Translation, Part
of Speech tagging, Information Retrieval, Question Answering etc. Feature
selection is an important factor in the recognition of Manipuri MWEs using
Conditional Random Field (CRF). The disadvantage of manual selection and
choosing of the appropriate features for running CRF motivates us to think of
Genetic Algorithm (GA). Using GA we are able to find the optimal features to
run the CRF. We have tried with fifty generations in feature selection along
with three fold cross validation as fitness function. This model demonstrated
the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%,
showing an improvement over the CRF based Manipuri MWE identification without
GA application.Comment: 14 pages, 6 figures, see
http://airccse.org/journal/jcsit/1011csit05.pd
- …