11,101 research outputs found
MultiMWE: building a multi-lingual multi-word expression (MWE) parallel corpora
Multi-word expressions (MWEs) are a hot topic in research in natural language processing (NLP), including topics such as MWE detection, MWE decomposition, and research investigating the exploitation of MWEs in other NLP fields such as Machine Translation. However, the availability of bilingual or multi-lingual MWE corpora is very limited. The only bilingual MWE corpora that we are aware of is from the PARSEME (PARSing and Multi-word Expressions) EU project. This is a small collection of only 871 pairs of English-German MWEs. In this paper, we present multi-lingual and bilingual MWE corpora that we have extracted from root parallel corpora. Our collections are 3,159,226 and 143,042 bilingual MWE pairs for German-English and Chinese-English respectively after filtering. We examine the quality of these extracted bilingual MWEs in MT experiments. Our initial experiments applying MWEs in MT show improved translation performances on MWE terms in qualitative analysis and better general evaluation scores in quantitative analysis, on both German-English and Chinese-English language pairs. We follow a standard experimental pipeline to create our MultiMWE corpora which are available online. Researchers can use this free corpus for their own models or use them in a knowledge base as model features
Statistical Machine Translation Features with Multitask Tensor Networks
We present a three-pronged approach to improving Statistical Machine
Translation (SMT), building on recent success in the application of neural
networks to SMT. First, we propose new features based on neural networks to
model various non-local translation phenomena. Second, we augment the
architecture of the neural network with tensor layers that capture important
higher-order interaction among the network units. Third, we apply multitask
learning to estimate the neural network parameters jointly. Each of our
proposed methods results in significant improvements that are complementary.
The overall improvement is +2.7 and +1.8 BLEU points for Arabic-English and
Chinese-English translation over a state-of-the-art system that already
includes neural network features.Comment: 11 pages (9 content + 2 references), 2 figures, accepted to ACL 2015
as a long pape
Trialing project-based learning in a new EAP ESP course: A collaborative reflective practice of three college English teachers
Currently in many Chinese universities, the traditional College English course is facing the risk of being âmarginalizedâ, replaced or even removed, and many hours previously allocated to the course are now being taken by EAP or ESP. At X University in northern China, a curriculum reform as such is taking place, as a result of which a new course has been created called âxue keâ English. Despite the fact that âxue keâ means subject literally, the course designer has made it clear that subject content is not the target, nor is the course the same as EAP or ESP. This curriculum initiative, while possibly having been justified with a rationale of some kind (e.g. to meet with changing social and/or academic needs of students and/or institutions), this is posing a great challenge for, as well as considerable pressure on, a number of College English teachers who have taught this single course for almost their entire teaching career. In such a context, three teachers formed a peer support group in Semester One this year, to work collaboratively co-tackling the challenge, and they chose Project-Based Learning (PBL) for the new course. This presentation will report on the implementation of this project, including the overall designing, operational procedure, and the teachersâ reflections.
Based on discussion, pre-agreement was reached on the purpose and manner of collaboration as offering peer support for more effective teaching and learning and fulfilling and pleasant professional development. A WeChat group was set up as the chief platform for messaging, idea-sharing, and resource-exchanging. Physical meetings were supplementary, with sound agenda but flexible time, and venues. Mosoteach cloud class (lan mo yun ban ke) was established as a tool for virtual learning, employed both in and after class. Discussions were held at the beginning of the semester which determined only brief outlines for PBL implementation and allowed space for everyone to autonomously explore in their own way. Constant further discussions followed, which generated a great deal of opportunities for peer learning and lesson plan modifications. A reflective journal, in a greater or lesser detailed manner, was also kept by each teacher to record the journey of the collaboration. At the end of the semester, it was commonly recognized that, although challenges existed, the collaboration was overall a success and they were all willing to continue with it and endeavor to refine it to be a more professional and productive approach
Filling Knowledge Gaps in a Broad-Coverage Machine Translation System
Knowledge-based machine translation (KBMT) techniques yield high quality in
domains with detailed semantic models, limited vocabulary, and controlled input
grammar. Scaling up along these dimensions means acquiring large knowledge
resources. It also means behaving reasonably when definitive knowledge is not
yet available. This paper describes how we can fill various KBMT knowledge
gaps, often using robust statistical techniques. We describe quantitative and
qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT
system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9
Towards a Universal Wordnet by Learning from Combined Evidenc
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
- âŠ