Search CORE

1,551 research outputs found

Incorporating intra-query term dependencies in an Aspect Query Language Model

Author: Bruza
Creighton
Hiemstra
Hipp
Luo
Manning
Rabiner
Song
Srivastava
Zhao
Publication venue: 'Wiley'
Publication date: 26/11/2014
Field of study

Query language modeling based on relevance feedback has been widely applied to improve the effectiveness of information retrieval. However, intra-query term dependencies (i.e., the dependencies between different query terms and term combinations) have not yet been sufficiently addressed in the existing approaches. This paper aims to investigate this issue within a comprehensive framework, namely the Aspect Query Language Model (AM). We propose to extend the AM with a Hidden Markov Model (HMM) structure, to incorporate the intra-query term dependencies and learn the structure of a novel Aspect Hidden Markov Model (AHMM) for query language modeling. In the proposed AHMM, the combinations of query terms are viewed as latent variables representing query aspects. They further form an Ergodic HMM, where the dependencies between latent variables (nodes) are modelled as the transitional probabilities. The segmented chunks from the feedback documents are considered as observables of the HMM. Then the AHMM structure is optimized by the HMM, which can estimate the prior of the latent variables and the probability distribution of the observed chunks. Our extensive experiments on three large scale TREC collections have shown that our method not only significantly outperforms a number of strong baselines in terms of both effectiveness and robustness, but also achieves better results than the AM and another state-of-the-art approach, namely the Latent Concept Expansion (LCE) model

University of Essex Research Repository

University of Regensburg Publication Server

Crossref

Open Research Online (The Open University)

The acquisition of morphosyntactic agreement in the interlanguage system of AFL learners in Ghana

Author: Husein Alhassan Abdur-Rahim
Publication venue: AUC Knowledge Fountain
Publication date: 01/02/2013
Field of study

Despite the relevance of agreement structures in constructing the interlanguage (IL) system of the L2 learner, not much research has been conducted in this area on Arabic language learners. This study investigated the acquisition of morphosyntactic agreement structures by Arabic as Foreign Language (AFL) learners in Ghana, using the Processability Theory (PT) formulated in Pienemann (1998, 2005). The theory predicts cross-linguistic developmental routes for the acquisition of grammatical structures. A cross-sectional study was performed in order to test the theory. Data were elicited from 15 participants from the University of Ghana, Legon using Grammaticality Judgment Task and Elicited Production Task. Five Arabic morphosyntactic agreement structures at the phrasal, inter-phrasal and subordinate clause processing procedure stages of Pienemann\u27 s implicational hierarchy were tested. The data collected were analysed by using distributional analysis, a pre-defined emergence criterion and implicational scaling. The results of the study suggest that: (1) acquisition of agreement structures by AFL learners in Ghana seems to develop, generaly, according to PT\u27s predictions; (2) there is enough evidence for the stability of developmental stages. In effect, that seems to confirm the cross-linguistic plausibility of the theory and (3) no significant differences were found in the acquisition of the Noun Predicative Adjective (an inter-phrasal structure) among all the participants. These findings were discussed in the light of L1 transfer and variation and processing constraints. The study highlights the importance of teaching L2 learners structures that they are cognitively and developmentally ready to process so that the entire teaching practice would be beneficial. Otherwise, learners IL development becomes stagnated, teaching becomes ineffective and precious classroom time is wasted, eventually

AUC Knowledge Fountain (American Univ. in Cairo)

The integration of machine translation and translation memory

Author: He Yifan
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2011
Field of study

We design and evaluate several models for integrating Machine Translation (MT) output into a Translation Memory (TM) environment to facilitate the adoption of MT technology in the localization industry. We begin with the integration on the segment level via translation recommendation and translation reranking. Given an input to be translated, our translation recommendation model compares the output from the MT and the TMsystems, and presents the better one to the post-editor. Our translation reranking model combines k-best lists from both systems, and generates a new list according to estimated post-editing effort. We perform both automatic and human evaluation on these models. When measured against the consensus of human judgement, the recommendation model obtains 0.91 precision at 0.93 recall, and the reranking model obtains 0.86 precision at 0.59 recall. The high precision of these models indicates that they can be integrated into TM environments without the risk of deteriorating the quality of the post-editing candidate, and can thereby preserve TM assets and established cost estimation methods associated with TMs. We then explore methods for a deeper integration of translation memory and machine translation on the sub-segment level. We predict whether phrase pairs derived from fuzzy matches could be used to constrain the translation of an input segment. Using a series of novel linguistically-motivated features, our constraints lead both to more consistent translation output, and to improved translation quality, reflected by a 1.2 improvement in BLEU score and a 0.72 reduction in TER score, both of statistical significance (p < 0.01). In sum, we present our work in three aspects: 1) translation recommendation and translation reranking models that can access high quality MT outputs in the TMenvironment, 2) a sub-segment translation memory and machine translation integration model that improves both translation consistency and translation quality, and 3) a human evaluation pipeline to validate the effectiveness of our models with human judgements

DCU Online Research Access Service

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Author: EHRMANN MAUD
TURCHI MARCO
Publication venue: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication date: 09/08/2011
Field of study

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

Author: Bisazza Arianna
Federico Marcello
Publication venue: 'MIT Press - Journals'
Publication date: 14/03/2016
Field of study

Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Recommended from our members

Adapting Automatic Summarization to New Sources of Information

Author: Ouyang Jessica Jin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

English-language news articles are no longer necessarily the best source of information. The Web allows information to spread more quickly and travel farther: first-person accounts of breaking news events pop up on social media, and foreign-language news articles are accessible to, if not immediately understandable by, English-speaking users. This thesis focuses on developing automatic summarization techniques for these new sources of information. We focus on summarizing two specific new sources of information: personal narratives, first-person accounts of exciting or unusual events that are readily found in blog entries and other social media posts, and non-English documents, which must first be translated into English, often introducing translation errors that complicate the summarization process. Personal narratives are a very new area of interest in natural language processing research, and they present two key challenges for summarization. First, unlike many news articles, whose lead sentences serve as summaries of the most important ideas in the articles, personal narratives provide no such shortcuts for determining where important information occurs in within them; second, personal narratives are written informally and colloquially, and unlike news articles, they are rarely edited, so they require heavier editing and rewriting during the summarization process. Non-English documents, whether news or narrative, present yet another source of difficulty on top of any challenges inherent to their genre: they must be translated into English, potentially introducing translation errors and disfluencies that must be identified and corrected during summarization. The bulk of this thesis is dedicated to addressing the challenges of summarizing personal narratives found on the Web. We develop a two-stage summarization system for personal narrative that first extracts sentences containing important content and then rewrites those sentences into summary-appropriate forms. Our content extraction system is inspired by contextualist narrative theory, using changes in writing style throughout a narrative to detect sentences containing important information; it outperforms both graph-based and neural network approaches to sentence extraction for this genre. Our paraphrasing system rewrites the extracted sentences into shorter, standalone summary sentences, learning to mimic the paraphrasing choices of human summarizers more closely than can traditional lexicon- or translation-based paraphrasing approaches. We conclude with a chapter dedicated to summarizing non-English documents written in low-resource languages – documents that would otherwise be unreadable for English-speaking users. We develop a cross-lingual summarization system that performs even heavier editing and rewriting than does our personal narrative paraphrasing system; we create and train on large amounts of synthetic errorful translations of foreign-language documents. Our approach produces fluent English summaries from disdisfluent translations of non-English documents, and it generalizes across languages

Columbia University Academic Commons

A study of the company kept by a selection of English delexical verbs and the implications for the teaching of English in Hong Kong.

Author: May YungFan
Publication venue
Publication date: 01/01/1991
Field of study

Durham e-Theses

Proceedings

Author: Dickinson Markus
Müürisep Kaili
Passarotti Marco
Publication venue
Publication date: 01/12/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

DSpace at Tartu University Library