1,847 research outputs found
Robust Tuning Datasets for Statistical Machine Translation
We explore the idea of automatically crafting a tuning dataset for
Statistical Machine Translation (SMT) that makes the hyper-parameters of the
SMT system more robust with respect to some specific deficiencies of the
parameter tuning algorithms. This is an under-explored research direction,
which can allow better parameter tuning. In this paper, we achieve this goal by
selecting a subset of the available sentence pairs, which are more suitable for
specific combinations of optimizers, objective functions, and evaluation
measures. We demonstrate the potential of the idea with the pairwise ranking
optimization (PRO) optimizer, which is known to yield too short translations.
We show that the learning problem can be alleviated by tuning on a subset of
the development set, selected based on sentence length. In particular, using
the longest 50% of the tuning sentences, we achieve two-fold tuning speedup,
and improvements in BLEU score that rival those of alternatives, which fix
BLEU+1's smoothing instead.Comment: RANLP-201
Cortical Representation Underlying the Semantic Processing of Numerical Symbols: Evidence from Adult and Developmental Studies
Humans possess the remarkable ability to process numerical information using numerical symbols such as Arabic digits. A growing body of neuroimaging work has provided new insights into the neural correlates associated with symbolic numerical magnitude processing. However, little is known about the cortical specialization underlying the representation of symbolic numerical magnitude in adults and children. To constrain our current knowledge, I conducted a series of functional Magnetic Resonance Imaging (fMRI) studies that aimed to better understand the functional specialization of symbolic numerical magnitudes representation in the human brain.
Using a number line estimation task, the first study contrasted the brain activation associated with processing symbolic numerical magnitude against the brain activation associated with non-numerical magnitude (brightness) processing. Results demonstrated a right lateralized parietal network that was commonly engaged when magnitude dimensions were processed. However, the left intraparietal sulcus (IPS) was additionally activated when symbolic numerical magnitudes were estimated, suggesting that number is a special category amongst magnitude dimensions and that the left hemisphere plays a critical role in representing number.
The second study tested a child friendly version of an fMRI-adaptation paradigm in adults. For this participant’s brain response was habituated to a numerical value (i.e., 6) and signal recovery in response to the presentation of numerical deviants was investigated. Across two different brain normalization procedures results showed a replication of previous findings demonstrating that the brain response of the IPS is modulated by the semantic meaning of numbers in the absence of overt response selection.
The last study aimed to unravel developmental changes in the cortical representation of symbolic numerical magnitudes in children. Using the paradigm tested in chapter 2, results demonstrated an increase in the signal recovery with age in the left IPS as well as an age-independent signal recovery in the right IPS. This finding indicates that the left IPS becomes increasingly specialized for the representation of symbolic numerical magnitudes over developmental time, while the right IPS may play a different and earlier role in symbolic numerical magnitude representation.
Findings of these studies are discussed in relation to our current knowledge about symbolic numerical magnitude representation
Word Reordering in Statistical Machine Translation with a POS-Based Distortion Model
In this paper we describe a word reordering strategy for statistical machine translation that reorders the source side based on Part of Speech (POS) information. Reordering rules are learned from the word aligned corpus. Reordering is integrated into the decoding process by constructing a lattice, which contains all word reorderings according to the reordering rules. Probabilities are assigned to the different reorderings. On this lattice monotone decoding is performed. This reordering strategy is compared with our previous reordering strategy, which looks at all permutations within a sliding window. We extend reordering rules by adding context information. Phrase translation pairs are learned from the original corpus and from a reordered source corpus to better capture the reordered word sequences at decoding time. Results are presented for English → Spanish and German ↔ English translations, using the European Parliament Plenary Sessions corpus
Consensus Versus Expertise: A Case Study of Word Alignment with Mechanical Turk
Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easy-to-use interface is developed to simplify the labeling process. We compare the alignment results by Turkers to that by experts, and incorporate the alignments in a semi-supervised word alignment tool to improve the quality of the labels. We also compared two pricing strategies for word alignment task. Experimental results show high precision of the alignments provided by Turkers and the semi-supervised approach achieved 0.5% absolute reduction on alignment error rate
EMDC: A Semi-supervised Approach for Word Alignment
This paper proposes a novel semisupervised word alignment technique called EMDC that integrates discriminative and generative methods. A discriminative aligner is used to find high precision partial alignments that serve as constraints for a generative aligner which implements a constrained version of the EM algorithm. Experiments on small-size Chinese and Arabic tasks show consistent improvements on AER. We also experimented with moderate-size Chinese machine translation tasks and got an average of 0.5 point improvement on BLEU scores across five standard NIST test sets and four other test sets
Word Alignment Based On Bilingual Bracketing
In this paper, an improved word alignment based on bilingual bracketing is described. The explored approaches include using Model-1 conditional probability, a boosting strategy for lexicon probabilities based on importance sampling, applying Parts of Speech to discriminate English words and incorporating information of English base noun phrase. The results of the shared task on French-English, RomanianEnglish and Chinese-English word alignments are presented and discussed
Combination of Machine Translation Systems via Hypothesis Selection from Combined n-best lists
Different approaches in machine translation achieve similar translation quality with a variety of translations in the output. Recently it has been shown, that it is possible to leverage the individual strengths of various systems and improve the overall translation quality by combining translation outputs. In this paper we present a method of hypothesis selection which is relatively simple compared to system combination methods which construct a synthesis of the input hypotheses. Our method uses information from n-best lists from several MT systems and features on the sentence level which are independent from the MT systems involved to improve the translation quality
- …