11,737 research outputs found
A discriminative latent variable-based "DE" classifier for Chinese–English SMT
Syntactic reordering on the source-side
is an effective way of handling word order
differences. The (DE) construction
is a flexible and ubiquitous syntactic
structure in Chinese which is a major
source of error in translation quality.
In this paper, we propose a new classifier
model — discriminative latent variable
model (DPLVM) — to classify the
DE construction to improve the accuracy
of the classification and hence the translation
quality. We also propose a new feature
which can automatically learn the reordering
rules to a certain extent. The experimental
results show that the MT systems
using the data reordered by our proposed
model outperform the baseline systems
by 6.42% and 3.08% relative points
in terms of the BLEU score on PB-SMT
and hierarchical phrase-based MT respectively.
In addition, we analyse the impact
of DE annotation on word alignment and
on the SMT phrase table
IMPROVING MOLECULAR FINGERPRINT SIMILARITY VIA ENHANCED FOLDING
Drug discovery depends on scientists finding similarity in molecular fingerprints to the drug target. A new way to improve the accuracy of molecular fingerprint folding is presented. The goal is to alleviate a growing challenge due to excessively long fingerprints. This improved method generates a new shorter fingerprint that is more accurate than the basic folded fingerprint. Information gathered during preprocessing is used to determine an optimal attribute order. The most commonly used blocks of bits can then be organized and used to generate a new improved fingerprint for more optimal folding. We thenapply the widely usedTanimoto similarity search algorithm to benchmark our results. We show an improvement in the final results using this method to generate an improved fingerprint when compared against other traditional folding methods
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Improved phrase-based SMT with syntactic reordering patterns learned from lattice scoring
In this paper, we present a novel approach to incorporate source-side syntactic reordering patterns into phrase-based SMT. The main contribution of this work is to use the lattice scoring approach to exploit and utilize reordering
information that is favoured by the baseline PBSMT system. By referring to the parse trees of the training corpus, we represent the observed reorderings with source-side
syntactic patterns. The extracted patterns are then used to convert the parsed inputs into word lattices, which contain both the original source sentences and their potential reorderings. Weights of the word lattices are estimated from the observations of the syntactic reordering patterns in the training corpus. Finally, the PBSMT system is tuned
and tested on the generated word lattices to show the benefits of adding potential sourceside reorderings in the inputs. We confirmed the effectiveness of our proposed method on a medium-sized corpus for Chinese-English
machine translation task. Our method outperformed the baseline system by 1.67% relative on a randomly selected testset and 8.56% relative on the NIST 2008 testset in terms of BLEU score
Linguistic Structure in Statistical Machine Translation
This thesis investigates the influence of linguistic structure in statistical machine translation. We develop a word reordering model based on syntactic parse trees and address the issues of pronouns and morphological agreement with a source discriminative word lexicon predicting the translation for individual words using structural features. When used in phrase-based machine translation, the models improve the translation for language pairs with different word order and morphological variation
Description of the Chinese-to-Spanish rule-based machine translation system developed with a hybrid combination of human annotation and statistical techniques
Two of the most popular Machine Translation (MT) paradigms are rule based (RBMT) and corpus based, which include the statistical systems (SMT). When scarce parallel corpus is available, RBMT becomes particularly attractive. This is the case of the Chinese--Spanish language pair.
This article presents the first RBMT system for Chinese to Spanish. We describe a hybrid method for constructing this system taking advantage of available resources such as parallel corpora that are used to extract dictionaries and lexical and structural transfer rules.
The final system is freely available online and open source. Although performance lags behind standard SMT systems for an in-domain test set, the results show that the RBMT’s coverage is competitive and it outperforms the SMT system in an out-of-domain test set. This RBMT system is available to the general public, it can be further enhanced, and it opens up the possibility of creating future hybrid MT systems.Peer ReviewedPostprint (author's final draft
Timesharing in relation to broad ability domains
[Abstract]: The concept of a timesharing ability has been the subject of considerable interest in recent times. The present study set out to determine whether a timesharing factor can be identi¬fied when a number of competing tasks are presented in the midst of a range of single tests designed to sample a broad range of psychological dimensions. Evidence for the existence of such a factor would form an important addition to our knowledge of human cognitive functioning.
The framework for the study was provided by the theory of fluid and crystallized intelligence. A battery of single and competing tasks was presented to 126 subjects. The competing tasks represented a variety of within- and across-factor combinations from different levels of the (Gf/Gc) hierarchy. Modality of presentation was also varied in some combinations. On the basis of evidence presented in this study, it would be pre¬mature to seek to include a timesharing factor in the (Gf/Gc) model of intelligence
- …