16,555 research outputs found
Source-side context-informed hypothesis alignment for combining outputs from machine translation systems
This paper presents a new hypothesis alignment method for combining outputs of multiple machine translation (MT) systems. Traditional hypothesis alignment algorithms such
as TER, HMM and IHMM do not directly utilise the context information of the source side but rather address the alignment issues via the output data itself. In this paper, a source-side context-informed (SSCI) hypothesis alignment method is proposed to carry out the word alignment and word reordering issues. First of all, the source–target word alignment links are produced as the hidden variables by exporting source phrase spans during the translation decoding process. Secondly, a mapping strategy and normalisation model are employed to acquire the 1-
to-1 alignment links and build the confusion network (CN). The source-side context-based method outperforms the state-of-the-art TERbased alignment model in our experiments
on the WMT09 English-to-French and NIST Chinese-to-English data sets respectively. Experimental results demonstrate that our proposed approach scores consistently among the
best results across different data and language pair conditions
An incremental three-pass system combination framework by combining multiple hypothesis alignment methods
System combination has been applied successfully to various machine translation tasks in recent years. As is known, the hypothesis alignment method is a critical factor for the
translation quality of system combination. To date, many effective hypothesis alignment metrics have been proposed and applied to the system combination, such as TER, HMM,
ITER, IHMM, and SSCI. In addition, Minimum Bayes-risk (MBR) decoding and confusion networks (CN) have become state-of-the-art techniques in system combination. In this paper,
we examine different hypothesis alignment approaches and investigate how much the hypothesis alignment results impact on system combination, and finally present a three-pass system combination strategy that can combine hypothesis alignment results derived from multiple alignment metrics to generate a better translation. Firstly, these different alignment metrics are carried out to align the backbone and hypotheses, and the individual CNs are built corresponding to each set of alignment results; then we construct a ‘super network’ by merging the multiple metric-based CNs to generate a consensus output. Finally a modified MBR network approach is employed to find the best overall translation. Our proposed strategy outperforms the best single confusion network as well as the best single system in our experiments on the NIST Chinese-to-English test set and the WMT2009 English-to-French system combination shared test set
Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization
In this paper we study the personalized text search problem. The keyword
based search method in conventional algorithms has a low efficiency in
understanding users' intention since the semantic meaning, user profile, user
interests are not always considered. Firstly, we propose a novel text search
algorithm using a inverse filtering mechanism that is very efficient for label
based item search. Secondly, we adopt the Bayesian network to implement the
user interest prediction for an improved personalized search. According to user
input, it searches the related items using keyword information, predicted user
interest. Thirdly, the word vectorization is used to discover potential targets
according to the semantic meaning. Experimental results show that the proposed
search engine has an improved efficiency and accuracy and it can operate on
embedded devices with very limited computational resources
- …