11 research outputs found
Syntax-Directed Attention for Neural Machine Translation
Attention mechanism, including global attention and local attention, plays a
key role in neural machine translation (NMT). Global attention attends to all
source words for word prediction. In comparison, local attention selectively
looks at fixed-window source words. However, alignment weights for the current
target word often decrease to the left and right by linear distance centering
on the aligned source position and neglect syntax-directed distance
constraints. In this paper, we extend local attention with syntax-distance
constraint, to focus on syntactically related source words with the predicted
target word, thus learning a more effective context vector for word prediction.
Moreover, we further propose a double context NMT architecture, which consists
of a global context vector and a syntax-directed context vector over the global
attention, to provide more translation performance for NMT from source
representation. The experiments on the large-scale Chinese-to-English and
English-to-Germen translation tasks show that the proposed approach achieves a
substantial and significant improvement over the baseline system.Comment: AAAI2018, revised versio
Designing Ranking System for Chinese Product Search Engine Based on Customer Reviews
With the spread of e-commerce platforms, it becomes extremely difficult for the costumer to choose the right product from a large number of products, and different sellers based only on his/her own experience, product picture and meta-data. Customer’s reviews present a rich source of information that have an enormous impact on the purchasing decision of the potential consumers, but reading all of the available reviews is a hard task and time consuming. Thus, the automated mining of these reviews and extract product features in order to generate a raking system present a valuable and useful tool for consumers to make well-versed decision. In this paper, we propose a product search ranking mechanism based on costumers reviews written in Chinese language. We score each product using the features extracted from the reviews. Also, a ranking function has been developed. The proposed research evaluated using customer reviews of two famous brands of mobile phones: Apple and Samsung from taobao.com. The evaluation shows a promising result compared to the existing systems
Chinese Textual Entailment with Wordnet Semantic and Dependency Syntactic Analysis
Chun Tu and Min-Yuh Day (2013), "Chinese Textual Entailment with Wordnet Semantic and Dependency Syntactic Analysis", 2013 IEEE International Workshop on Empirical Methods for Recognizing Inference in Text (IEEE EM-RITE 2013), August 14, 2013, in Proceedings of the IEEE International Conference on Information Reuse and Integration (IEEE IRI 2013), San Francisco, California, USA, August 14-16, 2013, pp. 69-74.[[abstract]]Recognizing Inference in TExt (RITE) is a task for automatically detecting entailment, paraphrase, and contradiction in texts which addressing major text understanding in information access research areas. In this paper, we proposed a Chinese textual entailment system using Wordnet semantic and dependency syntactic approaches in Recognizing Inference in Text (RITE) using the NTCIR-10 RITE-2 subtask datasets. Wordnet is used to recognize entailment at lexical level. Dependency syntactic approach is a tree edit distance algorithm applied on the dependency trees of both the text and the hypothesis. We thoroughly evaluate our approach using NTCIR-10 RITE-2 subtask datasets. As a result, our system achieved 73.28% on Traditional Chinese Binary-Class (BC) subtask and 74.57% on Simplified Chinese Binary-Class subtask with NTCIR-10 RITE-2 development datasets. Thorough experiments with the text fragments provided by the NTCIR-10 RITE-2 subtask showed that the proposed approach can improve system's overall accuracy.[[sponsorship]]IEEE[[incitationindex]]EI[[conferencetype]]國際[[conferencedate]]20130814~20130816[[booktype]]電子版[[iscallforpapers]]Y[[conferencelocation]]San Francisco, US
Hong Kong identity and the press-politics dynamics: a corpus-assisted discourse study
This article uses corpus-assisted discourse studies to examine the discursive construction of collective identity in the Hong Kong press, particularly with regard to its relationship with mainland China. Drawing on critical discourse analysis and collective identity theory, it develops a set of analytical techniques amenable to quantification, and applies it to a corpus of newspaper coverage of the 2005 Hong Kong Chief Executive election. The article uses these techniques not only to develop a representative description and interpretation of discursive patterns, but also to offer an explanatory account of the discursive construction of Hong Kong identity. It shows that the discursive patterns vary systematically depending on newspaper ownership, commercial imperatives, and newspaper type
Investigating the Relationship between Classification Quality and SMT Performance in Discriminative Reordering Models
Reordering is one of the most important factors affecting the quality of the output in
statistical machine translation (SMT). A considerable number of approaches that proposed addressing
the reordering problem are discriminative reordering models (DRM). The core component of the
DRMs is a classifier which tries to predict the correct word order of the sentence. Unfortunately,
the relationship between classification quality and ultimate SMT performance has not been
investigated to date. Understanding this relationship will allow researchers to select the classifier that
results in the best possible MT quality. It might be assumed that there is a monotonic relationship
between classification quality and SMT performance, i.e., any improvement in classification
performance will be monotonically reflected in overall SMT quality. In this paper, we experimentally
show that this assumption does not always hold, i.e., an improvement in classification performance
might actually degrade the quality of an SMT system, from the point of view of MT automatic
evaluation metrics. However, we show that if the improvement in the classification performance is
high enough, we can expect the SMT quality to improve as well. In addition to this, we show that
there is a negative relationship between classification accuracy and SMT performance in imbalanced
parallel corpora. For these types of corpora, we provide evidence that, for the evaluation of the
classifier, macro-averaged metrics such as macro-averaged F-measure are better suited than accuracy,
the metric commonly used to date
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT
Getting Past the Language Gap: Innovations in Machine Translation
In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT