Search CORE

11 research outputs found

Syntax-Directed Attention for Neural Machine Translation

Author: Chen Kehai
Sumita Eiichiro
Utiyama Masao
Wang Rui
Zhao Tiejun
Publication venue
Publication date: 26/04/2018
Field of study

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax-directed distance constraints. In this paper, we extend local attention with syntax-distance constraint, to focus on syntactically related source words with the predicted target word, thus learning a more effective context vector for word prediction. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector over the global attention, to provide more translation performance for NMT from source representation. The experiments on the large-scale Chinese-to-English and English-to-Germen translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.Comment: AAAI2018, revised versio

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Designing Ranking System for Chinese Product Search Engine Based on Customer Reviews

Author: Chaabna Samira
Lutf Mohammed
Wang Hu
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/06/2014
Field of study

With the spread of e-commerce platforms, it becomes extremely difficult for the costumer to choose the right product from a large number of products, and different sellers based only on his/her own experience, product picture and meta-data. Customer’s reviews present a rich source of information that have an enormous impact on the purchasing decision of the potential consumers, but reading all of the available reviews is a hard task and time consuming. Thus, the automated mining of these reviews and extract product features in order to generate a raking system present a valuable and useful tool for consumers to make well-versed decision. In this paper, we propose a product search ranking mechanism based on costumers reviews written in Chinese language. We score each product using the features extracted from the reviews. Also, a ranking function has been developed. The proposed research evaluated using customer reviews of two famous brands of mobile phones: Apple and Samsung from taobao.com. The evaluation shows a promising result compared to the existing systems

AIS Electronic Library (AISeL)

Chinese Textual Entailment with Wordnet Semantic and Dependency Syntactic Analysis

Author: [[corresponding]]Min-Yuh Day
Tu Chun
Publication venue: IEEE Press
Publication date
Field of study

Chun Tu and Min-Yuh Day (2013), "Chinese Textual Entailment with Wordnet Semantic and Dependency Syntactic Analysis", 2013 IEEE International Workshop on Empirical Methods for Recognizing Inference in Text (IEEE EM-RITE 2013), August 14, 2013, in Proceedings of the IEEE International Conference on Information Reuse and Integration (IEEE IRI 2013), San Francisco, California, USA, August 14-16, 2013, pp. 69-74.[[abstract]]Recognizing Inference in TExt (RITE) is a task for automatically detecting entailment, paraphrase, and contradiction in texts which addressing major text understanding in information access research areas. In this paper, we proposed a Chinese textual entailment system using Wordnet semantic and dependency syntactic approaches in Recognizing Inference in Text (RITE) using the NTCIR-10 RITE-2 subtask datasets. Wordnet is used to recognize entailment at lexical level. Dependency syntactic approach is a tree edit distance algorithm applied on the dependency trees of both the text and the hypothesis. We thoroughly evaluate our approach using NTCIR-10 RITE-2 subtask datasets. As a result, our system achieved 73.28% on Traditional Chinese Binary-Class (BC) subtask and 74.57% on Simplified Chinese Binary-Class subtask with NTCIR-10 RITE-2 development datasets. Thorough experiments with the text fragments provided by the NTCIR-10 RITE-2 subtask showed that the proposed approach can improve system's overall accuracy.[[sponsorship]]IEEE[[incitationindex]]EI[[conferencetype]]國際[[conferencedate]]20130814~20130816[[booktype]]電子版[[iscallforpapers]]Y[[conferencelocation]]San Francisco, US

Tamkang University Institutional Repository

Hong Kong identity and the press-politics dynamics: a corpus-assisted discourse study

Author: Mengmeng Zhang (1507543)
Sabina Mihelj (1254861)
Publication venue
Publication date: 01/01/2012
Field of study

This article uses corpus-assisted discourse studies to examine the discursive construction of collective identity in the Hong Kong press, particularly with regard to its relationship with mainland China. Drawing on critical discourse analysis and collective identity theory, it develops a set of analytical techniques amenable to quantification, and applies it to a corpus of newspaper coverage of the 2005 Hong Kong Chief Executive election. The article uses these techniques not only to develop a representative description and interpretation of discursive patterns, but also to offer an explanatory account of the discursive construction of Hong Kong identity. It shows that the discursive patterns vary systematically depending on newspaper ownership, commercial imperatives, and newspaper type

Loughborough University Institutional Repository

Investigating the Relationship between Classification Quality and SMT Performance in Discriminative Reordering Models

Author: Kazemi Arefeh
Monadjemi Amirhassan
Nematbakhsh Mohammadali
Toral Antonio
Way Andy
Publication venue
Publication date: 24/07/2017
Field of study

Reordering is one of the most important factors affecting the quality of the output in statistical machine translation (SMT). A considerable number of approaches that proposed addressing the reordering problem are discriminative reordering models (DRM). The core component of the DRMs is a classifier which tries to predict the correct word order of the sentence. Unfortunately, the relationship between classification quality and ultimate SMT performance has not been investigated to date. Understanding this relationship will allow researchers to select the classifier that results in the best possible MT quality. It might be assumed that there is a monotonic relationship between classification quality and SMT performance, i.e., any improvement in classification performance will be monotonically reflected in overall SMT quality. In this paper, we experimentally show that this assumption does not always hold, i.e., an improvement in classification performance might actually degrade the quality of an SMT system, from the point of view of MT automatic evaluation metrics. However, we show that if the improvement in the classification performance is high enough, we can expect the SMT quality to improve as well. In addition to this, we show that there is a negative relationship between classification accuracy and SMT performance in imbalanced parallel corpora. For these types of corpora, we provide evidence that, for the evaluation of the classifier, macro-averaged metrics such as macro-averaged F-measure are better suited than accuracy, the metric commonly used to date

Multidisciplinary Digital Publishing Institute

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Irish Universities

Directory of Open Access Journals

DCU Online Research Access Service

Dissertations of the University of Groningen

Getting Past the Language Gap: Innovations in Machine Translation

Author: Hush NS
McKemmish LK
McKenzie RH
Reimers JR
Publication venue: Attuale: SPRINGER, NEW YOIRK
Publication date: 01/01/2013
Field of study

In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

Archivio Ricerca Ca'Foscari

Crossref

OPUS - University of Technology Sydney

UCL Discovery

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

University of Queensland eSpace