820 research outputs found
Prospects of Machine Translation in the Chinese Context
China has been among the several leading countries in the research and applications of Machine Translation (MT) and Machine-aided Translation (MAT) ever since the 1950s. The first part of this paper is a historical sketch of MT and MAT in the Chinese context, highlighting some important stages in its development which have laid the foundation for later achievements. It is shown that the research of MT in this region is similar to that in other parts of the world, with the attention gradually turning to MAT. The second part deals with the state of the art of MT and MAT research and applications in Mainland China, Taiwan and Hong Kong, respectively. Then popular commercial software dedicated to the translation from Chinese into other foreign languages, and vice versa, are introduced, with an appraisal of both their merits and demerits. Finally, prospects of MT and MAT in the Chinese context is discussed. It is suggested that, for mutual benefits, MT and MAT research in the Chinese context should cooperate with the outside world more closely.Depuis les années 1950, la Chine fait partie des pays les plus innovateurs dans la recherche et les applications des outils de traduction automatiques et de la traduction assistée par ordinateur. La première partie de ce travail propose un survol historique de l’évolution de ces techniques en mettant en valeur quelques moments importants de leurs développements dans le contexte chinois. Si les résultats des recherches en Chine sont semblables aux autres pays, on dénote cependant une attention de plus en plus marquée pour la traductions assistée par ordinateur. La deuxième partie expose l’état actuel des connaissances à propos de ces deux outils pour le territoire chinois, pour Taiwan et pour Hong Kong. Par la suite, cet article présente les logiciels commerciaux les plus populaires de traduction du chinois aux autres langues et vice-versa en montrant pour chacun leurs qualités et leurs défauts. Finalement, il est question des perspectives d’avenir de la traduction automatique et de la traduction assistée par ordinateur en Chine. Pour le bénéfice mutuel de ces deux domaines de la traduction, les recherches chinoises devraient opter pour une coopération plus étroite avec le reste du monde
Progress report on user interface studies, cognitive and user modelling
This WP presents the empirical foundations for the development of the CasMaCat workbench.
A series of experiments are being run to establish basic facts about translator behaviour in
computer-aided translation, focusing on the use of visualization options and input modalities
while post-editing machine translation (sections 1 and 2). Another series of studies deals with
cognitive modelling and individual di erences in translation production, in particular translator
types and translation/post-editing styles (sections 3 and 4).
This deliverable, D1.2, is a progress report on user interface studies, cognitive and user
modelling. It reports on post-editing and interactive translation experiments, as well as cognitive
modelling covering Tasks 1.1, 1.2, 1.3 and 1.5. It also addresses the issues that were raised in
the last review report for the project period M1 to M12, in particular:
the basic facts about the translator behaviour in CAT (sections 1 and 4) highlighting
usage of visualization and input modalities (see also D5.3).
the individual di erences in translator types and translation styles, (section 3, see also
terminology, section A.1)
the results and conclusions of preliminary studies conducted to investigate post-editing
and translation styles (section 2 and 5)
From the experiments and analyses so far, it is clear that the data collected in the CRITT
TPR-DB (Translation Process Research database) is an essential resource to achieve the Cas-
MaCat project goals. It allows for large-scale in depth studies of human translation processes
and thus serves as a basis of information to empirically grounded future development of the
CasMaCat workbench. It attracts an international research community to investigate human
translation processes under various conditions and to arrive at a more advanced level of understanding.
Additional language pairs and more data increase the chances to better underpin the
conclusions needed, as will be shown in this report, and as concluded in section 5
On the effective deployment of current machine translation technology
Machine translation is a fundamental technology that is gaining more importance
each day in our multilingual society. Companies and particulars are
turning their attention to machine translation since it dramatically cuts down
their expenses on translation and interpreting. However, the output of current
machine translation systems is still far from the quality of translations generated
by human experts. The overall goal of this thesis is to narrow down
this quality gap by developing new methodologies and tools that improve the
broader and more efficient deployment of machine translation technology.
We start by proposing a new technique to improve the quality of the
translations generated by fully-automatic machine translation systems. The
key insight of our approach is that different translation systems, implementing
different approaches and technologies, can exhibit different strengths and
limitations. Therefore, a proper combination of the outputs of such different
systems has the potential to produce translations of improved quality.
We present minimum Bayes¿ risk system combination, an automatic approach
that detects the best parts of the candidate translations and combines them
to generate a consensus translation that is optimal with respect to a particular
performance metric. We thoroughly describe the formalization of our
approach as a weighted ensemble of probability distributions and provide efficient
algorithms to obtain the optimal consensus translation according to the
widespread BLEU score. Empirical results show that the proposed approach
is indeed able to generate statistically better translations than the provided
candidates. Compared to other state-of-the-art systems combination methods,
our approach reports similar performance not requiring any additional data
but the candidate translations.
Then, we focus our attention on how to improve the utility of automatic
translations for the end-user of the system. Since automatic translations are
not perfect, a desirable feature of machine translation systems is the ability
to predict at run-time the quality of the generated translations. Quality estimation
is usually addressed as a regression problem where a quality score
is predicted from a set of features that represents the translation. However, although the concept of translation quality is intuitively clear, there is no
consensus on which are the features that actually account for it. As a consequence,
quality estimation systems for machine translation have to utilize
a large number of weak features to predict translation quality. This involves
several learning problems related to feature collinearity and ambiguity, and
due to the ¿curse¿ of dimensionality. We address these challenges by adopting
a two-step training methodology. First, a dimensionality reduction method
computes, from the original features, the reduced set of features that better
explains translation quality. Then, a prediction model is built from this
reduced set to finally predict the quality score. We study various reduction
methods previously used in the literature and propose two new ones based on
statistical multivariate analysis techniques. More specifically, the proposed dimensionality
reduction methods are based on partial least squares regression.
The results of a thorough experimentation show that the quality estimation
systems estimated following the proposed two-step methodology obtain better
prediction accuracy that systems estimated using all the original features.
Moreover, one of the proposed dimensionality reduction methods obtained the
best prediction accuracy with only a fraction of the original features. This
feature reduction ratio is important because it implies a dramatic reduction
of the operating times of the quality estimation system.
An alternative use of current machine translation systems is to embed them
within an interactive editing environment where the system and a human expert
collaborate to generate error-free translations. This interactive machine
translation approach have shown to reduce supervision effort of the user in
comparison to the conventional decoupled post-edition approach. However,
interactive machine translation considers the translation system as a passive
agent in the interaction process. In other words, the system only suggests translations
to the user, who then makes the necessary supervision decisions. As
a result, the user is bound to exhaustively supervise every suggested translation.
This passive approach ensures error-free translations but it also demands
a large amount of supervision effort from the user.
Finally, we study different techniques to improve the productivity of current
interactive machine translation systems. Specifically, we focus on the development
of alternative approaches where the system becomes an active agent
in the interaction process. We propose two different active approaches. On the
one hand, we describe an active interaction approach where the system informs
the user about the reliability of the suggested translations. The hope is that
this information may help the user to locate translation errors thus improving
the overall translation productivity. We propose different scores to measure translation reliability at the word and sentence levels and study the influence
of such information in the productivity of an interactive machine translation
system. Empirical results show that the proposed active interaction protocol
is able to achieve a large reduction in supervision effort while still generating
translations of very high quality. On the other hand, we study an active learning
framework for interactive machine translation. In this case, the system is
not only able to inform the user of which suggested translations should be
supervised, but it is also able to learn from the user-supervised translations to
improve its future suggestions. We develop a value-of-information criterion to
select which automatic translations undergo user supervision. However, given
its high computational complexity, in practice we study different selection
strategies that approximate this optimal criterion. Results of a large scale experimentation
show that the proposed active learning framework is able to
obtain better compromises between the quality of the generated translations
and the human effort required to obtain them. Moreover, in comparison to
a conventional interactive machine translation system, our proposal obtained
translations of twice the quality with the same supervision effort.González Rubio, J. (2014). On the effective deployment of current machine translation technology [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/37888TESI
Building task-oriented machine translation systems
La principal meta de esta tesis es desarrollar sistemas de traduccion interactiva que presenten mayor
sinergia con sus usuarios potenciales. Por ello, el objetivo es hacer los sistemas estado del arte mas
ergonomicos, intuitivos y eficientes, con el fin de que el experto humano se sienta mas comodo al utilizarlos.
Con este fin se presentan diferentes t�ecnicas enfocadas a mejorar la adaptabilidad y el tiempo
de respuesta de los sistemas de traduccion automatica subyacentes, as�ÿ como tambien se presenta una
estrategia cuya finalidad es mejorar la interaccion hombre-m�aquina. Todo ello con el proposito ultimo
de rellenar el hueco existente entre el estado del arte en traduccion automatica y las herramientas que los
traductores humanos tienen a su disposici�on.
En lo que respecta al tiempo de respuesta de los sistemas de traducci�on autom�atica, en esta tesis se
presenta una t�ecnica de poda de los par�ametros de los modelos de traducci�on actuales, cuya intuici�on est�a
basada en el concepto de segmentaci�on biling¤ue, pero que termina por evolucionar hacia una estrategia de
re-estimaci�on de dichos par�ametros. Utilizando esta estrategia se obtienen resultados experimentales que
demuestran que es posible podar la tabla de segmentos hasta en un 97%, sin mermar por ello la calidad
de las traducciones obtenidas. Adem�as, estos resultados son coherentes en diferentes pares de lenguas,
lo cual evidencia que la t�ecnica que se presenta aqu�ÿ es efectiva en un entorno de traducci�on autom�atica
tradicional, y por lo tanto podr�ÿa ser utilizada directamente en un escenario de post-edici�on. Sin embargo,
los experimentos llevados a cabo en traducci�on interactiva son ligeramente menos convincentes, pues
implican la necesidad de llegar a un compromiso entre el tiempo de respuesta y la calidad de los sufijos
producidos.
Por otra parte, se presentan dos t�ecnicas de adaptaci�on, con el prop�osito de mejorar la adaptabilidad
de los sistemas de traducci�on autom�atica. La primeraSanchis Trilles, G. (2012). Building task-oriented machine translation systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17174Palanci
Chinese information processing
A survey of the field of Chinese information processing is provided. It covers the following areas: the Chinese writing system, several popular Chinese encoding schemes and code conversions, Chinese keyboard entry methods, Chinese fonts, Chinese operating systems, basic Chinese computing techniques and applications
Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution
Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Peer reviewe
- …