251,867 research outputs found

    GREAT: open source software for statistical machine translation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-011-9097-6[EN] In this article, the first public release of GREAT as an open-source, statistical machine translation (SMT) software toolkit is described. GREAT is based on a bilingual language modelling approach for SMT, which is so far implemented for n-gram models based on the framework of stochastic finite-state transducers. The use of finite-state models is motivated by their simplicity, their versatility, and the fact that they present a lower computational cost, if compared with other more expressive models. Moreover, if translation is assumed to be a subsequential process, finite-state models are enough for modelling the existing relations between a source and a target language. GREAT includes some characteristics usually present in state-of-the-art SMT, such as phrase-based translation models or a log-linear framework for local features. Experimental results on a well-known corpus such as Europarl are reported in order to validate this software. A competitive translation quality is achieved, yet using both a lower number of model parameters and a lower response time than the widely-used, state-of-the-art SMT system Moses. © 2011 Springer Science+Business Media B.V.Study was supported by the EC (FEDER, FSE), the Spanish government (MICINN, MITyC, “Plan E”, under Grants MIPRCV “Consolider Ingenio 2010”, iTrans2 TIN2009-14511, and erudito.com TSI-020110-2009-439), and the Generalitat Valenciana (Grant Prometeo/2009/014).González Mollá, J.; Casacuberta Nolla, F. (2011). GREAT: open source software for statistical machine translation. Machine Translation. 25(2):145-160. https://doi.org/10.1007/s10590-011-9097-6S145160252Amengual JC, Benedí JM, Casacuberta F, Castaño MA, Castellanos A, Jiménez VM, Llorens D, Marzal A, Pastor M, Prat F, Vidal E, Vilar JM (2000) The EUTRANS-I speech translation system. Mach Transl 15(1-2): 75–103Andrés-Ferrer J, Juan-Císcar A, Casacuberta F (2008) Statistical estimation of rational transducers applied to machine translation. Appl Artif Intell 22(1–2): 4–22Bangalore S, Riccardi G (2002) Stochastic finite-state models for spoken language machine translation. Mach Transl 17(3): 165–184Berstel J (1979) Transductions and context-free languages. B.G. Teubner, Stuttgart, GermanyCasacuberta F, Vidal E (2004) Machine translation with inferred stochastic finite-state transducers. Comput Linguist 30(2): 205–225Casacuberta F, Vidal E (2007) Learning finite-state models for machine translation. Mach Learn 66(1): 69–91Foster G, Kuhn R, Johnson H (2006) Phrasetable smoothing for statistical machine translation. In: Proceedings of the 11th Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, pp 53–61González J (2009) Aprendizaje de transductores estocásticos de estados finitos y su aplicación en traducción automática. PhD thesis, Universitat Politècnica de València. Advisor: Casacuberta FGonzález J, Casacuberta F (2009) GREAT: a finite-state machine translation toolkit implementing a grammatical inference approach for transducer inference (GIATI). In: Proceedings of the EACL Workshop on Computational Linguistic Aspects of Grammatical Inference, Athens, Greece, pp 24–32Kanthak S, Vilar D, Matusov E, Zens R, Ney H (2005) Novel reordering approaches in phrase-based statistical machine translation. In: Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, Ann Arbor, MI, pp 167–174Karttunen L (2001) Applications of finite-state transducers in natural language processing. In: Proceedings of the 5th Conference on Implementation and Application of Automata, London, UK, pp 34–46Kneser R, Ney H (1995) Improved backing-off for n-gram language modeling. In: Proceedings of the 20th IEEE International Conference on Acoustic, Speech and Signal Processing, Detroit, MI, pp 181–184Knight K, Al-Onaizan Y (1998) Translation with finite-state devices. In: Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas, Langhorne, PA, pp 421–437Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 9th Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp 388–395Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp 79–86Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge, UKKoehn P, Hoang H (2007) Factored translation models. In: Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, pp 868–876Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp 177–180Kumar S, Deng Y, Byrne W (2006) A weighted finite state transducer translation template model for statistical machine translation. Nat Lang Eng 12(1): 35–75Li Z, Callison-Burch C, Dyer C, Ganitkevitch J, Khudanpur S, Schwartz L, Thornton WNG, Weese J, Zaidan OF (2009) Joshua: an open source toolkit for parsing-based machine translation. In: Procee- dings of the ACL Workshop on Statistical Machine Translation, Morristown, NJ, pp 135–139Llorens D, Vilar JM, Casacuberta F (2002) Finite state language models smoothed using n-grams. Int J Pattern Recognit Artif Intell 16(3): 275–289Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: Proceedings of the 7th Conference on Empirical Methods in Natural Language Processing, Morristown, NJ, pp 133–139Mariño JB, Banchs RE, Crego JM, de Gispert A, Lambert P, Fonollosa JAR, Costa-jussà MR (2006) N-gram-based machine translation. Comput Linguist 32(4): 527–549Medvedev YT (1964) On the class of events representable in a finite automaton. In: Moore EF (eds) Sequential machines selected papers. Addison Wesley, Reading, MAMohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1): 69–88Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 295–302Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1): 19–51Ortiz D, García-Varea I, Casacuberta F (2005) Thot: a toolkit to train phrase-based statistical translation models. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp 141–148Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp 311–318Pérez A, Torres MI, Casacuberta F (2008) Joining linguistic and statistical methods for Spanish-to-Basque speech translation. Speech Commun 50: 1021–1033Picó D, Casacuberta F (2001) Some statistical-estimation methods for stochastic finite-state transducers. Mach Learn 44: 121–142Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Comput Speech Lang 10: 187–228Simard M, Plamondon P (1998) Bilingual sentence alignment: balancing robustness and accuracy. Mach Transl 13(1): 59–80Singh AK, Husain S (2007) Exploring translation similarities for building a better sentence aligner. In: Proceedings of the 3rd Indian International Conference on Artificial Intelligence, Pune, India, pp 1852–1863Steinbiss V, Tran BH, Ney H (1994) Improvements in beam search. In: Proceedings of the 3rd International Conference on Spoken Language Processing, Yokohama, Japan, pp 2143–2146Torres MI, Varona A (2001) k-TSS language models in speech recognition systems. Comput Speech Lang 15(2): 127–149Vidal E (1997) Finite-state speech-to-speech translation. In: Proceedings of the 22nd IEEE International Conference on Acoustic, Speech and Signal Processing, Munich, Germany, pp 111–114Vidal E, Thollard F, de la Higuera C, Casacuberta F, Carrasco RC (2005) Probabilistic finite-state machines–Part II. IEEE Trans Pattern Anal Mach Intell 27(7): 1025–1039Viterbi A (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13(2): 260–26

    New Confidence Measures for Statistical Machine Translation

    Get PDF
    International audienceA confidence measure is able to estimate the reliability of an hypothesis provided by a machine translation system. The problem of confidence measure can be seen as a process of testing : we want to decide whether the most probable sequence of words provided by the machine translation system is correct or not. In the following we describe several original word-level confidence measures for machine translation, based on mutual information, n-gram language model and lexical features language model. We evaluate how well they perform individually or together, and show that using a combination of confidence measures based on mutual information yields a classification error rate as low as 25.1\% with an F-measure of 0.708

    An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation

    No full text
    International audienceRecently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predicting the quality of words of a SMT output, whose novel contributions are (i) support for various target languages, (ii) handle a number of features of different types (system-based, lexical , syntactic and semantic). In addition, the toolkit also integrates a wide variety of Natural Language Processing or Machine Learning tools to pre-process data, extract features and estimate confidence at word-level. Features for Word-level Confidence Estimation (WCE) can be easily added / removed using a configuration file. We validate the toolkit by experimenting in the WCE evaluation framework of WMT shared task with two language pairs: French-English and English-Spanish. The toolkit is made available to the research community with ready-made scripts to launch full experiments on these language pairs, while achieving state-of-the-art and reproducible performances

    Translating Articles in the Humanities and Social Sciences.

    Get PDF
    International audienceMy study mainly consists in analysing precise examples rather than developing long theories, as the readers of this journal are not translation experts, but would benefit from a translator's viewpoint for their own research. The study focuses on written articles and leaves oral communications aside because they entail different translation issues (e.g. ways to address an English-speaking audience, simpler syntax, or words easy to pronounce for a non-native English speaker). The corpus of the article is extracted from my own experience as a freelance translator and tackles different fields, i.e. management, law, psychology and even geography. All the examples are either translated from English to French or from French to English. My study aims to debunk misconceptions about translation, more specifically in academic environments. I chose to divide my presentation in six main stereotypes : 1) Translating is easier and quicker than writing. Wrong. Translating is a long-term process which involves, among other things, language skills, content-based knowledge, appropriating the rationale and ideas of the author, and rewriting the whole article, the translator becoming a co-author of the article. 2) Machine translation yields good results. Wrong. Even if machine translation is a useful, time-saving tool for translators and researchers, it cannot replace human translators who are still needed at least for post-editing. 3) Literal translation ensures the quality of the translation. Wrong. Being faithful to the words in the source language text would often sound awkward in the target language ; the translated article would sound like a translation, without intending to do so for stylistic reasons. A case in point is, when authors can translate their own articles, they rephrase their own work and usually choose a different perspective from their original article ; they do not convey exactly the same meaning in the translated article. 4) Everyone speaks the same English. Wrong. Each country-or even states for larger countries-use English differently and have developed their own English. Authors should therefore write their articles differently when they submit their paper to a British, American, Australian, Canadian or Indian journal. This factor also depends on the scientific community you belong to, as it will superimpose specific terminology, phraseology and standards to tackle the same subject

    Dimensionality reduction methods for machine translation quality estimation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-013-9139-3[EN] Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets.This work supported by the European Union Seventh Framework Program (FP7/2007-2013) under the CasMaCat project (grants agreement no. 287576), by Spanish MICINN under TIASA (TIN2009-14205-C04-02) project, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).González Rubio, J.; Navarro Cerdán, JR.; Casacuberta Nolla, F. (2013). Dimensionality reduction methods for machine translation quality estimation. Machine Translation. 27(3-4):281-301. https://doi.org/10.1007/s10590-013-9139-3S281301273-4Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1–2):237–260Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New YorkAvramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of the seventh workshop on statistical machine translation, pp 84–90Bellman RE (1961) Adaptive control processes: a guided tour. Rand Corporation research studies. Princeton University Press, PrincetonBisani M, Ney H (2004) Bootstrap estimates for confidence intervals in asr performance evaluation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 409–412Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the international conference on Computational Linguistics, pp 315–321Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, pp 10–51Chong I, Jun C (2005) Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 78(1–2):103–112Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Gamon M, Aue A, Smets M (2005) Sentence-Level MT evaluation without reference translations: beyond language modeling. In: Proceedings of the conference of the European Association for Machine TranslationGandrabur S, Foster G (2003) Confidence estimation for text prediction. In: Proceedings of the conference on computational natural language learning, pp 315–321Geladi P, Kowalski BR (1986) Partial least-squares regression: a tutorial. Anal Chim Acta 185(1):1–17González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedinss of the meeting of the association for computational linguistics, pp 173–177González-Rubio J, Sanchís A, Casacuberta F (2012) Prhlt submission to the wmt12 quality estimation task. In: Proceedings of the seventh workshop on statistical machine translation, pp 104–108Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Machine Learning Research 3:1157–1182Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat 2(3):360–378Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the association for computational linguistics, demonstration sessionKohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572Platt JC (1999) Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of the conference on advances in neural information processing systems II, pp 557–563Quinlan RJ (1992) Learning with continuous classes. In: Proceedings of the Australian joint conference on artificial intelligence, pp 343–348Quirk C (2004) Training a sentence-level machine translation confidence measure. In: Proceedings of conference on language resources and evaluation, pp 825–828Sanchis A, Juan A, Vidal E (2007) Estimation of confidence measures for machine translation. In: Proceedings of the machine translation summit XI, pp 407–412Scott DW, Thompson JR (1983) Probability density estimation in higher dimensions. In: Proceedings of the fifteenth symposium on the interface, computer science and statistics, pp 173–179Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the meeting of the association for computational linguistics, pp 612–621Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation. Montreal, Canada, pp 145–151Specia L, Saunders C, Wang Z, Shawe-Taylor J, Turchi M (2009a) Improving the confidence of machine translation quality estimates. In: Proceedings of the machine translation summit XIISpecia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the meeting of the European Association for Machine Translation, pp 28–35Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Ling 33:9–40Ueffing N, Macherey K, Ney H (2003) Confidence measures for statistical machine translation. In: Proceedings of the MT summit IX, pp 394–401Wold H (1966) Estimation of principal components and related models by iterative least squares. Academic Press, New Yor

    The Impact of AI from the Perspective of Multilingual Students

    Get PDF
    In a world marked by the rapid ascension of AI technology, the consequences of this transformation extend across industries. Automation is becoming more prevalent, and human-machine interactions are redefining job markets. This article delves into the influence of Artificial Intelligence (AI) on the professional landscape, with a focus on the experiences of multilingual students. As AI technologies reach their zenith, traditional, labor-intensive sectors are adapting to the reality of AI-induced unemployment. The shift is most apparent in fields dependent on manual labor, such as construction, manufacturing, and electronics assembly. Industries that have thrived on repetitive, labor-intensive tasks are now transitioning to AI, replacing human workers with automated machines that offer precision, efficiency, and reduced operational risks. Consequently, we are witnessing the onset of an AI-induced unemployment wave that has far-reaching implications. Parallel to the automation of manual labor, language-related sectors are undergoing substantial changes. Translation, simultaneous interpretation, foreign language instruction, and editorial work are no exceptions. The meteoric rise of machine translation, fueled by advances in AI, has reshaped this landscape. Machine translation, evolving from rudimentary word-for-word conversion to a context-aware understanding of language nuances, is poised to challenge human translators. However, its capabilities remain concentrated in widely used languages, and challenges arise when translating less common or non-universal languages. The predicament of translating non-universal languages underscores the limitations of AI, demonstrating that it can provide invaluable assistance but may not replace the finesse of human language experts. This shift leaves multilingual students and language professionals in a unique position. They can add value by leveraging their deep understanding of languages, cultures, and contextual nuances. The skill of bridging linguistic and cultural gaps is becoming increasingly indispensable in international business, diplomacy, and cross-cultural communication. As the AI era advances, multilingual individuals must recognize the unique skills they possess in understanding emotional, cultural, and contextual subtleties. In an environment where AI excels at translating factual content but falls short in conveying the emotions of a letter, the nuances of a poem, or the cultural depth of a phrase, multilingual students and language professionals stand as indispensable figures
    • …
    corecore