82,390 research outputs found

    Dimensionality reduction methods for machine translation quality estimation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10590-013-9139-3[EN] Quality estimation (QE) for machine translation is usually addressed as a regression problem where a learning model is used to predict a quality score from a (usually highly-redundant) set of features that represent the translation. This redundancy hinders model learning, and thus penalizes the performance of quality estimation systems. We propose different dimensionality reduction methods based on partial least squares regression to overcome this problem, and compare them against several reduction methods previously used in the QE literature. Moreover, we study how the use of such methods influence the performance of different learning models. Experiments carried out on the English-Spanish WMT12 QE task showed that it is possible to improve prediction accuracy while significantly reducing the size of the feature sets.This work supported by the European Union Seventh Framework Program (FP7/2007-2013) under the CasMaCat project (grants agreement no. 287576), by Spanish MICINN under TIASA (TIN2009-14205-C04-02) project, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).González Rubio, J.; Navarro Cerdán, JR.; Casacuberta Nolla, F. (2013). Dimensionality reduction methods for machine translation quality estimation. Machine Translation. 27(3-4):281-301. https://doi.org/10.1007/s10590-013-9139-3S281301273-4Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor Comput Sci 209(1–2):237–260Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New YorkAvramidis E (2012) Quality estimation for machine translation output using linguistic analysis and decoding features. In: Proceedings of the seventh workshop on statistical machine translation, pp 84–90Bellman RE (1961) Adaptive control processes: a guided tour. Rand Corporation research studies. Princeton University Press, PrincetonBisani M, Ney H (2004) Bootstrap estimates for confidence intervals in asr performance evaluation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol 1, pp 409–412Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the international conference on Computational Linguistics, pp 315–321Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, pp 10–51Chong I, Jun C (2005) Performance of some variable selection methods when multicollinearity is present. Chemom Intell Lab Syst 78(1–2):103–112Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297Gamon M, Aue A, Smets M (2005) Sentence-Level MT evaluation without reference translations: beyond language modeling. In: Proceedings of the conference of the European Association for Machine TranslationGandrabur S, Foster G (2003) Confidence estimation for text prediction. In: Proceedings of the conference on computational natural language learning, pp 315–321Geladi P, Kowalski BR (1986) Partial least-squares regression: a tutorial. Anal Chim Acta 185(1):1–17González-Rubio J, Ortiz-Martínez D, Casacuberta F (2010) Balancing user effort and translation error in interactive machine translation via confidence measures. In: Proceedinss of the meeting of the association for computational linguistics, pp 173–177González-Rubio J, Sanchís A, Casacuberta F (2012) Prhlt submission to the wmt12 quality estimation task. In: Proceedings of the seventh workshop on statistical machine translation, pp 104–108Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Machine Learning Research 3:1157–1182Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18Hotelling H (1931) The generalization of Student’s ratio. Ann Math Stat 2(3):360–378Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the association for computational linguistics, demonstration sessionKohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572Platt JC (1999) Using analytic QP and sparseness to speed training of support vector machines. In: Proceedings of the conference on advances in neural information processing systems II, pp 557–563Quinlan RJ (1992) Learning with continuous classes. In: Proceedings of the Australian joint conference on artificial intelligence, pp 343–348Quirk C (2004) Training a sentence-level machine translation confidence measure. In: Proceedings of conference on language resources and evaluation, pp 825–828Sanchis A, Juan A, Vidal E (2007) Estimation of confidence measures for machine translation. In: Proceedings of the machine translation summit XI, pp 407–412Scott DW, Thompson JR (1983) Probability density estimation in higher dimensions. In: Proceedings of the fifteenth symposium on the interface, computer science and statistics, pp 173–179Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the meeting of the association for computational linguistics, pp 612–621Soricut R, Bach N, Wang Z (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation. Montreal, Canada, pp 145–151Specia L, Saunders C, Wang Z, Shawe-Taylor J, Turchi M (2009a) Improving the confidence of machine translation quality estimates. In: Proceedings of the machine translation summit XIISpecia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009b) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the meeting of the European Association for Machine Translation, pp 28–35Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288Ueffing N, Ney H (2007) Word-level confidence estimation for machine translation. Comput Ling 33:9–40Ueffing N, Macherey K, Ney H (2003) Confidence measures for statistical machine translation. In: Proceedings of the MT summit IX, pp 394–401Wold H (1966) Estimation of principal components and related models by iterative least squares. Academic Press, New Yor

    Bridging SMT and TM with translation recommendation

    Get PDF
    We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding exact matches. futhermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels

    DeMoN: Depth and Motion Network for Learning Monocular Stereo

    Full text link
    In this paper we formulate structure from motion as a learning problem. We train a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and motion, but additionally surface normals, optical flow between the images and confidence of the matching. A crucial component of the approach is a training loss based on spatial relative differences. Compared to traditional two-frame structure from motion methods, results are more accurate and more robust. In contrast to the popular depth-from-single-image networks, DeMoN learns the concept of matching and, thus, better generalizes to structures not seen during training.Comment: Camera ready version for CVPR 2017. Supplementary material included. Project page: http://lmb.informatik.uni-freiburg.de/people/ummenhof/depthmotionnet

    Partial least squares for word confidence estimation in machine translation

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-38628-2_59We present a new technique to estimate the reliability of the words in automatically generated translations. Our approach addresses confidence estimation as a classification problem where a confidence score is to be predicted from a feature vector that represents each translated word. We describe a new set of prediction features designed to capture context information, and propose a model based on partial least squares to perform the classification. Good empirical results are reported in a large-domain news translation task.Work supported by the European Union Seventh Framework Program (FP7/2007-2013) under the CasMaCat project (grants agreement no 287576), by Spanish MICINN under TIASA (TIN2009-14205-C04-02) project, and by the Generalitat Valenciana under grant ALMPR (Prometeo/2009/014).González Rubio, J.; Navarro Cerdán, JR.; Casacuberta Nolla, F. (2013). Partial least squares for word confidence estimation in machine translation. En Pattern Recognition and Image Analysis. Springer Verlag (Germany). 500-508. https://doi.org/10.1007/978-3-642-38628-2_59S500508NIST: National Institute of Standards and Technology MT evaluation official results (November 2006), http://www.itl.nist.gov/iad/mig/tests/mt/Ueffing, N., Macherey, K., Ney, H.: Confidence measures for statistical machine translation. In: Proc. of the MT Summit, pp. 394–401. Springer (2003)Sanchis, A., Juan, A., Vidal, E.: Estimation of confidence measures for machine translation. In: Proc. of the Machine Translation Summit, pp. 407–412 (2007)Wold, H.: Estimation of Principal Components and Related Models by Iterative Least squares, pp. 391–420. Academic Press, New York (1966)Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)Brown, P., Della Pietra, V., Della Pietra, S., Mercer, R.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19, 263–311 (1993)Mevik, B.H., Wehrens, R., Liland, K.H.: pls: Partial Least Squares and Principal Component regression. R package version 2.3-0 (2011)Callison-Burch, C., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2012 workshop on statistical machine translation. In: Proc. of the Workshop on Statistical Machine Translation, Montréal, Canada, pp. 10–51 (June 2012)Chinchor, N.: The statistical significance of the muc-4 results. In: Proceedings of the Conference on Message Understanding, pp. 30–50 (1992
    corecore