47 research outputs found

    Assessing the Accuracy of Discourse Connective Translations: Validation of an Automatic Metric

    Get PDF
    Automatic metrics for the evaluation of machine translation (MT) compute scores that characterize globally certain aspects of MT quality such as adequacy and fluency. This paper introduces a reference-based metric that is focused on a particular class of function words, namely discourse connectives, of particular importance for text structuring, and rather challenging for MT. To measure the accuracy of connective translation (ACT), the metric relies on automatic word-level alignment between a source sentence and respectively the reference and candidate translations, along with other heuristics for comparing translations of discourse connectives. Using a dictionary of equivalents, the translations are scored automatically, or, for better precision, semi-automatically. The precision of the ACT metric is assessed by human judges on sample data for English/French and English/Arabic translations: the ACT scores are on average within 2% of human scores. The ACT metric is then applied to several commercial and research MT systems, providing an assessment of their performance on discourse connectives

    Bootstrapping a Portuguese WordNet from Galician, Spanish and English wordnets

    Get PDF
    Series: Lecture notes in computer science, ISSN 0302-9743, vol. 8854In this article we exploit the possibility on bootstrapping an European Portuguese WordNet from the English, Spanish and Galician wordnets using Probabilistic Translation Dictionaries automatically created from parallel corpora. The process generated a total of 56~770 synsets and 97~058 variants. An evaluation of the results using the Brazilian OpenWordNet-PT as a gold standard resulted on a precision varying from 53\% to 75\% percent, depending on the cut-line. The results were satisfying and comparable to similar experiments using the WN-Toolkit.PEst-OE/EEI/UI0752/2014, TIN2012-38584-C06-01, TIN2012-38584-C06-0

    Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing

    Full text link
    Cross-language plagiarism detection attempts to identify and extract automatically plagiarism among documents in different languages. Plagiarized fragments can be translated verbatim copies or may alter their structure to hide the copying, which is known as paraphrasing and is more difficult to detect. In order to improve the paraphrasing detection, we use a knowledge graph-based approach to obtain and compare context models of document fragments in different languages. Experimental results in German-English and Spanish-English cross-language plagiarism detection indicate that our knowledge graph-based approach offers a better performance compared to other state-of-the-art models.The research has been carried out in the framework of the European Commission WIQ-EIIRSES (no. 269180) and DIANA-APPLICATIONS - Finding Hidden Knowledge in Texts:Applications (TIN2012-38603-C02-01) projects as well as the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Franco-Salvador, M.; Gupta, P.; Rosso, P. (2013). Knowledge Graphs as Context Models: Improving the Detection of Cross-Language Plagiarism with Paraphrasing. En Bridging Between Information Retrieval and Databases: PROMISE Winter School 2013, Bressanone, Italy, February 4-8, 2013. Revised Tutorial Lectures. Springer Verlag (Germany). 227-236. https://doi.org/10.1007/978-3-642-54798-0_12S227236Barrón-Cedeño, A., Vila, M., Martí, M., Rosso, P.: Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Computational Linguistics 39(4) (2013)Barrón-Cedeño, A.: On the mono- and cross-language detection of text re-use and plagiarism. Ph.D. thesis, Universitat Politènica de València (2012)Barrón-Cedeño, A., Rosso, P., Pinto, D., Juan, A.: On cross-lingual plagiarism analysis using a statistical model. In: Proc. of the ECAI 2008 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 (2008)Franco-Salvador, M., Gupta, P., Rosso, P.: Cross-language plagiarism detection using BabelNet’s statistical dictionary. Computación y Sistemas, Revista Iberoamericana de Computación 16(4), 383–390 (2012)Franco-Salvador, M., Gupta, P., Rosso, P.: Cross-language plagiarism detection using a multilingual semantic network. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 710–713. Springer, Heidelberg (2013)Franco-Salvador, M., Gupta, P., Rosso, P.: Graph-based similarity analysis: a new approach to cross-language plagiarism detection. Journal of the Spanish Society of Natural Language Processing (Sociedad Espaola de Procesamiento del Languaje Natural) (50) (2013)Montes-y-Gómez, M., Gelbukh, A., López-López, A., Baeza-Yates, R.: Flexible comparison of conceptual graphs. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 102–111. Springer, Heidelberg (2001)Gupta, P., Barrón-Cedeño, A., Rosso, P.: Cross-language high similarity search using a conceptual thesaurus. In: Catarci, T., Forner, P., Hiemstra, D., Peñas, A., Santucci, G. (eds.) CLEF 2012. LNCS, vol. 7488, pp. 67–75. Springer, Heidelberg (2012)Mcnamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Information Retrieval 7(1), 73–97 (2004)Miller, G.A., Leacock, C., Tengi, R., Bunker, R.T.: A semantic concordance. In: Proceedings of the Workshop on Human Language Technology, HLT 1993, pp. 303–308. Association for Computational Linguistics, Stroudsburg (1993)Navigli, R., Ponzetto, S.P.: BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250 (2012)Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: An evaluation framework for plagiarism detection. In: Proc. of the 23rd Int. Conf. on Computational Linguistics, COLING 2010, Beijing, China, pp. 997–1005 (2010)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Language Resources and Evaluation, Special Issue on Plagiarism and Authorship Analysis 45(1), 45–62 (2011)Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd int. competition on plagiarism detection. In: CLEF (Notebook Papers/Labs/Workshop) (2011)Potthast, M., Gollub, T., Hagen, M., Kiesel, J., Michel, M., Oberländer, A., Tippmann, M., Barrón-Cedeño, A., Gupta, P., Rosso, P., et al.: Overview of the 4th international competition on plagiarism detection. In: CLEF (Online Working Notes/Labs/Workshop) (2012)Pouliquen, B., Steinberger, R., Ignat, C.: Automatic linking of similar texts across languages. In: Proc. Recent Advances in Natural Language Processing III, RANLP 2003, pp. 307–316 (2003)Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proc. Int. Conf. on New Methods in Language Processing (1994)Stein, B., zu Eissen, S.M., Potthast, M.: Strategies for retrieving plagiarized documents. In: Proc. of the 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 825–826. ACM (2007)Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The jrc-acquis: A multilingual aligned parallel corpus with +20 languages. In: Proc. 5th Int. Conf. on Language Resources and Evaluation, LREC 2006 (2006)Vossen, P.: Eurowordnet: A multilingual database of autonomous and language-specific wordnets connected via an inter-lingual index. Proc. Int. Journal of Lexicography 17 (2004

    Read My Lips: Continuous Signer Independent Weakly Supervised Viseme Recognition

    Full text link
    Abstract. This work presents a framework to recognise signer indepen-dent mouthings in continuous sign language, with no manual annotations needed. Mouthings represent lip-movements that correspond to pronun-ciations of words or parts of them during signing. Research on sign lan-guage recognition has focused extensively on the hands as features. But sign language is multi-modal and a full understanding particularly with respect to its lexical variety, language idioms and grammatical structures is not possible without further exploring the remaining information chan-nels. To our knowledge no previous work has explored dedicated viseme recognition in the context of sign language recognition. The approach is trained on over 180.000 unlabelled frames and reaches 47.1 % precision on the frame level. Generalisation across individuals and the influence of context-dependent visemes are analysed

    Effective Hypotheses Re-ranking Model in Statistical Machine Translation

    No full text

    Using alignment templates to infer shallow-transfer machine translation rules

    Get PDF
    When building rule-based machine translation systems, a considerable human effort is needed to code the transfer rules that are able to translate source-language sentences into grammatically correct target-language sentences. In this paper we describe how to adapt the alignment templates used in statistical machine translation to the rule-based machine translation framework. The alignment templates are converted into structural transfer rules that are used by a shallow-transfer machine translation engine to produce grammatically correct translations. As the experimental results show there is a considerable improvement in the translation quality as compared to word-for-word translation (when no transfer rules are used), and the translation quality is close to that achieved when hand-coded transfer rules are used. The method presented is entirely unsupervised, and needs only a parallel corpus, two morphological analysers, and two part-of-speech taggers, such as those used by the machine translation system in which the inferred transfer rules are integrated.Work funded by the Spanish Comisión Interministerial de Ciencia y Tecnología through project TIC2003-08681-C02-01 and by the Spanish Ministerio de Educación y Ciencia and the European Social Fund through grant BES-2004-4711

    German Compounds in Factored Statistical Machine Translation

    No full text

    Advances in Czech – Signed Speech Translation

    No full text

    A Novel Rule Refinement Method for SMT through Simulated Post-Editing

    No full text

    Validity of an Automatic Evaluation of Machine Translation Using a Word-Alignment-Based Classifier

    No full text
    corecore