Cross-language plagiarism detection using multilingual semantic network

Abstract

The final publication is available at Springer via http://10.1007/978-3-642-36973-5_66Cross-language plagiarism refers to the type of plagiarism where the source and suspicious documents are in different languages. Plagiarism detection across languages is still in its infancy state. In this article, we propose a new graph-based approach that uses a multilingual semantic network to compare document paragraphs in different languages. In order to investigate the proposed approach, we used the German-English and Spanish-English cross-language plagiarism cases of the PAN-PC¿11 corpus. We compare the obtained results with two state-of-the-art models. Experimental results indicate that our graph-based approach is a good alternative for cross-language plagiarism detectionWe thank the Conselleria d′educació, Formació i Ocupació of the Generalitat Valenciana for funding the work of the first author with the Gerónimo Forteza program. The research has been carried out in the framework of the European Commission WIQ-EI IRSES project (no. 269180) and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Franco Salvador, M.; Gupta, PA.; Rosso ., P. (2013). Cross-language plagiarism detection using multilingual semantic network. En Advances in Information Retrieval. Springer Verlag (Germany). 7814:710-713. https://doi.org/10.1007/978-3-642-36973-5_66S7107137814Barrón-Cedeño, A.: On the mono- and cross-language detection of text re-use and plagiarism. Ph.D. thesis, Universitat Politènica de València (2012)Barrón-Cedeño, A., Rosso, P., Pinto, D., Juan, A.: On cross-lingual plagiarism analysis using a statistical model. In: Proceedings of the ECAI 2008 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2008 (2008)Havasi, C.: Conceptnet 3: A flexible, multilingual semantic network for common sense knowledge. In: The 22nd Conference on Artificial Intelligence (2007)Mcnamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Inf. Retr. 7(1-2), 73–97 (2004)Montes-y-Gómez, M., Gelbukh, A., López-López, A., Baeza-Yates, R.: Flexible Comparison of Conceptual GraphsWork done under partial support of CONACyT, CGEPI-IPN, and SNI, Mexico. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 102–111. Springer, Heidelberg (2001)Navigli, R., Ponzetto, S.P.: Babelnet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 216–225 (2010)Potthast, M., Barrón-Cedeño, A., Stein, B., Rosso, P.: Cross-language plagiarism detection. Language Resources and Evaluation, Special Issue on Plagiarism and Authorship Analysis 45(1) (2011)Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: CLEF (Notebook Papers/Labs/Workshop) (2011

    Similar works

    Full text

    thumbnail-image

    Available Versions