1,128,898 research outputs found

    On the use of word embedding for cross language plagiarism detection

    Full text link
    [EN] Cross language plagiarism is the unacknowledged reuse of text across language pairs. It occurs if a passage of text is translated from source language to target language and no proper citation is provided. Although various methods have been developed for detection of cross language plagiarism, less attention has been paid to measure and compare their performance, especially when tackling with different types of paraphrasing through translation. In this paper, we investigate various approaches to cross language plagiarism detection. Moreover, we present a novel approach to cross language plagiarism detection using word embedding methods and explore its performance against other state-of-the-art plagiarism detection algorithms. In order to evaluate the methods, we have constructed an English-Persian bilingual plagiarism detection corpus (referred to as HAMTA-CL) comprised of seven types of obfuscation. The results show that the word embedding approach outperforms the other approaches with respect to recall when encountering heavily paraphrased passages. On the other hand, translation based approach performs well when the precision is the main consideration of the cross language plagiarism detection system.Asghari, H.; Fatemi, O.; Mohtaj, S.; Faili, H.; Rosso, P. (2019). On the use of word embedding for cross language plagiarism detection. Intelligent Data Analysis. 23(3):661-680. https://doi.org/10.3233/IDA-183985S661680233H. Asghari, K. Khoshnava, O. Fatemi and H. Faili, Developing bilingual plagiarism detection corpus using sentence aligned parallel corpus: Notebook for {PAN} at {CLEF} 2015, In L. Cappellato, N. Ferro, G.J.F. Jones and E. SanJuan, editors, Working Notes of {CLEF} 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, volume 1391 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2015.A. Barrón-Cede no, M. Potthast, P. Rosso and B. Stein, Corpus and evaluation measures for automatic plagiarism detection, In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner and D. Tapias, editors, Proceedings of the International Conference on Language Resources and Evaluation, {LREC} 2010, 17–23 May 2010, Valletta, Malta. European Language Resources Association, 2010.A. Barrón-Cede no, P. Rosso, D. Pinto and A. Juan, On cross-lingual plagiarism analysis using a statistical model, In B. Stein, E. Stamatatos and M. Koppel, editors, Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2008.Farghaly, A., & Shaalan, K. (2009). Arabic Natural Language Processing. ACM Transactions on Asian Language Information Processing, 8(4), 1-22. doi:10.1145/1644879.1644881J. Ferrero, F. Agnès, L. Besacier and D. Schwab, A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection, In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk and S. Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation {LREC} 2016, Portorož, Slovenia, May 23–28, 2016, European Language Resources Association {(ELRA)}, 2016.Franco-Salvador, M., Gupta, P., Rosso, P., & Banchs, R. E. (2016). Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowledge-Based Systems, 111, 87-99. doi:10.1016/j.knosys.2016.08.004Franco-Salvador, M., Rosso, P., & Montes-y-Gómez, M. (2016). A systematic study of knowledge graph analysis for cross-language plagiarism detection. Information Processing & Management, 52(4), 550-570. doi:10.1016/j.ipm.2015.12.004C.K. Kent and N. Salim, Web based cross language plagiarism detection, CoRR, abs/0912.3, 2009.McNamee, P., & Mayfield, J. (2004). Character N-Gram Tokenization for European Language Text Retrieval. Information Retrieval, 7(1/2), 73-97. doi:10.1023/b:inrt.0000009441.78971.beT. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, CoRR, abs/1301.3, 2013.S. Mohtaj, B. Roshanfekr, A. Zafarian and H. Asghari, Parsivar: A language processing toolkit for persian, In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis and T. Tokunaga, editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018, European Language Resources Association ELRA, 2018.R.M.A. Nawab, M. Stevenson and P.D. Clough, University of Sheffield – Lab Report for {PAN} at {CLEF} 2010, In M. Braschler, D. Harman and E. Pianta, editors, {CLEF} 2010 LABs and Workshops, Notebook Papers, 22–23 September 2010, Padua, Italy, volume 1176 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2010.G. Oberreuter, G. L’Huillier, S.A. Rios and J.D. Velásquez, Approaches for intrinsic and external plagiarism detection – Notebook for {PAN} at {CLEF} 2011, In V. Petras, P. Forner and P.D. Clough, editors, {CLEF} 2011 Labs and Workshop, Notebook Papers, 19–22 September 2011, Amsterdam, The Netherlands, volume 1177 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2011.Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., & Rosso, P. (2009). A statistical approach to crosslingual natural language tasks. Journal of Algorithms, 64(1), 51-60. doi:10.1016/j.jalgor.2009.02.005M. Potthast, A. Barrón-Cede no, A. Eiselt, B. Stein and P. Rosso, Overview of the 2nd international competition on plagiarism detection, In M. Braschler, D. Harman and E. Pianta, editors, {CLEF} 2010 LABs and Workshops, Notebook Papers, 22–23 September 2010, Padua, Italy, volume 1176 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2010.Potthast, M., Barrón-Cedeño, A., Stein, B., & Rosso, P. (2010). Cross-language plagiarism detection. Language Resources and Evaluation, 45(1), 45-62. doi:10.1007/s10579-009-9114-zM. Potthast, A. Eiselt, A. Barrón-Cede no, B. Stein and P. Rosso, Overview of the 3rd international competition on plagiarism detection, In V. Petras, P. Forner and P.D. Clough, editors, {CLEF} 2011 Labs and Workshop, Notebook Papers, 19–22 September 2011, Amsterdam, The Netherlands, volume 1177 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2011.M. Potthast, S. Goering, P. Rosso and B. Stein, Towards data submissions for shared tasks: First experiences for the task of text alignment, In L. Cappellato, N. Ferro, G.J.F. Jones and E. SanJuan, editors, Working Notes of {CLEF} 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, volume 1391 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2015.Potthast, M., Stein, B., & Anderka, M. (s. f.). A Wikipedia-Based Multilingual Retrieval Model. Advances in Information Retrieval, 522-530. doi:10.1007/978-3-540-78646-7_51B. Pouliquen, R. Steinberger and C. Ignat, Automatic identification of document translations in large multilingual document collections, CoRR, abs/cs/060, 2006.B. Stein, E. Stamatatos and M. Koppel, Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2008.J. Wieting, M. Bansal, K. Gimpel and K. Livescu, Towards universal paraphrastic sentence embeddings, CoRR, abs/1511.0, 2015.V. Zarrabi, J. Rafiei, K. Khoshnava, H. Asghari and S. Mohtaj, Evaluation of text reuse corpora for text alignment task of plagiarism detection, In L. Cappellato, N. Ferro, G.J.F. Jones and E. SanJuan, editors, Working Notes of {CLEF} 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, volume 1391 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2015.Barrón-Cedeño, A., Gupta, P., & Rosso, P. (2013). Methods for cross-language plagiarism detection. Knowledge-Based Systems, 50, 211-217. doi:10.1016/j.knosys.2013.06.01

    Summarization of Spanish Talk Shows with Siamese Hierarchical Attention Networks

    Full text link
    [EN] In this paper, we present an approach to Spanish talk shows summarization. Our approach is based on the use of Siamese Neural Networks on the transcription of the show audios. Specifically, we propose to use Hierarchical Attention Networks to select the most relevant sentences for each speaker about a given topic in the show, in order to summarize his opinion about the topic. We train these networks in a siamese way to determine whether a summary is appropriate or not. Previous evaluation of this approach on summarization task of English newspapers achieved performances similar to other state-of-the-art systems. In the absence of enough transcribed or recognized speech data to train our system for talk show summarization in Spanish, we acquire a large corpus of document-summary pairs from Spanish newspapers and we use it to train our system. We choose this newspapers domain due to its high similarity with the topics addressed in talk shows. A preliminary evaluation of our summarization system on Spanish TV programs shows the adequacy of the proposal.This work has been partially supported by the Spanish MINECO and FEDER founds under project AMIC (TIN2017-85854-C4-2-R). Work of Jose-Angel Gonzalez is financed by Universitat Politecnica de Valencia under grant PAID-01-17.González-Barba, JÁ.; Hurtado Oliver, LF.; Segarra Soriano, E.; García-Granada, F.; Sanchís Arnal, E. (2019). Summarization of Spanish Talk Shows with Siamese Hierarchical Attention Networks. Applied Sciences. 9(18):1-13. https://doi.org/10.3390/app9183836S113918Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’98. doi:10.1145/290941.291025Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Journal of Artificial Intelligence Research, 22, 457-479. doi:10.1613/jair.1523Lloret, E., & Palomar, M. (2011). Text summarisation in progress: a literature review. Artificial Intelligence Review, 37(1), 1-41. doi:10.1007/s10462-011-9216-zSee, A., Liu, P. J., & Manning, C. D. (2017). Get To The Point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). doi:10.18653/v1/p17-1099Narayan, S., Cohen, S. B., & Lapata, M. (2018). Ranking Sentences for Extractive Summarization with Reinforcement Learning. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). doi:10.18653/v1/n18-1158González, J.-Á., Segarra, E., García-Granada, F., Sanchis, E., & Hurtado, L.-F. (2019). Siamese hierarchical attention networks for extractive summarization. Journal of Intelligent & Fuzzy Systems, 36(5), 4599-4607. doi:10.3233/jifs-179011Furui, S., Kikuchi, T., Shinnaka, Y., & Hori, C. (2004). Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech. IEEE Transactions on Speech and Audio Processing, 12(4), 401-408. doi:10.1109/tsa.2004.828699Shih-Hung Liu, Kuan-Yu Chen, Chen, B., Hsin-Min Wang, Hsu-Chun Yen, & Wen-Lian Hsu. (2015). Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(6), 957-969. doi:10.1109/taslp.2015.2414820Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical Attention Networks for Document Classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. doi:10.18653/v1/n16-1174Conneau, A., Kiela, D., Schwenk, H., Barrault, L., & Bordes, A. (2017). Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. doi:10.18653/v1/d17-1070Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407. doi:10.1002/(sici)1097-4571(199009)41:63.0.co;2-

    Attachment and children's biased attentional processing: evidence for the exclusion of attachment-related information

    Get PDF
    Research in both infants and adults demonstrated that attachment expectations are associated with the attentional processing of attachment-related information. However, this research suffered from methodological issues and has not been validated across ages. Employing a more ecologically valid paradigm to measure attentional processes by virtue of eye tracking, the current study tested the defensive exclusion hypothesis in late childhood. According to this hypothesis, insecurely attached children are assumed to defensively exclude attachment-related information. We hypothesized that securely attached children process attachment- related neutral and emotional information in a more open manner compared to insecurely attached children. Sixty-two children (59.7% girls, 8–12 years) completed two different tasks, while eye movements were recorded: task one presented an array of neutral faces including mother and unfamiliar women and task two presented the same with happy and angry faces. Results indicated that more securely attached children looked longer at mother’s face regardless of the emotional expression. Also, they tend to have more maintained attention to mother’s neutral face. Furthermore, more attachment avoidance was related to a reduced total viewing time of mother’s neutral, happy, and angry face. Attachment anxiety was not consistently related to the processing of mother’s face. Findings support the theoretical assumption that securely attached children have an open manner of processing all attachment-related information

    Mechanisms of Cognitive Impairment in Cerebral Small Vessel Disease: Multimodal MRI Results from the St George's Cognition and Neuroimaging in Stroke (SCANS) Study.

    Get PDF
    Cerebral small vessel disease (SVD) is a common cause of vascular cognitive impairment. A number of disease features can be assessed on MRI including lacunar infarcts, T2 lesion volume, brain atrophy, and cerebral microbleeds. In addition, diffusion tensor imaging (DTI) is sensitive to disruption of white matter ultrastructure, and recently it has been suggested that additional information on the pattern of damage may be obtained from axial diffusivity, a proposed marker of axonal damage, and radial diffusivity, an indicator of demyelination. We determined the contribution of these whole brain MRI markers to cognitive impairment in SVD. Consecutive patients with lacunar stroke and confluent leukoaraiosis were recruited into the ongoing SCANS study of cognitive impairment in SVD (n = 115), and underwent neuropsychological assessment and multimodal MRI. SVD subjects displayed poor performance on tests of executive function and processing speed. In the SVD group brain volume was lower, white matter hyperintensity volume higher and all diffusion characteristics differed significantly from control subjects (n = 50). On multi-predictor analysis independent predictors of executive function in SVD were lacunar infarct count and diffusivity of normal appearing white matter on DTI. Independent predictors of processing speed were lacunar infarct count and brain atrophy. Radial diffusivity was a stronger DTI predictor than axial diffusivity, suggesting ischaemic demyelination, seen neuropathologically in SVD, may be an important predictor of cognitive impairment in SVD. Our study provides information on the mechanism of cognitive impairment in SVD

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
    corecore