608 research outputs found

    On the use of word embedding for cross language plagiarism detection

    Full text link
    [EN] Cross language plagiarism is the unacknowledged reuse of text across language pairs. It occurs if a passage of text is translated from source language to target language and no proper citation is provided. Although various methods have been developed for detection of cross language plagiarism, less attention has been paid to measure and compare their performance, especially when tackling with different types of paraphrasing through translation. In this paper, we investigate various approaches to cross language plagiarism detection. Moreover, we present a novel approach to cross language plagiarism detection using word embedding methods and explore its performance against other state-of-the-art plagiarism detection algorithms. In order to evaluate the methods, we have constructed an English-Persian bilingual plagiarism detection corpus (referred to as HAMTA-CL) comprised of seven types of obfuscation. The results show that the word embedding approach outperforms the other approaches with respect to recall when encountering heavily paraphrased passages. On the other hand, translation based approach performs well when the precision is the main consideration of the cross language plagiarism detection system.Asghari, H.; Fatemi, O.; Mohtaj, S.; Faili, H.; Rosso, P. (2019). On the use of word embedding for cross language plagiarism detection. Intelligent Data Analysis. 23(3):661-680. https://doi.org/10.3233/IDA-183985S661680233H. Asghari, K. Khoshnava, O. Fatemi and H. Faili, Developing bilingual plagiarism detection corpus using sentence aligned parallel corpus: Notebook for {PAN} at {CLEF} 2015, In L. Cappellato, N. Ferro, G.J.F. Jones and E. SanJuan, editors, Working Notes of {CLEF} 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, volume 1391 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2015.A. Barrón-Cede no, M. Potthast, P. Rosso and B. Stein, Corpus and evaluation measures for automatic plagiarism detection, In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner and D. Tapias, editors, Proceedings of the International Conference on Language Resources and Evaluation, {LREC} 2010, 17–23 May 2010, Valletta, Malta. European Language Resources Association, 2010.A. Barrón-Cede no, P. Rosso, D. Pinto and A. Juan, On cross-lingual plagiarism analysis using a statistical model, In B. Stein, E. Stamatatos and M. Koppel, editors, Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2008.Farghaly, A., & Shaalan, K. (2009). Arabic Natural Language Processing. ACM Transactions on Asian Language Information Processing, 8(4), 1-22. doi:10.1145/1644879.1644881J. Ferrero, F. Agnès, L. Besacier and D. Schwab, A multilingual, multi-style and multi-granularity dataset for cross-language textual similarity detection, In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk and S. Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation {LREC} 2016, Portorož, Slovenia, May 23–28, 2016, European Language Resources Association {(ELRA)}, 2016.Franco-Salvador, M., Gupta, P., Rosso, P., & Banchs, R. E. (2016). Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowledge-Based Systems, 111, 87-99. doi:10.1016/j.knosys.2016.08.004Franco-Salvador, M., Rosso, P., & Montes-y-Gómez, M. (2016). A systematic study of knowledge graph analysis for cross-language plagiarism detection. Information Processing & Management, 52(4), 550-570. doi:10.1016/j.ipm.2015.12.004C.K. Kent and N. Salim, Web based cross language plagiarism detection, CoRR, abs/0912.3, 2009.McNamee, P., & Mayfield, J. (2004). Character N-Gram Tokenization for European Language Text Retrieval. Information Retrieval, 7(1/2), 73-97. doi:10.1023/b:inrt.0000009441.78971.beT. Mikolov, K. Chen, G. Corrado and J. Dean, Efficient estimation of word representations in vector space, CoRR, abs/1301.3, 2013.S. Mohtaj, B. Roshanfekr, A. Zafarian and H. Asghari, Parsivar: A language processing toolkit for persian, In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis and T. Tokunaga, editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018, European Language Resources Association ELRA, 2018.R.M.A. Nawab, M. Stevenson and P.D. Clough, University of Sheffield – Lab Report for {PAN} at {CLEF} 2010, In M. Braschler, D. Harman and E. Pianta, editors, {CLEF} 2010 LABs and Workshops, Notebook Papers, 22–23 September 2010, Padua, Italy, volume 1176 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2010.G. Oberreuter, G. L’Huillier, S.A. Rios and J.D. Velásquez, Approaches for intrinsic and external plagiarism detection – Notebook for {PAN} at {CLEF} 2011, In V. Petras, P. Forner and P.D. Clough, editors, {CLEF} 2011 Labs and Workshop, Notebook Papers, 19–22 September 2011, Amsterdam, The Netherlands, volume 1177 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2011.Pinto, D., Civera, J., Barrón-Cedeño, A., Juan, A., & Rosso, P. (2009). A statistical approach to crosslingual natural language tasks. Journal of Algorithms, 64(1), 51-60. doi:10.1016/j.jalgor.2009.02.005M. Potthast, A. Barrón-Cede no, A. Eiselt, B. Stein and P. Rosso, Overview of the 2nd international competition on plagiarism detection, In M. Braschler, D. Harman and E. Pianta, editors, {CLEF} 2010 LABs and Workshops, Notebook Papers, 22–23 September 2010, Padua, Italy, volume 1176 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2010.Potthast, M., Barrón-Cedeño, A., Stein, B., & Rosso, P. (2010). Cross-language plagiarism detection. Language Resources and Evaluation, 45(1), 45-62. doi:10.1007/s10579-009-9114-zM. Potthast, A. Eiselt, A. Barrón-Cede no, B. Stein and P. Rosso, Overview of the 3rd international competition on plagiarism detection, In V. Petras, P. Forner and P.D. Clough, editors, {CLEF} 2011 Labs and Workshop, Notebook Papers, 19–22 September 2011, Amsterdam, The Netherlands, volume 1177 of {CEUR} Workshop Proceedings. CEUR-WS.org, 2011.M. Potthast, S. Goering, P. Rosso and B. Stein, Towards data submissions for shared tasks: First experiences for the task of text alignment, In L. Cappellato, N. Ferro, G.J.F. Jones and E. SanJuan, editors, Working Notes of {CLEF} 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, volume 1391 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2015.Potthast, M., Stein, B., & Anderka, M. (s. f.). A Wikipedia-Based Multilingual Retrieval Model. Advances in Information Retrieval, 522-530. doi:10.1007/978-3-540-78646-7_51B. Pouliquen, R. Steinberger and C. Ignat, Automatic identification of document translations in large multilingual document collections, CoRR, abs/cs/060, 2006.B. Stein, E. Stamatatos and M. Koppel, Proceedings of the ECAI’08 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, Patras, Greece, July 22, 2008, volume 377 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2008.J. Wieting, M. Bansal, K. Gimpel and K. Livescu, Towards universal paraphrastic sentence embeddings, CoRR, abs/1511.0, 2015.V. Zarrabi, J. Rafiei, K. Khoshnava, H. Asghari and S. Mohtaj, Evaluation of text reuse corpora for text alignment task of plagiarism detection, In L. Cappellato, N. Ferro, G.J.F. Jones and E. SanJuan, editors, Working Notes of {CLEF} 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015, volume 1391 of {CEUR} Workshop Proceedings, CEUR-WS.org, 2015.Barrón-Cedeño, A., Gupta, P., & Rosso, P. (2013). Methods for cross-language plagiarism detection. Knowledge-Based Systems, 50, 211-217. doi:10.1016/j.knosys.2013.06.01

    Findings from a literature review

    Get PDF
    Mentzingen, H., António, N., & Bação, F. (2023). Automation of legal precedents retrieval: Findings from a literature review. International Journal of Intelligent Systems, 2023, 1-22. [6660983]. https://doi.org/10.21203/rs.3.rs-2292464/v1, https://doi.org/10.21203/rs.3.rs-2292464/v2, https://doi.org/10.1155/2023/6660983---This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project-UIDB/04152/2020-Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.Judges frequently rely their reasoning on precedents. Courts must preserve uniformity in decisions while, depending on the legal system, previous cases compel rulings. The search for methods to accurately identify similar previous cases is not new and has been a vital input, for example, to case-based reasoning (CBR) methodologies. This literature review offers a comprehensive analysis of the advancements in automating the identification of legal precedents, primarily focusing on the paradigm shift from manual knowledge engineering to the incorporation of Artificial Intelligence (AI) technologies such as natural language processing (NLP) and machine learning (ML). While multiple approaches harnessing NLP and ML show promise, none has emerged as definitively superior, and further validation through statistically significant samples and expert-provided ground truth is imperative. Additionally, this review employs text-mining techniques to streamline the survey process, providing an accurate and holistic view of the current research landscape. By delineating extant research gaps and suggesting avenues for future exploration, this review serves as both a summation and a call for more targeted, empirical investigations.publishersversionpublishe

    Cross-language Information Retrieval

    Full text link
    Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved documents will suffice because the searcher will be able to recognize those which they wished to find. When the documents to be searched are in a language not known by the searcher, neither assumption is true. In such cases, Cross-Language Information Retrieval (CLIR) is needed. This chapter reviews the state of the art for CLIR and outlines some open research questions.Comment: 49 pages, 0 figure

    The Wild East: Criminal Political Economies in South Asia

    Get PDF
    The Wild East bridges political economy and anthropology to examine a variety of il/legal economic sectors and businesses such as red sanders, coal, fire, oil, sand, air spectrum, land, water, real estate, procurement and industrial labour. The 11 case studies, based across India, Pakistan and Bangladesh, explore how state regulative law is often ignored and/or selectively manipulated. The emerging collective narrative shows the workings of regulated criminal economic systems where criminal formations, politicians, police, judges and bureaucrats are deeply intertwined. By pioneering the field-study of the politicisation of economic crime, and disrupting the wider literature on South Asia’s informal economy, The Wild East aims to influence future research agendas through its case for the study of mafia-enterprises and their engagement with governance in South Asia and outside. Its empirical and theoretical contribution to debates about economic crimes in democratic regimes will be of critical value to researchers in Economics, Anthropology, Sociology, Comparative Politics, Political Science and International Relations, Criminologists and Development Studies, as well as to those inside and outside academia interested in current affairs and the relationship between crime, politics and mafia enterprises

    The Wild East

    Get PDF
    The Wild East bridges political economy and anthropology to examine a variety of il/legal economic sectors and businesses such as red sanders, coal, fire, oil, sand, air spectrum, land, water, real estate, procurement and industrial labour. The 11 case studies, based across India, Pakistan and Bangladesh, explore how state regulative law is often ignored and/or selectively manipulated. The emerging collective narrative shows the workings of regulated criminal economic systems where criminal formations, politicians, police, judges and bureaucrats are deeply intertwined. By pioneering the field-study of the politicisation of economic crime, and disrupting the wider literature on South Asia’s informal economy, The Wild East aims to influence future research agendas through its case for the study of mafia-enterprises and their engagement with governance in South Asia and outside. Its empirical and theoretical contribution to debates about economic crimes in democratic regimes will be of critical value to researchers in Economics, Anthropology, Sociology, Comparative Politics, Political Science and International Relations, Criminologists and Development Studies, as well as to those inside and outside academia interested in current affairs and the relationship between crime, politics and mafia enterprises

    NIAS Annual Report 2017-2018

    Get PDF

    Where there are no Footprints: An Ethnography of Contemporary Art in Kolkata

    Get PDF
    This dissertation is an ethnographic exploration of contemporary art in Kolkata. It closely follows several individual artists and art groups who endeavour to defy established conventions and create new works of art. Yet, confronted with a new work of art, a doubt sets in. Wasn’t this made before? Can we be sure that it’s not a copy? The doubt of novelty and originality has always troubled the visual arts, and was particularly pronounced in (post)colonial Calcutta, where works were condemned by some as belated repetitions of European modern art. Convictions of belatedness have been successfully debunked, but with the emergence of ‘contemporary’ art the doubt of novelty emerges yet again. To understand the predicament of artistic novelty this dissertation unfolds a cyclical ritual theory of contemporary art that includes the practices surrounding the artwork, while not forgetting the artwork itself – an analysis that involves the entire set of practices implicated in the production, exhibition, circulation, and preservation of art. This theory is subsequently applied to the field of contemporary art in Kolkata. The ethnographic chapters focus on the tension between artistic attempts to make new works of art and the various limitations that artists encounter; artists are not just caught up in artistic conventions, but are simultaneously impeded by limited economic means and trapped in a peripheral position where they always seem to lag behind, caught up in a city that doesn’t seem to live up to its own history. Yet, by making ambiguous works that resonate with the city in various ways, artists defy conventions, bypass limitations, and offer moments in which the world can be experienced anew

    Institutional arrangements for resource recovery and reuse in the wastewater sector

    Get PDF
    As populations grow and urban centres expand, meeting water demand and wastewater management requirements will become increasingly difficult. Goal 6 of the Sustainable Development Goals is to: ‘Ensure availability and sustainable management of water and sanitation for all’. Part of the approach to achieving this will be reusing wastewater and will require a greater understanding of the institutional arrangements that support or obstruct reuse. This research was designed to achieve this and aimed to develop a set of factors that investors could use to assess the institutional feasibility of reuse in a given setting. The methodology combined a case study approach, focusing on wastewater systems in Bangalore, India and Hanoi, Vietnam, with triangle analysis to assess: the content of policies and laws; the structures (formal and informal) to implement laws and reuse projects; and the culture around acceptance and engagement in reuse. The reuse practices observed in Bangalore were treatment and use within apartments, centralized treatment and sale to industries, use in agriculture after natural attenuation, groundwater recharge and lake regeneration. In Hanoi the only reuse was indirect use from rivers feeding fish ponds and fields, although formal treatment and use is planned. Critically, both cities have environmental and water resources policies and laws that advocate reuse, as well as related local legislation. However, support for reuse is not reciprocated in industrial, agricultural or fisheries law, the result being that reuse does not always take place as planned. Legislation is required along the whole sanitation chain to the point of wastewater use. Structures to implement reuse are also vital. In Bangalore the water board has initiated reuse projects and established the New Initiatives Division but resources are a limiting factor. Effective institutions include expertise, manpower and financing mechanisms, which are lacking in both cities. The environment agency is also engaged in reuse though legislation on recycling in residential and commercial complexes but guidance for users is inadequate, expectations are perceived to be excessive and monitoring is almost impossible. The driver for reuse is increasingly the benefits observed by users. In the case of apartments this is a reliable water source and reduced costs of water supply. As a result, a private sector in wastewater treatment is becoming established. The active civil society and strong, independent media are instrumental in providing information to potential users and holding authorities to account in Bangalore. Their absence in Hanoi is notable. In summary, institutional elements to be considered are: supportive legislation across all sectors; details of acceptable reuse, deterrents and inducements; budget allocation; structures to enable reuse; strong civil society, NGOs, courts, media and universities providing evidence of suitability and safety; donors and finance mechanisms; and stakeholders willing to use the products. Encumbrances are inconsistent or uncoordinated legislation, lack of cooperation and insufficient benefit sharing or perceptions of benefits along the reuse chain

    A Global Perspective on Addressing Inclusion through the SDGs

    Get PDF
    The future of our world over the next decade is being shaped by the Sustainable Development Goals (SDGs) that seek to uphold children’s wellbeing and, by their call to leave no one behind and to reach the furthest behind first, shine a spotlight on the world’s most vulnerable populations including children and adolescents living in poverty and exclusion. The transformative steps promised in the SDGs to ‘shift the world onto a sustainable and resilient path’ assumes greater significance in the post-COVID-19 world where structural exclusions are starkly exposed and deep societal inequalities thickly underlined. This volume seeks to address the main drivers of poverty, exclusion, urbanization, and violence against children and adolescents and investigates how knowledge, information, data collection, measurement, and monitoring can support strategies and innovations to effectively implement the SDGs by drawing on data and experience from several countries across the world including Bangladesh, Colombia, Côte d’Ivoire, Ethiopia, Ghana, Guatemala, India, Indonesia, Iraq, Kenya, Malawi, MENA countries, the Netherlands, Pakistan, Sierra Leone, Suriname, and Thailand. As a result, it contributes to revealing the politics of social inclusion, offering policy proposals towards overcoming inequality and exclusion among children and adolescents.publishedVersio
    corecore