Search CORE

18 research outputs found

Revisiting the challenges and surveys in text similarity matching and detection methods

Author: Kusrini Kusrini
Muhammad Alva Hendi
Oyong Irwan
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 30/09/2022
Field of study

The massive amount of information from the internet has revolutionized the field of natural language processing. One of the challenges was estimating the similarity between texts. This has been an open research problem although various studies have proposed new methods over the years. This paper surveyed and traced the primary studies in the field of text similarity. The aim was to give a broad overview of existing issues, applications, and methods of text similarity research. This paper identified four issues and several applications of text similarity matching. It classified current studies based on intrinsic, extrinsic, and hybrid approaches. Then, we identified the methods and classified them into lexical-similarity, syntactic-similarity, semantic-similarity, structural-similarity, and hybrid. Furthermore, this study also analyzed and discussed method improvement, current limitations, and open challenges on this topic for future research directions

Journal of Education and Learning (EduLearn)

Plagiarism detection for Indonesian texts

Author: Krisnawati Lucia Dwi
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 18/05/2016
Field of study

As plagiarism becomes an increasing concern for Indonesian universities and research centers, the need of using automatic plagiarism checker is becoming more real. However, researches on Plagiarism Detection Systems (PDS) in Indonesian documents have not been well developed, since most of them deal with detecting duplicate or near-duplicate documents, have not addressed the problem of retrieving source documents, or show tendency to measure document similarity globally. Therefore, systems resulted from these researches are incapable of referring to exact locations of ``similar passage'' pairs. Besides, there has been no public and standard corpora available to evaluate PDS in Indonesian texts. To address the weaknesses of former researches, this thesis develops a plagiarism detection system which executes various methods of plagiarism detection stages in a workflow system. In retrieval stage, a novel document feature coined as phraseword is introduced and executed along with word unigram and character n-grams to address the problem of retrieving source documents, whose contents are copied partially or obfuscated in a suspicious document. The detection stage, which exploits a two-step paragraph-based comparison, is aimed to address the problems of detecting and locating source-obfuscated passage pairs. The seeds for matching source-obfuscated passage pairs are based on locally-weighted significant terms to capture paraphrased and summarized passages. In addition to this system, an evaluation corpus was created through simulation by human writers, and by algorithmic random generation. Using this corpus, the performance evaluation of the proposed methods was performed in three scenarios. On the first scenario which evaluated source retrieval performance, some methods using phraseword and token features were able to achieve the optimum recall rate 1. On the second scenario which evaluated detection performance, our system was compared to Alvi's algorithm and evaluated in 4 levels of measures: character, passage, document, and cases. The experiment results showed that methods resulted from using token as seeds have higher scores than Alvi's algorithm in all 4 levels of measures both in artificial and simulated plagiarism cases. In case detection, our systems outperform Alvi's algorithm in recognizing copied, shaked, and paraphrased passages. However, Alvi's recognition rate on summarized passage is insignificantly higher than our system. The same tendency of experiment results were demonstrated on the third experiment scenario, only the precision rates of Alvi's algorithm in character and paragraph levels are higher than our system. The higher Plagdet scores produced by some methods in our system than Alvi's scores show that this study has fulfilled its objective in implementing a competitive state-of-the-art algorithm for detecting plagiarism in Indonesian texts. Being run at our test document corpus, Alvi's highest scores of recall, precision, Plagdet, and detection rate on no-plagiarism cases correspond to its scores when it was tested on PAN'14 corpus. Thus, this study has contributed in creating a standard evaluation corpus for assessing PDS for Indonesian documents. Besides, this study contributes in a source retrieval algorithm which introduces phrasewords as document features, and a paragraph-based text alignment algorithm which relies on two different strategies. One of them is to apply local-word weighting used in text summarization field to select seeds for both discriminating paragraph pair candidates and matching process. The proposed detection algorithm results in almost no multiple detection. This contributes to the strength of this algorithm

Plagiarism detection for Indonesian texts

Author: Krisnawati Lucia Dwi
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 18/05/2016
Field of study

Digitale Hochschulschriften der LMU

On retrieving intelligently plagiarized documents using semantic similarity

Author: Hussain Syed Fawad
Suryani Asif
Publication venue: 'Elsevier BV'
Publication date: 01/10/2015
Field of study

University of Birmingham Research Portal

Musical Landscapes: Theophile Gautier and the Evolution of Nineteenth Century French Poetry

Author: Milstein Dana
Publication venue: CUNY Academic Works
Publication date: 03/06/2014
Field of study

Theophile Gautier\u27s first edition of Emaux et camees (1852) marks the juncture at which Romantic, Neoclassical, and nascent Symbolist poetic theories converged under the umbrella ideology of Parnassianism. Emaux et camees synthesizes the aesthetics promoted by these diverse groups, primarily by 1) using musical and painterly language, 2) emphasizing correspondences among arts, and 3) paradoxically demanding an attention to form and the artist\u27s labor while also emphasizing art\u27s inutility during a century characterized by Progress. Gautier\u27s Emaux et camees bridges painterly and musical poetics to create a new model for poetry. While the vocabulary of painting captivated many nineteenth century writers, music became increasingly admired by poets because of its freedom from representation, and as an intention-less language. Musical poets indemnified the mantra art for art\u27s sake and touted the intermingling of art forms, belief systems, and cultural practices during a time when usefulness, authoritarian rule, and homogeny were staunchly reinforced in the political and public spheres. Emaux et camees appeared in 1852, marking a point of departure for poetry. Gautier preserved earlier poetic principles, but also invested a robust work ethic and a devotion to form in his collection. Numerous offshoot poetic groups arose as a result of Gautier, who had reclaimed music\u27s nuanced, fragmented, performative, and anti-utilitarian nature for poetry and poetics

City University of New York

Science in the Forest, Science in the Past

Author
Publication venue
Publication date
Field of study

This collection brings together leading anthropologists, historians, philosophers, and artificial-intelligence researchers to discuss the sciences and mathematics used in various Eastern, Western, and Indigenous societies, both ancient and contemporary. The authors analyze prevailing assumptions about these societies and propose more faithful, sensitive analyses of their ontological views about reality—a step toward mutual understanding and translatability across cultures and research fields. Science in the Forest, Science in the Past is a pioneering interdisciplinary exploration that will challenge the way readers interested in sciences, mathematics, humanities, social research, computer sciences, and education think about deeply held notions of what constitutes reality, how it is apprehended, and how to investigate it

OAPEN Library

El Che vive: memory, cinema,art andpolitics

Author: Maya Neto Olegario da Costa
Publication venue
Publication date: 01/01/2020
Field of study

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Programa de Pós-Graduação em Inglês: Estudos Linguísticos e Literários, Florianópolis, 2020.Che Guevara, morto há mais de cinquenta anos, surpreende por seu persistente ressurgimento através de imagens. Essa fascinação pelas imagens de Che se explica pelo conceito da ansiedade de lembrar e não-lembrar, fruto da demanda de rememoração e redenção ? no sentido Benjaminiano (LÖWY; BENJAMIN, 2005) ?, a qual é expressa através do olhar fantasmagórico de Che. Tal olhar fantasmagórico é ambivalente podendo potencialmente levar a imaginações artísticas e ações emancipatórias que recriem Che, ou a apropriações capitalistas ou outras formas de tentar controlar as imagens de Che. Na tese, são criadas algumas pontes entre o Marxismo e o pensamento decolonial, tal como entre o conceito de ação criativa Arendtiana (1998), da consciência antecipatória de Bloch (1996) e a cosmovisão ancestral (WILSON, 2001; LACLAU, 2016; ANZALDUA, 2012), e no entendimento amplo do conceito de alienação/fetiche. Diversos exemplos contemporâneos de imaginações artísticas e ações emancipatórias são discutidos, desafiando a retórica de suposta irrelevância política das imagens de Che. Tentativas de apropriação por corporações capitalistas, por um movimento nazista e por um artista gráfico também são discutidos a partir de uma redefinição ampla da teoria da alienação. Um conto e dois poemas de minha autoria sobre Che também são discutidos na tese, assim como cinco filmes: The Last Hours of Che Guevara (THE LAST HOURS, 2016), El Dia que Me Quieras (EL DIA, 1997), El Che de los Gays (EL CHE DE LOS GAYS, 2004), Personal Che (PERSONAL CHE, 2007), and Che! (1969).Abstract:Che Guevara, who died more than fifty years ago, keeps resurging through images. This fascination with Che's images is explained by the concept of the anxiety of remembering and non-remembering, caused by the demand for remembrance and redemption ? in the Benjaminian sense (LÖWY; BENJAMIN, 2005) ?, which is expressed through Che's ghostly look. Such a ghostly look is ambivalent and can potentially lead to artistic imaginations and emancipatory actions that recreate Che, or to capitalist appropriations or other ways of trying to control Che's images. In this doctoral dissertation, some bridges are created between Marxism and decolonial thought, such as between the Arendt?s concept of creative action (ARENDT, 1998), Bloch's anticipatory consciousness (BLOCH, 1996) and cosmovision (WILSON, 2001; LACLAU, 2016; ANZALDUA , 2012), and in the broad understanding of the concept of alienation / fetish. Several contemporary examples of artistic imaginations and emancipatory actions are discussed, challenging the rhetoric of supposed political irrelevance of Che's images. Attempts at appropriation by capitalist corporations, by a Nazi movement and by a graphic artist are also discussed from a plural redefinition of the theory of alienation. A short story and two poems of my own about Che are also discussed in the dissertation, as well as five films: The Last Hours of Che Guevara (THE LAST HOURS, 2016), El Dia que Me Quieras (EL DIA, 1997), El Che de los Gays (EL CHE DE LOS GAYS, 2004), Personal Che (PERSONAL CHE, 2007), and Che! (1969)

Repositório Institucional da UFSC

Cherry Valley and the Uses of Memory

Author: Paradis Stephen
Publication venue
Publication date: 01/01/2009
Field of study

Excerpt from Introduction: The Uses of Memory. One of the distinctions of historical maps dealing with the encounters occurring on the frontier between two separate cultures is a unique symbol to indicate massacres. The United States comprised, throughout its pre- and actual history, one great frontier that saw many such exchanges. Some of these were mortal, and many of them, one-sided in nature, tended to be called massacres by the losing side, in an attempt to salvage some moral high ground. However, no one disputes what what happened at Cherry Valley, New York, on 11 November 1778, was a massacre. On that date, Iroquois and Loyalist Rangers raided the hamlet of Cherry Valley on the New York frontier, south of the Mohawk Valley. The raid destroyed the settlement and forced the evacuation of the fort. Forty people died, most of them unarmed civilians. This minor episode seemed to give birth to a considerable body of work, comprising various histories from diverse viewpoints, and works of fiction including dramatic literature and motion pictures. The first question that arose from this material, in the course of preparing research for a historical paper was simple and factual: 1. Is it possible to find the truth of what happened that day?...Master'sCollege of Arts and Sciences: Liberal StudiesUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/117732/1/Paradis.pd

Deep Blue Documents at the University of Michigan