6,138 research outputs found
AntiPlag: Plagiarism Detection on Electronic Submissions of Text Based Assignments
Plagiarism is one of the growing issues in academia and is always a concern
in Universities and other academic institutions. The situation is becoming even
worse with the availability of ample resources on the web. This paper focuses
on creating an effective and fast tool for plagiarism detection for text based
electronic assignments. Our plagiarism detection tool named AntiPlag is
developed using the tri-gram sequence matching technique. Three sets of text
based assignments were tested by AntiPlag and the results were compared against
an existing commercial plagiarism detection tool. AntiPlag showed better
results in terms of false positives compared to the commercial tool due to the
pre-processing steps performed in AntiPlag. In addition, to improve the
detection latency, AntiPlag applies a data clustering technique making it four
times faster than the commercial tool considered. AntiPlag could be used to
isolate plagiarized text based assignments from non-plagiarised assignments
easily. Therefore, we present AntiPlag, a fast and effective tool for
plagiarism detection on text based electronic assignments
Text Similarity from Image Contents using Statistical and Semantic Analysis Techniques
Plagiarism detection is one of the most researched areas among the Natural
Language Processing(NLP) community. A good plagiarism detection covers all the
NLP methods including semantics, named entities, paraphrases etc. and produces
detailed plagiarism reports. Detection of Cross Lingual Plagiarism requires
deep knowledge of various advanced methods and algorithms to perform effective
text similarity checking. Nowadays the plagiarists are also advancing
themselves from hiding the identity from being catch in such offense. The
plagiarists are bypassed from being detected with techniques like paraphrasing,
synonym replacement, mismatching citations, translating one language to
another. Image Content Plagiarism Detection (ICPD) has gained importance,
utilizing advanced image content processing to identify instances of plagiarism
to ensure the integrity of image content. The issue of plagiarism extends
beyond textual content, as images such as figures, graphs, and tables also have
the potential to be plagiarized. However, image content plagiarism detection
remains an unaddressed challenge. Therefore, there is a critical need to
develop methods and systems for detecting plagiarism in image content. In this
paper, the system has been implemented to detect plagiarism form contents of
Images such as Figures, Graphs, Tables etc. Along with statistical algorithms
such as Jaccard and Cosine, introducing semantic algorithms such as LSA, BERT,
WordNet outperformed in detecting efficient and accurate plagiarism.Comment: NLPTT2023 publication, 10 Page
The Hegelian Inquiring System and Critical Triangulation Tools for the Internet Information Slave
This paper discusses informing, i.e. increasing people’s understanding of reality by providing representations of this reality. The Hegelian inquiry system is used to explain the nature of informing. Understanding the Hegelian inquiry system is essential for making informed decisions where the reality can be ambiguous and where sources of bias and manipulation have to be understood for increasing the level of free-informed choice. This inquiry system metaphorically identifies information masters and slaves, and we propose critical dialectic information triangulation (CDIT) tools for information slaves (i.e. non-experts) in dialect interactions with informative systems owned by supposed information masters. The paper concludes with suggestions for further research on informative triangulation tools for the internet and management information systems
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs
Binary code analysis allows analyzing binary code without having access to
the corresponding source code. A binary, after disassembly, is expressed in an
assembly language. This inspires us to approach binary analysis by leveraging
ideas and techniques from Natural Language Processing (NLP), a rich area
focused on processing text of various natural languages. We notice that binary
code analysis and NLP share a lot of analogical topics, such as semantics
extraction, summarization, and classification. This work utilizes these ideas
to address two important code similarity comparison problems. (I) Given a pair
of basic blocks for different instruction set architectures (ISAs), determining
whether their semantics is similar or not; and (II) given a piece of code of
interest, determining if it is contained in another piece of assembly code for
a different ISA. The solutions to these two problems have many applications,
such as cross-architecture vulnerability discovery and code plagiarism
detection. We implement a prototype system INNEREYE and perform a comprehensive
evaluation. A comparison between our approach and existing approaches to
Problem I shows that our system outperforms them in terms of accuracy,
efficiency and scalability. And the case studies utilizing the system
demonstrate that our solution to Problem II is effective. Moreover, this
research showcases how to apply ideas and techniques from NLP to large-scale
binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium
201
Sensing Textual Plagiarism
This Final Year Project (FYP) is about Sensing Textual Plagiarism. To realize this, an
application that is equipped with the capability of detecting plagiarism from occurring in a
textual document is to be developed. The main focus of this project is to perform a study on
how to detect plagiarism from a textual document. Word-for-word plagiarism is the most
obvious and serious form of plagiarism which can be can be categorized as a form of
direct stealing, without significant alteration and consent of another's work. Fact
findings are carried out in order to perform the study on plagiarism. This project will
incorporate the Smith-Waterman Algorithm which is a classical tool in the
identification and quantification of local similarities in biological sequences. As a
result, the significance of this project is the availability of the application to sense the
wide spread of plagiarism that often occur upon valuable documents, articles, and
journals
- …