3,816 research outputs found
Dynamic Thresholding Mechanisms for IR-Based Filtering in Efficient Source Code Plagiarism Detection
To solve time inefficiency issue, only potential pairs are compared in
string-matching-based source code plagiarism detection; wherein potentiality is
defined through a fast-yet-order-insensitive similarity measurement (adapted
from Information Retrieval) and only pairs which similarity degrees are higher
or equal to a particular threshold is selected. Defining such threshold is not
a trivial task considering the threshold should lead to high efficiency
improvement and low effectiveness reduction (if it is unavoidable). This paper
proposes two thresholding mechanisms---namely range-based and pair-count-based
mechanism---that dynamically tune the threshold based on the distribution of
resulted similarity degrees. According to our evaluation, both mechanisms are
more practical to be used than manual threshold assignment since they are more
proportional to efficiency improvement and effectiveness reduction.Comment: The 2018 International Conference on Advanced Computer Science and
Information Systems (ICACSIS
- …