3,653 research outputs found
Deep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection
methods on a new recently introduced open dataset, which contains parallel and
comparable collections of documents with multiple characteristics (different
genres, languages and sizes of texts). We investigate cross-language plagiarism
detection methods for 6 language pairs on 2 granularities of text units in
order to draw robust conclusions on the best methods while deeply analyzing
correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable
Corpora) colocated with ACL 201
Metrics for measuring distances in configuration spaces
In order to characterize molecular structures we introduce configurational
fingerprint vectors which are counterparts of quantities used experimentally to
identify structures. The Euclidean distance between the configurational
fingerprint vectors satisfies the properties of a metric and can therefore
safely be used to measure dissimilarities between configurations in the high
dimensional configuration space. We show that these metrics correlate well with
the RMSD between two configurations if this RMSD is obtained from a global
minimization over all translations, rotations and permutations of atomic
indices. We introduce a Monte Carlo approach to obtain this global minimum of
the RMSD between configurations
Fingerprint Analysis with Marked Point Processes
We present a framework for fingerprint matching based on marked point process
models. An efficient Monte Carlo algorithm is developed to calculate the
marginal likelihood ratio for the hypothesis that two observed prints originate
from the same finger against the hypothesis that they originate from different
fingers. Our model achieves good performance on an NIST-FBI fingerprint
database of 258 matched fingerprint pairs
- …