Homoglyphs can be used for disguising plagiarized text by
replacing letters in source texts with visually identical letters from other
scripts. Most current plagiarism detection systems are not able to detect
plagiarism when text has been obfuscated using homoglyphs. In this
work, we present two alternative approaches for detecting plagiarism in
homoglyph obfuscated texts. The first approach utilizes the Unicode list
of confusables to replace homoglyphs with visually identical letters, while
the second approach uses a similarity score computed using normalized
hamming distance to match homoglyph obfuscated words with source
words. Empirical testing on datasets from PAN-2015 shows that both
approaches perform equally well for plagiarism detection in homoglyph
obfuscated texts