research

Hamming-like distances for ill-defined strings in linguistic classification

Abstract

Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider l-length strings which have in each position one of the three symbols 0 or false, 1 or true, b or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches

    Similar works