3,458,190 research outputs found
Normalized Information Distance
The normalized information distance is a universal distance measure for
objects of all kinds. It is based on Kolmogorov complexity and thus
uncomputable, but there are ways to utilize it. First, compression algorithms
can be used to approximate the Kolmogorov complexity if the objects have a
string representation. Second, for names and abstract concepts, page count
statistics from the World Wide Web can be used. These practical realizations of
the normalized information distance can then be applied to machine learning
tasks, expecially clustering, to perform feature-free and parameter-free data
mining. This chapter discusses the theoretical foundations of the normalized
information distance and both practical realizations. It presents numerous
examples of successful real-world applications based on these distance
measures, ranging from bioinformatics to music clustering to machine
translation.Comment: 33 pages, 12 figures, pdf, in: Normalized information distance, in:
Information Theory and Statistical Learning, Eds. M. Dehmer, F.
Emmert-Streib, Springer-Verlag, New-York, To appea
Information Distance: New Developments
In pattern recognition, learning, and data mining one obtains information
from information-carrying objects. This involves an objective definition of the
information in a single object, the information to go from one object to
another object in a pair of objects, the information to go from one object to
any other object in a multiple of objects, and the shared information between
objects. This is called "information distance." We survey a selection of new
developments in information distance.Comment: 4 pages, Latex; Series of Publications C, Report C-2011-45,
Department of Computer Science, University of Helsinki, pp. 71-7
- …