A very promising idea for fast searching in traditional and
multimedia databases is to map objects into points in k-d space, using k
feature-extraction functions, provided by a domain expert rJag91]. Thus.
we can subsequently use highly fine-tuned spatia l access methods (SAMs),
to answer several types of queries, including the 'Query By Example' type
(which translates to a range query); the 'all pairs' query (which
translates to a spatial join [BKSS94]); the nearest-neighbor or best-match
query, etc.
However, designing feature extraction functions can be hard. It is
relatively easier for a domain expert to assess the similarity/distance of
two objects. Given only the distance information though, it is not obvious
how to map objects into points.
This is exactly the topic of this paper. We describe a fast
algorithm to map objects into points in some k-dimensional space (k is
user-defined), such that the dissimilarities are preserved. There are two
benefits from this mapping: (a) efficient retriev al, in conjunction with
a SAM, as discussed before and (b) visualization and data-mining: the
objects can now be plotted as points in 2-d or Sd space, revealing
potential clusters, correlations among attributes and other regularities
that data-mining is l ooking for.
We introduce an older method from pattern recognition, namely,
Multi-Dimcnsional Scaling (MDS) [Tor52]; although unsuitable for indexing,
we use it as yardstick for our method. Then, we propose a much faster
algorithm to solve the problem in hand, while in addition it allows for
indexing. Experiments on real and synthetic data indeed show that the
proposed algorithm is significantly faster than MDS, (being linear, as
opposed to quadratic, on the database size N), while it manages to
preserve distances an d the overall structure of the data-set.
(Also cross-referenced as UMIACS-TR-94-132