16,192 research outputs found
S-TREE: Self-Organizing Trees for Data Clustering and Online Vector Quantization
This paper introduces S-TREE (Self-Organizing Tree), a family of models that use unsupervised learning to construct hierarchical representations of data and online tree-structured vector quantizers. The S-TREE1 model, which features a new tree-building algorithm, can be implemented with various cost functions. An alternative implementation, S-TREE2, which uses a new double-path search procedure, is also developed. S-TREE2 implements an online procedure that approximates an optimal (unstructured) clustering solution while imposing a tree-structure constraint. The performance of the S-TREE algorithms is illustrated with data clustering and vector quantization examples, including a Gauss-Markov source benchmark and an image compression application. S-TREE performance on these tasks is compared with the standard tree-structured vector quantizer (TSVQ) and the generalized Lloyd algorithm (GLA). The image reconstruction quality with S-TREE2 approaches that of GLA while taking less than 10% of computer time. S-TREE1 and S-TREE2 also compare favorably with the standard TSVQ in both the time needed to create the codebook and the quality of image reconstruction.Office of Naval Research (N00014-95-10409, N00014-95-0G57
Sources of Superlinearity in Davenport-Schinzel Sequences
A generalized Davenport-Schinzel sequence is one over a finite alphabet that
contains no subsequences isomorphic to a fixed forbidden subsequence. One of
the fundamental problems in this area is bounding (asymptotically) the maximum
length of such sequences. Following Klazar, let Ex(\sigma,n) be the maximum
length of a sequence over an alphabet of size n avoiding subsequences
isomorphic to \sigma. It has been proved that for every \sigma, Ex(\sigma,n) is
either linear or very close to linear; in particular it is O(n
2^{\alpha(n)^{O(1)}}), where \alpha is the inverse-Ackermann function and O(1)
depends on \sigma. However, very little is known about the properties of \sigma
that induce superlinearity of \Ex(\sigma,n).
In this paper we exhibit an infinite family of independent superlinear
forbidden subsequences. To be specific, we show that there are 17 prototypical
superlinear forbidden subsequences, some of which can be made arbitrarily long
through a simple padding operation. Perhaps the most novel part of our
constructions is a new succinct code for representing superlinear forbidden
subsequences
Rank, select and access in grammar-compressed strings
Given a string of length on a fixed alphabet of symbols, a
grammar compressor produces a context-free grammar of size that
generates and only . In this paper we describe data structures to
support the following operations on a grammar-compressed string:
\mbox{rank}_c(S,i) (return the number of occurrences of symbol before
position in ); \mbox{select}_c(S,i) (return the position of the th
occurrence of in ); and \mbox{access}(S,i,j) (return substring
). For rank and select we describe data structures of size
bits that support the two operations in time. We
propose another structure that uses
bits and that supports the two queries in , where
is an arbitrary constant. To our knowledge, we are the first to
study the asymptotic complexity of rank and select in the grammar-compressed
setting, and we provide a hardness result showing that significantly improving
the bounds we achieve would imply a major breakthrough on a hard
graph-theoretical problem. Our main result for access is a method that requires
bits of space and time to extract
consecutive symbols from . Alternatively, we can achieve query time using bits of space. This matches a lower bound stated by Verbin
and Yu for strings where is polynomially related to .Comment: 16 page
- …