Search CORE

16,192 research outputs found

S-TREE: Self-Organizing Trees for Data Clustering and Online Vector Quantization

Author: Campos Marcos
Carpenter Gail
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/09/2000
Field of study

This paper introduces S-TREE (Self-Organizing Tree), a family of models that use unsupervised learning to construct hierarchical representations of data and online tree-structured vector quantizers. The S-TREE1 model, which features a new tree-building algorithm, can be implemented with various cost functions. An alternative implementation, S-TREE2, which uses a new double-path search procedure, is also developed. S-TREE2 implements an online procedure that approximates an optimal (unstructured) clustering solution while imposing a tree-structure constraint. The performance of the S-TREE algorithms is illustrated with data clustering and vector quantization examples, including a Gauss-Markov source benchmark and an image compression application. S-TREE performance on these tasks is compared with the standard tree-structured vector quantizer (TSVQ) and the generalized Lloyd algorithm (GLA). The image reconstruction quality with S-TREE2 approaches that of GLA while taking less than 10% of computer time. S-TREE1 and S-TREE2 also compare favorably with the standard TSVQ in both the time needed to create the codebook and the quality of image reconstruction.Office of Naval Research (N00014-95-10409, N00014-95-0G57

Boston University Institutional Repository (OpenBU)

Sources of Superlinearity in Davenport-Schinzel Sequences

Author: Pettie Seth
Publication venue
Publication date: 11/07/2007
Field of study

A generalized Davenport-Schinzel sequence is one over a finite alphabet that contains no subsequences isomorphic to a fixed forbidden subsequence. One of the fundamental problems in this area is bounding (asymptotically) the maximum length of such sequences. Following Klazar, let Ex(\sigma,n) be the maximum length of a sequence over an alphabet of size n avoiding subsequences isomorphic to \sigma. It has been proved that for every \sigma, Ex(\sigma,n) is either linear or very close to linear; in particular it is O(n 2^{\alpha(n)^{O(1)}}), where \alpha is the inverse-Ackermann function and O(1) depends on \sigma. However, very little is known about the properties of \sigma that induce superlinearity of \Ex(\sigma,n). In this paper we exhibit an infinite family of independent superlinear forbidden subsequences. To be specific, we show that there are 17 prototypical superlinear forbidden subsequences, some of which can be made arbitrarily long through a simple padding operation. Perhaps the most novel part of our constructions is a new succinct code for representing superlinear forbidden subsequences

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Rank, select and access in grammar-compressed strings

Author: Belazzougui Djamal
Puglisi Simon J.
Tabei Yasuo
Publication venue
Publication date: 14/08/2014
Field of study

Given a string

S

of length

N

on a fixed alphabet of

\sigma

symbols, a grammar compressor produces a context-free grammar

G

of size

n

that generates

S

and only

S

. In this paper we describe data structures to support the following operations on a grammar-compressed string: \mbox{rank}_c(S,i) (return the number of occurrences of symbol

c

before position

i

S

); \mbox{select}_c(S,i) (return the position of the

i

th occurrence of

c

S

); and \mbox{access}(S,i,j) (return substring

S[i,j]

). For rank and select we describe data structures of size

O(n\sigma\log N)

bits that support the two operations in

O(\log N)

time. We propose another structure that uses

O(n\sigma\log (N/n)(\log N)^{1+\epsilon})

bits and that supports the two queries in

O(\log N/\log\log N)

, where

\epsilon>0

is an arbitrary constant. To our knowledge, we are the first to study the asymptotic complexity of rank and select in the grammar-compressed setting, and we provide a hardness result showing that significantly improving the bounds we achieve would imply a major breakthrough on a hard graph-theoretical problem. Our main result for access is a method that requires