3 research outputs found
Incremental dimension reduction of tensors with random index
We present an incremental, scalable and efficient dimension reduction
technique for tensors that is based on sparse random linear coding. Data is
stored in a compactified representation with fixed size, which makes memory
requirements low and predictable. Component encoding and decoding are performed
on-line without computationally expensive re-analysis of the data set. The
range of tensor indices can be extended dynamically without modifying the
component representation. This idea originates from a mathematical model of
semantic memory and a method known as random indexing in natural language
processing. We generalize the random-indexing algorithm to tensors and present
signal-to-noise-ratio simulations for representations of vectors and matrices.
We present also a mathematical analysis of the approximate orthogonality of
high-dimensional ternary vectors, which is a property that underpins this and
other similar random-coding approaches to dimension reduction. To further
demonstrate the properties of random indexing we present results of a synonym
identification task. The method presented here has some similarities with
random projection and Tucker decomposition, but it performs well at high
dimensionality only (n>10^3). Random indexing is useful for a range of complex
practical problems, e.g., in natural language processing, data mining, pattern
recognition, event detection, graph searching and search engines. Prototype
software is provided. It supports encoding and decoding of tensors of order >=
1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure