Search CORE

44,721 research outputs found

The SBC-Tree: An Index for Run-Length Compressed Sequences

Author: Aref Walid G.
Eltabakh Mohamed Y.
Hon Wing-Kai
Shah Rahul
Vitter Jeffrey S.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2005
Field of study

Run-Length-Encoding (RLE) is a data compression technique that is used in various applications, e.g., biological sequence databases. multimedia: and facsimile transmission. One of the main challenges is how to operate, e.g., indexing: searching, and retriexral: on the compressed data without decompressing it. In t.his paper, we present the String &tree for _Compressed sequences; termed the SBC-tree, for indexing and searching RLE-compressed sequences of arbitrary length. The SBC-tree is a two-level index structure based on the well-knoxvn String B-tree and a 3-sided range query structure. The SBC-tree supports substring as \\re11 as prefix m,atching, and range search operations over RLE-compressed sequences. The SBC-tree has an optimal external-memory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. The insertion and deletion of all suffixes of a compressed sequence of length m taltes O(m logB(N + m)) I/O operations. Substring match,ing, pre,fix matching, and range search execute in an optimal O(log, N + F) I/O operations, where Ip is the length of the compressed query pattern and T is the query output size. Re present also two variants of the SBC-tree: the SBC-tree that is based on an R-tree instead of the 3-sided structure: and the one-level SBC-tree that does not use a two-dimensional index. These variants do not have provable worstcase theoret.ica1 bounds for search operations, but perform well in practice. The SBC-tree index is realized inside PostgreSQL in t,he context of a biological protein database application. Performance results illustrate that using the SBC-tree to index RLE-compressed sequences achieves up to an order of magnitude reduction in storage, up to 30 % reduction in 110s for the insertion operations, and retains the optimal search performance achieved by the St,ring B-tree over the uncompressed sequences.!I c 0,

CiteSeerX

Purdue E-Pubs

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

Author: Freris Nikolaos
Kyrillidis Anastasios
Vlachos Michail
Publication venue
Publication date: 22/05/2014
Field of study

Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the \emph{tightest} lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations \textit{directly} in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an \emph{exact} solution to the problem. The suggested solution provides the tightest estimation of the

L_2

-norm or the correlation. We show that typical data-analysis operations, such as k-NN search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD

arXiv.org e-Print Archive

Serveur académique lausannois

Sequential Compressed Sensing

Author: Malioutov Dmitry
Sanghavi Sujay
Willsky Alan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2009
Field of study

Compressed sensing allows perfect recovery of sparse signals (or signals sparse in some basis) using only a small number of random measurements. Existing results in compressed sensing literature have focused on characterizing the achievable performance by bounding the number of samples required for a given level of signal sparsity. However, using these bounds to minimize the number of samples requires a-priori knowledge of the sparsity of the unknown signal, or the decay structure for near-sparse signals. Furthermore, there are some popular recovery methods for which no such bounds are known. In this paper, we investigate an alternative scenario where observations are available in sequence. For any recovery method, this means that there is now a sequence of candidate reconstructions. We propose a method to estimate the reconstruction error directly from the samples themselves, for every candidate in this sequence. This estimate is universal in the sense that it is based only on the measurement ensemble, and not on the recovery method or any assumed level of sparsity of the unknown signal. With these estimates, one can now stop observations as soon as there is reasonable certainty of either exact or sufficiently accurate reconstruction. They also provide a way to obtain "run-time" guarantees for recovery methods that otherwise lack a-priori performance bounds. We investigate both continuous (e.g. Gaussian) and discrete (e.g. Bernoulli) random measurement ensembles, both for exactly sparse and general near-sparse signals, and with both noisy and noiseless measurements.Comment: to appear in IEEE transactions on Special Topics in Signal Processin

arXiv.org e-Print Archive

DSpace@MIT

Compressed sensing reconstruction using Expectation Propagation

Author: Braunstein Alfredo
Muntoni Anna Paola
Pagnani Andrea
Pieropan Mirko
Publication venue: 'IOP Publishing'
Publication date: 01/01/2019
Field of study

Many interesting problems in fields ranging from telecommunications to computational biology can be formalized in terms of large underdetermined systems of linear equations with additional constraints or regularizers. One of the most studied ones, the Compressed Sensing problem (CS), consists in finding the solution with the smallest number of non-zero components of a given system of linear equations

\boldsymbol y = \mathbf{F} \boldsymbol{w}

for known measurement vector

\boldsymbol{y}

and sensing matrix

\mathbf{F}

. Here, we will address the compressed sensing problem within a Bayesian inference framework where the sparsity constraint is remapped into a singular prior distribution (called Spike-and-Slab or Bernoulli-Gauss). Solution to the problem is attempted through the computation of marginal distributions via Expectation Propagation (EP), an iterative computational scheme originally developed in Statistical Physics. We will show that this strategy is comparatively more accurate than the alternatives in solving instances of CS generated from statistically correlated measurement matrices. For computational strategies based on the Bayesian framework such as variants of Belief Propagation, this is to be expected, as they implicitly rely on the hypothesis of statistical independence among the entries of the sensing matrix. Perhaps surprisingly, the method outperforms uniformly also all the other state-of-the-art methods in our tests.Comment: 20 pages, 6 figure

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Hal-Diderot