1,565 research outputs found
Generalized residual vector quantization for large scale data
Vector quantization is an essential tool for tasks involving large scale
data, for example, large scale similarity search, which is crucial for
content-based information retrieval and analysis. In this paper, we propose a
novel vector quantization framework that iteratively minimizes quantization
error. First, we provide a detailed review on a relevant vector quantization
method named \textit{residual vector quantization} (RVQ). Next, we propose
\textit{generalized residual vector quantization} (GRVQ) to further improve
over RVQ. Many vector quantization methods can be viewed as the special cases
of our proposed framework. We evaluate GRVQ on several large scale benchmark
datasets for large scale search, classification and object retrieval. We
compared GRVQ with existing methods in detail. Extensive experiments
demonstrate our GRVQ framework substantially outperforms existing methods in
term of quantization accuracy and computation efficiency.Comment: published on International Conference on Multimedia and Expo 201
Privacy-Preserving Shortest Path Computation
Navigation is one of the most popular cloud computing services. But in
virtually all cloud-based navigation systems, the client must reveal her
location and destination to the cloud service provider in order to learn the
fastest route. In this work, we present a cryptographic protocol for navigation
on city streets that provides privacy for both the client's location and the
service provider's routing data. Our key ingredient is a novel method for
compressing the next-hop routing matrices in networks such as city street maps.
Applying our compression method to the map of Los Angeles, for example, we
achieve over tenfold reduction in the representation size. In conjunction with
other cryptographic techniques, this compressed representation results in an
efficient protocol suitable for fully-private real-time navigation on city
streets. We demonstrate the practicality of our protocol by benchmarking it on
real street map data for major cities such as San Francisco and Washington,
D.C.Comment: Extended version of NDSS 2016 pape
Compressing Word Embeddings
Recent methods for learning vector space representations of words have
succeeded in capturing fine-grained semantic and syntactic regularities using
vector arithmetic. However, these vector space representations (created through
large-scale text analysis) are typically stored verbatim, since their internal
structure is opaque. Using word-analogy tests to monitor the level of detail
stored in compressed re-representations of the same vector space, the
trade-offs between the reduction in memory usage and expressiveness are
investigated. A simple scheme is outlined that can reduce the memory footprint
of a state-of-the-art embedding by a factor of 10, with only minimal impact on
performance. Then, using the same `bit budget', a binary (approximate)
factorisation of the same space is also explored, with the aim of creating an
equivalent representation with better interpretability.Comment: 10 pages, 0 figures, submitted to ICONIP-2016. Previous experimental
results were submitted to ICLR-2016, but the paper has been significantly
updated, since a new experimental set-up worked much bette
Compressing Sparse Sequences under Local Decodability Constraints
We consider a variable-length source coding problem subject to local
decodability constraints. In particular, we investigate the blocklength scaling
behavior attainable by encodings of -sparse binary sequences, under the
constraint that any source bit can be correctly decoded upon probing at most
codeword bits. We consider both adaptive and non-adaptive access models,
and derive upper and lower bounds that often coincide up to constant factors.
Notably, such a characterization for the fixed-blocklength analog of our
problem remains unknown, despite considerable research over the last three
decades. Connections to communication complexity are also briefly discussed.Comment: 8 pages, 1 figure. First five pages to appear in 2015 International
Symposium on Information Theory. This version contains supplementary materia
Connectivity Compression for Irregular Quadrilateral Meshes
Applications that require Internet access to remote 3D datasets are often
limited by the storage costs of 3D models. Several compression methods are
available to address these limits for objects represented by triangle meshes.
Many CAD and VRML models, however, are represented as quadrilateral meshes or
mixed triangle/quadrilateral meshes, and these models may also require
compression. We present an algorithm for encoding the connectivity of such
quadrilateral meshes, and we demonstrate that by preserving and exploiting the
original quad structure, our approach achieves encodings 30 - 80% smaller than
an approach based on randomly splitting quads into triangles. We present both a
code with a proven worst-case cost of 3 bits per vertex (or 2.75 bits per
vertex for meshes without valence-two vertices) and entropy-coding results for
typical meshes ranging from 0.3 to 0.9 bits per vertex, depending on the
regularity of the mesh. Our method may be implemented by a rule for a
particular splitting of quads into triangles and by using the compression and
decompression algorithms introduced in [Rossignac99] and
[Rossignac&Szymczak99]. We also present extensions to the algorithm to compress
meshes with holes and handles and meshes containing triangles and other
polygons as well as quads
On the Hardness and Inapproximability of Recognizing Wheeler Graphs
In recent years several compressed indexes based on variants of the Burrows-Wheeler transformation have been introduced. Some of these are used to index structures far more complex than a single string, as was originally done with the FM-index [Ferragina and Manzini, J. ACM 2005]. As such, there has been an increasing effort to better understand under which conditions such an indexing scheme is possible. This has led to the introduction of Wheeler graphs [Gagie et al., Theor. Comput. Sci., 2017]. Gagie et al. showed that de Bruijn graphs, generalized compressed suffix arrays, and several other BWT related structures can be represented as Wheeler graphs, and that Wheeler graphs can be indexed in a way which is space efficient. Hence, being able to recognize whether a given graph is a Wheeler graph, or being able to approximate a given graph by a Wheeler graph, could have numerous applications in indexing. Here we resolve the open question of whether there exists an efficient algorithm for recognizing if a given graph is a Wheeler graph. We present:
- The problem of recognizing whether a given graph G=(V,E) is a Wheeler graph is NP-complete for any edge label alphabet of size sigma >= 2, even when G is a DAG. This holds even on a restricted, subset of graphs called d-NFA\u27s for d >= 5. This is in contrast to recent results demonstrating the problem can be solved in polynomial time for d-NFA\u27s where d <= 2. We also show the recognition problem can be solved in linear time for sigma =1;
- There exists an 2^{e log sigma + O(n + e)} time exact algorithm where n = |V| and e = |E|. This algorithm relies on graph isomorphism being computable in strictly sub-exponential time;
- We define an optimization variant of the problem called Wheeler Graph Violation, abbreviated WGV, where the aim is to remove the minimum number of edges in order to obtain a Wheeler graph. We show WGV is APX-hard, even when G is a DAG, implying there exists a constant C >= 1 for which there is no C-approximation algorithm (unless P = NP). Also, conditioned on the Unique Games Conjecture, for all C >= 1, it is NP-hard to find a C-approximation;
- We define the Wheeler Subgraph problem, abbreviated WS, where the aim is to find the largest subgraph which is a Wheeler Graph (the dual of the WGV). In contrast to WGV, we prove that the WS problem is in APX for sigma=O(1);
The above findings suggest that most problems under this theme are computationally difficult. However, we identify a class of graphs for which the recognition problem is polynomial time solvable, raising the open question of which parameters determine this problem\u27s difficulty
Superselectors: Efficient Constructions and Applications
We introduce a new combinatorial structure: the superselector. We show that
superselectors subsume several important combinatorial structures used in the
past few years to solve problems in group testing, compressed sensing,
multi-channel conflict resolution and data security. We prove close upper and
lower bounds on the size of superselectors and we provide efficient algorithms
for their constructions. Albeit our bounds are very general, when they are
instantiated on the combinatorial structures that are particular cases of
superselectors (e.g., (p,k,n)-selectors, (d,\ell)-list-disjunct matrices,
MUT_k(r)-families, FUT(k, a)-families, etc.) they match the best known bounds
in terms of size of the structures (the relevant parameter in the
applications). For appropriate values of parameters, our results also provide
the first efficient deterministic algorithms for the construction of such
structures
Space Efficient Encodings for Bit-strings, Range queries and Related Problems
학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. Srinivasa Rao Satti.In this thesis, we design and implement various space efficient data structures. Most of these structures use spaces close to the information-theoretic lower bound while supporting the queries efficiently.
In particular, this thesis is concerned with the data structures for four problems: (i) supporting \rank{} and \select{} queries on compressed bit strings, (ii) nearest larger neighbor problem, (iii) simultaneous encodings for range and next/previous larger/smaller value queries, and (iv) range \topk{} queries on two-dimensional arrays.
We first consider practical implementations of \emph{compressed} bitvectors, which support \rank{} and \select{} operations on a given bit-string, while storing the bit-string in compressed form~\cite{DBLP:conf/dcc/JoJORS14}. Our approach relies on \emph{variable-to-fixed} encodings
of the bit-string, an approach that has not yet been considered systematically for practical encodings of bitvectors. We show that this approach leads to fast practical implementations with low \emph{redundancy} (i.e., the space used by the bitvector in addition to the compressed representation of the bit-string), and is a flexible and promising solution to the problem of supporting
\rank{} and \select{} on moderately compressible bit-strings, such as those encountered in real-world applications.
Next, we propose space-efficient data structures for the nearest larger neighbor problem~\cite{IWOCA2014,walcom-JoRS15}. Given a sequence of elements from a total order, and a position in the sequence, the nearest larger neighbor (\NLV{}) query returns the position of the
element which is closest to the query position, and is larger than the element at the query position. The
problem of finding all nearest larger neighbors has attracted interest due to its applications for
parenthesis matching and in computational geometry~\cite{AsanoBK09,AsanoK13,BerkmanSV93}.
We consider a data structure version of this problem, which is to preprocess a given sequence of elements to construct a data structure that can answer \NLN{} queries efficiently. For one-dimensional arrays, we give time-space tradeoffs for the problem on \textit{indexing model}. For two-dimensional arrays, we give an optimal encoding with constant query on \textit{encoding model}.
We also propose space-efficient encodings which support various range queries, and previous and next smaller/larger value queries~\cite{cocoonJS15}. Given a sequence of elements from a total order, we obtain a -bit encoding that supports all these queries where is the length
of input array. For the case when we need to support all these queries in constant time, we give an encoding that takes bits. This improves the -bit encoding obtained by encoding the colored -Min and -Max heaps proposed by Fischer~\cite{Fischer11}.
We extend the original DFUDS~\cite{BDMRRR05} encoding of the colored -Min and -Max heap that supports the queries in constant time. Then, we combine the extended DFUDS of -Min heap and -Max heap using the Min-Max encoding of Gawrychowski and Nicholson~\cite{Gawry14}
with some modifications. We also obtain encodings that take lesser space and support a subset of these queries.
Finally, we consider the various encodings that support range \topk{} queries on a two-dimensional array containing elements from a total order. For an array, we first propose an optimal encoding for answering one-sided \topk{} queries, whose query range is restricted to , for . Next, we propose an encoding for the general \topk{} queries that takes
bits. This generalizes the \topk{} encoding of Gawrychowski and Nicholson~\cite{Gawry14}.Chapter 1 Introduction 1
1.1 Computational model 2
1.1.1 Encoding and indexing models 2
1.2 Contribution of the thesis 3
1.3 Organization of the thesis 5
Chapter 2 Preliminaries 7
Chapter 3 Compressed bit vectors based on variable-to-fixed encodings 10
3.1 Introduction 10
3.2 Bit-vectors using V2F coding 14
3.3 V2F compression algorithms for bit-strings 16
3.3.1 Tunstall code 16
3.3.2 Enumerative codes 19
3.3.3 LZW algorithm 23
3.3.4 Empirical evaluation of the compressors 23
3.4 Practical implementation of bitvectors based on V2F compression. 26
3.4.1 Testing Methodology 29
3.4.2 Results of Empirical Evaluation 33
3.5 Future works 35
Chapter 4 Space Efficient Data Structures for Nearest Larger Neighbor 39
4.1 Introduction 39
4.2 Indexing NLV queries on 1D arrays 43
4.3 Encoding NLN queries on2D binary arrays 44
4.4 Encoding NLN queries for general 2D arrays 50
4.4.1 2D NLN in the encoding model–distinct case 50
4.4.2 2D NLN in the encoding model–general case 53
4.5 Open problems 63
Chapter 5 Simultaneous encodings for range and next/previous larger/smaller value queries 64
5.1 Introduction 64
5.2 Preliminaries 67
5.2.1 2d-Min heap 69
5.2.2 Encoding range min-max queries 72
5.3 Extended DFUDS for colored 2d-Min heap 75
5.4 Encoding colored 2d-Min and 2d-Max heaps 80
5.4.1 Combined data structure for DCMin(A) and DCMax(A) 82
5.4.2 Encoding colored 2d-Min and 2d-Max heaps using less space 88
5.5 Open problems 89
Chapter 6 Encoding Two-dimensional range Top-k queries 90
6.1 Introduction 90
6.2 Encoding one-sided range Top-k queries on 2D array 92
6.3 Encoding general range Top-k queries on 2D array 95
6.4 Open problems 99
Chapter 7 Conculsion 100
Bibliography 103
요약 112Docto
- …