Search CORE

2,174 research outputs found

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Author: Arora Akhil
Bhattacharya Arnab
Kumar Piyush
Sinha Sakshi
Publication venue: 'VLDB Endowment'
Publication date: 23/04/2018
Field of study

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Reordering Rows for Better Compression: Beyond the Lexicographic Order

Author: Gutarra Eduardo
Kaser Owen
Lemire Daniel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2012
Field of study

Sorting database tables before compressing them improves the compression rate. Can we do better than the lexicographical order? For minimizing the number of runs in a run-length encoding compression scheme, the best approaches to row-ordering are derived from traveling salesman heuristics, although there is a significant trade-off between running time and compression. A new heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades off compression for a major running-time speedup, is a good option for very large tables. However, for some compression schemes, it is more important to generate long runs rather than few runs. For this case, another novel heuristic, Vortex, is promising. We find that we can improve run-length encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%: these gains are on top of the gains due to lexicographically sorting the table. We prove that the new row reordering is optimal (within 10%) at minimizing the runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD

arXiv.org e-Print Archive

R-libre

Crossref

The True Destination of EGO is Multi-local Optimization

Author: Preuss Mike
Wessing Simon
Publication venue
Publication date: 19/04/2017
Field of study

Efficient global optimization is a popular algorithm for the optimization of expensive multimodal black-box functions. One important reason for its popularity is its theoretical foundation of global convergence. However, as the budgets in expensive optimization are very small, the asymptotic properties only play a minor role and the algorithm sometimes comes off badly in experimental comparisons. Many alternative variants have therefore been proposed over the years. In this work, we show experimentally that the algorithm instead has its strength in a setting where multiple optima are to be identified

arXiv.org e-Print Archive

Crossref

Recent Advances in Graph Partitioning

Author: A Buluç
A Felner
A George
A Lisser
A Pothen
A Trifunović
AB Kahng
AE Feldmann
AH Land
AJ Soper
B Brandfass
B Hendrickson
B Hendrickson
B Hendrickson
B Junker
B Monien
B Peng
BW Kernighan
C Aykanat
C Chevalier
C Chevalier
C Farhat
C Lanczos
C Walshaw
C Walshaw
C Walshaw
C Walshaw
C Walshaw
C Walshaw
CE Bichot
CE Ferreira
D Delling
D Delling
D Delling
D Drake
D Luxen
D Ron
D Ron
D Wagner
DA Papa
DE Drake Vinkemeier
E Jeannot
E Rolland
F Comellas
F Glover
F Glover
F Pellegrini
F Pellegrini
F Pellegrini
F Schulz
FT Leighton
G Even
G Karypis
G Karypis
G Karypis
G Zumbusch
H Li
H Meyerhenke
H Meyerhenke
H Meyerhenke
H Meyerhenke
H Meyerhenke
HD Simon
HD Simon
I Moulitsas
I Safro
I Safro
J Chen
J Cong
J Fietz
J Hromkovič
J Hungershöfer
J Maue
J Maue
J Shalf
JR Gilbert
K Andreev
K Lang
K Schloegel
K Schloegel
K Schloegel
KS Camilus
L Brunetta
L Grady
L Lovász
LA Sanchis
LR Ford
M Armbruster
M Bader
M Birn
M Fiedler
M Jerrum
M Newman
M Sellmann
M Zhou
MR Garey
N Sensen
O Goldschmidt
P Chardaire
P Galinier
P Korosec
P Sanders
P Sanders
R Diekmann
R Diekmann
R Glantz
R Preis
RD Williams
S Arora
S Huang
S Lafon
S Lloyd
S Pettie
SE Karisch
SY Chan
T Bui
T Kieritz
U Benlic
U Benlic
U Feige
V Osipov
WE Donath
WE Donath
WW Hager
WW Hager
X Sui
Y Low
YM Kim
Ü Çatalyürek
Publication venue
Publication date: 03/02/2015
Field of study

We survey recent trends in practical algorithms for balanced graph partitioning together with applications and future research directions

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Multispectral texture synthesis

Author: Tyrrell J. Alexander
Publication venue: RIT Scholar Works
Publication date: 01/01/2003
Field of study

Synthesizing texture involves the ordering of pixels in a 2D arrangement so as to display certain known spatial correlations, generally as described by a sample texture. In an abstract sense, these pixels could be gray-scale values, RGB color values, or entire spectral curves. The focus of this work is to develop a practical synthesis framework that maintains this abstract view while synthesizing texture with high spectral dimension, effectively achieving spectral invariance. The principle idea is to use a single monochrome texture synthesis step to capture the spatial information in a multispectral texture. The first step is to use a global color space transform to condense the spatial information in a sample texture into a principle luminance channel. Then, a monochrome texture synthesis step generates the corresponding principle band in the synthetic texture. This spatial information is then used to condition the generation of spectral information. A number of variants of this general approach are introduced. The first uses a multiresolution transform to decompose the spatial information in the principle band into an equivalent scale/space representation. This information is encapsulated into a set of low order statistical constraints that are used to iteratively coerce white noise into the desired texture. The residual spectral information is then generated using a non-parametric Markov Ran dom field model (MRF). The remaining variants use a non-parametric MRF to generate the spatial and spectral components simultaneously. In this ap proach, multispectral texture is grown from a seed region by sampling from the set of nearest neighbors in the sample texture as identified by a template matching procedure in the principle band. The effectiveness of both algorithms is demonstrated on a number of texture examples ranging from greyscale to RGB textures, as well as 16, 22, 32 and 63 band spectral images. In addition to the standard visual test that predominates the literature, effort is made to quantify the accuracy of the synthesis using informative and effective metrics. These include first and second order statistical comparisons as well as statistical divergence tests

RIT Scholar Works