150,830 research outputs found
On Optimizing Distributed Tucker Decomposition for Dense Tensors
The Tucker decomposition expresses a given tensor as the product of a small
core tensor and a set of factor matrices. Apart from providing data
compression, the construction is useful in performing analysis such as
principal component analysis (PCA)and finds applications in diverse domains
such as signal processing, computer vision and text analytics. Our objective is
to develop an efficient distributed implementation for the case of dense
tensors. The implementation is based on the HOOI (Higher Order Orthogonal
Iterator) procedure, wherein the tensor-times-matrix product forms the core
routine. Prior work have proposed heuristics for reducing the computational
load and communication volume incurred by the routine. We study the two metrics
in a formal and systematic manner, and design strategies that are optimal under
the two fundamental metrics. Our experimental evaluation on a large benchmark
of tensors shows that the optimal strategies provide significant reduction in
load and volume compared to prior heuristics, and provide up to 7x speed-up in
the overall running time.Comment: Preliminary version of the paper appears in the proceedings of
IPDPS'1
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
- …