Search CORE

849 research outputs found

Recommended from our members

Algorithms to Exploit Data Sparsity

Author: Flodin Larkin H
Publication venue: ScholarWorks@UMass Amherst
Publication date: 20/10/2021
Field of study

While data in the real world is very high-dimensional, it generally has some underlying structure; for instance, if we think of an image as a set of pixels with associated color values, most possible settings of color values correspond to something more like random noise than what we typically think of as a picture. With an appropriate transformation of basis, this underlying structure can often be converted into sparsity in data, giving an equivalent representation of the data where the magnitude is large in only a few directions relative to the ambient dimension. This motivates a variety of theoretical questions around designing algorithms that can exploit this data sparsity to achieve better performance than what would be possible naively, and in this thesis we tackle several such questions.We first examine the question of simply approximating the level of sparsity of a signal under several different measurement models, a natural first step if the sparsity is to be exploited by other algorithms. Second, we look at a particular sparse signal recovery problem called nonadaptive probabilistic group testing, and investigate the question of exactly how sparse the signal needs to be before the methods used for recovering sparse signals outperform those used for non-sparse signals. Third, we prove novel upper bounds on the number of measurements needed to recover a sparse signal in the universal one-bit compressed sensing model of sparse signal recovery. Fourth, we give some approximations of an information-theoretic quantity called the index coding rate of a network modeled by a graph, in the special case that the graph is sparse or otherwise highly structured. For each of the problems considered, we also discuss some remaining open questions and conjectures, as well as possible directions towards their solutions

ScholarWorks@UMass Amherst

Entropy-scaling search of massive biological data

Author: Berger Bonnie
Daniels Noah M.
Danko David Christian
Yu Y. William
Publication venue: 'Elsevier BV'
Publication date: 01/06/2015
Field of study

Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

arXiv.org e-Print Archive

Elsevier - Publisher Connector

DSpace@MIT

PubMed Central