81,139 research outputs found
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
Distance Metric Learning using Graph Convolutional Networks: Application to Functional Brain Networks
Evaluating similarity between graphs is of major importance in several
computer vision and pattern recognition problems, where graph representations
are often used to model objects or interactions between elements. The choice of
a distance or similarity metric is, however, not trivial and can be highly
dependent on the application at hand. In this work, we propose a novel metric
learning method to evaluate distance between graphs that leverages the power of
convolutional neural networks, while exploiting concepts from spectral graph
theory to allow these operations on irregular graphs. We demonstrate the
potential of our method in the field of connectomics, where neuronal pathways
or functional connections between brain regions are commonly modelled as
graphs. In this problem, the definition of an appropriate graph similarity
function is critical to unveil patterns of disruptions associated with certain
brain disorders. Experimental results on the ABIDE dataset show that our method
can learn a graph similarity metric tailored for a clinical application,
improving the performance of a simple k-nn classifier by 11.9% compared to a
traditional distance metric.Comment: International Conference on Medical Image Computing and
Computer-Assisted Interventions (MICCAI) 201
Generalizing Kronecker graphs in order to model searchable networks
This paper describes an extension to stochastic
Kronecker graphs that provides the special structure required
for searchability, by defining a “distance”-dependent Kronecker
operator. We show how this extension of Kronecker graphs
can generate several existing social network models, such as
the Watts-Strogatz small-world model and Kleinberg’s latticebased
model. We focus on a specific example of an expanding
hypercube, reminiscent of recently proposed social network
models based on a hidden hyperbolic metric space, and prove
that a greedy forwarding algorithm can find very short paths
of length O((log log n)^2) for graphs with n nodes
Ramsey expansions of metrically homogeneous graphs
We discuss the Ramsey property, the existence of a stationary independence
relation and the coherent extension property for partial isometries (coherent
EPPA) for all classes of metrically homogeneous graphs from Cherlin's
catalogue, which is conjectured to include all such structures. We show that,
with the exception of tree-like graphs, all metric spaces in the catalogue have
precompact Ramsey expansions (or lifts) with the expansion property. With two
exceptions we can also characterise the existence of a stationary independence
relation and the coherent EPPA.
Our results can be seen as a new contribution to Ne\v{s}et\v{r}il's
classification programme of Ramsey classes and as empirical evidence of the
recent convergence in techniques employed to establish the Ramsey property, the
expansion (or lift or ordering) property, EPPA and the existence of a
stationary independence relation. At the heart of our proof is a canonical way
of completing edge-labelled graphs to metric spaces in Cherlin's classes. The
existence of such a "completion algorithm" then allows us to apply several
strong results in the areas that imply EPPA and respectively the Ramsey
property.
The main results have numerous corollaries on the automorphism groups of the
Fra\"iss\'e limits of the classes, such as amenability, unique ergodicity,
existence of universal minimal flows, ample generics, small index property,
21-Bergman property and Serre's property (FA).Comment: 57 pages, 14 figures. Extends results of arXiv:1706.00295. Minor
revisio
- …