21,380 research outputs found
Fundamentals of Large Sensor Networks: Connectivity, Capacity, Clocks and Computation
Sensor networks potentially feature large numbers of nodes that can sense
their environment over time, communicate with each other over a wireless
network, and process information. They differ from data networks in that the
network as a whole may be designed for a specific application. We study the
theoretical foundations of such large scale sensor networks, addressing four
fundamental issues- connectivity, capacity, clocks and function computation.
To begin with, a sensor network must be connected so that information can
indeed be exchanged between nodes. The connectivity graph of an ad-hoc network
is modeled as a random graph and the critical range for asymptotic connectivity
is determined, as well as the critical number of neighbors that a node needs to
connect to. Next, given connectivity, we address the issue of how much data can
be transported over the sensor network. We present fundamental bounds on
capacity under several models, as well as architectural implications for how
wireless communication should be organized.
Temporal information is important both for the applications of sensor
networks as well as their operation.We present fundamental bounds on the
synchronizability of clocks in networks, and also present and analyze
algorithms for clock synchronization. Finally we turn to the issue of gathering
relevant information, that sensor networks are designed to do. One needs to
study optimal strategies for in-network aggregation of data, in order to
reliably compute a composite function of sensor measurements, as well as the
complexity of doing so. We address the issue of how such computation can be
performed efficiently in a sensor network and the algorithms for doing so, for
some classes of functions.Comment: 10 pages, 3 figures, Submitted to the Proceedings of the IEE
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
Circuits with arbitrary gates for random operators
We consider boolean circuits computing n-operators f:{0,1}^n --> {0,1}^n. As
gates we allow arbitrary boolean functions; neither fanin nor fanout of gates
is restricted. An operator is linear if it computes n linear forms, that is,
computes a matrix-vector product y=Ax over GF(2). We prove the existence of
n-operators requiring about n^2 wires in any circuit, and linear n-operators
requiring about n^2/\log n wires in depth-2 circuits, if either all output
gates or all gates on the middle layer are linear.Comment: 7 page
Computational barriers in minimax submatrix detection
This paper studies the minimax detection of a small submatrix of elevated
mean in a large matrix contaminated by additive Gaussian noise. To investigate
the tradeoff between statistical performance and computational cost from a
complexity-theoretic perspective, we consider a sequence of discretized models
which are asymptotically equivalent to the Gaussian model. Under the hypothesis
that the planted clique detection problem cannot be solved in randomized
polynomial time when the clique size is of smaller order than the square root
of the graph size, the following phase transition phenomenon is established:
when the size of the large matrix , if the submatrix size
for any , computational complexity
constraints can incur a severe penalty on the statistical performance in the
sense that any randomized polynomial-time test is minimax suboptimal by a
polynomial factor in ; if for any
, minimax optimal detection can be attained within
constant factors in linear time. Using Schatten norm loss as a representative
example, we show that the hardness of attaining the minimax estimation rate can
crucially depend on the loss function. Implications on the hardness of support
recovery are also obtained.Comment: Published at http://dx.doi.org/10.1214/14-AOS1300 in the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Rank Minimization over Finite Fields: Fundamental Limits and Coding-Theoretic Interpretations
This paper establishes information-theoretic limits in estimating a finite
field low-rank matrix given random linear measurements of it. These linear
measurements are obtained by taking inner products of the low-rank matrix with
random sensing matrices. Necessary and sufficient conditions on the number of
measurements required are provided. It is shown that these conditions are sharp
and the minimum-rank decoder is asymptotically optimal. The reliability
function of this decoder is also derived by appealing to de Caen's lower bound
on the probability of a union. The sufficient condition also holds when the
sensing matrices are sparse - a scenario that may be amenable to efficient
decoding. More precisely, it is shown that if the n\times n-sensing matrices
contain, on average, \Omega(nlog n) entries, the number of measurements
required is the same as that when the sensing matrices are dense and contain
entries drawn uniformly at random from the field. Analogies are drawn between
the above results and rank-metric codes in the coding theory literature. In
fact, we are also strongly motivated by understanding when minimum rank
distance decoding of random rank-metric codes succeeds. To this end, we derive
distance properties of equiprobable and sparse rank-metric codes. These
distance properties provide a precise geometric interpretation of the fact that
the sparse ensemble requires as few measurements as the dense one. Finally, we
provide a non-exhaustive procedure to search for the unknown low-rank matrix.Comment: Accepted to the IEEE Transactions on Information Theory; Presented at
IEEE International Symposium on Information Theory (ISIT) 201
Towards a complexity theory for the congested clique
The congested clique model of distributed computing has been receiving
attention as a model for densely connected distributed systems. While there has
been significant progress on the side of upper bounds, we have very little in
terms of lower bounds for the congested clique; indeed, it is now know that
proving explicit congested clique lower bounds is as difficult as proving
circuit lower bounds.
In this work, we use various more traditional complexity-theoretic tools to
build a clearer picture of the complexity landscape of the congested clique:
-- Nondeterminism and beyond: We introduce the nondeterministic congested
clique model (analogous to NP) and show that there is a natural canonical
problem family that captures all problems solvable in constant time with
nondeterministic algorithms. We further generalise these notions by introducing
the constant-round decision hierarchy (analogous to the polynomial hierarchy).
-- Non-constructive lower bounds: We lift the prior non-uniform counting
arguments to a general technique for proving non-constructive uniform lower
bounds for the congested clique. In particular, we prove a time hierarchy
theorem for the congested clique, showing that there are decision problems of
essentially all complexities, both in the deterministic and nondeterministic
settings.
-- Fine-grained complexity: We map out relationships between various natural
problems in the congested clique model, arguing that a reduction-based
complexity theory currently gives us a fairly good picture of the complexity
landscape of the congested clique
- …