1,906 research outputs found
Community detection and stochastic block models: recent developments
The stochastic block model (SBM) is a random graph model with planted
clusters. It is widely employed as a canonical model to study clustering and
community detection, and provides generally a fertile ground to study the
statistical and computational tradeoffs that arise in network and data
sciences.
This note surveys the recent developments that establish the fundamental
limits for community detection in the SBM, both with respect to
information-theoretic and computational thresholds, and for various recovery
requirements such as exact, partial and weak recovery (a.k.a., detection). The
main results discussed are the phase transitions for exact recovery at the
Chernoff-Hellinger threshold, the phase transition for weak recovery at the
Kesten-Stigum threshold, the optimal distortion-SNR tradeoff for partial
recovery, the learning of the SBM parameters and the gap between
information-theoretic and computational thresholds.
The note also covers some of the algorithms developed in the quest of
achieving the limits, in particular two-round algorithms via graph-splitting,
semi-definite programming, linearized belief propagation, classical and
nonbacktracking spectral methods. A few open problems are also discussed
Factorised Representations of Query Results
Query tractability has been traditionally defined as a function of input
database and query sizes, or of both input and output sizes, where the query
result is represented as a bag of tuples. In this report, we introduce a
framework that allows to investigate tractability beyond this setting. The key
insight is that, although the cardinality of a query result can be exponential,
its structure can be very regular and thus factorisable into a nested
representation whose size is only polynomial in the size of both the input
database and query.
For a given query result, there may be several equivalent representations,
and we quantify the regularity of the result by its readability, which is the
minimum over all its representations of the maximum number of occurrences of
any tuple in that representation. We give a characterisation of
select-project-join queries based on the bounds on readability of their results
for any input database. We complement it with an algorithm that can find
asymptotically optimal upper bounds and corresponding factorised
representations.Comment: 44 pages, 13 figure
Pipelining Saturated Accumulation
Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, and FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280 MHz on a Xilinx Spartan-3(XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM
Distinguishing Infections on Different Graph Topologies
The history of infections and epidemics holds famous examples where
understanding, containing and ultimately treating an outbreak began with
understanding its mode of spread. Influenza, HIV and most computer viruses,
spread person to person, device to device, through contact networks; Cholera,
Cancer, and seasonal allergies, on the other hand, do not. In this paper we
study two fundamental questions of detection: first, given a snapshot view of a
(perhaps vanishingly small) fraction of those infected, under what conditions
is an epidemic spreading via contact (e.g., Influenza), distinguishable from a
"random illness" operating independently of any contact network (e.g., seasonal
allergies); second, if we do have an epidemic, under what conditions is it
possible to determine which network of interactions is the main cause of the
spread -- the causative network -- without any knowledge of the epidemic, other
than the identity of a minuscule subsample of infected nodes?
The core, therefore, of this paper, is to obtain an understanding of the
diagnostic power of network information. We derive sufficient conditions
networks must satisfy for these problems to be identifiable, and produce
efficient, highly scalable algorithms that solve these problems. We show that
the identifiability condition we give is fairly mild, and in particular, is
satisfied by two common graph topologies: the grid, and the Erdos-Renyi graphs
Selecting Near-Optimal Learners via Incremental Data Allocation
We study a novel machine learning (ML) problem setting of sequentially
allocating small subsets of training data amongst a large set of classifiers.
The goal is to select a classifier that will give near-optimal accuracy when
trained on all data, while also minimizing the cost of misallocated samples.
This is motivated by large modern datasets and ML toolkits with many
combinations of learning algorithms and hyper-parameters. Inspired by the
principle of "optimism under uncertainty," we propose an innovative strategy,
Data Allocation using Upper Bounds (DAUB), which robustly achieves these
objectives across a variety of real-world datasets.
We further develop substantial theoretical support for DAUB in an idealized
setting where the expected accuracy of a classifier trained on samples can
be known exactly. Under these conditions we establish a rigorous sub-linear
bound on the regret of the approach (in terms of misallocated data), as well as
a rigorous bound on suboptimality of the selected classifier. Our accuracy
estimates using real-world datasets only entail mild violations of the
theoretical scenario, suggesting that the practical behavior of DAUB is likely
to approach the idealized behavior.Comment: AAAI-2016: The Thirtieth AAAI Conference on Artificial Intelligenc
- …