5,395 research outputs found
On the Complexity of Mining Itemsets from the Crowd Using Taxonomies
We study the problem of frequent itemset mining in domains where data is not
recorded in a conventional database but only exists in human knowledge. We
provide examples of such scenarios, and present a crowdsourcing model for them.
The model uses the crowd as an oracle to find out whether an itemset is
frequent or not, and relies on a known taxonomy of the item domain to guide the
search for frequent itemsets. In the spirit of data mining with oracles, we
analyze the complexity of this problem in terms of (i) crowd complexity, that
measures the number of crowd questions required to identify the frequent
itemsets; and (ii) computational complexity, that measures the computational
effort required to choose the questions. We provide lower and upper complexity
bounds in terms of the size and structure of the input taxonomy, as well as the
size of a concise description of the output itemsets. We also provide
constructive algorithms that achieve the upper bounds, and consider more
efficient variants for practical situations.Comment: 18 pages, 2 figures. To be published to ICDT'13. Added missing
acknowledgemen
System Design for a Long-Line Quantum Repeater
We present a new control algorithm and system design for a network of quantum
repeaters, and outline the end-to-end protocol architecture. Such a network
will create long-distance quantum states, supporting quantum key distribution
as well as distributed quantum computation. Quantum repeaters improve the
reduction of quantum-communication throughput with distance from exponential to
polynomial. Because a quantum state cannot be copied, a quantum repeater is not
a signal amplifier, but rather executes algorithms for quantum teleportation in
conjunction with a specialized type of quantum error correction called
purification to raise the fidelity of the quantum states. We introduce our
banded purification scheme, which is especially effective when the fidelity of
coupled qubits is low, improving the prospects for experimental realization of
such systems. The resulting throughput is calculated via detailed simulations
of a long line composed of shorter hops. Our algorithmic improvements increase
throughput by a factor of up to fifty compared to earlier approaches, for a
broad range of physical characteristics.Comment: 12 pages, 13 figures. v2 includes one new graph, modest corrections
to some others, and significantly improved presentation. to appear in
IEEE/ACM Transactions on Networkin
goSLP: Globally Optimized Superword Level Parallelism Framework
Modern microprocessors are equipped with single instruction multiple data
(SIMD) or vector instruction sets which allow compilers to exploit superword
level parallelism (SLP), a type of fine-grained parallelism. Current SLP
auto-vectorization techniques use heuristics to discover vectorization
opportunities in high-level language code. These heuristics are fragile, local
and typically only present one vectorization strategy that is either accepted
or rejected by a cost model. We present goSLP, a novel SLP auto-vectorization
framework which solves the statement packing problem in a pairwise optimal
manner. Using an integer linear programming (ILP) solver, goSLP searches the
entire space of statement packing opportunities for a whole function at a time,
while limiting total compilation time to a few minutes. Furthermore, goSLP
optimally solves the vector permutation selection problem using dynamic
programming. We implemented goSLP in the LLVM compiler infrastructure,
achieving a geometric mean speedup of 7.58% on SPEC2017fp, 2.42% on SPEC2006fp
and 4.07% on NAS benchmarks compared to LLVM's existing SLP auto-vectorizer.Comment: Published at OOPSLA 201
Minimum-Cost Coverage of Point Sets by Disks
We consider a class of geometric facility location problems in which the goal
is to determine a set X of disks given by their centers (t_j) and radii (r_j)
that cover a given set of demand points Y in the plane at the smallest possible
cost. We consider cost functions of the form sum_j f(r_j), where f(r)=r^alpha
is the cost of transmission to radius r. Special cases arise for alpha=1 (sum
of radii) and alpha=2 (total area); power consumption models in wireless
network design often use an exponent alpha>2. Different scenarios arise
according to possible restrictions on the transmission centers t_j, which may
be constrained to belong to a given discrete set or to lie on a line, etc. We
obtain several new results, including (a) exact and approximation algorithms
for selecting transmission points t_j on a given line in order to cover demand
points Y in the plane; (b) approximation algorithms (and an algebraic
intractability result) for selecting an optimal line on which to place
transmission points to cover Y; (c) a proof of NP-hardness for a discrete set
of transmission points in the plane and any fixed alpha>1; and (d) a
polynomial-time approximation scheme for the problem of computing a minimum
cost covering tour (MCCT), in which the total cost is a linear combination of
the transmission cost for the set of disks and the length of a tour/path that
connects the centers of the disks.Comment: 10 pages, 4 figures, Latex, to appear in ACM Symposium on
Computational Geometry 200
- …