57,720 research outputs found
LASSO: Listing All Subset Sums Obediently for Evaluating Unbounded Subset Sums
In this study we present a novel algorithm, LASSO, for solving the unbounded and bounded subset sum problem. The LASSO algorithm was designed to solve the unbounded SSP quickly and to return all subsets summing to a target sum. As speed was the highest priority, we benchmarked the run time performance of LASSO against implementations of some common approaches to the bounded SSP, as well as the only comparable implementation for solving the unbounded SSP that we could find. In solving the bounded SSP, our algorithm had a significantly faster run time than the competing algorithms when the target sum returned at least one subset. When the target returned no subsets, LASSO had a poorer run time growth rate than the competing algorithms solving bounded subset sum. For solving the USSP LASSO was significantly faster than the only comparable algorithm for this problem, both in run time and run time growth rate
Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation
We introduce and study a new data sketch for processing massive datasets. It
addresses two common problems: 1) computing a sum given arbitrary filter
conditions and 2) identifying the frequent items or heavy hitters in a data
set. For the former, the sketch provides unbiased estimates with state of the
art accuracy. It handles the challenging scenario when the data is
disaggregated so that computing the per unit metric of interest requires an
expensive aggregation. For example, the metric of interest may be total clicks
per user while the raw data is a click stream with multiple rows per user. Thus
the sketch is suitable for use in a wide range of applications including
computing historical click through rates for ad prediction, reporting user
metrics from event streams, and measuring network traffic for IP flows.
We prove and empirically show the sketch has good properties for both the
disaggregated subset sum estimation and frequent item problems. On i.i.d. data,
it not only picks out the frequent items but gives strongly consistent
estimates for the proportion of each frequent item. The resulting sketch
asymptotically draws a probability proportional to size sample that is optimal
for estimating sums over the data. For non i.i.d. data, we show that it
typically does much better than random sampling for the frequent item problem
and never does worse. For subset sum estimation, we show that even for
pathological sequences, the variance is close to that of an optimal sampling
design. Empirically, despite the disadvantage of operating on disaggregated
data, our method matches or bests priority sampling, a state of the art method
for pre-aggregated data and performs orders of magnitude better on skewed data
compared to uniform sampling. We propose extensions to the sketch that allow it
to be used in combining multiple data sets, in distributed systems, and for
time decayed aggregation
Exact Algorithms for 0-1 Integer Programs with Linear Equality Constraints
In this paper, we show -time and -space exact
algorithms for 0-1 integer programs where constraints are linear equalities and
coefficients are arbitrary real numbers. Our algorithms are quadratically
faster than exhaustive search and almost quadratically faster than an algorithm
for an inequality version of the problem by Impagliazzo, Lovett, Paturi and
Schneider (arXiv:1401.5512), which motivated our work. Rather than improving
the time and space complexity, we advance to a simple direction as inclusion of
many NP-hard problems in terms of exact exponential algorithms. Specifically,
we extend our algorithms to linear optimization problems
Anonymous Networking amidst Eavesdroppers
The problem of security against timing based traffic analysis in wireless
networks is considered in this work. An analytical measure of anonymity in
eavesdropped networks is proposed using the information theoretic concept of
equivocation. For a physical layer with orthogonal transmitter directed
signaling, scheduling and relaying techniques are designed to maximize
achievable network performance for any given level of anonymity. The network
performance is measured by the achievable relay rates from the sources to
destinations under latency and medium access constraints. In particular,
analytical results are presented for two scenarios:
For a two-hop network with maximum anonymity, achievable rate regions for a
general m x 1 relay are characterized when nodes generate independent Poisson
transmission schedules. The rate regions are presented for both strict and
average delay constraints on traffic flow through the relay.
For a multihop network with an arbitrary anonymity requirement, the problem
of maximizing the sum-rate of flows (network throughput) is considered. A
selective independent scheduling strategy is designed for this purpose, and
using the analytical results for the two-hop network, the achievable throughput
is characterized as a function of the anonymity level. The throughput-anonymity
relation for the proposed strategy is shown to be equivalent to an information
theoretic rate-distortion function
- …