1,602 research outputs found
Learning from networked examples
Many machine learning algorithms are based on the assumption that training
examples are drawn independently. However, this assumption does not hold
anymore when learning from a networked sample because two or more training
examples may share some common objects, and hence share the features of these
shared objects. We show that the classic approach of ignoring this problem
potentially can have a harmful effect on the accuracy of statistics, and then
consider alternatives. One of these is to only use independent examples,
discarding other information. However, this is clearly suboptimal. We analyze
sample error bounds in this networked setting, providing significantly improved
results. An important component of our approach is formed by efficient sample
weighting schemes, which leads to novel concentration inequalities
Algorithms to Approximate Column-Sparse Packing Problems
Column-sparse packing problems arise in several contexts in both
deterministic and stochastic discrete optimization. We present two unifying
ideas, (non-uniform) attenuation and multiple-chance algorithms, to obtain
improved approximation algorithms for some well-known families of such
problems. As three main examples, we attain the integrality gap, up to
lower-order terms, for known LP relaxations for k-column sparse packing integer
programs (Bansal et al., Theory of Computing, 2012) and stochastic k-set
packing (Bansal et al., Algorithmica, 2012), and go "half the remaining
distance" to optimal for a major integrality-gap conjecture of Furedi, Kahn and
Seymour on hypergraph matching (Combinatorica, 1993).Comment: Extended abstract appeared in SODA 2018. Full version in ACM
Transactions of Algorithm
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Nonnegative k-sums, fractional covers, and probability of small deviations
More than twenty years ago, Manickam, Mikl\'{o}s, and Singhi conjectured that
for any integers satisfying , every set of real numbers
with nonnegative sum has at least -element subsets whose
sum is also nonnegative. In this paper we discuss the connection of this
problem with matchings and fractional covers of hypergraphs, and with the
question of estimating the probability that the sum of nonnegative independent
random variables exceeds its expectation by a given amount. Using these
connections together with some probabilistic techniques, we verify the
conjecture for . This substantially improves the best previously
known exponential lower bound . In addition we prove
a tight stability result showing that for every and all sufficiently large
, every set of reals with a nonnegative sum that does not contain a
member whose sum with any other members is nonnegative, contains at least
subsets of cardinality with
nonnegative sum.Comment: 15 pages, a section of Hilton-Milner type result adde
Asymmetric Lee Distance Codes for DNA-Based Storage
We consider a new family of codes, termed asymmetric Lee distance codes, that
arise in the design and implementation of DNA-based storage systems and systems
with parallel string transmission protocols. The codewords are defined over a
quaternary alphabet, although the results carry over to other alphabet sizes;
furthermore, symbol confusability is dictated by their underlying binary
representation. Our contributions are two-fold. First, we demonstrate that the
new distance represents a linear combination of the Lee and Hamming distance
and derive upper bounds on the size of the codes under this metric based on
linear programming techniques. Second, we propose a number of code
constructions which imply lower bounds
On complexity of optimized crossover for binary representations
We consider the computational complexity of producing the best possible
offspring in a crossover, given two solutions of the parents. The crossover
operators are studied on the class of Boolean linear programming problems,
where the Boolean vector of variables is used as the solution representation.
By means of efficient reductions of the optimized gene transmitting crossover
problems (OGTC) we show the polynomial solvability of the OGTC for the maximum
weight set packing problem, the minimum weight set partition problem and for
one of the versions of the simple plant location problem. We study a connection
between the OGTC for linear Boolean programming problem and the maximum weight
independent set problem on 2-colorable hypergraph and prove the NP-hardness of
several special cases of the OGTC problem in Boolean linear programming.Comment: Dagstuhl Seminar 06061 "Theory of Evolutionary Algorithms", 200
- …