Search CORE

19 research outputs found

The minimum-entropy set cover problem

Author: Halperin Eran
Karp Richard M.
Publication venue: Published by Elsevier B.V.
Publication date: 08/12/2005
Field of study

AbstractWe consider the minimum entropy principle for learning data generated by a random source and observed with random noise.In our setting we have a sequence of observations of objects drawn uniformly at random from a population. Each object in the population belongs to one class. We perform an observation for each object which determines that it belongs to one of a given set of classes. Given these observations, we are interested in assigning the most likely class to each of the objects.This scenario is a very natural one that appears in many real life situations. We show that under reasonable assumptions finding the most likely assignment is equivalent to the following variant of the set cover problem. Given a universe U and a collection S=(S1,…,St) of subsets of U, we wish to find an assignment f:U→S such that u∈f(u) and the entropy of the distribution defined by the values |f-1(Si)| is minimized.We show that this problem is NP-hard and that the greedy algorithm for set cover s with an additive constant error with respect to the optimal cover. This sheds a new light on the behavior of the greedy set cover algorithm. We further enhance the greedy algorithm and show that the problem admits a polynomial time approximation scheme (PTAS).Finally, we demonstrate how this model and the greedy algorithm can be useful in real life scenarios, and in particular, in problems arising naturally in computational biology

Elsevier - Publisher Connector

Minimum Entropy Orientations

Author: Cardinal
Cardinal
Cover
Feige
Fukunaga
Groenevelt
Gwenaël Joret
Halperin
Hardy
Hochbaum
Jean Cardinal
Li
Moore
Moriguchi
Samuel Fiorini
Schrijver
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

We study graph orientations that minimize the entropy of the in-degree sequence. The problem of finding such an orientation is an interesting special case of the minimum entropy set cover problem previously studied by Halperin and Karp [Theoret. Comput. Sci., 2005] and by the current authors [Algorithmica, to appear]. We prove that the minimum entropy orientation problem is NP-hard even if the graph is planar, and that there exists a simple linear-time algorithm that returns an approximate solution with an additive error guarantee of 1 bit. This improves on the only previously known algorithm which has an additive error guarantee of log_2 e bits (approx. 1.4427 bits).Comment: Referees' comments incorporate

arXiv.org e-Print Archive

Crossref

DI-fusion

Schema-agnostic entity retrieval in highly heterogeneous semi-structured environments

Author: Gaugaz Julien
Publication venue: Hannover : Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2015
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Set covering with our eyes closed

Author: Grandoni F
Gupta A
Leonardi S
Miettinen P
Sankowski P
Singh M
Publication venue
Publication date: 01/01/2013
Field of study

Given a universe

U

n

elements and a weighted collection

\mathscr{S}

m

subsets of

U

, the universal set cover problem is to a priori map each element

u \in U

to a set

S(u) \in \mathscr{S}

containing

u

such that any set

X{\subseteq U}

is covered by S(X)=\cup_{u\in XS(u). The aim is to find a mapping such that the cost of

S(X)

is as close as possible to the optimal set cover cost for

X

. (Such problems are also called oblivious or a priori optimization problems.) Unfortunately, for every universal mapping, the cost of

S(X)

can be

\Omega(\sqrt{n})

times larger than optimal if the set

X

is adversarially chosen. In this paper we study the performance on average, when

X

is a set of randomly chosen elements from the universe: we show how to efficiently find a universal map whose expected cost is

O(\log mn)

times the expected optimal cost. In fact, we give a slightly improved analysis and show that this is the best possible. We generalize these ideas to weighted set cover and show similar guarantees to (nonmetric) facility location, where we have to balance the facility opening cost with the cost of connecting clients to the facilities. We show applications of our results to universal multicut and disc-covering problems and show how all these universal mappings give us algorithms for the stochastic online variants of the problems with the same competitive factors

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

MPG.PuRe

Brotli: A General-Purpose Data Compressor

Author: Andrea Farruggia
Eugene Kliuchnikov
Jyrki Alakuijala
Lode Vandevenne
Paolo Ferragina
Robert Obryk
Zoltan Szabadka
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Brotli is an open source general-purpose data compressor introduced by Google in late 2013 and now adopted in most known browsers and Web servers. It is publicly available on GitHub and its data format was submitted as RFC 7932 in July 2016. Brotli is based on the Lempel-Ziv compression scheme and planned as a generic replacement of Gzip and ZLib. The main goal in its design was to compress data on the Internet, which meant optimizing the resources used at decoding time, while achieving maximal compression density. This article is intended to provide the first thorough, systematic description of the Brotli format as well as a detailed computational and experimental analysis of the main algorithmic blocks underlying the current encoder implementation, together with a comparison against compressors of different families constituting the state-of-the-art either in practice or in theory. This treatment will allow us to raise a set of new algorithmic and software engineering problems that deserve further attention from the scientific community

Archivio della Ricerca - Università di Pisa

Archivio della ricerca della Scuola Superiore Sant'Anna

Algorithms for Viral Population Analysis

Author: Mancuso Nicholas
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/08/2014
Field of study

The genetic structure of an intra-host viral population has an effect on many clinically important phenotypic traits such as escape from vaccine induced immunity, virulence, and response to antiviral therapies. Next-generation sequencing provides read-coverage sufficient for genomic reconstruction of a heterogeneous, yet highly similar, viral population; and more specifically, for the detection of rare variants. Admittedly, while depth is less of an issue for modern sequencers, the short length of generated reads complicates viral population assembly. This task is worsened by the presence of both random and systematic sequencing errors in huge amounts of data. In this dissertation I present completed work for reconstructing a viral population given next-generation sequencing data. Several algorithms are described for solving this problem under the error-free amplicon (or sliding-window) model. In order for these methods to handle actual real-world data, an error-correction method is proposed. A formal derivation of its likelihood model along with optimization steps for an EM algorithm are presented. Although these methods perform well, they cannot take into account paired-end sequencing data. In order to address this, a new method is detailed that works under the error-free paired-end case along with maximum a-posteriori estimation of the model parameters

ScholarWorks @ Georgia State University

Set Covering with Our Eyes Closed

Author: Anagnostopoulos A.
Anupam Gupta
Fabrizio Grandoni
Garg N.
Hajiaghayi M. T.
Immorlica N.
Mohit Singh
Pauli Miettinen
Piotr Sankowski
Srinivasan A.
Stefano Leonardi
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref