11 research outputs found
Highly Scalable Algorithms for Robust String Barcoding
String barcoding is a recently introduced technique for genomic-based
identification of microorganisms. In this paper we describe the engineering of
highly scalable algorithms for robust string barcoding. Our methods enable
distinguisher selection based on whole genomic sequences of hundreds of
microorganisms of up to bacterial size on a well-equipped workstation, and can
be easily parallelized to further extend the applicability range to thousands
of bacterial size genomes. Experimental results on both randomly generated and
NCBI genomic data show that whole-genome based selection results in a number of
distinguishers nearly matching the information theoretic lower bounds for the
problem
A Kernelisation Approach for Multiple d-Hitting Set and Its Application in Optimal Multi-Drug Therapeutic Combinations
Therapies consisting of a combination of agents are an attractive proposition,
especially in the context of diseases such as cancer, which can manifest with a
variety of tumor types in a single case. However uncovering usable drug
combinations is expensive both financially and temporally. By employing
computational methods to identify candidate combinations with a greater
likelihood of success we can avoid these problems, even when the amount of data
is prohibitively large. Hitting Set is a combinatorial problem
that has useful application across many fields, however as it is
NP-complete it is traditionally considered hard to solve
exactly. We introduce a more general version of the problem
(α,β,d)-Hitting Set,
which allows more precise control over how and what the hitting set targets.
Employing the framework of Parameterized Complexity we show that despite being
NP-complete, the
(α,β,d)-Hitting Set
problem is fixed-parameter tractable with a kernel of size O(αdkd) when we parameterize by the size k of the
hitting set and the maximum number α of the minimum number of hits,
and taking the maximum degree d of the target sets as a
constant. We demonstrate the application of this problem to multiple drug
selection for cancer therapy, showing the flexibility of the problem in
tailoring such drug sets. The fixed-parameter tractability result indicates that
for low values of the parameters the problem can be solved quickly using exact
methods. We also demonstrate that the problem is indeed practical, with
computation times on the order of 5 seconds, as compared to previous Hitting Set
applications using the same dataset which exhibited times on the order of 1 day,
even with relatively relaxed notions for what constitutes a low value for the
parameters. Furthermore the existence of a kernelization for
(α,β,d)-Hitting Set
indicates that the problem is readily scalable to large datasets
Bucket Game with Applications to Set Multicover and Dynamic Page Migration
We present a simple two-person Bucket Game, based on throwing balls into buckets, and we discuss possible players’ strategies. We use these strategies to create an approximation algorithm for a generalization of the well known Set Cover problem, where we need to cover each element by at least k sets. Furthermore, we apply these strategies to construct a randomized algorithm for Dynamic Page Migration problem achieving the optimal competitive ratio against an oblivious adversary
Multiwinner Voting with Fairness Constraints
Multiwinner voting rules are used to select a small representative subset of
candidates or items from a larger set given the preferences of voters. However,
if candidates have sensitive attributes such as gender or ethnicity (when
selecting a committee), or specified types such as political leaning (when
selecting a subset of news items), an algorithm that chooses a subset by
optimizing a multiwinner voting rule may be unbalanced in its selection -- it
may under or over represent a particular gender or political orientation in the
examples above. We introduce an algorithmic framework for multiwinner voting
problems when there is an additional requirement that the selected subset
should be "fair" with respect to a given set of attributes. Our framework
provides the flexibility to (1) specify fairness with respect to multiple,
non-disjoint attributes (e.g., ethnicity and gender) and (2) specify a score
function. We study the computational complexity of this constrained multiwinner
voting problem for monotone and submodular score functions and present several
approximation algorithms and matching hardness of approximation results for
various attribute group structure and types of score functions. We also present
simulations that suggest that adding fairness constraints may not affect the
scores significantly when compared to the unconstrained case.Comment: The conference version of this paper appears in IJCAI-ECAI 201
Energy Proportionality and Performance in Data Parallel Computing Clusters
Energy consumption in datacenters has recently become a major concern due to the rising operational costs andscalability issues. Recent solutions to this problem propose the principle of energy proportionality, i.e., the amount of energy consumedby the server nodes must be proportional to the amount of work performed. For data parallelism and fault tolerancepurposes, most common file systems used in MapReduce-type clusters maintain a set of replicas for each data block. A coveringset is a group of nodes that together contain at least one replica of the data blocks needed for performing computing tasks. In thiswork, we develop and analyze algorithms to maintain energy proportionality by discovering a covering set that minimizesenergy consumption while placing the remaining nodes in lowpower standby mode. Our algorithms can also discover coveringsets in heterogeneous computing environments. In order to allow more data parallelism, we generalize our algorithms so that itcan discover k-covering sets, i.e., a set of nodes that contain at least k replicas of the data blocks. Our experimental results showthat we can achieve substantial energy saving without significant performance loss in diverse cluster configurations and workingenvironments