3,040 research outputs found
Submodular memetic approximation for multiobjective parallel test paper generation
Parallel test paper generation is a biobjective distributed resource optimization problem, which aims to generate multiple similarly optimal test papers automatically according to multiple user-specified assessment criteria. Generating high-quality parallel test papers is challenging due to its NP-hardness in both of the collective objective functions. In this paper, we propose a submodular memetic approximation algorithm for solving this problem. The proposed algorithm is an adaptive memetic algorithm (MA), which exploits the submodular property of the collective objective functions to design greedy-based approximation algorithms for enhancing steps of the multiobjective MA. Synergizing the intensification of submodular local search mechanism with the diversification of the population-based submodular crossover operator, our algorithm can jointly optimize the total quality maximization objective and the fairness quality maximization objective. Our MA can achieve provable near-optimal solutions in a huge search space of large datasets in efficient polynomial runtime. Performance results on various datasets have shown that our algorithm has drastically outperformed the current techniques in terms of paper quality and runtime efficiency
Adversarially Robust Submodular Maximization under Knapsack Constraints
We propose the first adversarially robust algorithm for monotone submodular
maximization under single and multiple knapsack constraints with scalable
implementations in distributed and streaming settings. For a single knapsack
constraint, our algorithm outputs a robust summary of almost optimal (up to
polylogarithmic factors) size, from which a constant-factor approximation to
the optimal solution can be constructed. For multiple knapsack constraints, our
approximation is within a constant-factor of the best known non-robust
solution.
We evaluate the performance of our algorithms by comparison to natural
robustifications of existing non-robust algorithms under two objectives: 1)
dominating set for large social network graphs from Facebook and Twitter
collected by the Stanford Network Analysis Project (SNAP), 2) movie
recommendations on a dataset from MovieLens. Experimental results show that our
algorithms give the best objective for a majority of the inputs and show strong
performance even compared to offline algorithms that are given the set of
removals in advance.Comment: To appear in KDD 201
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover
Stochastic Boolean Function Evaluation is the problem of determining the
value of a given Boolean function f on an unknown input x, when each bit of x_i
of x can only be determined by paying an associated cost c_i. The assumption is
that x is drawn from a given product distribution, and the goal is to minimize
the expected cost. This problem has been studied in Operations Research, where
it is known as "sequential testing" of Boolean functions. It has also been
studied in learning theory in the context of learning with attribute costs. We
consider the general problem of developing approximation algorithms for
Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for
evaluating Boolean linear threshold formulas. We also present an approximation
algorithm for evaluating CDNF formulas (and decision trees) achieving a factor
of O(log kd), where k is the number of terms in the DNF formula, and d is the
number of clauses in the CNF formula. In addition, we present approximation
algorithms for simultaneous evaluation of linear threshold functions, and for
ranking of linear functions.
Our function evaluation algorithms are based on reductions to the Stochastic
Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and
Krause. They presented an approximation algorithm for the problem, called
Adaptive Greedy. Our main technical contribution is a new approximation
algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an
extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito,
which is a generalization of Hochbaum's algorithm for the classical Set Cover
Problem. We also give a new bound on the approximation achieved by the Adaptive
Greedy algorithm of Golovin and Krause
A Literature Survey of Cooperative Caching in Content Distribution Networks
Content distribution networks (CDNs) which serve to deliver web objects
(e.g., documents, applications, music and video, etc.) have seen tremendous
growth since its emergence. To minimize the retrieving delay experienced by a
user with a request for a web object, caching strategies are often applied -
contents are replicated at edges of the network which is closer to the user
such that the network distance between the user and the object is reduced. In
this literature survey, evolution of caching is studied. A recent research
paper [15] in the field of large-scale caching for CDN was chosen to be the
anchor paper which serves as a guide to the topic. Research studies after and
relevant to the anchor paper are also analyzed to better evaluate the
statements and results of the anchor paper and more importantly, to obtain an
unbiased view of the large scale collaborate caching systems as a whole.Comment: 5 pages, 5 figure
- …