3,040 research outputs found

    Submodular memetic approximation for multiobjective parallel test paper generation

    Get PDF
    Parallel test paper generation is a biobjective distributed resource optimization problem, which aims to generate multiple similarly optimal test papers automatically according to multiple user-specified assessment criteria. Generating high-quality parallel test papers is challenging due to its NP-hardness in both of the collective objective functions. In this paper, we propose a submodular memetic approximation algorithm for solving this problem. The proposed algorithm is an adaptive memetic algorithm (MA), which exploits the submodular property of the collective objective functions to design greedy-based approximation algorithms for enhancing steps of the multiobjective MA. Synergizing the intensification of submodular local search mechanism with the diversification of the population-based submodular crossover operator, our algorithm can jointly optimize the total quality maximization objective and the fairness quality maximization objective. Our MA can achieve provable near-optimal solutions in a huge search space of large datasets in efficient polynomial runtime. Performance results on various datasets have shown that our algorithm has drastically outperformed the current techniques in terms of paper quality and runtime efficiency

    Adversarially Robust Submodular Maximization under Knapsack Constraints

    Full text link
    We propose the first adversarially robust algorithm for monotone submodular maximization under single and multiple knapsack constraints with scalable implementations in distributed and streaming settings. For a single knapsack constraint, our algorithm outputs a robust summary of almost optimal (up to polylogarithmic factors) size, from which a constant-factor approximation to the optimal solution can be constructed. For multiple knapsack constraints, our approximation is within a constant-factor of the best known non-robust solution. We evaluate the performance of our algorithms by comparison to natural robustifications of existing non-robust algorithms under two objectives: 1) dominating set for large social network graphs from Facebook and Twitter collected by the Stanford Network Analysis Project (SNAP), 2) movie recommendations on a dataset from MovieLens. Experimental results show that our algorithms give the best objective for a majority of the inputs and show strong performance even compared to offline algorithms that are given the set of removals in advance.Comment: To appear in KDD 201

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover

    Full text link
    Stochastic Boolean Function Evaluation is the problem of determining the value of a given Boolean function f on an unknown input x, when each bit of x_i of x can only be determined by paying an associated cost c_i. The assumption is that x is drawn from a given product distribution, and the goal is to minimize the expected cost. This problem has been studied in Operations Research, where it is known as "sequential testing" of Boolean functions. It has also been studied in learning theory in the context of learning with attribute costs. We consider the general problem of developing approximation algorithms for Stochastic Boolean Function Evaluation. We give a 3-approximation algorithm for evaluating Boolean linear threshold formulas. We also present an approximation algorithm for evaluating CDNF formulas (and decision trees) achieving a factor of O(log kd), where k is the number of terms in the DNF formula, and d is the number of clauses in the CNF formula. In addition, we present approximation algorithms for simultaneous evaluation of linear threshold functions, and for ranking of linear functions. Our function evaluation algorithms are based on reductions to the Stochastic Submodular Set Cover (SSSC) problem. This problem was introduced by Golovin and Krause. They presented an approximation algorithm for the problem, called Adaptive Greedy. Our main technical contribution is a new approximation algorithm for the SSSC problem, which we call Adaptive Dual Greedy. It is an extension of the Dual Greedy algorithm for Submodular Set Cover due to Fujito, which is a generalization of Hochbaum's algorithm for the classical Set Cover Problem. We also give a new bound on the approximation achieved by the Adaptive Greedy algorithm of Golovin and Krause

    A Literature Survey of Cooperative Caching in Content Distribution Networks

    Full text link
    Content distribution networks (CDNs) which serve to deliver web objects (e.g., documents, applications, music and video, etc.) have seen tremendous growth since its emergence. To minimize the retrieving delay experienced by a user with a request for a web object, caching strategies are often applied - contents are replicated at edges of the network which is closer to the user such that the network distance between the user and the object is reduced. In this literature survey, evolution of caching is studied. A recent research paper [15] in the field of large-scale caching for CDN was chosen to be the anchor paper which serves as a guide to the topic. Research studies after and relevant to the anchor paper are also analyzed to better evaluate the statements and results of the anchor paper and more importantly, to obtain an unbiased view of the large scale collaborate caching systems as a whole.Comment: 5 pages, 5 figure
    corecore