Search CORE

5 research outputs found

Correlated Stochastic Knapsack with a Submodular Objective

Author: Choudhary Sunav
Khuller Samir
Mahadik Kanak
Mitra Subrata
Yang Sheng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

We study the correlated stochastic knapsack problem of a submodular target function, with optional additional constraints. We utilize the multilinear extension of submodular function, and bundle it with an adaptation of the relaxed linear constraints from Ma [Mathematics of Operations Research, Volume 43(3), 2018] on correlated stochastic knapsack problem. The relaxation is then solved by the stochastic continuous greedy algorithm, and rounded by a novel method to fit the contention resolution scheme (Feldman et al. [FOCS 2011]). We obtain a pseudo-polynomial time (1 - 1/?e)/2 ? 0.1967 approximation algorithm with or without those additional constraints, eliminating the need of a key assumption and improving on the (1 - 1/?e)/2 ? 0.1106 approximation by Fukunaga et al. [AAAI 2019]

Dagstuhl Research Online Publication Server

Flexible resource allocation for reliable virtual cluster computing systems

Author: Mahadik Kanak V
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2012
Field of study

Virtualization and cloud computing technologies now make it possible to create scalable and reliable virtual high performance computing clusters. Integrating these technologies, however, is complicated by fundamental and inherent differences in the way in which these systems allocate resources to computational tasks. Cloud computing systems immediately allocate available resources or deny requests. In contrast, parallel computing systems route all requests through a queue for future resource allocation. This divergence of allocation polices hinders efforts to implement efficient, responsive, and reliable virtual clusters. Hence, the work develops a continuum of four scheduling polices along with an analytical resource prediction model for each policy to estimate the level of resources needed to provide a predictable grade of service for a realistic high performance computing workload and estimate the queue wait time for a partial or full resource allocation. To determine the performance of the models in a real system, they are simulated using the Haizea simulator. The models and results are useful for cloud computing providers seeking to operate efficient and cost-effective virtual cluster systems

Purdue E-Pubs

Techniques for Scaling Computational Genomics Applications

Author: Mahadik Kanak V
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2017
Field of study

A revolution in personalized genomics will occur when scientists can sequence genomes of millions of people cost effectively and conclusively understand how genes influence diseases, and develop better drugs and treatments. The announcement by Illumina on sequencing a human genome for $1000 is a stellar attempt to solve the first part of the puzzle. However, to provide genetic treatments for diseases such as breast cancer, cystic fibrosis, Huntington’s disease, and others requires us to develop tools that can quickly analyze biological sequences and understand their structural and functional properties. Currently, tools are designed in an ad hoc manner, and require extensive programmer effort to develop and optimize them. Existing tools also show poor scalability for the exponentially increasing genomic data generated from continuously enhancing sequencing technologies. In this dissertation, we have taken a holistic approach to enhance the performance and scalability of genomic applications handling large volumes of data. This approach comprises of techniques at three levels - algorithm, compiler, and data structure. At the algorithm level, we identify opportunities for exploiting parallelism and efficient methods of data distribution. Our technique Orion exploits fine-grained parallelism to scale for long genomic sequences and achieves superior performance and better load balance than state-of-the-art distributed genomic sequence matching tools. ScalaDBG transforms the sequential and computationally intensive process of iterative de Bruijn graph construction to a parallel one. At the compiler level, we develop a domain-specific language, called SARVAVID. SARVAVID provides commonly occurring modules in genomics applications as high-level language constructs and performs domain-specific optimizations well beyond the scope of libraries and generic compilers. At the data structure level, we identify opportunities to exploit cache locality and software prefetching for enhancing the performance of indexing structures in genomic applications. We apply our approach to the major classes of genomic applications and demonstrate the benefits with relevant genomic datasets

Purdue E-Pubs