Search CORE

58,490 research outputs found

An Overview of Backtrack Search Satisfiability Algorithms

Author: Lynce I.
Marques-Silva J. P.
Publication venue
Publication date: 01/01/2003
Field of study

Propositional Satisfiability (SAT) is often used as the underlying model for a significan

CiteSeerX

Southampton (e-Prints Soton)

Dynamic load balancing for the distributed mining of molecular structures

Author: Berthold M.R.
Di Fatta Giuseppe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids

KOPS - The Institutional Repository of the University of Konstanz

Central Archive at the University of Reading

Crossref

Distributed Caching for Processing Raw Arrays

Author: Abadi M.
Altinel M.
Cao P.
Cudre-Mauroux P.
Dar S.
Furtado P.
Han D.
Idreos S.
Idreos S.
Karpathiotakis M.
Kornacker M.
Lübbe C.
Nelson J.
Sarawagi S.
Sparks E.
Witkowski A.
Zhang Y.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/07/2018
Field of study

As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority - by as much as two orders of magnitude - of the proposed framework over existing techniques in terms of cache overhead and workload execution time

Crossref

Caltech Authors

PasMoQAP: A Parallel Asynchronous Memetic Algorithm for solving the Multi-Objective Quadratic Assignment Problem

Author: Berretta Regina
Jimenez Francia
Moscato Pablo
Sanhueza Claudio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Multi-Objective Optimization Problems (MOPs) have attracted growing attention during the last decades. Multi-Objective Evolutionary Algorithms (MOEAs) have been extensively used to address MOPs because are able to approximate a set of non-dominated high-quality solutions. The Multi-Objective Quadratic Assignment Problem (mQAP) is a MOP. The mQAP is a generalization of the classical QAP which has been extensively studied, and used in several real-life applications. The mQAP is defined as having as input several flows between the facilities which generate multiple cost functions that must be optimized simultaneously. In this study, we propose PasMoQAP, a parallel asynchronous memetic algorithm to solve the Multi-Objective Quadratic Assignment Problem. PasMoQAP is based on an island model that structures the population by creating sub-populations. The memetic algorithm on each island individually evolve a reduced population of solutions, and they asynchronously cooperate by sending selected solutions to the neighboring islands. The experimental results show that our approach significatively outperforms all the island-based variants of the multi-objective evolutionary algorithm NSGA-II. We show that PasMoQAP is a suitable alternative to solve the Multi-Objective Quadratic Assignment Problem.Comment: 8 pages, 3 figures, 2 tables. Accepted at Conference on Evolutionary Computation 2017 (CEC 2017

arXiv.org e-Print Archive

University of Newcastle's Digital Repository

Crossref

Distributed Caching for Complex Querying of Raw Arrays

Author: Dong Bin
Ho Anna Y. Q.
Nugent Peter
Rusu Florin
Wu Kesheng
Zhao Weijie
Publication venue
Publication date: 15/03/2018
Field of study

As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority -- by as much as two orders of magnitude -- of the proposed framework over existing techniques in terms of cache overhead and workload execution time

arXiv.org e-Print Archive

eScholarship - University of California