58,490 research outputs found
An Overview of Backtrack Search Satisfiability Algorithms
Propositional Satisfiability (SAT) is often used as the underlying model for a significan
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Distributed Caching for Processing Raw Arrays
As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority - by as much as two orders of magnitude - of the proposed framework over existing techniques in terms of cache overhead and workload execution time
PasMoQAP: A Parallel Asynchronous Memetic Algorithm for solving the Multi-Objective Quadratic Assignment Problem
Multi-Objective Optimization Problems (MOPs) have attracted growing attention
during the last decades. Multi-Objective Evolutionary Algorithms (MOEAs) have
been extensively used to address MOPs because are able to approximate a set of
non-dominated high-quality solutions. The Multi-Objective Quadratic Assignment
Problem (mQAP) is a MOP. The mQAP is a generalization of the classical QAP
which has been extensively studied, and used in several real-life applications.
The mQAP is defined as having as input several flows between the facilities
which generate multiple cost functions that must be optimized simultaneously.
In this study, we propose PasMoQAP, a parallel asynchronous memetic algorithm
to solve the Multi-Objective Quadratic Assignment Problem. PasMoQAP is based on
an island model that structures the population by creating sub-populations. The
memetic algorithm on each island individually evolve a reduced population of
solutions, and they asynchronously cooperate by sending selected solutions to
the neighboring islands. The experimental results show that our approach
significatively outperforms all the island-based variants of the
multi-objective evolutionary algorithm NSGA-II. We show that PasMoQAP is a
suitable alternative to solve the Multi-Objective Quadratic Assignment Problem.Comment: 8 pages, 3 figures, 2 tables. Accepted at Conference on Evolutionary
Computation 2017 (CEC 2017
Distributed Caching for Complex Querying of Raw Arrays
As applications continue to generate multi-dimensional data at exponentially
increasing rates, fast analytics to extract meaningful results is becoming
extremely important. The database community has developed array databases that
alleviate this problem through a series of techniques. In-situ mechanisms
provide direct access to raw data in the original format---without loading and
partitioning. Parallel processing scales to the largest datasets. In-memory
caching reduces latency when the same data are accessed across a workload of
queries. However, we are not aware of any work on distributed caching of
multi-dimensional raw arrays. In this paper, we introduce a distributed
framework for cost-based caching of multi-dimensional arrays in native format.
Given a set of files that contain portions of an array and an online query
workload, the framework computes an effective caching plan in two stages.
First, the plan identifies the cells to be cached locally from each of the
input files by continuously refining an evolving R-tree index. In the second
stage, an optimal assignment of cells to nodes that collocates dependent cells
in order to minimize the overall data transfer is determined. We design cache
eviction and placement heuristic algorithms that consider the historical query
workload. A thorough experimental evaluation over two real datasets in three
file formats confirms the superiority -- by as much as two orders of magnitude
-- of the proposed framework over existing techniques in terms of cache
overhead and workload execution time
- …