22,726 research outputs found
Drake: An Efficient Executive for Temporal Plans with Choice
This work presents Drake, a dynamic executive for temporal plans with choice.
Dynamic plan execution strategies allow an autonomous agent to react quickly to
unfolding events, improving the robustness of the agent. Prior work developed
methods for dynamically dispatching Simple Temporal Networks, and further
research enriched the expressiveness of the plans executives could handle,
including discrete choices, which are the focus of this work. However, in some
approaches to date, these additional choices induce significant storage or
latency requirements to make flexible execution possible.
Drake is designed to leverage the low latency made possible by a
preprocessing step called compilation, while avoiding high memory costs through
a compact representation. We leverage the concepts of labels and environments,
taken from prior work in Assumption-based Truth Maintenance Systems (ATMS), to
concisely record the implications of the discrete choices, exploiting the
structure of the plan to avoid redundant reasoning or storage. Our labeling and
maintenance scheme, called the Labeled Value Set Maintenance System, is
distinguished by its focus on properties fundamental to temporal problems, and,
more generally, weighted graph algorithms. In particular, the maintenance
system focuses on maintaining a minimal representation of non-dominated
constraints. We benchmark Drakes performance on random structured problems, and
find that Drake reduces the size of the compiled representation by a factor of
over 500 for large problems, while incurring only a modest increase in run-time
latency, compared to prior work in compiled executives for temporal plans with
discrete choices
Recursive Online Enumeration of All Minimal Unsatisfiable Subsets
In various areas of computer science, we deal with a set of constraints to be
satisfied. If the constraints cannot be satisfied simultaneously, it is
desirable to identify the core problems among them. Such cores are called
minimal unsatisfiable subsets (MUSes). The more MUSes are identified, the more
information about the conflicts among the constraints is obtained. However, a
full enumeration of all MUSes is in general intractable due to the large number
(even exponential) of possible conflicts. Moreover, to identify MUSes
algorithms must test sets of constraints for their simultaneous satisfiabilty.
The type of the test depends on the application domains. The complexity of
tests can be extremely high especially for domains like temporal logics, model
checking, or SMT. In this paper, we propose a recursive algorithm that
identifies MUSes in an online manner (i.e., one by one) and can be terminated
at any time. The key feature of our algorithm is that it minimizes the number
of satisfiability tests and thus speeds up the computation. The algorithm is
applicable to an arbitrary constraint domain and its effectiveness demonstrates
itself especially in domains with expensive satisfiability checks. We benchmark
our algorithm against state of the art algorithm on Boolean and SMT constraint
domains and demonstrate that our algorithm really requires less satisfiability
tests and consequently finds more MUSes in given time limits
Approximate Model-Based Diagnosis Using Greedy Stochastic Search
We propose a StochAstic Fault diagnosis AlgoRIthm, called SAFARI, which
trades off guarantees of computing minimal diagnoses for computational
efficiency. We empirically demonstrate, using the 74XXX and ISCAS-85 suites of
benchmark combinatorial circuits, that SAFARI achieves several
orders-of-magnitude speedup over two well-known deterministic algorithms, CDA*
and HA*, for multiple-fault diagnoses; further, SAFARI can compute a range of
multiple-fault diagnoses that CDA* and HA* cannot. We also prove that SAFARI is
optimal for a range of propositional fault models, such as the widely-used
weak-fault models (models with ignorance of abnormal behavior). We discuss the
optimality of SAFARI in a class of strong-fault circuit models with stuck-at
failure modes. By modeling the algorithm itself as a Markov chain, we provide
exact bounds on the minimality of the diagnosis computed. SAFARI also displays
strong anytime behavior, and will return a diagnosis after any non-trivial
inference time
Consistent Query Answering under Spatial Semantic Constraints
Consistent query answering is an inconsistency tolerant approach to obtaining
semantically correct answers from a database that may be inconsistent with
respect to its integrity constraints. In this work we formalize the notion of
consistent query answer for spatial databases and spatial semantic integrity
constraints. In order to do this, we first characterize conflicting spatial
data, and next, we define admissible instances that restore consistency while
staying close to the original instance. In this way we obtain a repair
semantics, which is used as an instrumental concept to define and possibly
derive consistent query answers. We then concentrate on a class of spatial
denial constraints and spatial queries for which there exists an efficient
strategy to compute consistent query answers. This study applies inconsistency
tolerance in spatial databases, rising research issues that shift the goal from
the consistency of a spatial database to the consistency of query answering.Comment: Journal submission, 201
A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures
Irregular computations on unstructured data are an important class of
problems for parallel programming. Graph coloring is often an important
preprocessing step, e.g. as a way to perform dependency analysis for safe
parallel execution. The total run time of a coloring algorithm adds to the
overall parallel overhead of the application whereas the number of colors used
determines the amount of exposed parallelism. A fast and scalable coloring
algorithm using as few colors as possible is vital for the overall parallel
performance and scalability of many irregular applications that depend upon
runtime dependency analysis.
Catalyurek et al. have proposed a graph coloring algorithm which relies on
speculative, local assignment of colors. In this paper we present an improved
version which runs even more optimistically with less thread synchronization
and reduced number of conflicts compared to Catalyurek et al.'s algorithm. We
show that the new technique scales better on multi-core and many-core systems
and performs up to 1.5x faster than its predecessor on graphs with high-degree
vertices, while keeping the number of colors at the same near-optimal levels.Comment: To appear in the proceedings of Euro Par 201
Improvements in Hardware Transactional Memory for GPU Architectures
In the multi-core CPU world, transactional memory (TM)has emerged as an alternative to lock-based programming for thread synchronization. Recent research proposes the use of TM in GPU architectures, where a high number of computing threads, organized in SIMT fashion, requires an effective synchronization method. In contrast to CPUs, GPUs offer two memory spaces: global memory and local memory. The local memory space serves as a shared scratch-pad for a subset of the computing threads, and it is used by programmers to speed-up their applications thanks to its low latency. Prior work from the authors proposed a lightweight hardware TM (HTM) support based in the local memory, modifying the SIMT execution model and adding a conflict detection mechanism. An efficient implementation of these features is key in order to provide an effective synchronization mechanism at the local memory level.
After a quick description of the main features of our HTM design for GPU local memory, in this work we gather together a number of proposals designed with the aim of improving those mechanisms with high impact on performance. Firstly, the SIMT execution model is modified to increase the parallelism of the application when transactions must be serialized in order to make forward progress. Secondly, the conflict detection mechanism is optimized depending on application characteristics, such us the read/write sets, the probability of conflict between transactions and the existence of read-only transactions. As these features can be present in hardware simultaneously, it is a task of the compiler and runtime to determine which ones are more important for a given application. This work includes a discussion on the analysis to be done in order to choose the best configuration solution.Universidad de MĂĄlaga. Campus de Excelencia Internacional AndalucĂa Tech
Towards Bin Packing (preliminary problem survey, models with multiset estimates)
The paper described a generalized integrated glance to bin packing problems
including a brief literature survey and some new problem formulations for the
cases of multiset estimates of items. A new systemic viewpoint to bin packing
problems is suggested: (a) basic element sets (item set, bin set, item subset
assigned to bin), (b) binary relation over the sets: relation over item set as
compatibility, precedence, dominance; relation over items and bins (i.e.,
correspondence of items to bins). A special attention is targeted to the
following versions of bin packing problems: (a) problem with multiset estimates
of items, (b) problem with colored items (and some close problems). Applied
examples of bin packing problems are considered: (i) planning in paper industry
(framework of combinatorial problems), (ii) selection of information messages,
(iii) packing of messages/information packages in WiMAX communication system
(brief description).Comment: 39 pages, 18 figures, 14 table
Parallel Chen-Han (PCH) Algorithm for Discrete Geodesics
In many graphics applications, the computation of exact geodesic distance is
very important. However, the high computational cost of the existing geodesic
algorithms means that they are not practical for large-scale models or
time-critical applications. To tackle this challenge, we propose the parallel
Chen-Han (or PCH) algorithm, which extends the classic Chen-Han (CH) discrete
geodesic algorithm to the parallel setting. The original CH algorithm and its
variant both lack a parallel solution because the windows (a key data structure
that carries the shortest distance in the wavefront propagation) are maintained
in a strict order or a tightly coupled manner, which means that only one window
is processed at a time. We propose dividing the CH's sequential algorithm into
four phases, window selection, window propagation, data organization, and
events processing so that there is no data dependence or conflicts in each
phase and the operations within each phase can be carried out in parallel. The
proposed PCH algorithm is able to propagate a large number of windows
simultaneously and independently. We also adopt a simple yet effective strategy
to control the total number of windows. We implement the PCH algorithm on
modern GPUs (such as Nvidia GTX 580) and analyze the performance in detail. The
performance improvement (compared to the sequential algorithms) is highly
consistent with GPU double-precision performance (GFLOPS). Extensive
experiments on real-world models demonstrate an order of magnitude improvement
in execution time compared to the state-of-the-art.Comment: 10 pages, accepted to ACM Transactions on Graphics with major
revisio
SAT-based Explicit LTL Reasoning
We present here a new explicit reasoning framework for linear temporal logic
(LTL), which is built on top of propositional satisfiability (SAT) solving. As
a proof-of-concept of this framework, we describe a new LTL satisfiability
tool, Aalta\_v2.0, which is built on top of the MiniSAT SAT solver. We test the
effectiveness of this approach by demonnstrating that Aalta\_v2.0 significantly
outperforms all existing LTL satisfiability solvers. Furthermore, we show that
the framework can be extended from propositional LTL to assertional LTL (where
we allow theory atoms), by replacing MiniSAT with the Z3 SMT solver, and
demonstrating that this can yield an exponential improvement in performance
On Scaling Rules for Energy of VLSI Polar Encoders and Decoders
It is shown that all polar encoding schemes of rate of block
length implemented according to the Thompson VLSI model must take energy
. This lower bound is achievable up to
polylogarithmic factors using a mesh network topology defined by Thompson and
the encoding algorithm defined by Arikan. A general class of circuits that
compute successive cancellation decoding adapted from Arikan's butterfly
network algorithm is defined. It is shown that such decoders implemented on a
rectangle grid for codes of rate must take energy
, and this can also be reached up to polylogarithmic
factors using a mesh network. Capacity approaching sequences of energy optimal
polar encoders and decoders, as a function of reciprocal gap to capacity , have energy that scales as
- âŠ