1,108 research outputs found
An Expressive Language and Efficient Execution System for Software Agents
Software agents can be used to automate many of the tedious, time-consuming
information processing tasks that humans currently have to complete manually.
However, to do so, agent plans must be capable of representing the myriad of
actions and control flows required to perform those tasks. In addition, since
these tasks can require integrating multiple sources of remote information ?
typically, a slow, I/O-bound process ? it is desirable to make execution as
efficient as possible. To address both of these needs, we present a flexible
software agent plan language and a highly parallel execution system that enable
the efficient execution of expressive agent plans. The plan language allows
complex tasks to be more easily expressed by providing a variety of operators
for flexibly processing the data as well as supporting subplans (for
modularity) and recursion (for indeterminate looping). The executor is based on
a streaming dataflow model of execution to maximize the amount of operator and
data parallelism possible at runtime. We have implemented both the language and
executor in a system called THESEUS. Our results from testing THESEUS show that
streaming dataflow execution can yield significant speedups over both
traditional serial (von Neumann) as well as non-streaming dataflow-style
execution that existing software and robot agent execution systems currently
support. In addition, we show how plans written in the language we present can
represent certain types of subtasks that cannot be accomplished using the
languages supported by network query engines. Finally, we demonstrate that the
increased expressivity of our plan language does not hamper performance;
specifically, we show how data can be integrated from multiple remote sources
just as efficiently using our architecture as is possible with a
state-of-the-art streaming-dataflow network query engine
Use of a weighted matching algorithm to sequence clusters in spatial join processing
One of the most expensive operations in a spatial database is spatial join processing. This study focuses on how to improve the performance of such processing. The main objective is to reduce the Input/Output (I/O) cost of the spatial join process by using a technique called cluster-scheduling. Generally, the spatial join is processed in two steps, namely filtering and refinement. The cluster-scheduling technique is performed after the filtering step and before the refinement step and is part of the housekeeping phase. The key point of this technique is to realise order wherein two consecutive clusters in the sequence have maximal overlapping objects. However, finding the maximal overlapping order has been shown to be Nondeterministic Polynomial-time (NP)-complete. This study proposes an algorithm to provide approximate maximal overlapping (AMO) order in a Cluster Overlapping (CO) graph. The study proposes the use of an efficient maximum weighted matching algorithm to solve the problem of finding AMO order. As a result, the I/O cost in spatial join processing can be minimised
A Comparison of some recent Task-based Parallel Programming Models
The need for parallel programming models that are simple to use and at the same time efficient for current ant future parallel platforms has led to recent attention to task-based models such as Cilk++, Intel TBB and the task concept in OpenMP version 3.0. The choice of model and implementation can have a major impact on the final performance and in order to understand some of the trade-offs we have made a quantitative study comparing four implementations of OpenMP (gcc, Intel icc, Sun studio and the research compiler Mercurium/nanos mcc), Cilk++ and Wool, a high-performance task-based library developed at SICS.
Abstract. We use microbenchmarks to characterize costs for task-creation and stealing and the Barcelona OpenMP Tasks Suite for characterizing application performance. By far Wool and Cilk++ have the lowest overhead in both spawning and stealing tasks. This is reflected in application performance when many tasks with small granularity are spawned where Cilk++ and, in particular, has the highest performance. For coarse granularity applications, the OpenMP implementations have quite similar performance as the more light-weight Cilk++ and Wool except for one application where mcc is superior thanks to a superior task scheduler.
Abstract. The OpenMP implemenations are generally not yet ready for use when the task granularity becomes very small. There is no inherent reason for this, so we expect future implementations of OpenMP to focus on this issue
Improving the Performance of SQL Join Operation in the Distributed Enterprise Information System by Caching
The enterprise information system (EIS) contains databases and other data sources in multiple data centers. Users query the EIS via clients. The client has a working space in the cloud. Caching data in client space will reduce the total execution time of the query. However, the client space has limited resources to store data. There are two options for caching data at the client space: caching the final results of query operations, or caching the source data tables. The problem is that some query operations such as “joining multiple big tables” will simply produce a result too big to store in cache in some cases. By contrast, caching source data tables may be a better choice in those situations. This paper presents an algorithm that combines active caching and passive caching to improve the cache hit, thus improving performance of the SQL join query in the cloud computing environment
Optimization of multi-domain queries on the Web
Where can I attend an interesting database workshop close
to a sunny beach? Who are the strongest experts on service
computing based upon their recent publication record and
accepted European projects? Can I spend an April week-
end in a city served by a low-cost direct
flight from Milano
offering a Mahler's symphony? We regard the above queries
as multi-domain queries, i.e., queries that can be answered
by combining knowledge from two or more domains (such
as: seaside locations,
flights, publications, accepted projects,
conference offerings, and so on). This information is avail-
able on the Web, but no general-purpose software system
can accept the above queries nor compute the answer. At
the most, dedicated systems support specific multi-domain
compositions (e.g., Google-local locates information such as
restaurants and hotels upon geographic maps).
This paper presents an overall framework for multi-domain
queries on the Web. We address the following problems: (a)
expressing multi-domain queries with an abstract formalism,
(b) separating the treatment of "search" services within the
model, by highlighting their dierences from "exact" Web
services, (c) explaining how the same query can be mapped
to multiple "query plans", i.e., a well-dened scheduling of
service invocations, possibly in parallel, which complies with
their access limitations and preserves the ranking order in
which search services return results; (d) introducing cross-
domain joins as first-class operation within plans; (e) eval-
uating the query plans against several cost metrics so as to
choose the most promising one for execution. This frame-
work adapts to a variety of application contexts, ranging
from end-user-oriented mash-up scenarios up to complex ap-
plication integration scenarios
PARALLELIZATION OF ASSEMBLY OPERATION IN FINITE ELEMENT METHOD
The efficient codes can take an advantage of multiple threads and/or processing nodes to partition a work that can be processed concurrently. This can reduce the overall run-time or make the solution of a large problem feasible. This paper deals with evaluation of different parallelization strategies of assembly operations for global vectors and matrices, which are one of the critical operations in any finite element software. Different assembly strategies for systems with a shared memory model are proposed and evaluated, using Open Multi-Processing (OpenMP), Portable Operating System Interface (POSIX), and C++11 Threads. The considered strategies are based on simple synchronization directives, various block locking algorithms and, finally, on smart locking free processing based on a colouring algorithm. The different strategies were implemented in a free finite element code with object-oriented architecture OOFEM [1]
WCET-Aware Scratchpad Memory Management for Hard Real-Time Systems
abstract: Cyber-physical systems and hard real-time systems have strict timing constraints that specify deadlines until which tasks must finish their execution. Missing a deadline can cause unexpected outcome or endanger human lives in safety-critical applications, such as automotive or aeronautical systems. It is, therefore, of utmost importance to obtain and optimize a safe upper bound of each task’s execution time or the worst-case execution time (WCET), to guarantee the absence of any missed deadline. Unfortunately, conventional microarchitectural components, such as caches and branch predictors, are only optimized for average-case performance and often make WCET analysis complicated and pessimistic. Caches especially have a large impact on the worst-case performance due to expensive off- chip memory accesses involved in cache miss handling. In this regard, software-controlled scratchpad memories (SPMs) have become a promising alternative to caches. An SPM is a raw SRAM, controlled only by executing data movement instructions explicitly at runtime, and such explicit control facilitates static analyses to obtain safe and tight upper bounds of WCETs. SPM management techniques, used in compilers targeting an SPM-based processor, determine how to use a given SPM space by deciding where to insert data movement instructions and what operations to perform at those program locations. This dissertation presents several management techniques for program code and stack data, which aim to optimize the WCETs of a given program. The proposed code management techniques include optimal allocation algorithms and a polynomial-time heuristic for allocating functions to the SPM space, with or without the use of abstraction of SPM regions, and a heuristic for splitting functions into smaller partitions. The proposed stack data management technique, on the other hand, finds an optimal set of program locations to evict and restore stack frames to avoid stack overflows, when the call stack resides in a size-limited SPM. In the evaluation, the WCETs of various benchmarks including real-world automotive applications are statically calculated for SPMs and caches in several different memory configurations.Dissertation/ThesisDoctoral Dissertation Computer Science 201
- …