3,172 research outputs found
An Efficient Data Structure for Dynamic Two-Dimensional Reconfiguration
In the presence of dynamic insertions and deletions into a partially
reconfigurable FPGA, fragmentation is unavoidable. This poses the challenge of
developing efficient approaches to dynamic defragmentation and reallocation.
One key aspect is to develop efficient algorithms and data structures that
exploit the two-dimensional geometry of a chip, instead of just one. We propose
a new method for this task, based on the fractal structure of a quadtree, which
allows dynamic segmentation of the chip area, along with dynamically adjusting
the necessary communication infrastructure. We describe a number of algorithmic
aspects, and present different solutions. We also provide a number of basic
simulations that indicate that the theoretical worst-case bound may be
pessimistic.Comment: 11 pages, 12 figures; full version of extended abstract that appeared
in ARCS 201
Specialized Hardware Support for Dynamic Storage Allocation
With the advent of operating systems and programming languages that can evaluate and guarantee real-time speciïŹcations, applications with real-time requirements can be authored in higher-level languages. For example, a version of Java suitable for real-time (RTSJ) has recently reached the status of a reference implementation, and it is likely that other implementations will follow. Analysis to show the feasibility of a given set of tasks must take into account their worst-case execution time, including any storage allocation or deallocation associated with those tasks. In this thesis, we present a hardware-based solution to the problem of storage allocation and (explicit) deallocation for real-time applications. Our approach oïŹers both predictable and low execution time: a storage allocation request can be satisïŹed in the time necessary to fetch one word from memory
Predictability of just in time compilation
The productivity of embedded software development is limited by the high fragmentation of hardware platforms. To alleviate this problem, virtualization has become an important tool in computer science; and virtual machines are used in a number of subdisciplines ranging from operating systems to processor architecture. The processor virtualization can be used to address the portability problem. While the traditional compilation flow consists of compiling program source code into binary objects that can natively executed on a given processor, processor virtualization splits that flow in two parts: the first part consists of compiling the program source code into processor-independent bytecode representation; the second part provides an execution platform that can run this bytecode in a given processor. The second part is done by a virtual machine interpreting the bytecode or by just-in-time (JIT) compiling the bytecodes of a method at run-time in order to improve the execution performance. Many applications feature real-time system requirements. The success of real-time systems relies upon their capability of producing functionally correct results within dened timing constraints. To validate these constraints, most scheduling algorithms assume that the worstcase execution time (WCET) estimation of each task is already known. The WCET of a task is the longest time it takes when it is considered in isolation. Sophisticated techniques are used in static WCET estimation (e.g. to model caches) to achieve both safe and tight estimation. Our work aims at recombining the two domains, i.e. using the JIT compilation in realtime systems. This is an ambitious goal which requires introducing the deterministic in many non-deterministic features, e.g. bound the compilation time and the overhead caused by the dynamic management of the compiled code cache, etc. Due to the limited time of the internship, this report represents a rst attempt to such combination. To obtain the WCET of a program, we have to add the compilation time to the execution time because the two phases are now mixed. Therefore, one needs to know statically how many times in the worst case a function will be compiled. It may be seemed a simple job, but if we consider a resource constraint as the limited memory size and the advanced techniques used in JIT compilation, things will be nasty. We suppose that a function is compiled at the rst time it is used, and its compiled code is cached in limited size software cache. Our objective is to find an appropriate structure cache and replacement policy which reduce the overhead of compilation in the worst case
RHmalloc : a very large, highly concurrent dynamic memory manager
University of Technology, Sydney. Dept. of Software Engineering.Dynamic memory management (DMM) is a fundamental aspect of computing, directly affecting the capability, performance and reliability of virtually every system in existence today. Yet oddly, the fifty year research into DMM has not taken memory capacity into account, having fallen significantly behind hardware trends. Comparatively little research work on scalable DMM has been conducted â of the order of ten papers exist on this topic â all of which focus on CPU scalability only; the largest heap reported in the literature to date is 600MB. By contrast, symmetric multiprocessor (SMP) machines with terabytes of memory are now commercially available.
The contribution of our research is the formal exploration, design, construction and proof of a general purpose, high performance dynamic memory manager which scales indefinitely with respect to both CPU and memory â one that can predictably manage a heap of arbitrary size, on any SMP machine with an arbitrary number of CPUâs, without a priori knowledge.
We begin by recognizing the scattered, inconsistency of the literature surrounding this topic. Firstly, to ensure clarity, we present a simplified introduction, followed by a catalog of the fundamental techniques. We discuss the melting pot of engineering tradeoffs, so as to establish a sound basis to tackle the issue at hand â large scale DMM. We review both the history and state of the art, from which significant insight into this topic is to be found. We then explore the problem space and suggest a workable solution.
Our proposal, known as RHmalloc, is based on the novel perspective that, a highly scalable heap can be viewed as, an unbounded set of finite-sized sub-heaps, where each sub-heap maybe concurrently shared by any number of threads; such that a suitable sub-heap can be found in O(1) time, and an allocation from a suitable sub-heap is also O(1).
Testing the design properties of RHmalloc, we show by extrapolation that, RHmalloc will scale to at least 1,024 CPUâs and 1PB; and we theoretically prove that DMM scales indefinitely with respect to both CPU and memory.
Most importantly, the approach for the scalability proof alludes to a general analysis and design technique for systems of this nature
Taming Numbers and Durations in the Model Checking Integrated Planning System
The Model Checking Integrated Planning System (MIPS) is a temporal least
commitment heuristic search planner based on a flexible object-oriented
workbench architecture. Its design clearly separates explicit and symbolic
directed exploration algorithms from the set of on-line and off-line computed
estimates and associated data structures. MIPS has shown distinguished
performance in the last two international planning competitions. In the last
event the description language was extended from pure propositional planning to
include numerical state variables, action durations, and plan quality objective
functions. Plans were no longer sequences of actions but time-stamped
schedules. As a participant of the fully automated track of the competition,
MIPS has proven to be a general system; in each track and every benchmark
domain it efficiently computed plans of remarkable quality. This article
introduces and analyzes the most important algorithmic novelties that were
necessary to tackle the new layers of expressiveness in the benchmark problems
and to achieve a high level of performance. The extensions include critical
path analysis of sequentially generated plans to generate corresponding optimal
parallel plans. The linear time algorithm to compute the parallel plan bypasses
known NP hardness results for partial ordering by scheduling plans with respect
to the set of actions and the imposed precedence relations. The efficiency of
this algorithm also allows us to improve the exploration guidance: for each
encountered planning state the corresponding approximate sequential plan is
scheduled. One major strength of MIPS is its static analysis phase that grounds
and simplifies parameterized predicates, functions and operators, that infers
knowledge to minimize the state description length, and that detects domain
object symmetries. The latter aspect is analyzed in detail. MIPS has been
developed to serve as a complete and optimal state space planner, with
admissible estimates, exploration engines and branching cuts. In the
competition version, however, certain performance compromises had to be made,
including floating point arithmetic, weighted heuristic search exploration
according to an inadmissible estimate and parameterized optimization
- âŠ